# nanoServices Data Type Compliance Policy
*Generated: 2025-08-14 10:18:17 UTC*

This notebook documents the compliance rules for **data types** in the nanoServices core ontology and their use in Java/Jena boundaries.


## Scope
This policy covers:
- Unicode strings (`xsd:string`, Java `String`), including serialization encodings.
- Identifiers (`uid`/`pid`) via `nano:UniqueIdentifier`.
- Numeric weights via `nano:UnitInterval` (e.g., `priority`, `entropy`).
- RDF Resource handling (IRI vs Blank Node) at the Java ↔️ Jena adapter boundary.


## Decision Summary (authoritative)
- **Unicode everywhere**: Java `String` and `xsd:string` carry Unicode. All serializations use **UTF‑8**.
- **Identifiers (`uid`, `pid`)**:
  - Datatype: **`nano:UniqueIdentifier`** = `xsd:token` with pattern `[^ \t\r\n]+` (no whitespace).
  - Exactly one per class: `Port(uid)`, `Process(pid)` via `owl:FunctionalProperty` + `owl:hasKey` + OWL cardinality 1.
  - **Empty string is forbidden**. If absent in the domain model, treat as `null`.
- **Numeric weights** (`priority`, `entropy`): **`nano:UnitInterval`** (decimal in **[0, 1]** inclusive).
- **RDF Resources**:
  - Domain model uses **`String`** for resource identifiers.
  - Adapter to Jena:
    - `null` ⇒ create **fresh Blank Node** (`ResourceFactory.createResource()`).
    - non-empty string ⇒ **IRI** (`ResourceFactory.createResource(value)`).
    - empty string `""` ⇒ **invalid** (reject before adapter).
- **No “has*” requirement**: Property names are relational/verb-like; compliance is about **ranges and cardinality**, not naming.


## Type Matrix
| Concept | RDF/XSD Datatype | Value Space & Constraints | Notes |
|---|---|---|---|
| Text (labels, comments) | `xsd:string` | Unicode, any scalar value | Serialized as UTF‑8 |
| Unique Identifier (`uid`, `pid`) | `nano:UniqueIdentifier` | `xsd:token` with pattern `[^ \t\r\n]+` (no whitespace) | Exactly one per Port/Process |
| Numeric weight (`priority`, `entropy`) | `nano:UnitInterval` | `xsd:decimal`, `0.0 ≤ v ≤ 1.0` | Inclusive bounds |
| IRI for resources | IRI (RDF 1.1) | Unicode per RFC 3987 | Passed to Jena as string |
| Blank Node | (no lexical form) | Created via Jena factory | Use `null` in domain ⇒ fresh bnode |


## Java ↔️ Jena Adapter (reference)
```java
import org.apache.jena.rdf.model.Resource;
import org.apache.jena.rdf.model.ResourceFactory;
import org.apache.jena.rdf.model.AnonId;

public final class RdfAdapters {
  private RdfAdapters() {}

  /** Convert domain value to a Jena Resource. 
   *  @param iriOrNull Unicode string IRI, or null for blank node
   *  @throws IllegalArgumentException if iriOrNull is empty
   */
  public static Resource toResource(String iriOrNull) {
    if (iriOrNull == null) {
      return ResourceFactory.createResource(); // fresh blank node
    }
    if (iriOrNull.isEmpty()) {
      throw new IllegalArgumentException("Empty string is not a valid IRI; use null for blank node.");
    }
    return ResourceFactory.createResource(iriOrNull);
  }

  /** Explicit blank-node creation with a supplied label (optional). */
  public static Resource toBNode(String bnodeIdOrNull) {
    return (bnodeIdOrNull == null)
        ? ResourceFactory.createResource()
        : ResourceFactory.createResource(AnonId.create(bnodeIdOrNull));
  }
}
```


In [None]:
# Lightweight validators used during ingestion or tests.
from __future__ import annotations
import re
import unicodedata
from decimal import Decimal, InvalidOperation

UUID_V4_RE = re.compile(r"^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$")
UNIQUE_IDENTIFIER_RE = re.compile(r"^[^\t\r\n ]+$")  # no whitespace at all


def normalize_unicode_nfc(s: str) -> str:
    """Normalize for stable comparisons (policy doesn't require it globally,
    but it's often wise for identifiers)."""
    return unicodedata.normalize("NFC", s)


def is_unique_identifier(value: str) -> bool:
    """nano:UniqueIdentifier compliance: no whitespace (token-based), non-empty."""
    return isinstance(value, str) and bool(value) and bool(UNIQUE_IDENTIFIER_RE.match(value))


def is_unit_interval(value: str | float | Decimal) -> bool:
    """nano:UnitInterval compliance: decimal in [0, 1]. Accept str/float/Decimal inputs."""
    try:
        v = Decimal(str(value))
    except (InvalidOperation, ValueError):
        return False
    return Decimal("0") <= v <= Decimal("1")


def is_uuid_v4(value: str) -> bool:
    """Optional helper; not mandated by policy, but provided for convenience."""
    return isinstance(value, str) and bool(UUID_V4_RE.match(value))

# Demo assertions (can be adjusted or removed by adopters)
assert is_unique_identifier("urn:uuid:550e8400-e29b-41d4-a716-446655440000")
assert not is_unique_identifier("")
assert is_unit_interval(0)
assert is_unit_interval("0.75")
assert not is_unit_interval("1.01")
assert is_uuid_v4("550e8400-e29b-41d4-a716-446655440000")
print("Sample validations passed.")


## Serialization Requirements
- **UTF‑8** encoding for **all** serializations (Turtle, JSON‑LD, RDF/XML, files & streams).
- RDF/XML must carry an explicit header:
  ```xml
  <?xml version="1.0" encoding="UTF-8"?>
  ```
- Writers in Java must open streams with `StandardCharsets.UTF_8`.
- Empty string `""` is **invalid** as IRI; use **null** to request a blank node.


## Conformance Checklist
- [ ] All identifiers (`uid`, `pid`) are non-empty, contain no whitespace, and occur **exactly once** per instance.
- [ ] All numeric weights are decimals in **[0, 1]**.
- [ ] No empty-string IRIs are written; `null` becomes a blank node.
- [ ] All files are saved/transmitted as **UTF‑8**.
- [ ] (Optional) UUIDv4 used where suitable; pattern validated.
