Skip to content

Feat/consys proto arrays#224

Draft
tipatterson-dev wants to merge 17 commits into
masterfrom
feat/consys-proto-arrays
Draft

Feat/consys proto arrays#224
tipatterson-dev wants to merge 17 commits into
masterfrom
feat/consys-proto-arrays

Conversation

@tipatterson-dev

@tipatterson-dev tipatterson-dev commented Jun 29, 2026

Copy link
Copy Markdown
Member

Depends on a change to core that adds support for custom endpoints in the Connected Systems module. See opensensorhub/osh-core#354

This change to allows users to read and write protobuf serialized messages according to the schema outlined in the included .proto files. Those files are the basis of an in progress draft of Part 5 of OGC's Connected Systems API. Tis work also ties into efforts to align our MQTT implementation with Part 3 of the CS API as well.

- add cloud event types
- add a resource event publisher
  - Replace permission checks in ConSysApiMqttConnector with validator
  - Add unit tests for ConSysTopicValidator
- fix a parsing bug with # wildcards
- simplify test readability for validator
Recognize an optional format token appended to a Resource Data Topic
(e.g. .../observations:data/swe-json) per the OGC CS API Part 3 content
negotiation rules, and apply it as the request/response format.
ConSysTopicValidator gains DATA_SUFFIX, FORMAT_SUBTOPICS (json, swe-json,
swe-binary, swe-csv, swe-proto, om-json, sml-json), isDataTopic,
parseDataTopicFormat and stripDataSuffix; ConSysApiMqttConnector consumes
them for both publish (request content type) and subscribe (response format),
replacing the bare :data suffix handling.
Add the sensorhub-service-consys-proto module skeleton: the Connected
Systems and SWE Common .proto definitions (including the swe_options field
annotations carrying SWE metadata), the Gradle build wiring the protobuf
plugin, and the IModuleProvider service registration. No Java yet —
the codec implementation follows in the next commit.
Implement the swe+proto codec end to end. Outbound: SWE→proto schema
generation (ProtoSchemaWriter, GeneratedSchemaCache, DataStreamSchemaCache)
and observation/command encoding (ProtoObsEncoder, ProtoFormat bindings).
Inbound: observation/command decode for ingestion (ProtoRecordDecoder,
ObsBindingProto, CommandBindingProto) and client schema ingestion that
rebuilds a SWE Common record from a wire FileDescriptorSet
(ProtoSchemaReader, DataStream/CommandStream schema bindings). Includes the
module service classes (Activator, ConSysApiProtoService, config, descriptor)
and the test suite.
…print cache

GeneratedSchemaCache now rebuilds the FileDescriptor/FileDescriptorSet on every
get() instead of memoizing per stream behind a structural fingerprint. The
hashing (fingerprint()/mix(), the entries map, Entry.fingerprint) is removed;
the class name and get(BigId, DataComponent) signature are kept (streamId
unused) so the cache can be reinstated without touching ProtoFormat or the
bindings.

The prior fingerprint-cache implementation is preserved on branch
parked/schema-fingerprint-cache.

- TestGeneratedSchemaCache: assert per-call rebuild + deterministic output
  instead of build-count memoization.
- ObsBindingProto / CommandBindingProto / ProtoFormat: refresh now-stale
  "memoized/fingerprinted" comments.
Move bindings/codecs/schema machinery out of the flat proto package into
subpackages by concern: schema/ (writer, reader, both caches), codec/
(encoder, decoder), and datastreams/ observations/ controlstreams/ commands/
for the resource bindings. Service/format/infra classes (ProtoFormat,
ConSysApiProtoService, *Config, *Descriptor, Activator) stay at the root so
the deployment-config FQN, the IModuleProvider SPI entry, and the OSGi
Bundle-Activator keep resolving.

Add the now-required cross-package imports and make the four binding name
constants (OBS_PACKAGE/OBS_MESSAGE, CMD_PACKAGE/CMD_MESSAGE) public so
ProtoFormat reads them across the boundary. Tests mirror the layout.
No behavior change; full module suite green.
Increment 1 of DataArray support. Fixed-size arrays of scalars and of
records/vectors now round-trip on the same-node encode->decode path:

- ProtoObsEncoder: DataArray -> repeated field; K = getComponentCount()
  elements walked from the flat DataBlock (contiguous for fixed arrays).
- ProtoRecordDecoder: repeated field -> flat block, with a wire-length vs
  schema-size check.
- ProtoArrays.hasNonFlatLayout: shared guard so a fixed array hiding a
  DataChoice or variable-size sub-array (DataBlockList-backed, not flat-
  addressable) throws loudly in both stages instead of mis-reading atoms.
- Extracted scalarValue()/setAtom() so the array path reuses the scalar
  type switch (no fifth copy).

Not yet: variable-size arrays, Matrix, foreign ingest (ProtoSchemaReader
still rejects repeated fields) - follow-up increments.

Tests: TestProtoArrayRoundTrip. Full module suite green (37).
Increment 2. Variable-size arrays now round-trip; the element count travels
as the repeated field's wire length (no separate size field needed).

- ProtoObsEncoder.encode/encodeCommand: bind struct.setData(data) so a
  variable array's getComponentCount() reflects the observation's data; drop
  the variable-size rejection.
- ProtoRecordDecoder: the choice pre-pass becomes prepass(), which also
  updateSize()s each variable array from msg.getRepeatedFieldCount() before
  createDataBlock() so the flat block is allocated correctly; drop the
  variable-size rejection.
- Non-flat (DataBlockList) elements — a fixed array hiding a DataChoice or a
  variable-size sub-array — still throw via ProtoArrays.hasNonFlatLayout.

Tests: variable-size round-trip (trailing scalar catches drift) + a direct
unit test of the non-flat guard. Full module suite green.
Increment 3. A DataArray whose element is itself a DataArray (a Matrix row;
proto has no repeated-of-repeated) now round-trips via a generated wrapper
message holding the inner array as its single repeated field.

- ProtoSchemaWriter: route a DataArray element to the nested-message path;
  buildMessage already wraps a non-structured component as a one-field
  message, so the wrapper falls out for free (recurses for N-D).
- ProtoObsEncoder: treat a DataArray element as nested (encodeMessage wraps
  the inner array as the wrapper's field 1).
- ProtoRecordDecoder: new branch decodes each wrapper's field-1 inner array.

Fixed-size matrices only (their flat DataBlock is contiguous); variable-size
inner arrays remain non-flat and are rejected by ProtoArrays.hasNonFlatLayout.

Test: 2x3 array-of-array round-trip with a trailing scalar. Suite green.
Increment 4. ProtoSchemaReader (the foreign-schema ingest path) no longer
rejects repeated fields — it rebuilds a variable-size DataArray:

- buildArray(): element from the field's scalar/record type, or — for a
  Matrix-row wrapper message (one repeated field) — a nested DataArray
  (recursive, so N-D matrices ingest too).
- The array is implicit variable-size (an elementCount carrying a Count whose
  value is unset → isVariableSize()), so ProtoRecordDecoder.prepass() sizes it
  from the repeated field's wire length. A foreign descriptor carries no
  count, so ingested arrays are necessarily variable-size.

Test: encode with the original struct, decode with the reader-rebuilt one
(the foreign-ingest property). Suite green.
Increment 5. ProtoFormat.isCompatible no longer returns true unconditionally
(its standing TODO): it returns canEncode(recordStructure), so a datastream
whose structure the codec can't handle won't list swe+proto in its formats
and then 500 on encode.

canEncode rejects:
- structures with a non-flat array (a DataArray whose element hides a
  DataChoice or variable-size sub-array — DataBlockList-backed; the writer
  emits these but the flat-index codec can't walk them), via
  ProtoArrays.hasNonFlatLayout (made public);
- anything the schema build throws on (Geometry, unmapped scalar types,
  invalid descriptor) — caught by attempting write()+resolve().

Test: TestProtoFormat. Suite green.
…rows)

Proves a capability the increment-3 message understated ("fixed matrices
only"). The codec's only constraint is that every array ELEMENT has a fixed
atom count, so a variable-size array is fine as the OUTER matrix dimension:
array[var n] of (array[3] double) — a variable count of fixed-width rows —
round-trips on a flat block. Only a variable INNER array (ragged rows) or an
array-of-choice is DataBlockList-backed and rejected.
Increment 6. A nested DataArray with BOTH dimensions variable (M x N, sized by
count components, rectangular / "square per message") now round-trips — M and N
may differ per message. Verified empirically that SWE lays these out as a flat
DataBlockMixed (NOT a DataBlockList), so the flat-index codec handles them; the
earlier guard rejected them on a wrong premise.

- ProtoArrays: hasNonFlatLayout -> elementHasChoice. Rectangular arrays (fixed
  or variable, nested) stay flat and are no longer flagged; only a DataChoice
  in an array element is still rejected (the pre-pass doesn't apply per-element
  selections inside an array yet).
- ProtoRecordDecoder.prepass: size the (shared) inner array dimension from row
  0's wrapper message, so createDataBlock allocates the full M x N flat block.
  A ragged row then mismatches the per-row wireCount check and throws loudly.
- ProtoFormat: hasNonFlatArray -> hasArrayWithChoiceElement.

Encoder already handled it via setData + the nested-array path. Tests: M,N
varying per message (2x3, 3x2, 1x5) on one schema; updated guard unit test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant