-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tagged components #6889
Comments
An instance of tagged component @jleibs and me discussed recently is fill out and wireframe color in a situation where both solid & fill are supported at the same time (another outcome of the discussion was that we don't want to support it right now, more on that later & elsewhere):
|
This seems like a potentially lot of added fundamental complexity in implementation and possibly API. I wonder if there's a way to get the same effect, simpler, in addition to wondering exactly what the semantics of this tag system is going to be.
|
We've had a long discussion about leaf (nee "out-of-tree") transforms a couple days back, during which the desire for tagged components came up a lot. Again. We keep running into data modeling deadlocks. In parallel, @gavrelina has been experimenting with the Dataframe View and has hit more of the same issues. We keep running into data modeling deadlocks, again and again. My guess is that tagged components will greatly influence the design of our data model and query language, and to an extent even the UX of the viewer itself. As such I think it's really important that we get there sooner rather than later. It would also be quite nice to be completely done with major ABI breaking changes ASAP, before we enter the ✨ disk-based era ✨. Thus, here's a quick proposal to move us towards that goal. ContextToday, the atomic unit of data in Rerun is a Chunk column. A Chunk column is fully qualified by two things: a Rerun
The two are for the most part completely orthogonal to one another. This information is all stored within the column metadata, and denormalized into the store indices as needed. At runtime, a Rerun system will look for a piece of data by searching for the semantic its interested in, and then interpreting the returned data based on its datatype: let data = store.latest_at("my_entity", TimeInt::MAX, "rerun.components.Position3D")?; // untyped (!)
let data = data.try_downcast::<&[[f32; 3]]>()?; // typed All of this works pretty nicely, except for one major limitation: you cannot re-use the same semantics twice (or more) on a single entity. Example: imagine an entity This problem infects everything and leads to all kinds of nasty data modeling deadlocks all over the place. ProposalThe core idea of this proposal is trivial: to replace the very limited Rerun We should be able to get there in small increments that can be merged as they come, with complete feature parity and no visible impact on end-users whatsoever. Once we're there, we'll be able to start experimenting with all kinds of crazy ideas. Data model changesA Chunk column would still be fully qualified by two bits of information: a Rerun A /// A [`ComponentDescriptor`] fully describes the semantics of a column of data.
pub struct ComponentDescriptor {
/// Optional name of the `Archetype` associated with this data.
///
/// `None` if the data wasn't logged through an archetype.
///
/// Example: `rerun.archetypes.Points3D`.
archetype_name: Option<ArchetypeName>,
/// Semantic name associated with this data.
///
/// Example: `rerun.components.Position3D`.
component_name: ComponentName,
/// Optional label to further qualify the data.
///
/// Example: "postions".
//
// TODO: Maybe it's a dedicated type or an `InternedString` or w/e, doesn't matter.
tag: Option<String>,
}
// NOTE: Take a careful look at this implementation, so you know what I mean later in this doc.
//
// Examples:
// * `rerun.archetypes.Points3D::rerun.components.Position3D#positions`
// * `rerun.components.Translation3D#translation`
// * `third_party.barometric_pressure`
impl std::fmt::Display for ComponentDescriptor {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
let ComponentDescriptor {
archetype_name,
component_name,
tag,
} = self;
match (archetype_name, component_name, tag) {
(None, component_name, None) => f.write_str(component_name),
(Some(archetype_name), component_name, None) => {
f.write_fmt(format_args!("{archetype_name}::{component_name}"))
}
(None, component_name, Some(tag)) => {
f.write_fmt(format_args!("{component_name}#{tag}"))
}
(Some(archetype_name), component_name, Some(tag)) => {
f.write_fmt(format_args!("{archetype_name}::{component_name}#{tag}"))
}
}
}
} This information could be trivially code generated already today, and is a strict superset of the status quo. This would already be enough to get rid of our old indicator components. IDL changesI'm going to use The major change at the IDL level is that the entire (NOTE: I've omitted the usual attributes in all the IDL samples below. They haven't changed in any way.) I.e. table Points3D {
positions: [rerun.components.Position3D] (/* … */);
radii: [rerun.components.Radius] (/* … */);
colors: [rerun.components.Color] (/* … */);
labels: [rerun.components.Text] (/* … */);
class_ids: [rerun.components.ClassId] (/* … */);
keypoint_ids: [rerun.components.KeypointId] (/* … */);
} into this (the generated table Points3D {
// ComponentDescriptor: "rerun.archetypes.Points3D::rerun.components.Position3D#positions"
positions: [rerun.datatypes.Vec3D] ("attr.rerun.component": "rerun.components.Position3D", /* … */);
// ComponentDescriptor: "rerun.archetypes.Points3D::rerun.components.Radius#radii"
radii: [rerun.datatypes.Float32] ("attr.rerun.component": "rerun.components.Radius", /* … */);
// ComponentDescriptor: "rerun.archetypes.Points3D::rerun.components.Color#colors"
colors: [rerun.datatypes.UInt32] ("attr.rerun.component": "rerun.components.Color", /* … */);
// ComponentDescriptor: "rerun.archetypes.Points3D::rerun.components.Label#labels"
labels: [rerun.datatypes.Utf8] ("attr.rerun.component": "rerun.components.Label", /* … */);
// ComponentDescriptor: "rerun.archetypes.Points3D::rerun.components.ClassId#class_ids"
class_ids: [rerun.datatypes.UInt16] ("attr.rerun.component": "rerun.components.ClassId", /* … */);
// ComponentDescriptor: "rerun.archetypes.Points3D::rerun.components.KeypointId#keypoint_ids"
keypoint_ids: [rerun.datatypes.UInt16] ("attr.rerun.component": "rerun.components.KeypointId", /* … */);
}
table Mesh3D {
vertex_positions: [rerun.components.Position3D] (/* … */);
triangle_indices: [rerun.components.TriangleIndices] (/* … */);
vertex_normals: [rerun.components.Vector3D] (/* … */);
vertex_colors: [rerun.components.Color] (/* … */);
vertex_texcoords: [rerun.components.Texcoord2D] (/* … */);
albedo_factor: [rerun.components.AlbedoFactor] (/* … */);
class_ids: [rerun.components.ClassId] (/* … */);
} into this (the generated table Mesh3D {
// ComponentDescriptor: "rerun.archetypes.Mesh3D::rerun.components.Position3D#vertex_positions"
vertex_positions: [rerun.datatypes.Vec3D] ("attr.rerun.component": "rerun.components.Position3D", /* … */);
// ComponentDescriptor: "rerun.archetypes.Mesh3D::rerun.components.TriangleIndices#triangle_indices"
triangle_indices: [rerun.datatypes.UVec3D] ("attr.rerun.component": "rerun.components.TriangleIndices", /* … */);
// ComponentDescriptor: "rerun.archetypes.Mesh3D::rerun.components.Vector3D#vertex_normals"
vertex_normals: [rerun.datatypes.Vec3D] ("attr.rerun.component": "rerun.components.Vector3D", /* … */);
// ComponentDescriptor: "rerun.archetypes.Mesh3D::rerun.components.Color#vertex_colors"
vertex_colors: [rerun.datatypes.UInt32] ("attr.rerun.component": "rerun.components.Color", /* … */);
// ComponentDescriptor: "rerun.archetypes.Mesh3D::rerun.components.TexCoords2D#vertex_texcoords"
vertex_texcoords: [rerun.datatypes.Vec2D] ("attr.rerun.component": "rerun.components.TexCoords2D", /* … */);
// ComponentDescriptor: "rerun.archetypes.Mesh3D::rerun.components.Color#albedo_factor"
albedo_factor: [rerun.datatypes.UInt32] ("attr.rerun.component": "rerun.components.AlbedoFactor", /* … */);
// ComponentDescriptor: "rerun.archetypes.Mesh3D::rerun.components.ClassId#class_ids"
class_ids: [rerun.datatypes.UInt16] ("attr.rerun.component": "rerun.components.ClassId", /* … */);
} Logging changesLogging archetypes will yield fully-specified rr.log(
"points_and_mesh",
rr.Points3D(
# ComponentDescriptor: "rerun.archetypes.Points3D::rerun.components.Position3D#positions"
[[0, 0, 0], [1, 1, 1]],
# ComponentDescriptor: "rerun.archetypes.Points3D::rerun.components.Radius#radii"
radii=10,
# ComponentDescriptor: "rerun.archetypes.Points3D::rerun.components.Color#colors"
colors=[1, 1, 1],
# ComponentDescriptor: "rerun.archetypes.Points3D::rerun.components.Label#labels"
labels="some_label",
# ComponentDescriptor: "rerun.archetypes.Points3D::rerun.components.ClassId#class_ids"
class_ids=42,
# ComponentDescriptor: "rerun.archetypes.Points3D::rerun.components.KeypointId#keypoint_ids"
keypoint_ids=666,
),
)
rr.log(
"points_and_mesh",
rr.Mesh3D(
# ComponentDescriptor: "rerun.archetypes.Mesh3D::rerun.components.Position3D#vertex_positions"
vertex_positions=[[0.0, 1.0, 0.0], [1.0, 0.0, 0.0], [0.0, 0.0, 0.0]],
# ComponentDescriptor: "rerun.archetypes.Mesh3D::rerun.components.Vector3D#vertex_normals"
vertex_normals=[0.0, 0.0, 1.0],
# ComponentDescriptor: "rerun.archetypes.Mesh3D::rerun.components.Color#vertex_colors"
vertex_colors=[[0, 0, 255], [0, 255, 0], [255, 0, 0]],
# ComponentDescriptor: "rerun.archetypes.Mesh3D::rerun.components.TriangleIndices#triangle_indices"
triangle_indices=[2, 1, 0],
# ComponentDescriptor: "rerun.archetypes.Mesh3D::rerun.components.Color#albedo_factor"
albedo_factor=[32, 32, 32],
# ComponentDescriptor: "rerun.archetypes.Mesh3D::rerun.components.ClassId#class_ids"
class_ids=420,
),
) Logging components directly omits the archetype part of the descriptor: rr.log(
"points_and_mesh",
rr.components.Translation3D(
# ComponentDescriptor: "rerun.components.Translation3D#translation"
translation=[1, 2, 3],
),
) A third-party ad-hoc component might not even have a tag at all..: rr.log(
"points_and_mesh",
# ComponentDescriptor: "third_party.size"
rr.AnyValues({"third_party.size": 42}),
) ..although we could expose ways of setting one: rr.log(
"points_and_mesh",
# ComponentDescriptor: "third_party.size#some_tag"
rr.AnyValues({"third_party.size": 42}, "tag": "some_tag"),
) Store changesColumns are now uniquely identified by a This means we never overwrite data from an archetype with data from another one. We store everything, we can do whatever we want. The batcher and other compaction systems will never merge two columns with different descriptors. Indexing-wise, the store will add an extra layer of indices for tags ( Query changesQueries don't look for a E.g. to look for all columns with position semantics:
Here's a few example queries using the LatestAt(TimeInt::MAX) @ "points_and_mesh" for (*, *, *):
- ComponentDescriptor { "rerun.archetypes.Points3D", "rerun.components.Position3D", "positions" }
- ComponentDescriptor { "rerun.archetypes.Points3D", "rerun.components.Radius", "radii" }
- ComponentDescriptor { "rerun.archetypes.Points3D", "rerun.components.Color", "colors" }
- ComponentDescriptor { "rerun.archetypes.Points3D", "rerun.components.Label", "labels" }
- ComponentDescriptor { "rerun.archetypes.Points3D", "rerun.components.ClassId", "class_ids" }
- ComponentDescriptor { "rerun.archetypes.Points3D", "rerun.components.KeypointId", "keypoint_ids" }
- ComponentDescriptor { "rerun.archetypes.Mesh3D", "rerun.components.Position3D", "vertex_positions" }
- ComponentDescriptor { "rerun.archetypes.Mesh3D", "rerun.components.Vector3D", "vertex_normals" }
- ComponentDescriptor { "rerun.archetypes.Mesh3D", "rerun.components.Color", "vertex_colors" }
- ComponentDescriptor { "rerun.archetypes.Mesh3D", "rerun.components.TriangleIndices", "triangle_indices" }
- ComponentDescriptor { "rerun.archetypes.Mesh3D", "rerun.components.Color", "albedo_factor" }
- ComponentDescriptor { "rerun.archetypes.Mesh3D", "rerun.components.ClassId", "class_ids" }
LatestAt(TimeInt::MAX) @ "points_and_mesh" for (*, "rerun.components.Position3D", *):
- ComponentDescriptor { "rerun.archetypes.Points3D", "rerun.components.Position3D", "positions" }
- ComponentDescriptor { "rerun.archetypes.Mesh3D", "rerun.components.Position3D", "vertex_positions" }
LatestAt(TimeInt::MAX) @ "points_and_mesh" for (*, "rerun.components.Color", *):
- ComponentDescriptor { "rerun.archetypes.Points3D", "rerun.components.Color", "colors" }
- ComponentDescriptor { "rerun.archetypes.Mesh3D", "rerun.components.Color", "vertex_colors" }
- ComponentDescriptor { "rerun.archetypes.Mesh3D", "rerun.components.Color", "albedo_factor" }
LatestAt(TimeInt::MAX) @ "points_and_mesh" for (*, "rerun.components.Color", "albedo_factor"):
- ComponentDescriptor { "rerun.archetypes.Mesh3D", "rerun.components.Color", "albedo_factor" } It's basically pattern matching. This should be fairly trivial to implement on the query side. Viewer changesToday, each visualizer indicates the In that world, each visualizer would not only show the Examples
|
I disagree with with this. At most it's partially orthogonal. My big take away from the last year of working with our components and datatypes is that the line between semantics and datatypes is far from clear. I think this gives us an opportunity to improve that dramatically and we might as well plan to take advantage of it. The previous strict hierarchy of component > datatype forced us into arbitrary distinctions. For example, consider color. It was fairly clear that Color should be a component. But if color was the component, then the encoding of the color must (by necessity) be a datatype. This lead to datatypes like Tags appears to give us flexible yet structured way to introduce more dimensionality to the way we talk about data. In particular, things like Color-spaces, Units, Encodings, etc. all probably fit much better into semantic tags than they do in the datatype system. For me this path leads to, for example talking about rotations as:
We could theoretically try to have BOTH component-tags and datatype-tags, but I suspect the nitpicking over which kind of tag goes where would be endless while providing little-to-no practical value. The logical end-point is that datatypes are JUST arrow schema aliases they ONLY talk about the shape of the data (e.g. Vec4f) and never talk about what that data represents (e.g. Quat4f). Everything else becomes a component-descriptor tag. This generally means that "datatype conversions" are totally generic and follow very standard software equivalents:
A big chunk of the complexity previously attributed to datatype conversions now becomes "Component descriptor conversions." Same problem, new name. |
Agree 100% |
I've had a discussion with @abey79 today that touched upon this PR. For the graph primitives PR #7500, it would be great if we could log multiple components of the same data type to the same entity, possibly by discerning them with tags so that edges and nodes can live in the same entity while still allowing fields like I was wondering: How will that work with selections? If I understand correctly, we currently refer to "objects" in an entity by their |
Consider the
Color
component as applied to aMesh3D
archetype. Does theColor
specify the color of faces, edges, or vertices?To support this, we could introduced tagged components, allowing each entity to have multiple instances of the same component, if they have different tags.
Each
(Tag, ComponentName)
tuple would be its own column of data.Workarounds
For now we can introduce multiple components instead:
SolidColor
,EdgeColor
,VertexColor
, etcReferences
https://www.flecs.dev/flecs/md_docs_2Relationships.html
The text was updated successfully, but these errors were encountered: