Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate legacy splats #1893

Closed
Tracked by #1619
teh-cmc opened this issue Apr 18, 2023 · 5 comments · Fixed by #6104
Closed
Tracked by #1619

Eliminate legacy splats #1893

teh-cmc opened this issue Apr 18, 2023 · 5 comments · Fixed by #6104
Assignees
Labels
⛃ re_datastore affects the datastore itself 🔍 re_query affects re_query itself

Comments

@teh-cmc
Copy link
Member

teh-cmc commented Apr 18, 2023

Splats used to require a dedicated row with a special InstanceKey and associated machinery.

With the new store, anything that doesn't match the explicit number of instances given by the row should be treated as a splat:

  • within a row with num_instances == 0, a component of unit length is a splat
  • within a row with num_instances == 1, a component of unit length is not a splat
  • within a row with num_instances == N (N > 1), a component of unit length is a splat

Note: the renderer uses the old InstanceKey::SPLAT constant to drive some related (but different) logic in the selection/hovering subsystems.

Fixing this will fix #1014 in the process.

@teh-cmc teh-cmc changed the title Fix splats all around (rs sdk, py sdk, re_query...) Eliminate legacy splats Apr 18, 2023
@teh-cmc teh-cmc added ⛃ re_datastore affects the datastore itself 🔍 re_query affects re_query itself labels Apr 18, 2023
@Wumpf
Copy link
Member

Wumpf commented Apr 18, 2023

Expanding on the InstanceKey::SPLAT part:
Right now this is the instance id used to indicate that there is no particular instance id and the entire object is selected. Ideally, we would also use zero there since this is the default instance key for all picking layer renderings whenever no instance key was specified directly.
Afaik the only reason we can't do that yet is that we resolve "context dependent selection" when setting up the picking layer not before: We should always render the same picking layer out no matter what is selected at the moment, but right now we may or may not write out SPLAT for instance keys depending on whether an object or an instance is selectable. This ofc means that the picking layer needs to transfer this semantic so it can be picked up again when resolving a retrieved picking layer rectangle - 0 would mean instance 0, SPLAT means all instances. Instead, we should put this logic in the code that reads the picking layer back.
Fixing it this way should give a small performance & simplicity boost to SceneParts!

@Wumpf
Copy link
Member

Wumpf commented Apr 18, 2023

Created separate ticket for that ⬆️

@jleibs
Copy link
Member

jleibs commented Apr 18, 2023

within a row with num_instances == 1, a component of unit length is not a splat

I've mentioned it elsewhere, but I want to note that the legacy explicit-splat design did allow us to differentiate between these two calls:

rr.log_points("points", positions = [p1], colors = csplat)
rr.log_points("points", positions = [p1], colors = [c1])

which have different behavior if later followed by the call:

# Just log new positions; rely on previous splat for colors.
rr.log_points("points", positions = [p1, p2])

We can work around this by, again, logging a separate row, but then we're back to having multiple rows again. Alternatively we can only log the extra row if the primary component is length-1 but that special-casing feels worse to me because it's going to create a host of new special edge-case handling for the 1-element path, which starts to feel a lot like going back to having mono-records.

@teh-cmc
Copy link
Member Author

teh-cmc commented Oct 10, 2023

That's where it becomes funky...

If you do this:

rr.log("random", rr.Points3D(positions, colors=colors, radii=radii))
rr.log_components("random", [rr.components.ColorBatch([255, 0, 0])])

Then you're going to end up with the original colors being discarded, a single red point and the rest of the points using the default color for this entity path (because that ColorBatch is not a splat).

Now, there is a trick at your disposal... you could do this:

rr.log("random", rr.Points3D(positions, colors=colors, radii=radii))
rr.log_components("random", [rr.components.ColorBatch([255, 0, 0])], num_instances=2)

And now you'll end up with only red points, because you explicitly said that the data was 2 instances wide, and so the log function considers the ColorBatch to be a splat...

Of course we could change things so that logging 1 thing is always considered a splat, but then you have the opposite problem, which might or might not be better depending on the situation 🤷.

And this is why I don't like that splats are a logging-time rather than a query-time concern: the view should get to decide what to do with the data that it has as its disposal, and that behavior should be configurable through blueprints and through the UI.
This instance key business is pretty similar to e.g. configurable texture clamping modes in gfx APIs after all.

@teh-cmc
Copy link
Member Author

teh-cmc commented Oct 12, 2023

Here's another interesting case:

import rerun as rr
import numpy as np

rr.init("rerun_example_arrows_splat", spawn=True)

vertex_positions = np.array([[-1.0, 0.0, 0.0], [1.0, 0.0, 0.0], [0.0, 1.0, 0.0]], dtype=np.float32)

rr.log(
    "triangle",
    rr.Mesh3D(
        vertex_positions=vertex_positions,
        vertex_normals=[0.0, 0.0, 1.0],
        vertex_colors=[[255, 0, 0], [0, 255, 0], [0, 0, 255]],
    ),
    [rr.Arrows3D.indicator()],
)

image
image

Look at how the Arrow is rendered.

The arrow was logged as a splat, so it has a single entry whose instance-key is u64::MAX.

Since we also logged an arrow indicator, the viewer will query an Arrow3D archetype using that same mesh data, except this time the arrow is our primary/required component, not an optional one.
The archetype query will try to join our one single arrow with the available positions and colors, but since position and color are not splatted, there isn't a single instance key in common and so the join doesn't yield anything.
The result is that we draw the arrow on its own, with the default origin and default color.

The "fix" is to log the normal multiple times so that it doesn't get marked as a splat at log-time:

-        vertex_normals=[0.0, 0.0, 1.0],
+        vertex_normals=[[0.0, 0.0, 1.0], [0.0, 0.0, 1.0], [0.0, 0.0, 1.0]],

image

This would not be an issue if splatting was a query-time concern.


Another somewhat related issue here is how our query model works in and of itself: since we join optional components onto primary components, even if the above issue was fixed, we would still only show a single arrow (properly joined with the first position and first color, but still, something to think about...).

@teh-cmc teh-cmc self-assigned this Mar 14, 2024
teh-cmc added a commit that referenced this issue Apr 8, 2024
This implements the new uncached latest-at APIs, and introduces some
basic types for promises.

Tests and benchmarks have been backported from `re_query`.

We already get a pretty decent improvement because the join process
(clamped-zip) is cheaper and we don't need to query for instance keys at
all:
```
group                         re_query                                re_query2
-----                         --------                                ---------
arrow_batch_points2/query     1.39      2.5±0.03µs 379.7 MElem/sec    1.00  1810.6±23.62ns 526.7  MElem/sec
arrow_mono_points2/query      1.44   1082.7±8.66µs 902.0 KElem/sec    1.00    753.6±9.28µs 1295.9 KElem/sec
```

- Fixes #3379
- Part of #1893  

Here's an example/guide of using the new API:
```rust
// First, get the raw results for this query.
//
// Raw here means that these results are neither deserialized, nor resolved/converted.
// I.e. this corresponds to the raw `DataCell`s, straight from our datastore.
let results: LatestAtResults = re_query2::latest_at(
    &store,
    &query,
    &entity_path.into(),
    MyPoints::all_components().iter().cloned(), // no generics!
);

// Then, grab the raw results for each individual components.
//
// This is still raw data, but now a choice has been made regarding the nullability of the
// _component batch_ itself (that says nothing about its _instances_!).
//
// * `get_required` returns an error if the component batch is missing
// * `get_optional` returns an empty set of results if the component if missing
// * `get` returns an option
let points: &LatestAtComponentResults = results.get_required::<MyPoint>()?;
let colors: &LatestAtComponentResults = results.get_optional::<MyColor>();
let labels: &LatestAtComponentResults = results.get_optional::<MyLabel>();

// Then comes the time to resolve/convert and deserialize the data.
// These steps have to be done together for efficiency reasons.
//
// Both the resolution and deserialization steps might fail, which is why this returns a `Result<Result<T>>`.
// Use `PromiseResult::flatten` to simplify it down to a single result.
//
// A choice now has to be made regarding the nullability of the _component batch's instances_.
// Our IDL doesn't support nullable instances at the moment -- so for the foreseeable future you probably
// shouldn't be using anything but `iter_dense`.

let points = match points.iter_dense::<MyPoint>(&mut resolver).flatten() {
    PromiseResult::Pending => {
        // Handle the fact that the data isn't ready appropriately.
        return Ok(());
    }
    PromiseResult::Ready(data) => data,
    PromiseResult::Error(err) => return Err(err.into()),
};

let colors = match colors.iter_dense::<MyColor>(&mut resolver).flatten() {
    PromiseResult::Pending => {
        // Handle the fact that the data isn't ready appropriately.
        return Ok(());
    }
    PromiseResult::Ready(data) => data,
    PromiseResult::Error(err) => return Err(err.into()),
};

let labels = match labels.iter_sparse::<MyLabel>(&mut resolver).flatten() {
    PromiseResult::Pending => {
        // Handle the fact that the data isn't ready appropriately.
        return Ok(());
    }
    PromiseResult::Ready(data) => data,
    PromiseResult::Error(err) => return Err(err.into()),
};

// With the data now fully resolved/converted and deserialized, the joining logic can be
// applied.
//
// In most cases this will be either a clamped zip, or no joining at all.

let color_default_fn = || MyColor::from(0xFF00FFFF);
let label_default_fn = || None;

let results =
    clamped_zip_1x2(points, colors, color_default_fn, labels, label_default_fn).collect_vec();
```


---

Part of a PR series to completely revamp the data APIs in preparation
for the removal of instance keys and the introduction of promises:
- #5573
- #5574
- #5581
- #5605
- #5606
- #5633
- #5673
- #5679
- #5687
- #5755
- TODO
- TODO

Builds on top of the static data PR series:
- #5534
teh-cmc added a commit that referenced this issue Apr 8, 2024
Static-aware, key-less, component-based, cached latest-at APIs.

The overall structure of this new cache is very similar to what we had
before. Effectively it is just an extremely simplified version of
`re_query_cache`.

This introduces a new temporary `re_query_cache2` crate, which won't
ever be published.
It will replace the existing `re_query_cache` crate once all the
necessary features have been backported.

- Fixes #3232
- Fixes #4733
- Fixes #4734
- Part of #3379
- Part of #1893  

Example:
```rust
let caches = re_query_cache2::Caches::new(&store);

// First, get the results for this query.
//
// They might or might not already be cached. We won't know for sure until we try to access
// each individual component's data below.
let results: CachedLatestAtResults = caches.latest_at(
    &store,
    &query,
    &entity_path.into(),
    MyPoints::all_components().iter().cloned(), // no generics!
);

// Then, grab the results for each individual components.
// * `get_required` returns an error if the component batch is missing
// * `get_optional` returns an empty set of results if the component if missing
// * `get` returns an option
//
// At this point we still don't know whether they are cached or not. That's the next step.
let points: &CachedLatestAtComponentResults = results.get_required::<MyPoint>()?;
let colors: &CachedLatestAtComponentResults = results.get_optional::<MyColor>();
let labels: &CachedLatestAtComponentResults = results.get_optional::<MyLabel>();

// Then comes the time to resolve/convert and deserialize the data.
// These steps have to be done together for efficiency reasons.
//
// Both the resolution and deserialization steps might fail, which is why this returns a `Result<Result<T>>`.
// Use `PromiseResult::flatten` to simplify it down to a single result.
//
// A choice now has to be made regarding the nullability of the _component batch's instances_.
// Our IDL doesn't support nullable instances at the moment -- so for the foreseeable future you probably
// shouldn't be using anything but `iter_dense`.
//
// This is the step at which caching comes into play.
//
// If the data has already been accessed with the same nullability characteristics in the
// past, then this will just grab the pre-deserialized, pre-resolved/pre-converted result from
// the cache.
//
// Otherwise, this will trigger a deserialization and cache the result for next time.

let points = match points.iter_dense::<MyPoint>(&mut resolver).flatten() {
    PromiseResult::Pending => {
        // Handle the fact that the data isn't ready appropriately.
        return Ok(());
    }
    PromiseResult::Ready(data) => data,
    PromiseResult::Error(err) => return Err(err.into()),
};

let colors = match colors.iter_dense::<MyColor>(&mut resolver).flatten() {
    PromiseResult::Pending => {
        // Handle the fact that the data isn't ready appropriately.
        return Ok(());
    }
    PromiseResult::Ready(data) => data,
    PromiseResult::Error(err) => return Err(err.into()),
};

let labels = match labels.iter_sparse::<MyLabel>(&mut resolver).flatten() {
    PromiseResult::Pending => {
        // Handle the fact that the data isn't ready appropriately.
        return Ok(());
    }
    PromiseResult::Ready(data) => data,
    PromiseResult::Error(err) => return Err(err.into()),
};

// With the data now fully resolved/converted and deserialized, the joining logic can be
// applied.
//
// In most cases this will be either a clamped zip, or no joining at all.

let color_default_fn = || {
    static DEFAULT: MyColor = MyColor(0xFF00FFFF);
    &DEFAULT
};
let label_default_fn = || None;

let results =
    clamped_zip_1x2(points, colors, color_default_fn, labels, label_default_fn).collect_vec();
```


---

Part of a PR series to completely revamp the data APIs in preparation
for the removal of instance keys and the introduction of promises:
- #5573
- #5574
- #5581
- #5605
- #5606
- #5633
- #5673
- #5679
- #5687
- #5755
- TODO
- TODO

Builds on top of the static data PR series:
- #5534
teh-cmc added a commit that referenced this issue Apr 8, 2024
This implements the new uncached range APIs.

Latest-at & range queries are now much more similar than before and
share a lot of nice traits.

Tests have been backported from `re_query`.

Here's an example/guide of using the new API:
```rust
// First, get the raw results for this query.
//
// Raw here means that these results are neither deserialized, nor resolved/converted.
// I.e. this corresponds to the raw `DataCell`s, straight from our datastore.
let results: RangeResults = re_query2::range(
    &store,
    &query,
    &entity_path.into(),
    MyPoints::all_components().iter().cloned(), // no generics!
);

// Then, grab the raw results for each individual components.
//
// This is still raw data, but now a choice has been made regarding the nullability of the
// _component batch_ itself (that says nothing about its _instances_!).
//
// * `get_required` returns an error if the component batch is missing
// * `get_optional` returns an empty set of results if the component if missing
// * `get` returns an option
let all_points: &RangeComponentResults = results.get_required(MyPoint::name())?;
let all_colors: &RangeComponentResults = results.get_optional(MyColor::name());
let all_labels: &RangeComponentResults = results.get_optional(MyLabel::name());

let all_indexed_points = izip!(
    all_points.iter_indices(),
    all_points.iter_dense::<MyPoint>(&resolver)
);
let all_indexed_colors = izip!(
    all_colors.iter_indices(),
    all_colors.iter_sparse::<MyColor>(&resolver)
);
let all_indexed_labels = izip!(
    all_labels.iter_indices(),
    all_labels.iter_sparse::<MyLabel>(&resolver)
);

let all_frames = range_zip_1x2(all_indexed_points, all_indexed_colors, all_indexed_labels);

// Then comes the time to resolve/convert and deserialize the data, _for each timestamp_.
// These steps have to be done together for efficiency reasons.
//
// Both the resolution and deserialization steps might fail, which is why this returns a `Result<Result<T>>`.
// Use `PromiseResult::flatten` to simplify it down to a single result.
//
// A choice now has to be made regarding the nullability of the _component batch's instances_.
// Our IDL doesn't support nullable instances at the moment -- so for the foreseeable future you probably
// shouldn't be using anything but `iter_dense`.
eprintln!("results:");
for ((data_time, row_id), points, colors, labels) in all_frames {
    let points = match points.flatten() {
        PromiseResult::Pending => {
            // Handle the fact that the data isn't ready appropriately.
            continue;
        }
        PromiseResult::Ready(data) => data,
        PromiseResult::Error(err) => return Err(err.into()),
    };

    let colors = if let Some(colors) = colors {
        match colors.flatten() {
            PromiseResult::Pending => {
                // Handle the fact that the data isn't ready appropriately.
                continue;
            }
            PromiseResult::Ready(data) => data,
            PromiseResult::Error(err) => return Err(err.into()),
        }
    } else {
        vec![]
    };
    let color_default_fn = || Some(MyColor::from(0xFF00FFFF));

    let labels = if let Some(labels) = labels {
        match labels.flatten() {
            PromiseResult::Pending => {
                // Handle the fact that the data isn't ready appropriately.
                continue;
            }
            PromiseResult::Ready(data) => data,
            PromiseResult::Error(err) => return Err(err.into()),
        }
    } else {
        vec![]
    };
    let label_default_fn = || None;

    // With the data now fully resolved/converted and deserialized, the joining logic can be
    // applied.
    //
    // In most cases this will be either a clamped zip, or no joining at all.

    let results = clamped_zip_1x2(points, colors, color_default_fn, labels, label_default_fn)
        .collect_vec();
    eprintln!("{data_time:?} @ {row_id}:\n    {results:?}");
}
```

- Fixes #3379
- Part of #1893  

---

Part of a PR series to completely revamp the data APIs in preparation
for the removal of instance keys and the introduction of promises:
- #5573
- #5574
- #5581
- #5605
- #5606
- #5633
- #5673
- #5679
- #5687
- #5755
- TODO
- TODO

Builds on top of the static data PR series:
- #5534
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⛃ re_datastore affects the datastore itself 🔍 re_query affects re_query itself
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants