Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDK DataLoaders 5: customizable (external) loaders for Rust #5351

Merged
merged 14 commits into from
Feb 29, 2024

Conversation

teh-cmc
Copy link
Member

@teh-cmc teh-cmc commented Feb 29, 2024

Adds new RecommendedLoadSettings that gets passed to all DataLoaders -- builtin and external -- in order to customize their behaviors.

This includes:

cargo r -p rerun-loader-rust-file -- run_wasm/src/main.rs  --recording-id this-one --entity-path-prefix a/b/c  --time sim_time=1000 --time wall_time=1709204046 --sequence sim_frame=42 | rerun -

image

Checks:

  • external loader ran manually (loader | rerun)
  • external loader via rerun (rerun xxx.rs)
  • log_file with external loader (log_file xxx.rs)

Part of series of PR to expose configurable DataLoaders to our SDKs:

Checklist

  • I have read and agree to Contributor Guide and the Code of Conduct
  • I've included a screenshot or gif (if applicable)
  • I have tested the web demo (if applicable):
  • The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG
  • If applicable, add a new check to the release checklist!

@teh-cmc teh-cmc added 🦀 Rust API Rust logging API do-not-merge Do not merge this PR include in changelog 🪵 Log-API Affects the user-facing API for all languages labels Feb 29, 2024
@teh-cmc teh-cmc changed the base branch from main to cmc/sdk_dataloader_3_weak_recordings February 29, 2024 11:53
@teh-cmc teh-cmc changed the title Cmc/sdk dataloader 5 rust settings SDK DataLoaders 5: customizable (external) loaders for Rust Feb 29, 2024
@teh-cmc teh-cmc marked this pull request as draft February 29, 2024 11:59
@teh-cmc teh-cmc force-pushed the cmc/sdk_dataloader_3_weak_recordings branch from 1489296 to 33c9161 Compare February 29, 2024 12:04
@teh-cmc teh-cmc force-pushed the cmc/sdk_dataloader_5_rust_settings branch from 6aac240 to d61342b Compare February 29, 2024 12:05
@teh-cmc teh-cmc marked this pull request as ready for review February 29, 2024 12:28
@Wumpf Wumpf self-requested a review February 29, 2024 14:19
teh-cmc added a commit that referenced this pull request Feb 29, 2024
Exposes raw `DataLoader`s to the Python SDK through 2 new methods:
`RecordingStream.log_file_from_path` &
`RecordingStream.log_file_from_contents`.

Everything streams asynchronously, as expected.

There's no way to configure the behavior of the `DataLoader` at all,
yet. That comes next.

```python
rr.log_file_from_path(filepath)
```

![image](https://github.com/rerun-io/rerun/assets/2910679/0fe2d39c-f069-44a6-b836-e31001b3adaa)


---

Part of series of PR to expose configurable `DataLoader`s to our SDKs:
- #5327 
- #5328 
- #5330
- #5337
- #5351
- #5355
Copy link
Member

@Wumpf Wumpf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the "would you kindly" nature of the arg struct is annoying me a bit, but what can we do other than "handling whatever real users throw at it" :) 🤷

Comment on lines +102 to +126
fn timepoint_from_args(args: &Args) -> anyhow::Result<rerun::TimePoint> {
let mut timepoint = rerun::TimePoint::default();
{
for time_str in &args.time {
let Some((timeline_name, time)) = time_str.split_once('=') else {
continue;
};
timepoint.insert(
rerun::Timeline::new_temporal(timeline_name),
time.parse::<i64>()?.into(),
);
}

for seq_str in &args.sequence {
let Some((seqline_name, seq)) = seq_str.split_once('=') else {
continue;
};
timepoint.insert(
rerun::Timeline::new_sequence(seqline_name),
seq.parse::<i64>()?.into(),
);
}
}
Ok(timepoint)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like somthing we may want to provide as a public utility? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep the whole external DataLoader API surface as small as possible for now -- we know we're going to break things a million times; I'd rather break less...

Comment on lines +36 to +50
/// optional prefix for all entity paths
#[argh(option)]
entity_path_prefix: Option<String>,

/// optional mark data to be logged as timeless
#[argh(switch)]
timeless: bool,

/// optional timestamps to log at (e.g. `--time sim_time=1709203426`) (repeatable)
#[argh(option)]
time: Vec<String>,

/// optional sequences to log at (e.g. `--sequence sim_frame=42`) (repeatable)
#[argh(option)]
sequence: Vec<String>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if there's a way for data loader authors to "inherit" these arguments. Can clap work with nested arg strucs..?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can do inheritance with clap (we already do so for connect, spawn, etc).

But again I want to keep this whole thing as small and quick-n-dirty as possible, at least for now -- the whole point is to show that's it's just a dumb CLI app and there's no magic involved and one could do the same using any language / CLI library.


// ---

/// Recommended settings for the [`DataLoader`].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the recommended method on this, but I'm confused/not convinced that this thing as a whole is "recommended settings". Why not just DataLoaderSettings and then each field elaborates what effect it expects (already the case)? I get that we're defensive here and want to make clear that a data loader that ignores the settings is still a valid data loader and these are all "would you kindly please"-arguments, but I find baking that into the name goes a bit too far and is more confusing than helping.

... anyways, not a hill I gonna die on so if you prefer this as is keep it :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's something more we could write about the (lack of) consequences if a data loader ignores the args

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure we can rename to DataLoaderSettings.

I wonder if there's something more we could write about the (lack of) consequences if a data loader ignores the args

We've already done so in a bunch of places, including the guide.

rec = rec.recording_id(recording_id);
};

// The most important part of this: log to standard output so the Rerun Viewer can ingest it!
rec.stdout()?
};

rec.log_timeless(
rerun::EntityPath::from_file_path(&args.filepath),
// TODO(#3841): really need those send APIs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is example code to be consumed by users it would be nice(r) to write about what they can expect to improve here in the future inead :)

teh-cmc added a commit that referenced this pull request Feb 29, 2024
Exposes raw `DataLoader`s to the C++ SDK through 2 new methods:
`RecordingStream::log_file_from_path` &
`RecordingStream::log_file_from_contents`.

Everything streams asynchronously, as expected.

There's no way to configure the behavior of the `DataLoader` at all,
yet. That comes next.

:warning: I'm carrying bytes around in a `string_view` (see `rr_bytes`)
and related, and the internet has been unhelpful to tell e whether I
should or shouldn't.

```cpp
rec.log_file_from_path(filepath);
```

![image](https://github.com/rerun-io/rerun/assets/2910679/1933f08b-1b2b-483b-9576-ce2e0421cb62)



---

Part of series of PR to expose configurable `DataLoader`s to our SDKs:
- #5327 
- #5328 
- #5330
- #5337
- #5351
- #5355

---------

Co-authored-by: Andreas Reich <andreas@rerun.io>
Base automatically changed from cmc/sdk_dataloader_3_weak_recordings to main February 29, 2024 15:36
teh-cmc added a commit that referenced this pull request Feb 29, 2024
Introduces `RecordingStream::clone_weak`, so `DataLoader` threads
started from the SDK don't prevent the recording from flushing once
`main` goes out of scope, all the while making sure that `DataLoader`s
run to completion.


![image](https://github.com/rerun-io/rerun/assets/2910679/94bf7b16-cf9b-40ce-a190-d255328af3f8)


- Mitigates #5335 

---

Part of series of PR to expose configurable `DataLoader`s to our SDKs:
- #5327 
- #5328 
- #5330
- #5337
- #5351
- #5355
@teh-cmc teh-cmc force-pushed the cmc/sdk_dataloader_5_rust_settings branch from e50f9d3 to c220ab0 Compare February 29, 2024 15:38
@teh-cmc teh-cmc removed the do-not-merge Do not merge this PR label Feb 29, 2024
@teh-cmc teh-cmc merged commit 9e3ffc2 into main Feb 29, 2024
28 of 31 checks passed
@teh-cmc teh-cmc deleted the cmc/sdk_dataloader_5_rust_settings branch February 29, 2024 16:33
teh-cmc added a commit that referenced this pull request Mar 1, 2024
Introduces the new `DataLoaderSettings` business to Python and update
examples accordingly (`external_data_loader` & `log_file`).

```bash
python examples/python/external_data_loader/main.py --recording-id this-one --entity-path-prefix a/b/c  --time sim_time=1000 --time wall_time=1709204046 --sequence sim_frame=42 examples/python/dna/main.py | rerun -
```

![image](https://github.com/rerun-io/rerun/assets/2910679/bfda567d-3d16-42cd-be8e-8b1a0767a784)



Checks:
- [x] external loader ran manually (`python loader | rerun`)
- [x] external loader via rerun (`rerun xxx.py`)
- [x] log_file with external loader (`log_file xxx.py`)

---

Part of series of PR to expose configurable `DataLoader`s to our SDKs:
- #5327 
- #5328 
- #5330
- #5337
- #5351
- #5355
- #5361
teh-cmc added a commit that referenced this pull request Mar 4, 2024
This makes the `log_file` APIs behave more like the standard `log` APIs;
i.e. they inherit the state of their associated `RecordingStream`
(app_id, rec_id, timepoint, etc...).

Also inherit the application ID while we're at it.

Makes the API much nicer to use _and_ much more consistent with the
rest.

Checks:
- [x] external loader ran manually (`python loader | rerun`)
- [x] external loader via rerun (`rerun xxx.py`)
- [x] log_file with external loader (`log_file xxx.py`)
- [x] external loader ran manually (`loader | rerun`)
- [x] external loader via rerun (`rerun xxx.rs`)
- [x] log_file with external loader (`log_file xxx.rs`)

---

Part of series of PR to expose configurable `DataLoader`s to our SDKs:
- #5327 
- #5328 
- #5330
- #5337
- #5351
- #5355
- #5379
- #5361
- #5388

---------

Co-authored-by: Andreas Reich <andreas@rerun.io>
teh-cmc added a commit that referenced this pull request Mar 4, 2024
Introduces the new `DataLoaderSettings` business to C++ and update
examples accordingly (`external_data_loader` & `log_file`).

```bash
./build/debug/examples/cpp/log_file/example_log_file --recording-id this-one --entity-path-prefix a/b/c  --time sim_time=1000 --time wall_time=1709204046 --sequence sim_frame=42 rerun_cpp/tests/main.cpp | rerun -
```

![image](https://github.com/rerun-io/rerun/assets/2910679/b979a24c-29b6-473b-91b1-de3832bea436)


Checks:
- [x] external loader ran manually (`loader.exe | rerun`)
- [x] external loader via rerun (`rerun xxx.cpp`)
- [x] log_file with external loader (`log_file xxx.cpp`)

---

Part of series of PR to expose configurable `DataLoader`s to our SDKs:
- #5327 
- #5328 
- #5330
- #5337
- #5351
- #5355
- #5379
- #5361
- #5388
teh-cmc added a commit that referenced this pull request Mar 6, 2024
I guess that's good enough 🤷. I don't know, my brain has been
completely friend by C++ non-sense all day.

This includes a fix to make sure that a viewer that was spawned from the
python SDK is still allowed to spawn dataloaders implemented in python
(`RERUN_APP_ONLY` shenaniganeries).

- Fixes #4526 

---

Part of series of PR to expose configurable `DataLoader`s to our SDKs:
- #5327 
- #5328 
- #5330
- #5337
- #5351
- #5355
- #5379
- #5361
- #5388
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
include in changelog 🪵 Log-API Affects the user-facing API for all languages 🦀 Rust API Rust logging API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants