Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a resampling actor #32

Merged

Conversation

jkozlowicz
Copy link
Contributor

@jkozlowicz jkozlowicz commented Oct 17, 2022

Summary of this PR (25.10.2022):

Last updates:

  • removed the DataProvider
  • removed the core package entirely
  • removed the Sample class and use the one in data pipeline
  • removed subscriptions.py entirely
  • renamed TimeSeriesResamplingActor to ComponentMetricsResamplingActor
  • updated ComponentMetricsResamplingActor to:
    • take a channel sender in the constructor to send subscriptions to the data sourcing actor
    • take a channel receiver in the constructor to receive subscriptions
    • take channel registry as a parameter for receiving component data from the data source and sending out resampled metrics
    • change the request's namespace before forwarding it to the channel registry
  • updated the sdk resampling example to use the data sourcing actor ComponentMetricRequest class as the subscription request message

Updates in the N -1 round:

  • added use case documentation for the TimeSeriesResamplingActor
  • added tests
  • added a simple example for using the TimeSeriesResamplingActor with a dummy data source (because the real one is still under construction)
  • discard samples holding a NaN value

Updates in the N - 2 round:

  • removal of the RingBuffer and using a simple deque in the TimeSeriesResampler
  • removing outdated samples in the TimeSeriesResampler's buffer
  • renaming IndividualTimeSeriesResampler -> TimeSeriesResampler and TimeSeriesResampler to GroupTimeSeriesResampler
  • addressing various comments from the reviews

@jkozlowicz jkozlowicz requested a review from a team as a code owner October 17, 2022 09:31
@github-actions github-actions bot added part:data-pipeline Affects the data pipeline part:tests Affects the unit, integration and performance (benchmarks) tests labels Oct 17, 2022
@jakub-toptal jakub-toptal linked an issue Oct 17, 2022 that may be closed by this pull request
@jakub-toptal

This comment was marked as outdated.

@leandro-lucarella-frequenz

This comment was marked as resolved.

@leandro-lucarella-frequenz

This comment was marked as resolved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the ring buffer implementation is way too complicated and way too specific. I was suggesting putting it in the core package too so it is something that can be reused. We even have a trivial ad-hoc ring buffer implementation in the channels repo (receivers in Broadcast use a ring buffer for example).

Also the Sample doesn't need to be parametrized, we agreed on using a float always to keep thing simple.

Nitpick: I wouldn't name core.common common. Maybe core.timeseries as all in there seems to be related to timeseries?

@jakub-toptal

This comment was marked as resolved.

@jakub-toptal

This comment was marked as outdated.

@leandro-lucarella-frequenz

This comment was marked as outdated.

@leandro-lucarella-frequenz

This comment was marked as outdated.

@sahas-subramanian-frequenz

This comment was marked as outdated.

@thomas-nicolai-frequenz

This comment was marked as outdated.

@thomas-nicolai-frequenz

This comment was marked as outdated.

@thomas-nicolai-frequenz

This comment was marked as outdated.

@jakub-toptal

This comment was marked as outdated.

@ela-kotulska-frequenz

This comment was marked as outdated.

examples/sdk_resampling_example.py Outdated Show resolved Hide resolved
examples/sdk_resampling_example.py Outdated Show resolved Hide resolved
examples/sdk_resampling_example.py Outdated Show resolved Hide resolved
examples/sdk_resampling_example.py Outdated Show resolved Hide resolved
tests/data_ingestion/test_timeseries_resampler.py Outdated Show resolved Hide resolved
tests/data_ingestion/test_group_timeseries_resampler.py Outdated Show resolved Hide resolved
@jkozlowicz jkozlowicz force-pushed the resampling-actor-revised branch 2 times, most recently from 81c5a71 to c5febe7 Compare October 27, 2022 14:52
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, just one follow-up comment and we have the subscription stuff pending, which now that the data sourcing actor is about to be merged we can change with more confidence.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing that is still missing from previous reviews.

@leandro-lucarella-frequenz
Copy link
Contributor

OK, so the data sourcing actor is almost ready, so we have much more clarity on how the resampler needs to subscribe to data.

Key changes needed in this PR:

  • Remove the DataProvider.
  • Remove the core package entirely.
  • Remove the Sample class and use the one in data sourcing instead.
  • Remove subscriptions.py entirely.
  • ComponentMetricsResamplingActor
    • Rename TimeSeriesResamplingActor to ComponentMetricsResamplingActor (and the file names too).
    • Take a channel sender in the constructor to send subscriptions to the data sourcing actor.
    • Take a channel receiver in the constructor to receive subscriptions.
    • Use the data sourcing actor ComponentMetricRequest class as the subscription request message.
    • When forwarding the request, we need to change the request namespace. Probably append a _data_source suffix? @sahas-subramanian-frequenz maybe you can give your opinion as I never had this approach very clear in my head in terms of channels management.

Please see the data sourcing actor PR for more details, and rebase on top of it if needed:

Optional unrelated change:

  • Kill the ComponentMetricGroupResampler class and move the code to the actor.

@jkozlowicz jkozlowicz force-pushed the resampling-actor-revised branch 2 times, most recently from 295b121 to f9501b4 Compare October 28, 2022 16:49
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few final minor comments.

Signed-off-by: Jakub Kozlowicz <jakub.kozlowicz@gmail.com>
Signed-off-by: Jakub Kozlowicz <jakub.kozlowicz@gmail.com>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just 2 small suggestions, but we are good to go! 🎉

This class enables creating/removing resamplers for individual metrics
and delegates resampling to them.

Signed-off-by: Jakub Kozlowicz <jakub.kozlowicz@gmail.com>
Signed-off-by: Jakub Kozlowicz <jakub.kozlowicz@gmail.com>
Signed-off-by: Jakub Kozlowicz <jakub.kozlowicz@gmail.com>
@leandro-lucarella-frequenz leandro-lucarella-frequenz changed the title Resampling actor revised Add a resampling actor Nov 1, 2022
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the PR title so it gets a better commit message when merged.

@jakub-toptal jakub-toptal merged commit 1e01c21 into frequenz-floss:v0.x.x Nov 1, 2022
Copy link
Contributor

@shsms shsms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leandro-lucarella-frequenz I added some comments, maybe we can discuss tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
part:data-pipeline Affects the data pipeline part:tests Affects the unit, integration and performance (benchmarks) tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement a data resampling actor
8 participants