You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[P0] End-to-end examples: simple ML ingest, scalable batch inference, etc.
[P0] User guide overhaul - concentrate on datasets creation since this is the starting point for many Ray ML workflows.
[P0] API docs audit
[P0] Knob tuning and debugging guides
[P0] Document Datasets resource provisioning model (i.e. that Datasets resource allocation is implicit, using cluster resources in the margins of explicit allocations to other libraries like Tune).
[P0] De-emphasize global/windowed shuffling.
[P0] Document lazy mode.
[P1] Make the simple and Arrow dataset distinction more clear, including: (a) what are the differences, (b) what makes a dataset simple vs. Arrow (method of creation, outputs of mappers, etc.), (c) examples of (a) and (b).
[P1] Need better docs for .map() and .map_batches(), indicating that the former is for row-based mapping, the latter is for batch-based mapping (i.e. vectorized column operations and the like). Should be clear how data scientist users can map pandas operations in distributed way.
[P1] Explicitly note that column selection isn't possible on Datasets, funnel users to batch mapping for columnar transformations and aggregation API for columnar aggregations.
Tracker for docs and UX push for Datasets GA.
Docs
.map_batches()
.UX (code changes)
The text was updated successfully, but these errors were encountered: