You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently when running a workflow through the API, data is expected to be passed through the API. This is fine for most situations but inefficient when running a large data processing operation.
When a workflow is run through Python, a generator can be passed to efficiently process a large dataset. This issue will add a new YAML argument to workflows named stream. A stream will effectively act as a data generator and stream data to the workflow. This will enable full server side processing of content.
The following is an example workflow that indexes the cnn_dailymail Hugging Face dataset.
Currently when running a workflow through the API, data is expected to be passed through the API. This is fine for most situations but inefficient when running a large data processing operation.
When a workflow is run through Python, a generator can be passed to efficiently process a large dataset. This issue will add a new YAML argument to workflows named
stream
. A stream will effectively act as a data generator and stream data to the workflow. This will enable full server side processing of content.The following is an example workflow that indexes the cnn_dailymail Hugging Face dataset.
The text was updated successfully, but these errors were encountered: