You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have an add_columns() API for fragments, and a merge() API for dataset. It would be nice to create a add_columns() API for dataset that is similar to the one for fragments.
It would also be very nice to add progress tracking as an option there, since this might be a long operation. For example, users might call this API to add a new embedding column to the dataset.
The text was updated successfully, but these errors were encountered:
Another requirement: for the UDF, it would be nice to provide some way for the data to be staged so that it can be resumed in case of a crash.
One way to do that in Python is use a sqlite file as a durable mapping from hash(input_data) to output_data. We could even be extra smart and keep track of which fragments have already been written and where. This could also help with cleanup.
We have an
add_columns()
API for fragments, and amerge()
API for dataset. It would be nice to create aadd_columns()
API for dataset that is similar to the one for fragments.It would also be very nice to add progress tracking as an option there, since this might be a long operation. For example, users might call this API to add a new embedding column to the dataset.
The text was updated successfully, but these errors were encountered: