Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I added two new methods to the
Dataset
class:from_pandas()
to create a dataset from a pandas dataframefrom_dict()
to create a dataset from a dictionary (keys = columns)It uses the
pa.Table.from_pandas
andpa.Table.from_pydict
funcitons to do so.It is also possible to specify the features types via
features=...
if there are ambiguities (null/nan values), otherwise the arrow schema is infered from the data automatically by pyarrow.One question that I have right now:
save()
method that would write the dataset on the disk ? Right now if we create aDataset
using those two new methods, the data are kept in RAM. Then to reload it we can call thefrom_file()
method.