Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I/O operations similar to Numpy #354

Closed
AlexJuca opened this issue Mar 25, 2021 · 11 comments
Closed

I/O operations similar to Numpy #354

AlexJuca opened this issue Mar 25, 2021 · 11 comments
Labels
area:nx Applies to nx note:discussion Details or approval are up for discussion

Comments

@AlexJuca
Copy link
Contributor

AlexJuca commented Mar 25, 2021

Will Nx support I/O operations similar to those found in Numpy's npyio?

Functions like:

np.save("tmp/123", np.array([[1, 1, 0], [0, 1, 0]]))
np.load("tmp/123.npy")

if so, will these functions be packaged into their own module or called directly from Nx?

@josevalim josevalim added area:nx Applies to nx note:discussion Details or approval are up for discussion labels Mar 25, 2021
@josevalim
Copy link
Collaborator

We can discuss it, I am interested in the use cases. :)

@AlexJuca
Copy link
Contributor Author

AlexJuca commented Mar 25, 2021

@josevalim It is common in the ML community to save numpy arrays to disk so they can used at a later time.

And from my research the functionality fit the following use cases:

  1. To save predictions from a model and reuse these predictions later.
  2. To save pre-processed Tensors like a corpus of text (integers) or a collection of rescaled image data (pixels) and reuse-them later.

In essence, these example use cases capture the main idea:

Saving Tensors to file so they can be loaded up quickly and reused later.

It would also be interesting to see how these saved tensors could be used in a distributed system. Perhaps saving the file to one or multiple nodes.

Or loading very large tensors saved in binary compressed format on various nodes to be processed.

@AlexJuca
Copy link
Contributor Author

It is common to save to .csv files but loading and saving gigabytes of data using this format is slow.
Numpy solves that with a binary format that is much more efficient when loading and saving to disk.

@josevalim
Copy link
Collaborator

Right! For NN though, they have their own formats, so I am wondering if most tools will have their own formats in top, meaning ours won’t be used much. :)

@AlexJuca
Copy link
Contributor Author

AlexJuca commented Mar 25, 2021

Numpy has it's own .npy binary format and this format seems to be used in the R community.
R also has it's own .rdata format.

We could add support for reading .npy file if it is common enough and try to convert to valid Nx tensor.

This might be interesting for python developers who have workflows that use .npy files and want to experiment with elixir.

This would essentially be like reading another format like .csv and converting to a valid tensor.

But yeah, most tools have their own specialized binary format to save large arrays.

@shoz-f
Copy link

shoz-f commented Mar 30, 2021

Hello wizards.
A few days ago, I reworked my Npy module to load/save Nx.tensor into npy/npz files.

I created the original Npy module to take the tensor processed by python's tensorflow and use it in the tensorflow-lite module on Nerves. The initial specification was to load/save my own %Npy{} into a npy file.

If you are interested, please visit my github:
https://github.com/shoz-f/npy_ex

@AlexJuca
Copy link
Contributor Author

@shoz-f Nice! I will take a look at your implementation. It would be nice for this to be part of Nx itsel in my opinion. Tell me from your experience, how common is this feature in a ML researcher or practitioners workflow?

@sekunho
Copy link

sekunho commented May 15, 2021

Are there any temporary workarounds for this? I don't intend to use the model elsewhere, just within Axon. But I'm having trouble thinking of an approach that allows me to reuse a trained model later on, or in a different environment/application instance. Like, training locally, and using the model in a production environment. Or maybe, just to apply the model in a different instance.

@josevalim
Copy link
Collaborator

@imsekun if production and dev have the same endianess, then it is a matter of :erlang.term_to_binary to serialize it and write the result to a file. Then File.read! and :erlang.binary_to_term to read it.

@sekunho
Copy link

sekunho commented May 15, 2021

@josevalim Oh wow TIL that existed. Awesome. Thanks so much!

@josevalim
Copy link
Collaborator

We currently have from_numpy and from_numpy_archive, which are Numpy specific. I will open up a new discussion about file storage for Nx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:nx Applies to nx note:discussion Details or approval are up for discussion
Projects
None yet
Development

No branches or pull requests

4 participants