Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for saving matrices to files in npz format #154

Merged
merged 9 commits into from May 20, 2018

Conversation

nimroha
Copy link
Contributor

@nimroha nimroha commented May 17, 2018

Closes #153

100% test coverage python 2.7 and 3.5

added all the docs but could not build HTML locally

@codecov-io
Copy link

codecov-io commented May 17, 2018

Codecov Report

Merging #154 into master will increase coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #154      +/-   ##
==========================================
+ Coverage   96.89%   96.93%   +0.04%     
==========================================
  Files          10       11       +1     
  Lines        1191     1207      +16     
==========================================
+ Hits         1154     1170      +16     
  Misses         37       37
Impacted Files Coverage Δ
sparse/io.py 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f939529...d5c19a7. Read the comment docs.

Copy link
Collaborator

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is an awesome effort. We strive for perfection, so you can make changes and I'll approve. 😄

One thing not in this review is that you should add load_npz and save_npz into the list of functions in docs/generated/sparse.rst before building the docs.

sparse/io.py Outdated
[0. , 0.86522495]]])
>>> os.remove('mat.npz')

:param filename: string or file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameters must be documented Numpy-style. See https://numpydoc.readthedocs.io/en/latest/format.html

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sparse/io.py Outdated
[0. , 0.86522495]]])
>>> os.remove('mat.npz')

:param filename: file-like object, string, or pathlib.Path
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above.

from sparse.utils import assert_eq


def test_save_load_npz_file():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try parametrizing compressed as True/False. See example parametrizations in test_coo.py.

assert_eq(x, z)
assert_eq(y, z.todense())

# test exception on wrong format
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this into another test.

sparse/io.py Outdated
"""

nodes = {'data': matrix.data,
'coords': matrix.coords}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want to store the shape too, sometimes these aren't enough on their own.

sparse/io.py Outdated
@@ -0,0 +1,96 @@
import numpy as np

from sparse.coo.core import COO
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be .coo.core. Inside a package, we should always use relative imports.

@@ -0,0 +1,6 @@
io.load_npz
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this file and build the docs. It will be auto-generated, then you can do git add.

@@ -0,0 +1,6 @@
io.save_npz
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

with pytest.raises(RuntimeError):
load_npz(filename)

shutil.rmtree(dir_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try os.rmdir

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, ignore. I just read the docs on that... It doesn't remove unless empty.

sparse/io.py Outdated
except KeyError:
raise RuntimeError('The file {} does not contain a valid sparse matrix'.format(filename))

return COO(coords=coords, data=data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add sorted=True, has_duplicates=False and shape. The first two will make it a lot faster, and shape ensures consistency.

@hameerabbasi
Copy link
Collaborator

If this is too much for you, let me know and I'll fix it up. 🙂 The PR will still be in your name.

@nimroha
Copy link
Contributor Author

nimroha commented May 17, 2018

no worries, just got caught up at work.
will work on it tonight

@nimroha
Copy link
Contributor Author

nimroha commented May 17, 2018

addressed all your comments and added the doc files

Copy link
Collaborator

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like most of the major concerns have been addressed, here are some other recommendations. Good work by the way. 😄

sparse/io.py Outdated
Whether to save in compressed or uncompressed mode

Returns
-------
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this section, as the function isn't returning anything.


import sparse

from sparse.io import save_npz, load_npz
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be addressed. In scipy.sparse, the functions are in scipy.sparse, not scipy.sparse.io.

Therefore, here, they should be in sparse directly, not sparse.io. You need to import them from sparse here, not sparse.io.

with pytest.raises(RuntimeError):
load_npz(filename)

shutil.rmtree(dir_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flake8 is failing because of a missing newline here.

[[0. , 0. ],
[0. , 0.86522495]]])
>>> os.remove('mat.npz')

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional, but would be really nice to have: A see also section with load_npz, np.savez and scipy.sparse.save_npz. Also a note saying that it's binary incompatible. See examples in our code and here: https://numpydoc.readthedocs.io/en/latest/format.html

[[0. , 0. ],
[0. , 0.86522495]]])
>>> os.remove('mat.npz')

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above:

Optional, but would be really nice to have: A see also section with save_npz, np.load and scipy.sparse.load_npz. Also a note saying that it's binary incompatible.

@@ -53,3 +53,6 @@ API

where

save_npz
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these should be added in alphabetical order, otherwise they're hard to find.

Copy link
Collaborator

@hameerabbasi hameerabbasi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more changes.

We want the user-facing API to be compatible with scipy.sparse, so the .io should be removed everywhere and it should just be treated as an internal submodule.

sparse/io.py Outdated
>>> mat
<COO: shape=(2, 2, 2), dtype=float64, nnz=2>
>>> sparse.io.save_npz('mat.npz', mat)
>>> loaded_mat = sparse.io.load_npz('mat.npz')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this to sparse.load_npz

sparse/io.py Outdated
>>> mat = sparse.COO(dense_mat)
>>> mat
<COO: shape=(2, 2, 2), dtype=float64, nnz=2>
>>> sparse.io.save_npz('mat.npz', mat)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this to sparse.save_npz

sparse/io.py Outdated
>>> mat
<COO: shape=(2, 2, 2), dtype=float64, nnz=2>
>>> sparse.io.save_npz('mat.npz', mat)
>>> loaded_mat = sparse.io.load_npz('mat.npz')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as above. Also, it would be best to remove the examples section and in the main description, just add

See :obj:`save_npz` for usage examples.

@hameerabbasi hameerabbasi merged commit dfaee0c into pydata:master May 20, 2018
@hameerabbasi
Copy link
Collaborator

This is in! Thanks, @nimroha!

@nimroha
Copy link
Contributor Author

nimroha commented May 20, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants