New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for saving matrices to files in npz format #154

Merged
merged 9 commits into from May 20, 2018

Conversation

Projects
None yet
3 participants
@nimroha
Contributor

nimroha commented May 17, 2018

Closes #153

100% test coverage python 2.7 and 3.5

added all the docs but could not build HTML locally

@codecov-io

This comment has been minimized.

codecov-io commented May 17, 2018

Codecov Report

Merging #154 into master will increase coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #154      +/-   ##
==========================================
+ Coverage   96.89%   96.93%   +0.04%     
==========================================
  Files          10       11       +1     
  Lines        1191     1207      +16     
==========================================
+ Hits         1154     1170      +16     
  Misses         37       37
Impacted Files Coverage Δ
sparse/io.py 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f939529...d5c19a7. Read the comment docs.

@hameerabbasi

Overall, this is an awesome effort. We strive for perfection, so you can make changes and I'll approve. 😄

One thing not in this review is that you should add load_npz and save_npz into the list of functions in docs/generated/sparse.rst before building the docs.

[0. , 0.86522495]]])
>>> os.remove('mat.npz')
:param filename: string or file

This comment has been minimized.

@hameerabbasi

hameerabbasi May 17, 2018

Collaborator

Parameters must be documented Numpy-style. See https://numpydoc.readthedocs.io/en/latest/format.html

This comment has been minimized.

@hameerabbasi
[0. , 0.86522495]]])
>>> os.remove('mat.npz')
:param filename: file-like object, string, or pathlib.Path

This comment has been minimized.

@hameerabbasi

hameerabbasi May 17, 2018

Collaborator

See comment above.

from sparse.utils import assert_eq
def test_save_load_npz_file():

This comment has been minimized.

@hameerabbasi

hameerabbasi May 17, 2018

Collaborator

Try parametrizing compressed as True/False. See example parametrizations in test_coo.py.

assert_eq(x, z)
assert_eq(y, z.todense())
# test exception on wrong format

This comment has been minimized.

@hameerabbasi

hameerabbasi May 17, 2018

Collaborator

Make this into another test.

"""
nodes = {'data': matrix.data,
'coords': matrix.coords}

This comment has been minimized.

@hameerabbasi

hameerabbasi May 17, 2018

Collaborator

You want to store the shape too, sometimes these aren't enough on their own.

@@ -0,0 +1,96 @@
import numpy as np
from sparse.coo.core import COO

This comment has been minimized.

@hameerabbasi

hameerabbasi May 17, 2018

Collaborator

This should be .coo.core. Inside a package, we should always use relative imports.

@@ -0,0 +1,6 @@
io.load_npz

This comment has been minimized.

@hameerabbasi

hameerabbasi May 17, 2018

Collaborator

Remove this file and build the docs. It will be auto-generated, then you can do git add.

@@ -0,0 +1,6 @@
io.save_npz

This comment has been minimized.

@hameerabbasi

hameerabbasi May 17, 2018

Collaborator

Same as above.

with pytest.raises(RuntimeError):
load_npz(filename)
shutil.rmtree(dir_name)

This comment has been minimized.

@hameerabbasi

This comment has been minimized.

@hameerabbasi

hameerabbasi May 17, 2018

Collaborator

Oh, ignore. I just read the docs on that... It doesn't remove unless empty.

except KeyError:
raise RuntimeError('The file {} does not contain a valid sparse matrix'.format(filename))
return COO(coords=coords, data=data)

This comment has been minimized.

@hameerabbasi

hameerabbasi May 17, 2018

Collaborator

Add sorted=True, has_duplicates=False and shape. The first two will make it a lot faster, and shape ensures consistency.

@hameerabbasi

This comment has been minimized.

Collaborator

hameerabbasi commented May 17, 2018

If this is too much for you, let me know and I'll fix it up. 🙂 The PR will still be in your name.

@nimroha

This comment has been minimized.

Contributor

nimroha commented May 17, 2018

no worries, just got caught up at work.
will work on it tonight

@nimroha nimroha force-pushed the nimroha:npz_file_IO branch from 58ecee6 to 770e433 May 17, 2018

@nimroha

This comment has been minimized.

Contributor

nimroha commented May 17, 2018

addressed all your comments and added the doc files

@hameerabbasi

Looks like most of the major concerns have been addressed, here are some other recommendations. Good work by the way. 😄

Whether to save in compressed or uncompressed mode
Returns
-------

This comment has been minimized.

@hameerabbasi

hameerabbasi May 18, 2018

Collaborator

Remove this section, as the function isn't returning anything.

import sparse
from sparse.io import save_npz, load_npz

This comment has been minimized.

@hameerabbasi

hameerabbasi May 18, 2018

Collaborator

This still needs to be addressed. In scipy.sparse, the functions are in scipy.sparse, not scipy.sparse.io.

Therefore, here, they should be in sparse directly, not sparse.io. You need to import them from sparse here, not sparse.io.

with pytest.raises(RuntimeError):
load_npz(filename)
shutil.rmtree(dir_name)

This comment has been minimized.

@hameerabbasi

hameerabbasi May 18, 2018

Collaborator

flake8 is failing because of a missing newline here.

[[0. , 0. ],
[0. , 0.86522495]]])
>>> os.remove('mat.npz')

This comment has been minimized.

@hameerabbasi

hameerabbasi May 18, 2018

Collaborator

Optional, but would be really nice to have: A see also section with load_npz, np.savez and scipy.sparse.save_npz. Also a note saying that it's binary incompatible. See examples in our code and here: https://numpydoc.readthedocs.io/en/latest/format.html

[[0. , 0. ],
[0. , 0.86522495]]])
>>> os.remove('mat.npz')

This comment has been minimized.

@hameerabbasi

hameerabbasi May 18, 2018

Collaborator

Same as above:

Optional, but would be really nice to have: A see also section with save_npz, np.load and scipy.sparse.load_npz. Also a note saying that it's binary incompatible.

@@ -53,3 +53,6 @@ API
where
save_npz

This comment has been minimized.

@hameerabbasi

hameerabbasi May 18, 2018

Collaborator

Both of these should be added in alphabetical order, otherwise they're hard to find.

@hameerabbasi

A few more changes.

We want the user-facing API to be compatible with scipy.sparse, so the .io should be removed everywhere and it should just be treated as an internal submodule.

>>> mat
<COO: shape=(2, 2, 2), dtype=float64, nnz=2>
>>> sparse.io.save_npz('mat.npz', mat)
>>> loaded_mat = sparse.io.load_npz('mat.npz')

This comment has been minimized.

@hameerabbasi

hameerabbasi May 18, 2018

Collaborator

Change this to sparse.load_npz

>>> mat = sparse.COO(dense_mat)
>>> mat
<COO: shape=(2, 2, 2), dtype=float64, nnz=2>
>>> sparse.io.save_npz('mat.npz', mat)

This comment has been minimized.

@hameerabbasi

hameerabbasi May 18, 2018

Collaborator

Change this to sparse.save_npz

>>> mat
<COO: shape=(2, 2, 2), dtype=float64, nnz=2>
>>> sparse.io.save_npz('mat.npz', mat)
>>> loaded_mat = sparse.io.load_npz('mat.npz')

This comment has been minimized.

@hameerabbasi

hameerabbasi May 18, 2018

Collaborator

Same comments as above. Also, it would be best to remove the examples section and in the main description, just add

See :obj:`save_npz` for usage examples.

@hameerabbasi hameerabbasi merged commit dfaee0c into pydata:master May 20, 2018

4 checks passed

ci/circleci: build_27 Your tests passed on CircleCI!
Details
ci/circleci: build_36 Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 96.89%)
Details
codecov/project 96.93% (+0.04%) compared to f939529
Details
@hameerabbasi

This comment has been minimized.

Collaborator

hameerabbasi commented May 20, 2018

This is in! Thanks, @nimroha!

@nimroha

This comment has been minimized.

Contributor

nimroha commented May 20, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment