Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure reading single observation h5ads #42

Closed
mckinsel opened this issue Jul 24, 2018 · 3 comments · Fixed by #55
Closed

Failure reading single observation h5ads #42

mckinsel opened this issue Jul 24, 2018 · 3 comments · Fixed by #55

Comments

@mckinsel
Copy link

AnnData doesn't seem to like h5ad files with a single observation:

In [54]: anndata.__version__
Out[54]: '0.6.6'

In [55]: df.shape
Out[55]: (23465, 1)

In [56]: adata = anndata.AnnData(df.values.T, {"cell_names": df.columns.values}, {"gene_names": df.index.values})

In [57]: adata.write("test.h5ad")

In [58]: bdata = anndata.read_h5ad("test.h5ad")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-58-222d2ae8c407> in <module>()
----> 1 bdata = anndata.read_h5ad("test.h5ad")

~/.local/lib/python3.5/site-packages/anndata/readwrite/read.py in read_h5ad(filename, backed)
    342         # load everything into memory
    343         d = _read_h5ad(filename=filename)
--> 344         return AnnData(d)
    345
    346

~/.local/lib/python3.5/site-packages/anndata/base.py in __init__(self, X, obs, var, uns, obsm, varm, raw, dtype, shape, filename, filemode, asview, oidx, vidx)
    639                 obsm=obsm, varm=varm, raw=raw,
    640                 dtype=dtype, shape=shape,
--> 641                 filename=filename, filemode=filemode)
    642
    643     def _init_as_view(self, adata_ref, oidx, vidx):

~/.local/lib/python3.5/site-packages/anndata/base.py in _init_as_actual(self, X, obs, var, uns, obsm, varm, raw, dtype, shape, filename, filemode)
    758                 raise ValueError(
    759                     'If `X` is a dict no further arguments must be provided.')
--> 760             X, obs, var, uns, obsm, varm, raw = self._from_dict(X)
    761
    762         # init from AnnData

~/.local/lib/python3.5/site-packages/anndata/base.py in _from_dict(ddata)
   1920                     if key in d_true_keys[true_key].dtype.names:
   1921                         d_true_keys[true_key] = pd.DataFrame.from_records(
-> 1922                             d_true_keys[true_key], index=key)
   1923                         break
   1924                 d_true_keys[true_key].index = d_true_keys[true_key].index.astype('U')

~/.local/lib/python3.5/site-packages/pandas/core/frame.py in from_records(cls, data, index, exclude, columns, coerce_float, nrows)
   1267         else:
   1268             arrays, arr_columns = _to_arrays(data, columns,
-> 1269                                              coerce_float=coerce_float)
   1270
   1271             arr_columns = _ensure_index(arr_columns)

~/.local/lib/python3.5/site-packages/pandas/core/frame.py in _to_arrays(data, columns, coerce_float, dtype)
   7493     else:
   7494         # last ditch effort
-> 7495         data = lmap(tuple, data)
   7496         return _list_to_arrays(data, columns, coerce_float=coerce_float,
   7497                                dtype=dtype)

~/.local/lib/python3.5/site-packages/pandas/compat/__init__.py in lmap(*args, **kwargs)
    129
    130     def lmap(*args, **kwargs):
--> 131         return list(map(*args, **kwargs))
    132
    133     def lfilter(*args, **kwargs):

TypeError: 'numpy.int64' object is not iterable

But faking another cell works fine:

In [59]: df2 = pandas.concat([df, df], axis=1)

In [60]: df2.shape
Out[60]: (23465, 2)

In [61]: adata = anndata.AnnData(df2.values.T, {"cell_names": df2.columns.values}, {"gene_names": df2.index.values})

In [62]: adata.write("test.h5ad")

In [63]: bdata = anndata.read_h5ad("test.h5ad")

In [64]: bdata.n_obs, bdata.n_vars
Out[64]: (2, 23465)

I'm guessing this has something to do with _fix_shapes

@falexwolf
Copy link
Member

Ah, yes... thanks for pointing this out... It should work but there is something non-trivial, as the data matrix is cast to a vector when it has a single column or row (just like numpy arrays) so that flattening is unnecessary. I never looked into this on the level of the file, though... We're giving the whole file backing an overhaul anyways and will fix it during that.

@flying-sheep
Copy link
Member

i’m very sure i got rid of that behavior at one point and enforced that the data matrix was always 2D.

we should go back to that behavior, mohammad is struggling with problems regarding this as well.

@falexwolf
Copy link
Member

Finally fixed via #55.

Btw @mckinsel, looking at your call above: You can also init an AnnData directly from a DataFrame (adata = AnnData(df)) since a few weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants