Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update scanpy filter, fixes crashes on scanpy tutorial data #3485

Merged
merged 18 commits into from Mar 16, 2021

Conversation

@mvdbeek
Copy link
Member

@mvdbeek mvdbeek commented Feb 16, 2021

Fixes

Traceback (most recent call last):
  File "/Users/mvandenb/src/galaxy/database/jobs_directory/034/34876/configs/tmpgl47oehv", line 9, in <module>
    adata = sc.read('anndata.h5ad')
  File "/usr/local/lib/python3.6/site-packages/scanpy/readwrite.py", line 97, in read
    backup_url=backup_url, cache=cache, **kwargs,
  File "/usr/local/lib/python3.6/site-packages/scanpy/readwrite.py", line 499, in _read
    return read_h5ad(filename, backed=backed)
  File "/usr/local/lib/python3.6/site-packages/anndata/readwrite/read.py", line 447, in read_h5ad
    constructor_args = _read_args_from_h5ad(filename=filename, chunk_size=chunk_size)
  File "/usr/local/lib/python3.6/site-packages/anndata/readwrite/read.py", line 502, in _read_args_from_h5ad
    return AnnData._args_from_dict(d)
  File "/usr/local/lib/python3.6/site-packages/anndata/core/anndata.py", line 2157, in _args_from_dict
    if key in d_true_keys[true_key].dtype.names:
AttributeError: 'dict' object has no attribute 'dtype'

... which you can only see if you remove the stderr redirect. I didn't touch the stderr redirect, but this should probably be either removed or replaced with tee.

FOR CONTRIBUTOR:

  • - I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
  • - License permits unrestricted use (educational + commercial)
  • - This PR adds a new tool or tool collection
  • - This PR updates an existing tool or tool collection
  • - This PR does something else (explain below)
@mvdbeek
Copy link
Member Author

@mvdbeek mvdbeek commented Feb 16, 2021

Minor hiccup in scanpy plot should be fixed in theislab/scanpy#1654. I should be able to patch that in.

Loading

mvdbeek added 7 commits Feb 16, 2021
Otherwise this falls back to the last plotted layout, which is not a
good thing for reprodcubility or transparency.
This likely corrects a fixed bug in the anndata version that produced
this file. See theislab/scanpy#1656 (comment)
@mvdbeek
Copy link
Member Author

@mvdbeek mvdbeek commented Feb 19, 2021

Alright, at this point only scanpy plot fails, which should be fixed in theislab/scanpy#1654, theislab/scanpy#1669 and theislab/scanpy#1655

Loading

@mvdbeek mvdbeek mentioned this pull request Mar 6, 2021
5 tasks
@mtekman
Copy link
Contributor

@mtekman mtekman commented Mar 12, 2021

@mvdbeek thanks for all the fixes, it really looks like you've improved the upstream too. I'm free this weekend and a little bit of next week to help here if needed. Is this PR just waiting for 1669 to merge before it is complete, or is there more to do?

Loading

@pcm32
Copy link
Member

@pcm32 pcm32 commented Mar 15, 2021

Can I help on anything with this as well? We have a need for this for a training that we have on the second half of April. Thanks!

Loading

tools/scanpy/macros.xml Outdated Show resolved Hide resolved
Loading
@pcm32 pcm32 mentioned this pull request Mar 15, 2021
5 tasks
@bgruening
Copy link
Member

@bgruening bgruening commented Mar 15, 2021

Thanks @pcm32 for pushing this forward!

Loading

@mvdbeek
Copy link
Member Author

@mvdbeek mvdbeek commented Mar 16, 2021

Okay, so the last error that I'm stuck with is:

Observation names are not unique. To make them unique, call `.obs_names_make_unique`.
/usr/local/lib/python3.8/site-packages/pandas/core/arrays/categorical.py:2487: FutureWarning: The `inplace` parameter in pandas.Categorical.remove_unused_categories is deprecated and will be removed in a future version.
  res = method(*args, **kwargs)
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/pandas/core/frame.py", line 3292, in _ensure_valid_index
    value = Series(value)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/series.py", line 364, in __init__
    data = sanitize_array(data, index, dtype, copy, raise_cast_failure=True)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/construction.py", line 529, in sanitize_array
    raise ValueError("Data must be 1-dimensional")
ValueError: Data must be 1-dimensional

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/tmp/tmpfwsltt02/job_working_directory/000/50/configs/tmp7zg72a5_", line 15, in <module>
    sc.pl.rank_genes_groups_violin(
  File "/usr/local/lib/python3.8/site-packages/scanpy/plotting/_tools/__init__.py", line 909, in rank_genes_groups_violin
    df[g] = X_col
  File "/usr/local/lib/python3.8/site-packages/pandas/core/frame.py", line 3163, in __setitem__
    self._set_item(key, value)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/frame.py", line 3241, in _set_item
    self._ensure_valid_index(value)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/frame.py", line 3294, in _ensure_valid_index
    raise ValueError(
ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series

This one is theislab/scanpy#1669 and is probably specific to anndata objects without .raw attributes. I don't know enough to judge how common this is. If it is not common we can exchange the test data (the pbmc68k dataset does not have this problem), but it seems like this is supposed to work. @mtekman @pcm32 do you know what's up with the .raw attributes ?

Loading

@pcm32
Copy link
Member

@pcm32 pcm32 commented Mar 16, 2021

I think it can be fairly common not to have .raw, but what you often see in these cases where .raw is not available is that people copy data into .raw before. So we could check for that and do that copy if .raw is not available, maybe with a checkbox in the UI for "Copy data to .raw if .raw is not available?" and maybe handling the error to provide a more explicit explanation that the .raw is missing. I will ask some people to chip in who know more.

Loading

@pcm32
Copy link
Member

@pcm32 pcm32 commented Mar 16, 2021

I'm hearing from others who use AnnData more than I do that they would find surprising not to find a .raw, in case it helps.

Loading

@mtekman
Copy link
Contributor

@mtekman mtekman commented Mar 16, 2021

Yeah raw is definitely mentioned in the tutorials, and I use it a lot too

Loading

@mvdbeek
Copy link
Member Author

@mvdbeek mvdbeek commented Mar 16, 2021

copy data into .raw before

I suppose that's not just adata.raw = adata ? Which didn't work for me. I think we can just disable that one plot until there's a upstream fix.

Loading

@pcm32
Copy link
Member

@pcm32 pcm32 commented Mar 16, 2021

Sorry, it is adata.raw = adata.X if memory serves well. Let me ask if that plot is key.

Loading

@mtekman
Copy link
Contributor

@mtekman mtekman commented Mar 16, 2021

It's adata.raw = adata according to : https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html

And yep, happy to drop the rank genes plot for now -- for the PBMC tutorial above (use in the training), it's not that useful

Loading

@mvdbeek
Copy link
Member Author

@mvdbeek mvdbeek commented Mar 16, 2021

OK, both didn't work, so going with disabling the plot now.

Loading

@nomadscientist
Copy link

@nomadscientist nomadscientist commented Mar 16, 2021

This sounds good! I think there is usually a .raw, I couldn't actually tell you how to make it not exist, and also that's not the most useful plot in the world so binning it for now seems solid

Loading

Copy link
Contributor

@mtekman mtekman left a comment

The code changes are fantastic, cannot thank you enough.

The h5ad files for the recipes seem to be empty, which I can live with -- but I am wondering about the the empty files for the violin and UMAP plots

Loading

@pcm32
Copy link
Member

@pcm32 pcm32 commented Mar 16, 2021

Thanks @mvdbeek for all these improvements! For reference on the .raw discussion, our friend Ni, an expert from the Sanger said:

That error there was caused by adata[:, g].X not being handled correctly when it’s a sparse/dense (can’t remember which) matrix, which can be considered a bug in scanpy.
.raw is optional in anndata. I’d say it’s less common for a “processed” anndata to not have it, as saving log-transformed and normalised data to .raw is common practice. However, for data distrubtion, people quite often make an anndata with X being the log-transformed and normalised data, and leave .raw empty. Also, sometimes people may want to plot on X even when .raw exists, so I’d prefer getting that code in scanpy right.

Loading

@nh3
Copy link

@nh3 nh3 commented Mar 16, 2021

Thanks @pcm32.

I think theislab/scanpy#1669 is in the right thing to do.

It's an issue in scanpy that is not specific to anndata without .raw. When anndata.X is not sparse, anndata[:, g].X returns a 2d anndata._core.views.ArrayView rather than a 1d array that can be used directly downstream, so .toarray().flatten() is required for both sparse and non-sparse cases as in that fix.

Loading

@mvdbeek
Copy link
Member Author

@mvdbeek mvdbeek commented Mar 16, 2021

The h5ad files for the recipes seem to be empty, which I can live with -- but I am wondering about the the empty files for the violin and UMAP plots

Are you sure this is not the github UI being misleading ? None of the files in test-data are empty

Loading

@mtekman
Copy link
Contributor

@mtekman mtekman commented Mar 16, 2021

You're right -- this looks good

Loading

@pcm32
Copy link
Member

@pcm32 pcm32 commented Mar 16, 2021

I think it would help to @nomadscientist's training anxieties :-) if we can merge this with the missing plot and then add it later once the base scanpy issue is fixed and released. Unless that we see this affecting other parts of the plotting or tools in this PR (but I don't see it at least).

Loading

@mtekman
Copy link
Contributor

@mtekman mtekman commented Mar 16, 2021

there is a workshop on thursday which will need these, second that

Loading

@bgruening
Copy link
Member

@bgruening bgruening commented Mar 16, 2021

Thanks, everyone!

Loading

@bgruening bgruening merged commit 7b8af82 into galaxyproject:master Mar 16, 2021
10 checks passed
Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants