Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on shared column names when writing #1335

Merged
merged 4 commits into from Jan 29, 2024

Conversation

ivirshup
Copy link
Member

@ivirshup ivirshup commented Jan 25, 2024

Copy link

codecov bot commented Jan 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (299ca97) 85.65% compared to head (470f255) 83.47%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1335      +/-   ##
==========================================
- Coverage   85.65%   83.47%   -2.19%     
==========================================
  Files          34       34              
  Lines        5460     5465       +5     
==========================================
- Hits         4677     4562     -115     
- Misses        783      903     +120     
Flag Coverage Δ
gpu-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
anndata/_io/specs/methods.py 88.06% <100.00%> (-0.10%) ⬇️

... and 7 files with indirect coverage changes

@ivirshup ivirshup added this to the 0.10.6 milestone Jan 25, 2024
df.index, index=df.index
).equals(df[df.index.name]):
raise ValueError(
f"DataFrame.index.name ({df.index.name!r}) is also used by a column "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"DataFrame.index.name ({df.index.name!r}) is also used by a column "
f"DataFrame.index.name ({repr(df.index.name)}) is also used by a column "

https://peps.python.org/pep-0498/#s-r-and-a-are-redundant I had to google this tbh.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that PEP is saying that we shouldn't use !r it's just saying that there is another way to do it. Plus it's just a point in the discussion.

For this kind of thing, I would say if our formatter doesn't care then it's fine. If we want to enforce style rules, then we'll do it with the formatter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, !r is the way.

@@ -663,10 +663,23 @@ def write_dataframe(f, key, df, _writer, dataset_kwargs=MappingProxyType({})):
if reserved in df.columns:
raise ValueError(f"{reserved!r} is a reserved name for dataframe columns.")
group = f.require_group(key)
if not df.columns.is_unique:
duplicates = list(df.columns[df.columns.duplicated()])
raise ValueError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit, but this seems like a "breaking change" since the behavior is changing. I'm not sure how strict things are about that here, though. Should we warn until 0.11.0? Maybe use #1270 to ease things? But I see the point that this is a "bug" in some sense as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is throwing an error on malformed input which was allowed through before (leading to more problems). To me, that's much more fixing of a bug than changing behaviour.

@ivirshup ivirshup merged commit d07306f into scverse:main Jan 29, 2024
14 checks passed
meeseeksmachine pushed a commit to meeseeksmachine/anndata that referenced this pull request Jan 29, 2024
ivirshup added a commit that referenced this pull request Jan 29, 2024
Co-authored-by: Isaac Virshup <ivirshup@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Loss of data when loading AnnData object with duplicated .obs column names from .h5ad
3 participants