BUG: sample IDs that look like scientific notation are treated as numbers #131

colinbrislawn · 2024-01-31T20:29:04Z

Trying to close #130

More testing is needed, especially a regression test for sampleIDs like this.

lizgehret · 2024-01-31T21:57:48Z

Thanks for working on this @colinbrislawn! Lmk when it's ready for a review and I'll take a look 🙂

colinbrislawn · 2024-02-01T16:39:58Z

This seems to work:

t(read.delim(inp_abundances_path,
  check.names = TRUE, # change to true to so the empty first column becomes .$X
  row.names = 1,
  colClasses = c(X = "character") # Make the .$X column a character, leave other columns
))

Any idea why check.names = F originally?

For the regression test, maybe import this file and make sure the sample names match?
table-SI-names.tsv

Could I just run test_ids_in_table_not_in_md() again, with a new set of files?

lizgehret · 2024-02-01T20:48:35Z

This seems to work:

t(read.delim(inp_abundances_path,
  check.names = TRUE, # change to true to so the empty first column becomes .$X
  row.names = 1,
  colClasses = c(X = "character") # Make the .$X column a character, leave other columns
))

Any idea why check.names = F originally?

There wasn't a strong reason for that being set to False in the initial implementation - I hadn't run into this issue while developing, so I didn't realize this would be an edge case we'd need to consider. This looks reasonable to me though!

For the regression test, maybe import this file and make sure the sample names match? table-SI-names.tsv

Could I just run test_ids_in_table_not_in_md() again, with a new set of files?

Yeah that should work! I think as long as we continue to use the original files/tests, and then just create additional tests with your new set of files to make sure the behavior is the same, that should work well.

colinbrislawn · 2024-02-01T20:49:41Z

Hey Liz, I think I'm ready for your help here.

I've got a new test_ids_in_table_with_es testing function that should fail because I've reverted the ancom import code... but it works fine. I'm also unfamiliar with the dataloaf format

EDIT:

and then just create additional tests with your new set of files to make sure the behavior is the same, that should work well

Yes! I've added those files. Can you help me write that?

lizgehret · 2024-02-01T21:49:34Z

Yes, happy to help! Hung up with a few other things this afternoon, but I'll take a closer look tomorrow 🙂

q2_composition/assets/run_ancombc.R

q2_composition/tests/test_ancombc.py

lizgehret · 2024-02-02T16:39:50Z

Okay just adding some notes here for myself as I'm investigating:

Forum user who reported this issue was using QIIME 2 2023.9 and R 4.2.2.
I replicated the error with the same package versions using their dataset.
I created a dummy table/md file inline in a test to replicate the behavior using the same sample IDs that the forum user did to make sure the error would catch (which it did).
I added two tests (that both fail) for uppercase E and lowercase E (to ensure that R doesn't care about casing).

Adding just these two tests (not finalized) in a commit so you can take a look @colinbrislawn!

EDIT: Let me know if you want me to finish this up, or if you'd rather take it across the finish line - I'm happy either way, just don't want to steal your thunder!

lizgehret · 2024-02-02T22:05:40Z

Another thing I decided to do @colinbrislawn was to just create the table/MD files inline instead of using your newly created files - sorry for flip flopping on that, I realized that would be easier for me to see/iterate on the table as I was testing!

colinbrislawn · 2024-02-02T23:28:28Z

Okay, I'll take a look. I'm excited to see how you set up testing.

Let me know if you want me to finish this up, or if you'd rather take it across the finish line

I would like more experience with Python, so I'll see if I can get this finished. This may be more work for you, but if you are willing to help me out I would very much appreciate it.

lizgehret · 2024-02-02T23:31:05Z

I would like more experience with Python, so I'll see if I can get this finished. This may be more work for you, but if you are willing to help me out I would very much appreciate it.

Happy to help, and in full support of you learning! You just let me know what questions you have 🙂

colinbrislawn · 2024-02-03T01:29:39Z

I've run into issues around metadata, so I've updated both import functions.
(Good thing we tested!)

In the metadata importe, I reference the sampleID column with the key sample-id.
Is this first column always called that in this plugin or should I support other names?

Zooming out a little, Tidyverse packages like readr may give us better ways to handle this.
We could start the tidyverse retooling, or save it for the ancom-bc2 function!

lizgehret · 2024-02-06T18:16:34Z

Hey @colinbrislawn, sample-id is only one of the supported names for metadata identifiers (that's just the one I happened to use in the original test data); the full list can be found here. I'm a bit wary of hard-coding all of these into the import function unless absolutely necessary - but definitely open to any improvements that might be available within tidyverse, since that's already a dependency!

Re: ancom-bc2 - we were waiting for that paper to be published before starting development on this method, but it looks like that's finally happened as of a little over a month ago! I'll chat with the rest of the eng team in this week's meeting about this and see what thoughts are there.

colinbrislawn · 2024-02-06T22:12:56Z

I thought I remembered a bunch of allowed sample-ids!

Here's the options I can see:

Use the python code to make sure that column is called sample-id before passing it to the R code I have in this PR
Switch to the tidyverse readr::read_delim() and use their better API.

What do you suggest?

lizgehret · 2024-02-07T20:45:53Z

I thought I remembered a bunch of allowed sample-ids!

Here's the options I can see:
* Use the python code to make sure that column is called `sample-id` before passing it to the R code I have in this PR

* Switch to the tidyverse `readr::read_delim()` and use their better API.
What do you suggest?

Let's see what can be done with readr::read_delim() - I was reviewing the available parameters for this method in the R docs, and I'm wondering if a good solution would be to utilize the col_types param and set that to character for the first column (i.e. the index column containing the sample IDs). I haven't investigated this in detail, so not sure if this will actually work like we want it to, but it seems like a good option to explore!

colinbrislawn · 2024-03-15T20:12:58Z

Checks pass on my machine!

I've got to remove my own notes and code before we merge...

colinbrislawn · 2024-03-15T20:24:18Z

@lizgehret, I've got the R code working, including a patch to _ancombc.py to always output sample-id

When you have a chance, could you take a look at the unit test?

Thanks!

lizgehret · 2024-04-16T21:02:29Z

Hey @colinbrislawn!

Sorry for the delay, still getting caught up on things after vacation and travel. Will give this a proper review shortly!

lizgehret

Hey @colinbrislawn,

Sorry (again) for the delay on this! Overall this looks reasonable to me, thanks again for working on this! I think the remaining task here is (just as you mention in your test comment) to add assertions into the contents of the dataloaf itself for these new test cases.

One additional (small) comment is that we may want to re-name IDE with something a bit more verbose (like IDs-with-Es, or something like that). Only reason I bring that up is because I was initially thinking of IDE in terms of interactive development environment and that may cause confusion in the future.

colinbrislawn · 2024-04-25T21:46:57Z

I was initially thinking of IDE in terms of interactive development environment

Good idea. I noticed that too when I wrote it. This I can fix.

add assertions into the contents of the dataloaf itself for these new test cases.

I'm unfamiliar with the dataloaf format. Would you like me to try this first or is this something you could wrap up for me?

lizgehret · 2024-04-25T21:55:46Z

I'm unfamiliar with the dataloaf format. Would you like me to try this first or is this something you could wrap up for me?

I can finish that up, thanks @colinbrislawn! I'll wait for your name change updates re: IDE and then take it from there. 🙂

lizgehret · 2024-05-02T00:12:14Z

okay @colinbrislawn i've updated these tests to be more comprehensive - they now assert that the contents of the resultant IDs in each slice of the dataloaf are what we'd expect.

i also apologize for asking you to update the names of the files you added that contained IDE - i realized after the fact that we didn't need those files after all 🤦‍♀️

i'm going to get someone else to give this a final review (@colinvwood TIA!) since i'm no longer an unbiased reviewer 😉 but this should be g2g!

q2_composition/assets/run_ancombc.R

Co-authored-by: colinvwood <68213641+colinvwood@users.noreply.github.com>

q2_composition/assets/run_ancombc.R

patch sampleIDs as e notation with examples

73e70fb

colinbrislawn marked this pull request as draft January 31, 2024 20:29

colinbrislawn mentioned this pull request Jan 31, 2024

BUG: ANCOM-BC Fails if sample IDs look too much like exponent #130

Closed

only set first col to characte with named list

5bf7c73

lizgehret self-assigned this Feb 1, 2024

Add WIP testing code

d816202

fix lint

725d073

spellcheck TRUE and FALSE

75029dd

colinbrislawn commented Feb 2, 2024

View reviewed changes

q2_composition/assets/run_ancombc.R Outdated Show resolved Hide resolved

remove old IDs with Es tables

f717682

colinbrislawn commented Feb 2, 2024

View reviewed changes

q2_composition/tests/test_ancombc.py Outdated Show resolved Hide resolved

adds two failing tests to replicate undesired behavior

1f7da24

adding some test comments

1218876

colinbrislawn added 2 commits February 2, 2024 20:23

remove old testing function

fbc810e

update data import

c242d55

colinbrislawn marked this pull request as ready for review February 5, 2024 19:34

more notes and testing to this PR

5c4cf16

colinbrislawn added 3 commits March 15, 2024 16:18

Use #Sample ID to test rename

461d06a

removed unused code in comments

79bb182

remove my notes

9fcbfa5

lizgehret reviewed Apr 25, 2024

View reviewed changes

lizgehret assigned colinbrislawn and unassigned gregcaporaso and lizgehret Apr 25, 2024

IDE to IDs_with_Es

ba53272

lizgehret assigned lizgehret and unassigned colinbrislawn May 1, 2024

lizgehret added 2 commits May 1, 2024 16:59

adds ID assertions in tests

cd9fc1e

adding gh issue ref

98aed0f

lizgehret changed the title ~~SampleIDs as number~~ BUG: sample IDs that look like scientific notation are treated as numbers May 2, 2024

better test comment

c15415b

lizgehret requested a review from colinvwood May 2, 2024 00:12

lizgehret assigned colinvwood and unassigned lizgehret May 2, 2024

colinvwood reviewed May 2, 2024

View reviewed changes

q2_composition/assets/run_ancombc.R Outdated Show resolved Hide resolved

use reserved TRUE instead of T

575b66b

Co-authored-by: colinvwood <68213641+colinvwood@users.noreply.github.com>

colinvwood reviewed May 2, 2024

View reviewed changes

q2_composition/assets/run_ancombc.R Outdated Show resolved Hide resolved

updating md import to use tidyverse

217f5e6

lizgehret merged commit 33e8a9e into qiime2:dev May 2, 2024
4 checks passed

lizgehret unassigned colinvwood May 2, 2024

colinbrislawn deleted the ancomSamples branch May 3, 2024 19:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: sample IDs that look like scientific notation are treated as numbers #131

BUG: sample IDs that look like scientific notation are treated as numbers #131

colinbrislawn commented Jan 31, 2024 •

edited

Loading

lizgehret commented Jan 31, 2024

colinbrislawn commented Feb 1, 2024

lizgehret commented Feb 1, 2024

colinbrislawn commented Feb 1, 2024 •

edited

Loading

lizgehret commented Feb 1, 2024

lizgehret commented Feb 2, 2024 •

edited

Loading

lizgehret commented Feb 2, 2024

colinbrislawn commented Feb 2, 2024

lizgehret commented Feb 2, 2024

colinbrislawn commented Feb 3, 2024 •

edited

Loading

lizgehret commented Feb 6, 2024

colinbrislawn commented Feb 6, 2024

lizgehret commented Feb 7, 2024

colinbrislawn commented Mar 15, 2024

colinbrislawn commented Mar 15, 2024 •

edited

Loading

lizgehret commented Apr 16, 2024

lizgehret left a comment

colinbrislawn commented Apr 25, 2024

lizgehret commented Apr 25, 2024

lizgehret commented May 2, 2024 •

edited

Loading

BUG: sample IDs that look like scientific notation are treated as numbers #131

BUG: sample IDs that look like scientific notation are treated as numbers #131

Conversation

colinbrislawn commented Jan 31, 2024 • edited Loading

lizgehret commented Jan 31, 2024

colinbrislawn commented Feb 1, 2024

lizgehret commented Feb 1, 2024

colinbrislawn commented Feb 1, 2024 • edited Loading

lizgehret commented Feb 1, 2024

lizgehret commented Feb 2, 2024 • edited Loading

lizgehret commented Feb 2, 2024

colinbrislawn commented Feb 2, 2024

lizgehret commented Feb 2, 2024

colinbrislawn commented Feb 3, 2024 • edited Loading

lizgehret commented Feb 6, 2024

colinbrislawn commented Feb 6, 2024

lizgehret commented Feb 7, 2024

colinbrislawn commented Mar 15, 2024

colinbrislawn commented Mar 15, 2024 • edited Loading

lizgehret commented Apr 16, 2024

lizgehret left a comment

Choose a reason for hiding this comment

colinbrislawn commented Apr 25, 2024

lizgehret commented Apr 25, 2024

lizgehret commented May 2, 2024 • edited Loading

colinbrislawn commented Jan 31, 2024 •

edited

Loading

colinbrislawn commented Feb 1, 2024 •

edited

Loading

lizgehret commented Feb 2, 2024 •

edited

Loading

colinbrislawn commented Feb 3, 2024 •

edited

Loading

colinbrislawn commented Mar 15, 2024 •

edited

Loading

lizgehret commented May 2, 2024 •

edited

Loading