-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: sample IDs that look like scientific notation are treated as numbers #131
Conversation
Thanks for working on this @colinbrislawn! Lmk when it's ready for a review and I'll take a look 🙂 |
This seems to work: t(read.delim(inp_abundances_path,
check.names = TRUE, # change to true to so the empty first column becomes .$X
row.names = 1,
colClasses = c(X = "character") # Make the .$X column a character, leave other columns
)) Any idea why For the regression test, maybe import this file and make sure the sample names match? Could I just run |
There wasn't a strong reason for that being set to False in the initial implementation - I hadn't run into this issue while developing, so I didn't realize this would be an edge case we'd need to consider. This looks reasonable to me though!
Yeah that should work! I think as long as we continue to use the original files/tests, and then just create additional tests with your new set of files to make sure the behavior is the same, that should work well. |
Hey Liz, I think I'm ready for your help here. I've got a new EDIT:
Yes! I've added those files. Can you help me write that? |
Yes, happy to help! Hung up with a few other things this afternoon, but I'll take a closer look tomorrow 🙂 |
Okay just adding some notes here for myself as I'm investigating:
Adding just these two tests (not finalized) in a commit so you can take a look @colinbrislawn! EDIT: Let me know if you want me to finish this up, or if you'd rather take it across the finish line - I'm happy either way, just don't want to steal your thunder! |
Another thing I decided to do @colinbrislawn was to just create the table/MD files inline instead of using your newly created files - sorry for flip flopping on that, I realized that would be easier for me to see/iterate on the table as I was testing! |
Okay, I'll take a look. I'm excited to see how you set up testing.
I would like more experience with Python, so I'll see if I can get this finished. This may be more work for you, but if you are willing to help me out I would very much appreciate it. |
Happy to help, and in full support of you learning! You just let me know what questions you have 🙂 |
I've run into issues around metadata, so I've updated both import functions. In the metadata importe, I reference the sampleID column with the key Zooming out a little, Tidyverse packages like readr may give us better ways to handle this. |
Hey @colinbrislawn, Re: ancom-bc2 - we were waiting for that paper to be published before starting development on this method, but it looks like that's finally happened as of a little over a month ago! I'll chat with the rest of the eng team in this week's meeting about this and see what thoughts are there. |
I thought I remembered a bunch of allowed Here's the options I can see:
What do you suggest? |
Let's see what can be done with |
Checks pass on my machine! I've got to remove my own notes and code before we merge... |
@lizgehret, I've got the R code working, including a patch to _ancombc.py to always output When you have a chance, could you take a look at the unit test? Thanks! |
Hey @colinbrislawn! Sorry for the delay, still getting caught up on things after vacation and travel. Will give this a proper review shortly! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @colinbrislawn,
Sorry (again) for the delay on this! Overall this looks reasonable to me, thanks again for working on this! I think the remaining task here is (just as you mention in your test comment) to add assertions into the contents of the dataloaf itself for these new test cases.
One additional (small) comment is that we may want to re-name IDE with something a bit more verbose (like IDs-with-Es, or something like that). Only reason I bring that up is because I was initially thinking of IDE in terms of interactive development environment and that may cause confusion in the future.
Good idea. I noticed that too when I wrote it. This I can fix.
I'm unfamiliar with the dataloaf format. Would you like me to try this first or is this something you could wrap up for me? |
I can finish that up, thanks @colinbrislawn! I'll wait for your name change updates re: IDE and then take it from there. 🙂 |
okay @colinbrislawn i've updated these tests to be more comprehensive - they now assert that the contents of the resultant IDs in each slice of the dataloaf are what we'd expect. i also apologize for asking you to update the names of the files you added that contained IDE - i realized after the fact that we didn't need those files after all 🤦♀️ i'm going to get someone else to give this a final review (@colinvwood TIA!) since i'm no longer an unbiased reviewer 😉 but this should be g2g! |
Co-authored-by: colinvwood <68213641+colinvwood@users.noreply.github.com>
Trying to close #130
More testing is needed, especially a regression test for sampleIDs like this.