-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add duplicates and tube_ids feature #3291
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider my comments and add a unit-test for the functionality that you wrote. The test should include BLANKS in upper and lower-case as well as normal sample-names. It should also include sample-names with the qiita-id pre-pended and not.
snames[i] = f'{tube_ids_rev[sname]} ({sname})' | ||
|
||
# Finds duplicates in the samples | ||
seen = dict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try using collections.Counter. Once you have a Counter() object populated you can get the list of duplicates with a list comprehension:
list_of_dupes = [x for x in my_counter if my_counter[x] > 1]
The code will look cleaner, but more importantly list comprehensions and Counter() will be more efficient under the hood than the basic for loop/conditional implementation you have here.
for i, qsname in enumerate(qsnames): | ||
if qsname.startswith(qid): | ||
qsnames[i] = qsname.replace(f'{qid}.', "", 1) | ||
|
||
# Adds tube ids to a dict with key as tube id and value as qsname | ||
tube_ids_dict = dict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid naming variables in terms of their data-structure (something_list, something_dict). They nearly always can be given a more descriptive name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, would tube_id_lookup
be an appropriate name for the variable? Would it be better if I also change the name of the tube_id_rev
variable too as well then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, those sound good.
qid = self.get_argument("qid") | ||
snames = self.get_argument("snames").split() | ||
|
||
# Get study give qiita id | ||
st = Study(qid).sample_template |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
st is a not a properly descriptive name for a variable, especially one that plays an important role some ten lines later in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would changing the variable name to study
work better, or would a variable name like qt_study
be more appropriate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
study is fine.
for i, qsname in enumerate(qsnames): | ||
if qsname.startswith(qid): | ||
qsnames[i] = qsname.replace(f'{qid}.', "", 1) | ||
|
||
# Adds tube ids to a dict with key as tube id and value as qsname |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you're munging data like this to get it into the proper shape you need, it often seems intuitive to the writer, but to later readers it can appear somewhat opaque. In these cases it's important to write comments that communicate why you're doing it and what you hope to achieve. The above comment is just an English-language summary of the code below it, which I can already read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be appropriate:
# Creates a way to access a tube_id by its corresponding sample name
# and vice versa, which is important to adding tube_id in parentheses
# after a sample name a few lines later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
much improved, thanks!
Thanks for the comments! I've pushed by current changes based on the comments that are there if you would like to take another look. Quick question -- how and where do I implement unit tests? |
@CatFish47 Try making a tests directory under qiita_pet/handlers and use the test framework in analysis_handlers/tests and api_proxy/tests as a template. Try to exercise this code. |
Closing as this has been superseded by #3295 |
Addresses issue #3275