Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add duplicates and tube_ids feature #3291

Closed
wants to merge 3 commits into from

Conversation

CatFish47
Copy link
Contributor

Addresses issue #3275

  • Adds a duplicate column formatted like "SAMPLE_NAME × 2" if there are 2 copies of "SAMPLE_NAME"
  • Removed all duplicates from all other columns (blanks, matching, extra)
  • Added checking for tube_ids. If tube_ids are detected, will validate on tube_ids and will add the tube id after the sample name in listen (e.g. "SAMPLE_NAME (tube_id)")

@coveralls
Copy link

coveralls commented May 19, 2023

Coverage Status

coverage: 92.894% (-0.03%) from 92.928%
when pulling 5843087 on CatFish47:sample-val-tube
into aac0ec7 on qiita-spots:dev.

@charles-cowart charles-cowart self-requested a review May 19, 2023 04:31
Copy link
Contributor

@charles-cowart charles-cowart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider my comments and add a unit-test for the functionality that you wrote. The test should include BLANKS in upper and lower-case as well as normal sample-names. It should also include sample-names with the qiita-id pre-pended and not.

snames[i] = f'{tube_ids_rev[sname]} ({sname})'

# Finds duplicates in the samples
seen = dict()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try using collections.Counter. Once you have a Counter() object populated you can get the list of duplicates with a list comprehension:
list_of_dupes = [x for x in my_counter if my_counter[x] > 1]

The code will look cleaner, but more importantly list comprehensions and Counter() will be more efficient under the hood than the basic for loop/conditional implementation you have here.

for i, qsname in enumerate(qsnames):
if qsname.startswith(qid):
qsnames[i] = qsname.replace(f'{qid}.', "", 1)

# Adds tube ids to a dict with key as tube id and value as qsname
tube_ids_dict = dict()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid naming variables in terms of their data-structure (something_list, something_dict). They nearly always can be given a more descriptive name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, would tube_id_lookup be an appropriate name for the variable? Would it be better if I also change the name of the tube_id_rev variable too as well then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, those sound good.

qid = self.get_argument("qid")
snames = self.get_argument("snames").split()

# Get study give qiita id
st = Study(qid).sample_template
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

st is a not a properly descriptive name for a variable, especially one that plays an important role some ten lines later in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would changing the variable name to study work better, or would a variable name like qt_study be more appropriate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

study is fine.

for i, qsname in enumerate(qsnames):
if qsname.startswith(qid):
qsnames[i] = qsname.replace(f'{qid}.', "", 1)

# Adds tube ids to a dict with key as tube id and value as qsname
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you're munging data like this to get it into the proper shape you need, it often seems intuitive to the writer, but to later readers it can appear somewhat opaque. In these cases it's important to write comments that communicate why you're doing it and what you hope to achieve. The above comment is just an English-language summary of the code below it, which I can already read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be appropriate:

# Creates a way to access a tube_id by its corresponding sample name
# and vice versa, which is important to adding tube_id in parentheses
# after a sample name a few lines later

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much improved, thanks!

@CatFish47
Copy link
Contributor Author

Thanks for the comments! I've pushed by current changes based on the comments that are there if you would like to take another look. Quick question -- how and where do I implement unit tests?

@charles-cowart
Copy link
Contributor

@CatFish47 Try making a tests directory under qiita_pet/handlers and use the test framework in analysis_handlers/tests and api_proxy/tests as a template. Try to exercise this code.

@antgonza antgonza linked an issue May 31, 2023 that may be closed by this pull request
@antgonza
Copy link
Member

antgonza commented Jun 2, 2023

Closing as this has been superseded by #3295

@antgonza antgonza closed this Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sample_Validation?
4 participants