Add duplicates and tube_ids feature #3291

CatFish47 · 2023-05-19T01:57:01Z

Addresses issue #3275

Adds a duplicate column formatted like "SAMPLE_NAME × 2" if there are 2 copies of "SAMPLE_NAME"
Removed all duplicates from all other columns (blanks, matching, extra)
Added checking for tube_ids. If tube_ids are detected, will validate on tube_ids and will add the tube id after the sample name in listen (e.g. "SAMPLE_NAME (tube_id)")

coveralls · 2023-05-19T02:28:19Z

coverage: 92.894% (-0.03%) from 92.928%
when pulling 5843087 on CatFish47:sample-val-tube
into aac0ec7 on qiita-spots:dev.

charles-cowart

Please consider my comments and add a unit-test for the functionality that you wrote. The test should include BLANKS in upper and lower-case as well as normal sample-names. It should also include sample-names with the qiita-id pre-pended and not.

charles-cowart · 2023-05-23T15:53:41Z

qiita_pet/handlers/admin_processing_job.py

+                    snames[i] = f'{tube_ids_rev[sname]} ({sname})'
+
+        # Finds duplicates in the samples
+        seen = dict()


Try using collections.Counter. Once you have a Counter() object populated you can get the list of duplicates with a list comprehension:
list_of_dupes = [x for x in my_counter if my_counter[x] > 1]

The code will look cleaner, but more importantly list comprehensions and Counter() will be more efficient under the hood than the basic for loop/conditional implementation you have here.

charles-cowart · 2023-05-23T15:59:35Z

qiita_pet/handlers/admin_processing_job.py

        for i, qsname in enumerate(qsnames):
            if qsname.startswith(qid):
                qsnames[i] = qsname.replace(f'{qid}.', "", 1)

+        # Adds tube ids to a dict with key as tube id and value as qsname
+        tube_ids_dict = dict()


Avoid naming variables in terms of their data-structure (something_list, something_dict). They nearly always can be given a more descriptive name.

In this case, would tube_id_lookup be an appropriate name for the variable? Would it be better if I also change the name of the tube_id_rev variable too as well then?

Sure, those sound good.

charles-cowart · 2023-05-23T16:02:22Z

qiita_pet/handlers/admin_processing_job.py

        qid = self.get_argument("qid")
        snames = self.get_argument("snames").split()

+        # Get study give qiita id
+        st = Study(qid).sample_template


st is a not a properly descriptive name for a variable, especially one that plays an important role some ten lines later in the code.

Would changing the variable name to study work better, or would a variable name like qt_study be more appropriate?

study is fine.

charles-cowart · 2023-05-23T16:05:31Z

qiita_pet/handlers/admin_processing_job.py

        for i, qsname in enumerate(qsnames):
            if qsname.startswith(qid):
                qsnames[i] = qsname.replace(f'{qid}.', "", 1)

+        # Adds tube ids to a dict with key as tube id and value as qsname


When you're munging data like this to get it into the proper shape you need, it often seems intuitive to the writer, but to later readers it can appear somewhat opaque. In these cases it's important to write comments that communicate why you're doing it and what you hope to achieve. The above comment is just an English-language summary of the code below it, which I can already read.

Would this be appropriate:

# Creates a way to access a tube_id by its corresponding sample name # and vice versa, which is important to adding tube_id in parentheses # after a sample name a few lines later

much improved, thanks!

CatFish47 · 2023-05-27T00:34:42Z

Thanks for the comments! I've pushed by current changes based on the comments that are there if you would like to take another look. Quick question -- how and where do I implement unit tests?

charles-cowart · 2023-05-31T20:33:20Z

@CatFish47 Try making a tests directory under qiita_pet/handlers and use the test framework in analysis_handlers/tests and api_proxy/tests as a template. Try to exercise this code.

antgonza · 2023-06-02T18:57:46Z

Closing as this has been superseded by #3295

Add duplicates and tube_ids feature draft 1

5843087

Fix linting

5dff0ba

charles-cowart self-requested a review May 19, 2023 04:31

charles-cowart requested changes May 23, 2023

View reviewed changes

Update code with requested changes

c22d6e8

antgonza linked an issue May 31, 2023 that may be closed by this pull request

Sample_Validation? #3275

Closed

antgonza closed this Jun 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add duplicates and tube_ids feature #3291

Add duplicates and tube_ids feature #3291

CatFish47 commented May 19, 2023

coveralls commented May 19, 2023 •

edited

Loading

charles-cowart left a comment

charles-cowart May 23, 2023

charles-cowart May 23, 2023

CatFish47 May 26, 2023

charles-cowart May 31, 2023

charles-cowart May 23, 2023

CatFish47 May 26, 2023

charles-cowart May 31, 2023

charles-cowart May 23, 2023

CatFish47 May 26, 2023

charles-cowart May 31, 2023

CatFish47 commented May 27, 2023

charles-cowart commented May 31, 2023

antgonza commented Jun 2, 2023

Add duplicates and tube_ids feature #3291

Add duplicates and tube_ids feature #3291

Conversation

CatFish47 commented May 19, 2023

coveralls commented May 19, 2023 • edited Loading

charles-cowart left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CatFish47 commented May 27, 2023

charles-cowart commented May 31, 2023

antgonza commented Jun 2, 2023

coveralls commented May 19, 2023 •

edited

Loading