Further revisions to batch validation #20

jamesaoverton · 2019-06-28T13:39:12Z

I'd like some more changes to the batch validation.

The cached data should be divided by assay type: data/fcsAnalyzed/SDY113.json, data/hai/SDY113.json. The current code will overwrite the JSON without checking the assay types.

Currently the script is re-authenticating for every fetch from ImmPort, but it should only authenticate once each time the script is executed.

In general, I don't want the commands to be interactive. I want everything scripted and version controlled, so it's reproducible. I'd like to pass the username and password using environment variables. I'd like to be able to call make build/hai.tsv (or something), and have a script look in HIPC_Studies.tsv to get all the right SDY accessions, then process all those studies.

For interactive scripts, having an option for each input (e.g. --studiesinfo build/HIPC_Studies.tsv) is good. For scripts used in Makefiles I prefer the convention of specifying the required files in the right order for a Make task, and then just calling the list of inputs and the output:

result.tsv: src/script.py build/input1.tsv build/input2.tsv
    $^ $@

and so

$ make result.tsv
src/script.py build/input1.tsv build/input2.tsv result.tsv
$

The main advantage is avoiding duplication of the arguments. I also like make to notice if the script has been updated.

The text was updated successfully, but these errors were encountered:

lmcmicu · 2019-06-29T13:37:07Z

Hi James, ok for all of these. In regards to the first point, though, it isn't quite true that the script will overwrite the JSON withoug checking the assay type. In the current version of the script, if I ask for (for example) SDY113 and SDY44, two json files will be generated in the build directory corresponding to each.

Then further calls of the script which ask for those ones will not overwrite the old json files.

lmcmicu · 2019-06-29T13:38:16Z

E.g., here is a case where SDY113 is already cached, but SDY200 is not:

Enter a list of Cytometry studies to validate (e.g. SDY113) separated by whitespace: SDY113 SDY200
Retrieved JSON data for SDY113 from cached file build/SDY113.json
Fetching JSON data for SDY200 from ImmPort
Retrieving authentication token from Immport ...
Sending request: https://api.immport.org/data/query/result/fcsAnalyzed?studyAccession=SDY200

lmcmicu · 2019-06-29T14:06:21Z

Also, just to clarify, the script currently can be run non-interactively by passing the credentials, etc. as command line arguments. It's just that currently this isn't being done via the Makefile, as I didn't want to save those credentials in the file.

I did not think of having them in environment variables, though. Clearly that's the right solution. Definitely I can do that. Thanks for that suggestion and for your other comments.

jamesaoverton · 2019-06-29T14:38:01Z

Thanks.

My point is that a single study often includes multiple assay types. If I ask for HAI for SDY123 and then later flow cytometry for SDY123, they both go to a single SDY123.json.

lmcmicu · 2019-06-29T15:17:58Z

Oh. That's part of a different (wider) issue, though. This script does not handle HAI but only cytometry. The script which handles hai and neutAbTiter is in hipc-validation, so it didn't occur to me to cache across repos.

You mentioned the possibility of at some point combining these scripts, and maybe even the repos, but I thought you said that was on hold for now.

Should I combine the scripts, or just the caching? If the scripts, which repo should the script be placed into? If just the caching, then I guess what needs to be done is that each script should be made aware of the other.

jamesaoverton · 2019-06-29T15:37:08Z

No, you’re right. I got mixed up.

lmcmicu · 2019-06-29T15:48:33Z

Ok.

I just realised, though, that it didn't occur to me to add the caching into the hipc-validation script. I only did it for cell-name-and-marker-validator (total oversight on my part, I guess just because the issue was opened only in this repo).

If we're not going to merge the scripts, then I should at least duplicate all of the changes that I'll be making for this issue (and for the previous one) in that repo as well.

Also, since the hipc-validation script has two different assay types, I should be distinguishing between them when caching the json, exactly as you said. I guess in cell-name-and-marker-validator that isn't strictly speaking necessary, but if the solution is general enough I guess it won't matter.

I will duplicate this issue in that repo.

jamesaoverton · 2019-06-29T16:57:03Z

Yes, it will be good to have caching in the other repo. Thanks.

jamesaoverton assigned lmcmicu Jun 28, 2019

lmcmicu mentioned this issue Jun 29, 2019

Mirror the revisions to batch validation that are being made in the cell-name-and-marker-validator repo jamesaoverton/hipc-validation#6

Closed

lmcmicu mentioned this issue Jul 12, 2019

Batch validation changes and refactoring of common module #21

Merged

jamesaoverton closed this as completed in #21 Jul 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Further revisions to batch validation #20

Further revisions to batch validation #20

jamesaoverton commented Jun 28, 2019

lmcmicu commented Jun 29, 2019 •

edited

Loading

lmcmicu commented Jun 29, 2019 •

edited

Loading

lmcmicu commented Jun 29, 2019

jamesaoverton commented Jun 29, 2019

lmcmicu commented Jun 29, 2019 •

edited

Loading

jamesaoverton commented Jun 29, 2019

lmcmicu commented Jun 29, 2019

jamesaoverton commented Jun 29, 2019

Further revisions to batch validation #20

Further revisions to batch validation #20

Comments

jamesaoverton commented Jun 28, 2019

lmcmicu commented Jun 29, 2019 • edited Loading

lmcmicu commented Jun 29, 2019 • edited Loading

lmcmicu commented Jun 29, 2019

jamesaoverton commented Jun 29, 2019

lmcmicu commented Jun 29, 2019 • edited Loading

jamesaoverton commented Jun 29, 2019

lmcmicu commented Jun 29, 2019

jamesaoverton commented Jun 29, 2019

lmcmicu commented Jun 29, 2019 •

edited

Loading

lmcmicu commented Jun 29, 2019 •

edited

Loading

lmcmicu commented Jun 29, 2019 •

edited

Loading