Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further revisions to batch validation #20

Closed
jamesaoverton opened this issue Jun 28, 2019 · 8 comments · Fixed by #21
Closed

Further revisions to batch validation #20

jamesaoverton opened this issue Jun 28, 2019 · 8 comments · Fixed by #21
Assignees

Comments

@jamesaoverton
Copy link
Owner

I'd like some more changes to the batch validation.

The cached data should be divided by assay type: data/fcsAnalyzed/SDY113.json, data/hai/SDY113.json. The current code will overwrite the JSON without checking the assay types.

Currently the script is re-authenticating for every fetch from ImmPort, but it should only authenticate once each time the script is executed.

In general, I don't want the commands to be interactive. I want everything scripted and version controlled, so it's reproducible. I'd like to pass the username and password using environment variables. I'd like to be able to call make build/hai.tsv (or something), and have a script look in HIPC_Studies.tsv to get all the right SDY accessions, then process all those studies.

For interactive scripts, having an option for each input (e.g. --studiesinfo build/HIPC_Studies.tsv) is good. For scripts used in Makefiles I prefer the convention of specifying the required files in the right order for a Make task, and then just calling the list of inputs and the output:

result.tsv: src/script.py build/input1.tsv build/input2.tsv
    $^ $@

and so

$ make result.tsv
src/script.py build/input1.tsv build/input2.tsv result.tsv
$

The main advantage is avoiding duplication of the arguments. I also like make to notice if the script has been updated.

@lmcmicu
Copy link
Collaborator

lmcmicu commented Jun 29, 2019

Hi James, ok for all of these. In regards to the first point, though, it isn't quite true that the script will overwrite the JSON withoug checking the assay type. In the current version of the script, if I ask for (for example) SDY113 and SDY44, two json files will be generated in the build directory corresponding to each.

Then further calls of the script which ask for those ones will not overwrite the old json files.

@lmcmicu
Copy link
Collaborator

lmcmicu commented Jun 29, 2019

E.g., here is a case where SDY113 is already cached, but SDY200 is not:

Enter a list of Cytometry studies to validate (e.g. SDY113) separated by whitespace: SDY113 SDY200
Retrieved JSON data for SDY113 from cached file build/SDY113.json
Fetching JSON data for SDY200 from ImmPort
Retrieving authentication token from Immport ...
Sending request: https://api.immport.org/data/query/result/fcsAnalyzed?studyAccession=SDY200

@lmcmicu
Copy link
Collaborator

lmcmicu commented Jun 29, 2019

Also, just to clarify, the script currently can be run non-interactively by passing the credentials, etc. as command line arguments. It's just that currently this isn't being done via the Makefile, as I didn't want to save those credentials in the file.

I did not think of having them in environment variables, though. Clearly that's the right solution. Definitely I can do that. Thanks for that suggestion and for your other comments.

@jamesaoverton
Copy link
Owner Author

Thanks.

My point is that a single study often includes multiple assay types. If I ask for HAI for SDY123 and then later flow cytometry for SDY123, they both go to a single SDY123.json.

@lmcmicu
Copy link
Collaborator

lmcmicu commented Jun 29, 2019

Oh. That's part of a different (wider) issue, though. This script does not handle HAI but only cytometry. The script which handles hai and neutAbTiter is in hipc-validation, so it didn't occur to me to cache across repos.

You mentioned the possibility of at some point combining these scripts, and maybe even the repos, but I thought you said that was on hold for now.

Should I combine the scripts, or just the caching? If the scripts, which repo should the script be placed into? If just the caching, then I guess what needs to be done is that each script should be made aware of the other.

@jamesaoverton
Copy link
Owner Author

No, you’re right. I got mixed up.

@lmcmicu
Copy link
Collaborator

lmcmicu commented Jun 29, 2019

Ok.

I just realised, though, that it didn't occur to me to add the caching into the hipc-validation script. I only did it for cell-name-and-marker-validator (total oversight on my part, I guess just because the issue was opened only in this repo).

If we're not going to merge the scripts, then I should at least duplicate all of the changes that I'll be making for this issue (and for the previous one) in that repo as well.

Also, since the hipc-validation script has two different assay types, I should be distinguishing between them when caching the json, exactly as you said. I guess in cell-name-and-marker-validator that isn't strictly speaking necessary, but if the solution is general enough I guess it won't matter.

I will duplicate this issue in that repo.

@jamesaoverton
Copy link
Owner Author

Yes, it will be good to have caching in the other repo. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants