-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Further revisions to batch validation #20
Comments
Hi James, ok for all of these. In regards to the first point, though, it isn't quite true that the script will overwrite the JSON withoug checking the assay type. In the current version of the script, if I ask for (for example) SDY113 and SDY44, two json files will be generated in the build directory corresponding to each. Then further calls of the script which ask for those ones will not overwrite the old json files. |
E.g., here is a case where SDY113 is already cached, but SDY200 is not:
|
Also, just to clarify, the script currently can be run non-interactively by passing the credentials, etc. as command line arguments. It's just that currently this isn't being done via the Makefile, as I didn't want to save those credentials in the file. I did not think of having them in environment variables, though. Clearly that's the right solution. Definitely I can do that. Thanks for that suggestion and for your other comments. |
Thanks. My point is that a single study often includes multiple assay types. If I ask for HAI for SDY123 and then later flow cytometry for SDY123, they both go to a single SDY123.json. |
Oh. That's part of a different (wider) issue, though. This script does not handle HAI but only cytometry. The script which handles You mentioned the possibility of at some point combining these scripts, and maybe even the repos, but I thought you said that was on hold for now. Should I combine the scripts, or just the caching? If the scripts, which repo should the script be placed into? If just the caching, then I guess what needs to be done is that each script should be made aware of the other. |
No, you’re right. I got mixed up. |
Ok. I just realised, though, that it didn't occur to me to add the caching into the If we're not going to merge the scripts, then I should at least duplicate all of the changes that I'll be making for this issue (and for the previous one) in that repo as well. Also, since the I will duplicate this issue in that repo. |
Yes, it will be good to have caching in the other repo. Thanks. |
I'd like some more changes to the batch validation.
The cached data should be divided by assay type:
data/fcsAnalyzed/SDY113.json
,data/hai/SDY113.json
. The current code will overwrite the JSON without checking the assay types.Currently the script is re-authenticating for every fetch from ImmPort, but it should only authenticate once each time the script is executed.
In general, I don't want the commands to be interactive. I want everything scripted and version controlled, so it's reproducible. I'd like to pass the username and password using environment variables. I'd like to be able to call
make build/hai.tsv
(or something), and have a script look inHIPC_Studies.tsv
to get all the right SDY accessions, then process all those studies.For interactive scripts, having an option for each input (e.g.
--studiesinfo build/HIPC_Studies.tsv
) is good. For scripts used inMakefiles
I prefer the convention of specifying the required files in the right order for a Make task, and then just calling the list of inputs and the output:and so
The main advantage is avoiding duplication of the arguments. I also like
make
to notice if the script has been updated.The text was updated successfully, but these errors were encountered: