Skip to content
This repository has been archived by the owner on Jan 23, 2021. It is now read-only.

Populate Alfresco study using sequencescape-alfresco study mapping file #44

Open
podpearson opened this issue Dec 17, 2018 · 8 comments
Assignees

Comments

@podpearson
Copy link
Member

In many cases there is a 1-to-1 mapping between sequencescape study and alfresco study. In some cases, the names are identical. In some cases, alfresco study can (and has) been inferred from sequencescape study name, (e.g. "IHTP_PWGS 1134-PF-ML-CONWAY" study is Alfresco study 1134-PF-ML-CONWAY). In some cases, the two are subtly different (IHTP_1131-PF-BN-BERTIN vs 1131-PF-BJ-BERTIN - note BN vs BJ). In some cases, domain knowledge is required (e.g. sequencescape study "Plasmodium HB3xDD2 progeny" maps to Alfresco study "1041-PF-US-FERDIG").

Rather than inferring based on some rule, a more complete and accurate method for populating "Alfresco study" from sequencescape study might be to use a mapping file. I previously created such a thing when I was building manifests. A symlink to the latest version can be found at /nfs/team112_internal/rp7/src/github/malariagen/SIMS/meta/mlwh/sequencescape_alfresco_study_mappings.txt.

Could we consider incorporating such a mapping file into the process of populating the "Alfresco study" tags?

@tnguyensanger
Copy link

@sclaugoncalves Do you foresee any issues with using always using alfresco study as the sequencescape study going forward?

@sclaugoncalves
Copy link

@tnguyensanger no issues, we are already doing this for a while now...

@sclaugoncalves
Copy link

@tnguyensanger, @podpearson we can also update old study names in sequencescape to map 1 to 1 to alfresco

@tnguyensanger
Copy link

@tnguyensanger no issues, we are already doing this for a while now...

Sweet :).

@sclaugoncalves @podpearson There are over 10000 samples in FITS which are missing an Alfresco Study but are marked as R&D. See attached file for full list.

There is an entry in /nfs/team112_internal/rp7/src/github/malariagen/SIMS/meta/mlwh/sequencescape_alfresco_study_mappings.txt to map SequenceScape study <==> Alfresco study for "Team 112 R&D" <==> "1089-R&D".

Do we ever do R&D for other teams/labs?

How should we handle samples SequenceScape studies "Malaria Programme R&D" or "Malaria R&D" but empty Alfresco study?

To see which samples fall under this category, try running this query in FITS

select * from vw_pivot_sample where vw_pivot_sample.alfresco_study is null and vw_pivot_sample.sequenscape_study_name like '%R&D';

fits_r_and_d_samples_missing_alfresco_study.tsv.txt

@podpearson
Copy link
Member Author

Thanks @sclaugoncalves and @tnguyensanger . I have been considering for a while whether we should apply all the "exceptions" I have found (see https://github.com/malariagen/SIMS/tree/master/meta/mlwh) back to sequencescape. The more I think about this the more I think it is what we should do. Before doing this, I think we first need to understand whether this information would then feed through to other systems, particularly mlwh, iRODS and subtrack. I think we would also need to think about what audit trails we might need. Any thoughts gratefully received!

@podpearson
Copy link
Member Author

Do we ever do R&D for other teams/labs?

Not that I'm aware of

How should we handle samples SequenceScape studies "Malaria Programme R&D" or "Malaria R&D" but empty Alfresco study?

I think in general these have empty Alfresco study because they are simply not associated with any Alfresco study. They are R&D samples, e.g. created by doing stuff in the lab to cultured lab strains, rather than samples received from partners.

@sclaugoncalves
Copy link

@tnguyensanger yes, R&D samples are not assigned to an alfresco study...

In the past all groups in the programme were submitting R&D samples through our R&D study (Team 112) but when number of samples started to increase we decided to split it. All other groups now submit to study Malaria R&D, but I guess there still might be some old samples from other groups in our study. We can move those to the correct study.

@podpearson agree, we need to check with core if any change will pass on to all systems.
As a small test, I requested core to correct the study name for study 1131 (your first comment here), I'll let you you know when it's done and we can then check it...

@podpearson
Copy link
Member Author

@sclaugoncalves , after you requested core to correct study name from IHTP_1131-PF-BN-BERTIN to 1131-PF-BJ-BERTIN, it seems that files have had the study name changed correctly in both mlwh and iRODS.

You had earlier requested that 5 samples (RCN03610, RCN06860, RCN06881, RCN06893, RCN06911) were moved from study 1195-PF-TRAC2-DONDORP to study 1180-PF-TRAC2-DONDORP. These changes don't appear to have propagated through, but this might be because the original study name was actually 1195-PF-TRAC2_DONDORP (note the underscore rather than hyphen before DONDORP). I'll forward on the original email about this.

For details see https://github.com/malariagen/fits/blob/master/work/44_populating_alfresco_study/20190123_check_if_study_was_changed_in_mlwh_and_irods.ipynb

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants