# Introduction
See https://github.com/malariagen/fits/issues/44

Here I am trying to determine whether changes made to study in SequenceScape has propagated through to mlwh and iRODS. The following were previously changed:
- The following were moved from 1195-PF-TRAC2-DONDORP to 1180-PF-TRAC2-DONDORP in sequencescape:
  -  RCN03610
  -  RCN06860
  -  RCN06881
  -  RCN06893
  -  RCN06911
- Study IHTP_1131-PF-BN-BERTIN was changed to 1131-PF-BJ-BERTIN

In [1]:
%run ../setup.ipynb

python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) 
[GCC 7.3.0]
numpy 1.15.4
scipy 1.1.0
pandas 0.23.4
numexpr 2.6.8
pysam 0.15.2
pysamstats 1.1.2
petlx 1.0.3
vcf 0.6.8
h5py 2.8.0
tables 3.4.4
zarr 2.2.1.dev126
scikit-allel 1.2.0


In [2]:
mlwh_conn = MySQLdb.connect(host='mlwh-db', port=3435, user='mlwh_malaria', passwd='Solaris&2015', db='mlwarehouse')

# Check if samples were moved from 1195-PF-TRAC2-DONDORP to 1180-PF-TRAC2-DONDORP

In [3]:
sql_query = 'SELECT \
        study.name as study_name, \
        study.id_study_lims as study_lims, \
        sample.supplier_name as sample, \
        iseq_product_metrics.id_run, \
        iseq_product_metrics.position, \
        iseq_product_metrics.tag_index \
    FROM \
        study, \
        iseq_flowcell, \
        sample, \
        iseq_product_metrics \
    WHERE \
        study.id_study_tmp = iseq_flowcell.id_study_tmp and \
        iseq_flowcell.id_sample_tmp = sample.id_sample_tmp and \
        iseq_flowcell.manual_qc = 1 and \
        iseq_product_metrics.id_iseq_flowcell_tmp = iseq_flowcell.id_iseq_flowcell_tmp and \
        sample.supplier_name in ("RCN03610", "RCN06860", "RCN06881", "RCN06893", "RCN06911"); \
'

df_trac2 = pd.read_sql(sql_query, con=mlwh_conn)    
print(df_trac2.shape)
df_trac2

(5, 6)


Unnamed: 0,study_name,study_lims,sample,id_run,position,tag_index
0,1195-PF-TRAC2_DONDORP,4531,RCN03610,23117,8,9
1,1195-PF-TRAC2_DONDORP,4531,RCN06860,23117,8,12
2,1195-PF-TRAC2_DONDORP,4531,RCN06881,23117,8,15
3,1195-PF-TRAC2_DONDORP,4531,RCN06893,23117,8,19
4,1195-PF-TRAC2_DONDORP,4531,RCN06911,23117,8,23


In [4]:
!imeta ls -d /seq/23117/23117_8#9.cram | grep -A 1 'attribute: study$'

attribute: study
value: 1195-PF-TRAC2_DONDORP


In [5]:
sql_query = 'SELECT * from study where study.name like "%1195%"'

df_1195 = pd.read_sql(sql_query, con=mlwh_conn)    
print(df_1195.shape)
df_1195

(1, 40)


Unnamed: 0,id_study_tmp,id_lims,uuid_study_lims,id_study_lims,last_updated,recorded_at,deleted_at,created,name,reference_genome,...,data_release_delay_reason,remove_x_and_autosomes,aligned,separate_y_chromosome_data,data_access_group,prelim_id,hmdmc_number,data_destination,s3_email_list,data_deletion_period
0,4487,SQSCP,1158db20-c088-11e6-bf8a-68b59976a384,4531,2018-08-22 14:41:11,2018-08-22 14:41:11,,2016-12-12 16:28:52,1195-PF-TRAC2_DONDORP,Plasmodium_falciparum (3D7_Jan16v3),...,,0,1,0,,,,,,


# Check if samples were moved from IHTP_1131-PF-BN-BERTIN to 1131-PF-BJ-BERTIN

In [6]:
sql_query = 'SELECT * from study where study.name like "%1131%"'

df_1131 = pd.read_sql(sql_query, con=mlwh_conn)    
print(df_1131.shape)
df_1131

(2, 40)


Unnamed: 0,id_study_tmp,id_lims,uuid_study_lims,id_study_lims,last_updated,recorded_at,deleted_at,created,name,reference_genome,...,data_release_delay_reason,remove_x_and_autosomes,aligned,separate_y_chromosome_data,data_access_group,prelim_id,hmdmc_number,data_destination,s3_email_list,data_deletion_period
0,2567,SQSCP,62da6cf0-a6cb-11e2-8985-68b59976a382,2598,2017-05-11 12:32:00,2017-05-11 12:32:00,,2013-04-16 19:25:24,ILC 1131 Nosocomial Klebsiella pneumoniae with...,Klebsiella_pneumoniae (KCTC_2242),...,,0,1,0,,,,,,
1,3365,SQSCP,0dd36ab0-6c20-11e4-8102-68b59976a382,3398,2019-01-14 16:06:28,2019-01-14 16:06:28,,2014-11-14 17:02:45,1131-PF-BJ-BERTIN,Plasmodium_falciparum (3D7_Jan16v3),...,,0,1,0,,,,,,


In [7]:
sql_query = 'SELECT \
        study.name as study_name, \
        study.id_study_lims as study_lims, \
        sample.supplier_name as sample, \
        iseq_product_metrics.id_run, \
        iseq_product_metrics.position, \
        iseq_product_metrics.tag_index \
    FROM \
        study, \
        iseq_flowcell, \
        sample, \
        iseq_product_metrics \
    WHERE \
        study.id_study_tmp = iseq_flowcell.id_study_tmp and \
        iseq_flowcell.id_sample_tmp = sample.id_sample_tmp and \
        iseq_flowcell.manual_qc = 1 and \
        iseq_product_metrics.id_iseq_flowcell_tmp = iseq_flowcell.id_iseq_flowcell_tmp and \
        study.name = "1131-PF-BJ-BERTIN"; \
'

df_trac2 = pd.read_sql(sql_query, con=mlwh_conn)    
print(df_trac2.shape)
df_trac2.head()

(420, 6)


Unnamed: 0,study_name,study_lims,sample,id_run,position,tag_index
0,1131-PF-BJ-BERTIN,3398,QT0001-C,15966,1,1
1,1131-PF-BJ-BERTIN,3398,QT0002-C,15966,1,2
2,1131-PF-BJ-BERTIN,3398,QT0004-C,15966,1,3
3,1131-PF-BJ-BERTIN,3398,QT0005-C,15966,1,4
4,1131-PF-BJ-BERTIN,3398,QT0005-CW,15966,1,5


In [8]:
df_trac2.tail()

Unnamed: 0,study_name,study_lims,sample,id_run,position,tag_index
415,1131-PF-BJ-BERTIN,3398,SPT16783,23720,8,18
416,1131-PF-BJ-BERTIN,3398,SPT16768,23720,8,19
417,1131-PF-BJ-BERTIN,3398,SPT16776,23720,8,20
418,1131-PF-BJ-BERTIN,3398,SPT16769,23720,8,21
419,1131-PF-BJ-BERTIN,3398,SPT16777,23720,8,22


In [9]:
!imeta ls -d /seq/15966/15966_1#1.cram | grep -A 1 'attribute: study$'

attribute: study
value: 1131-PF-BJ-BERTIN


In [10]:
# What study was this assigned to in Pf 6.2?
!grep SPT16777 /nfs/team112_internal/production_files/Pf/6_2/pf_62_irods_manifest_20181023.txt

/lustre/scratch118/malaria/team112/pipelines/setups/pf_62/input/23027_5#1.cram	1131-PF-BJ-BERTIN	SPT16777	23027_5#1	2929816	1	/seq/23027/23027_5#1.cram	3398STDY6950148	5833	3398	IHTP_1131-PF-BN-BERTIN	23027	5	1.0	2017-07-18 12:54:37	1.0	qc complete	HX1	HiSeqX	150	450.0	450.0	98.95	23027_5#1.cram	197117638.0	ERR2094526	
/lustre/scratch118/malaria/team112/pipelines/setups/pf_62/input/23720_8#22.cram	1131-PF-BJ-BERTIN	SPT16777	23720_8#22	3765878	1	/seq/23720/23720_8#22.cram	3398STDY6950148	5833	3398	IHTP_1131-PF-BN-BERTIN	23720	8	22.0	2017-09-18 12:01:42	1.0	qc complete	HX8	HiSeqX	150	450.0	450.0	97.44	23720_8#22.cram	251321129.0	ERR2193441	


# Conclusion
After Sonia requested core to correct study name from IHTP_1131-PF-BN-BERTIN to 1131-PF-BJ-BERTIN, it seems that files had the study name changed correctly in both mlwh and iRODS.

Sonia has earlier requested that 5 samples were moved from study 1195-PF-TRAC2-DONDORP to study 1180-PF-TRAC2-DONDORP. These changes don't appear to have propagated through, but this might be because the original study name was actually 1195-PF-TRAC2_DONDORP (note the underscore rather than hyphen before DONDORP).