## Downloading from an ImmuneDB Link
For a hosted ImmuneDB instance, you can directly download and load data from the website link. Depending on the database size, initially gathering the data may take some time.  After it is downloaded, the cached version will be used unless the data is explicitly deleted.

In [1]:
import hicutils.core.io as io

io.pull_immunedb_data(
    'https://myurl.com/immunedb',
    'mydb',
    'example_data_immunedb'
)

Unnamed: 0,clone_id,subject,v_gene,j_gene,functional,insertions,deletions,cdr3_nt,cdr3_num_nts,cdr3_aa,...,METADATA_sequencing_date,METADATA_sequencing_type,METADATA_species,METADATA_umi,METADATA_gad,METADATA_ia2,METADATA_iaa,METADATA_znt8,METADATA_date_hic_received,METADATA_collapse_name
8248,6311533,HPAP015,IGHV2-5,IGHJ4,T,,,TGTGCACACAGCTGGGTACGGTATAACAGTGGCTGGGGCTTTCACT...,51,CAHSWVRYNSGWGFHYW,...,2019-08-05,Bulk,human,0,neg,neg,pos,neg,2019-07-31,
9810,6326493,HPAP015,IGHV2-70|2-70D,IGHJ4,T,,,TGTGCACGGCCCCATGGCAGCAGTGGCTGGTACTACTTTGACTACTGG,48,CARPHGSSGWYYFDYW,...,2019-08-05,Bulk,human,0,neg,neg,pos,neg,2019-07-31,
8697,6315829,HPAP015,IGHV2-5,IGHJ4,T,,,TGTGCCAGGGGCCAGTGGCTGGCACCGAACCACTTTGACTACTGG,45,CARGQWLAPNHFDYW,...,2019-08-05,Bulk,human,0,neg,neg,pos,neg,2019-07-31,
7970,6308963,HPAP015,IGHV2-5,IGHJ4,T,,,TGTGCACACAGGGGCAGCAGCTGGGACTACTGG,33,CAHRGSSWDYW,...,2019-08-05,Bulk,human,0,neg,neg,pos,neg,2019-07-31,
8549,6314347,HPAP015,IGHV2-5,IGHJ4,T,,,TGTGCGCACAGTACGATACGATTTCAGTACTACTTTGACTCCTGG,45,CAHSTIRFQYYFDSW,...,2019-08-05,Bulk,human,0,neg,neg,pos,neg,2019-07-31,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4137,7029341,HPAP017,IGHV1-46,IGHJ3,T,,,TGTGCGGCAGTTCGTTACTATGATAGTAGTGGTTATTTTGCTGCCG...,87,CAAVRYYDSSGYFAAGDSDYGRAGAFDIW,...,2019-08-05,Bulk,human,0,pos,neg,neg,neg,2019-07-31,
4135,7029336,HPAP017,IGHV1-46,IGHJ3,T,,,TGTGCGGCAGCAAATTACTATGATAGNAGTGGTTATTACCACTATG...,60,CAAANYYDXSGYYHYAFDIW,...,2019-08-05,Bulk,human,0,pos,neg,neg,neg,2019-07-31,
4134,7029309,HPAP017,IGHV1-46,IGHJ3,T,,,TGTGCGAGAGATCTCTATGATAGTATTGGTTATTACCGGGCCGANG...,60,CARDLYDSIGYYRAXAFDIW,...,2019-08-05,Bulk,human,0,pos,neg,neg,neg,2019-07-31,
4133,7029295,HPAP017,IGHV1-46,IGHJ3,T,,,NGTGCGAGAGACAAGTATAGTGGGAGCTACTACTTGTCCGATGCTT...,57,XARDKYSGSYYLSDAFDIW,...,2019-08-05,Bulk,human,0,pos,neg,neg,neg,2019-07-31,


## Importing from existing files with metadata in filenames
Alternatively, if you have existing files which were exported from ImmuneDB (either using `immunedb_export ... clones ...` or via the website), they can be imported directly.  Take for example the files below:

In [2]:
%%bash
ls example_data_meta_in_names

HPAP015.T1D.pooled.tsv
HPAP017.Control.pooled.tsv


The files can be imported with the following:

In [3]:
import hicutils.core.io as io

# Specify that the metadata in the filename is the disease status
# If there are multiple features separated with the _AND_ string
# per the ImmuneDB specification, the second parameter should
# be a list of all features (e.g. for age and siease ['age', 'disease'].
io.read_tsvs('example_data_meta_in_names', ['disease'])

Unnamed: 0,clone_id,subject,v_gene,j_gene,functional,insertions,deletions,cdr3_nt,cdr3_num_nts,cdr3_aa,...,copies,germline,parent_id,avg_v_identity,top_copy_seq,copies_fraction,copies_percent,shm,clones,disease
16548,6310562,HPAP015,IGHV2-5,IGHJ4|5,T,,,TGTGCACGTGCGCGGGGGGCTTATTGG,27,CARARGAYW,...,41,CAGGTCACCTTGAAGGAGTCTGGTCCT---GCGCTGGTGAAACCCA...,,0.955849,NNNNNNNNNNNNNNNNNNNNNNNNNNN---NNNNNNNNNNNNNNNN...,0.000953,0.095316,4.415122,1,T1D
16771,6311533,HPAP015,IGHV2-5,IGHJ4,T,,,TGTGCACACAGCTGGGTACGGTATAACAGTGGCTGGGGCTTTCACT...,51,CAHSWVRYNSGWGFHYW,...,34,CAGATCACCTTGAAGGAGTCTGGTCCT---ACGCTGGTGAAACCCA...,,0.988600,NNNNNNNNNNNNNNNNNNNNNNNNNNN---NNNNNNNNNNNNNNNN...,0.000790,0.079042,1.140000,1,T1D
19430,6326493,HPAP015,IGHV2-70|2-70D,IGHJ4,T,,,TGTGCACGGCCCCATGGCAGCAGTGGCTGGTACTACTTTGACTACTGG,48,CARPHGSSGWYYFDYW,...,31,CAGGTCACCTTGAAGGAGTCTGGTCCT---GCGCTGGTGAAACCCA...,,0.956600,NNNNNNNNNNNNNNNNNNNNNNNNNNN---NNNNNNNNNNNNNNNN...,0.000721,0.072068,4.340000,1,T1D
17713,6315829,HPAP015,IGHV2-5,IGHJ4,T,,,TGTGCCAGGGGCCAGTGGCTGGCACCGAACCACTTTGACTACTGG,45,CARGQWLAPNHFDYW,...,30,CAGATCACCTTGAAGGAGTCTGGTCCT---ACGCTGGTGAAACCCA...,,0.953710,NNNNNNNNNNNNNNNNNNNNNNNNNNN---NNNNNNNNNNNNNNNN...,0.000697,0.069743,4.629000,1,T1D
7648,6262779,HPAP015,IGHV1-3,IGHJ4,T,,,TGTGCGAGAGCCGTGGAGAATCATTTTGACTGGTTAAGTAACTACTGG,48,CARAVENHFDWLSNYW,...,30,CAGGTCCAGCTTGTGCAGTCTGGGGCT---GAGGTGAAGAAGCCTG...,,0.940033,NNNNNNNNNNNNNNNNNNNNNNNNNNN---NNNNNNNNNNNNNNNN...,0.000697,0.069743,5.996667,1,T1D
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8487,7016857,HPAP017,IGHV1-3,IGHJ3,F,,,TGNNCGAGACAGGGTGCGTAGCAGTGGCTGGTACTGTGGGGGGGGG...,63,XXRQGA*QWLVLWGGDAFDIW,...,1,CAGGTCCAGCTTGTGCAGTCTGGGGCT---GAGGTGAAGAAGCCTG...,,0.967300,NNNNNNNNNNNNNNNNNNNNNNNNNNN---NNNNNNNNNNNNNNNN...,0.000020,0.001979,3.270000,1,Control
8488,7016859,HPAP017,IGHV1-3,IGHJ3,T,,,TGTGCGAGAGTCATGGTGGGTTATAGTGGCTACGGAGGTNNCTACG...,75,CARVMVGYSGYGGXYXVSGYAFDIW,...,1,CAGGTCCAGCTTGTGCAGTCTGGGGCT---GAGGTGAAGAAGCCTG...,,0.972100,NNNNNNNNNNNNNNNNNNNNNNNNNNN---NNNNNNNNNNNNNNNN...,0.000020,0.001979,2.790000,1,Control
8492,7016881,HPAP017,IGHV1-3,IGHJ3,T,,,TGTGCGAGAGGGGGTTNTCGGCAGAGGGTGGCGAATTACTNTGGTT...,72,CARGGXRQRVANYXGSGRGAFDIW,...,1,CAGGTCCAGCTTGTGCAGTCTGGGGCT---GAGGTGAAGAAGCCTG...,,0.958100,NNNNNNNNNNNNNNNNNNNNNNNNNNN---NNNNNNNNNNNNNNNN...,0.000020,0.001979,4.190000,1,Control
8493,7016885,HPAP017,IGHV1-3,IGHJ3,T,,,TGTGCGAGAGTATCCAGCTATGGTTGGGAAAGTGCAGGGCCTGATG...,60,CARVSSYGWESAGPDAFDXW,...,1,CAGGTCCAGCTTGTGCAGTCTGGGGCT---GAGGTGAAGAAGCCTG...,,0.953500,NNNNNNNNNNNNNNNNNNNNNNNNNNN---NNNNNNNNNNNNNNNN...,0.000020,0.001979,4.650000,1,Control


## Importing from existing replicate files and associated metadata file

Finally you can load individual replicate files so long as their is an associated metadata file with the column `replicate_name` and then metadata for each file. For example, here are the files:

In [13]:
%%bash
ls example_data_immunedb

HPAP015.IgH_HPAP015_rep1_200p0ng.pooled.tsv
HPAP015.IgH_HPAP015_rep2_200p0ng.pooled.tsv
HPAP017.IgH_HPAP017_rep1_200p0ng.pooled.tsv
HPAP017.IgH_HPAP017_rep2_200p0ng.pooled.tsv
metadata.tsv


And here is the example metadata file (**this is just to show the file, you should not actually run this command**):

In [14]:
import pandas as pd
pd.read_csv('example_data_immunedb/metadata.tsv', sep='\t')

Unnamed: 0,replicate_name,aab,aab_summary,age,age_units,biological_sample,comments,date_sample_received,disease,dq8_call,...,sequencing_date,sequencing_type,species,umi,gad,ia2,iaa,znt8,date_hic_received,collapse_name
0,IgH_HPAP015_rep1_200p0ng,1,yes,29,years,HPAP015,,,T1D,yes,...,2019-08-05,Bulk,human,0,neg,neg,pos,neg,2019-07-31,
1,IgH_HPAP015_rep2_200p0ng,1,yes,29,years,HPAP015,,,T1D,yes,...,2019-08-05,Bulk,human,0,neg,neg,pos,neg,2019-07-31,
2,IgH_HPAP017_rep1_200p0ng,1,yes,30,years,HPAP017,,,Control,no,...,2019-08-05,Bulk,human,0,pos,neg,neg,neg,2019-07-31,
3,IgH_HPAP017_rep2_200p0ng,1,yes,30,years,HPAP017,,,Control,no,...,2019-08-05,Bulk,human,0,pos,neg,neg,neg,2019-07-31,


Given this, to load this directory run:

In [15]:
import hicutils.core.io as io

io.read_directory('example_data_immunedb')

Unnamed: 0,clone_id,subject,v_gene,j_gene,functional,insertions,deletions,cdr3_nt,cdr3_num_nts,cdr3_aa,...,METADATA_sequencing_date,METADATA_sequencing_type,METADATA_species,METADATA_umi,METADATA_gad,METADATA_ia2,METADATA_iaa,METADATA_znt8,METADATA_date_hic_received,METADATA_collapse_name
8248,6311533,HPAP015,IGHV2-5,IGHJ4,T,,,TGTGCACACAGCTGGGTACGGTATAACAGTGGCTGGGGCTTTCACT...,51,CAHSWVRYNSGWGFHYW,...,2019-08-05,Bulk,human,0,neg,neg,pos,neg,2019-07-31,
9810,6326493,HPAP015,IGHV2-70|2-70D,IGHJ4,T,,,TGTGCACGGCCCCATGGCAGCAGTGGCTGGTACTACTTTGACTACTGG,48,CARPHGSSGWYYFDYW,...,2019-08-05,Bulk,human,0,neg,neg,pos,neg,2019-07-31,
8697,6315829,HPAP015,IGHV2-5,IGHJ4,T,,,TGTGCCAGGGGCCAGTGGCTGGCACCGAACCACTTTGACTACTGG,45,CARGQWLAPNHFDYW,...,2019-08-05,Bulk,human,0,neg,neg,pos,neg,2019-07-31,
7970,6308963,HPAP015,IGHV2-5,IGHJ4,T,,,TGTGCACACAGGGGCAGCAGCTGGGACTACTGG,33,CAHRGSSWDYW,...,2019-08-05,Bulk,human,0,neg,neg,pos,neg,2019-07-31,
8549,6314347,HPAP015,IGHV2-5,IGHJ4,T,,,TGTGCGCACAGTACGATACGATTTCAGTACTACTTTGACTCCTGG,45,CAHSTIRFQYYFDSW,...,2019-08-05,Bulk,human,0,neg,neg,pos,neg,2019-07-31,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4137,7029341,HPAP017,IGHV1-46,IGHJ3,T,,,TGTGCGGCAGTTCGTTACTATGATAGTAGTGGTTATTTTGCTGCCG...,87,CAAVRYYDSSGYFAAGDSDYGRAGAFDIW,...,2019-08-05,Bulk,human,0,pos,neg,neg,neg,2019-07-31,
4135,7029336,HPAP017,IGHV1-46,IGHJ3,T,,,TGTGCGGCAGCAAATTACTATGATAGNAGTGGTTATTACCACTATG...,60,CAAANYYDXSGYYHYAFDIW,...,2019-08-05,Bulk,human,0,pos,neg,neg,neg,2019-07-31,
4134,7029309,HPAP017,IGHV1-46,IGHJ3,T,,,TGTGCGAGAGATCTCTATGATAGTATTGGTTATTACCGGGCCGANG...,60,CARDLYDSIGYYRAXAFDIW,...,2019-08-05,Bulk,human,0,pos,neg,neg,neg,2019-07-31,
4133,7029295,HPAP017,IGHV1-46,IGHJ3,T,,,NGTGCGAGAGACAAGTATAGTGGGAGCTACTACTTGTCCGATGCTT...,57,XARDKYSGSYYLSDAFDIW,...,2019-08-05,Bulk,human,0,pos,neg,neg,neg,2019-07-31,
