Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
93 lines (66 sloc) 4.49 KB

Flu Pipeline Notes

VDB

Upload documents to VDB

  1. Download sequences and meta information from GISAID
  • In EPIFLU, select host as human, select HA as required segment, select Submission Date >= last upload date to vdb
  • Ideally download about 5000 isolates at a time, may have to split downloads by submission date
  • Download Isolates as XLS with YYYY-MM-DD date format
  • Download Isolates as "Sequences (DNA) as FASTA"
    • Select all DNA
    • Fasta Header as 0: DNA Accession no., 1: Isolate name, 2: Isolate ID, 3: Segment, 4: Passage details/history, 5: Submitting lab
    • DNA Accession no. | Isolate name | Isolate ID | Segment | Passage details/history | Submitting lab
  1. Move files to fauna/data as gisaid_epiflu.xls and gisaid_epiflu.fasta.
  2. Upload to vdb database

Update documents in VDB

All of these functions are quite slow given they run over ~600k documents. Use sparingly.

  • Update genetic grouping fields

    • python2 vdb/flu_update.py -db vdb -v flu --update_groupings
    • updates vtype, subtype, lineage
  • Update locations

    • python2 vdb/flu_update.py -db vdb -v flu --update_locations
    • updates division, country and region from location
  • Update passage_category fields

    • python2 vdb/flu_update.py -db vdb -v flu --update_passage_categories
    • update passage_category based on passage field

Download documents from VDB

  • python2 vdb/flu_download.py -db vdb -v flu --select locus:HA lineage:seasonal_h3n2 --fstem h3n2
  • python2 vdb/flu_download.py -db vdb -v flu --select locus:HA lineage:seasonal_h1n1pdm --fstem h1n1pdm
  • python2 vdb/flu_download.py -db vdb -v flu --select locus:HA lineage:seasonal_vic --fstem vic
  • python2 vdb/flu_download.py -db vdb -v flu --select locus:HA lineage:seasonal_yam --fstem yam

TDB

Upload documents to TDB

Raw tables from NIMR reports

  1. Convert NIMR report pdfs to csv files
  2. Move csv files to subtype directory in fauna/data/
  3. Upload to tdb database
  • python2 tdb/upload.py -db tdb -v flu --subtype h3n2 --ftype flat --fstem h3n2_nimr_titers
  • Recommend running with --preview to confirm strain names are correctly parsed before uploading

Flat files

  1. Move line-list tsv files to fauna/data/
  2. Upload to tdb database with python2 tdb/upload.py -db tdb -v flu --subtype h3n2 --ftype flat --fstem H3N2_HI_titers_upload

CDC files

  1. Move line-list tsv files to fauna/data/
  2. Upload HI titers to tdb database with python2 tdb/cdc_upload.py -db cdc_tdb -v flu --ftype flat --fstem HITest_Oct2018_to_Sep2019_titers
  3. Upload FRA titers to tdb database with python2 tdb/cdc_upload.py -db cdc_tdb -v flu --ftype flat --fstem FRA_Oct2018_to_Sep2019_titers

Crick files

  1. Move Excel documents to fauna/data/
  2. Run python2 tdb/crick_upload.py -db crick_tdb --assay_type hi --fstem H3N2HIs
  3. Run python2 tdb/crick_upload.py -db crick_tdb --assay_type fra --fstem H3N2VNs
  4. Run python2 tdb/crick_upload.py -db crick_tdb --assay_type hi --fstem H1N1pdm09HIs
  5. Run python2 tdb/crick_upload.py -db crick_tdb --assay_type hi --fstem BVicHIs
  6. Run python2 tdb/crick_upload.py -db crick_tdb --assay_type hi --fstem BYamHIs

NIID files

  1. Make sure NIID-Tokyo-WHO-CC/ is a sister directory to fauna/
  2. Upload all titers with python2 tdb/upload_all.py --sources niid -db niid_tdb

VIDRL files

  1. Make sure VIDRL-Melbourne-WHO-CC/ is a sister directory to fauna/
  2. Upload all titers with python2 tdb/upload_all.py --sources vidrl -db vidrl_tdb

Download documents from TDB

  • python2 tdb/download.py -db tdb -v flu --subtype h3n2
  • python2 tdb/download.py -db tdb -v flu --subtype h1n1pdm
  • python2 tdb/download.py -db tdb -v flu --subtype vic
  • python2 tdb/download.py -db tdb -v flu --subtype yam
You can’t perform that action at this time.