Skip to content
Permalink
Browse files

Merge pull request #94 from nextstrain/avian

Added functionality for uploading avian flu sequences from the Influenza Research Database to fauna.
  • Loading branch information...
trvrb committed Apr 17, 2019
2 parents e9a3936 + 2581638 commit 804b413299f2f51858c8a981722bc4b33fd321a0
Showing with 267 additions and 61 deletions.
  1. +19 −1 builds/AVIAN_FLU.md
  2. +56 −0 source-data/flu_fix_location_label.tsv
  3. +34 −0 source-data/geo_synonyms.tsv
  4. +158 −60 vdb/avian_flu_upload.py
@@ -4,6 +4,8 @@

### Upload documents to VDB

#### Upload from GISAID

1. Download sequences and meta information from [GISAID](http://platform.gisaid.org/)
* In EPIFLU, select for either H7N9 sequences or H5N1 sequences, select `HA` as required segment, select Submission Date >= last upload date to vdb
* Download at most 5000 isolates at a time, may have to split downloads by submission date
@@ -14,7 +16,23 @@
* `DNA Accession no. | Isolate name | Isolate ID | Segment | Passage details/history | Submitting lab`
2. Move files to `fauna/data` as `gisaid_epiflu.xls` and `gisaid_epiflu.fasta`.
3. Upload to vdb database
* `python2 vdb/avian_flu_upload.py -db vdb -v avian_flu --source gisaid --fname gisaid_epiflu`
* `python2 vdb/avian_flu_upload.py -db vdb -v avian_flu --data_source gisaid --source gisaid --fname gisaid_epiflu`
* Recommend running with `--preview` to confirm strain names and locations are correctly parsed before uploading
* Can add to [geo_synonyms file](source-data/geo_synonyms.tsv) and [flu_fix_location_label file](source-data/flu_fix_location_label.tsv) to fix some of the formatting.

#### Upload from IRD

1. Download sequences from [IRD](https://www.fludb.org)
* Search for Sequences and strains
* Select Data Type as Strain
* Enter either "H5N1" or "H7N9" under Subtype
* Click Search
* Click download all
...
* Download "Segment FASTA" as `GenomicFastaResults.fasta`. Select "Custom format", select all and add.
2. Move file to `fauna/data` as `GenomicFastaResults.fasta`.
3. Upload to vdb database
* `python2 vdb/avian_flu_upload.py -db vdb -v avian_flu --data_source ird --source ird --fname GenomicFastaResults.fasta`
* Recommend running with `--preview` to confirm strain names and locations are correctly parsed before uploading
* Can add to [geo_synonyms file](source-data/geo_synonyms.tsv) and [flu_fix_location_label file](source-data/flu_fix_location_label.tsv) to fix some of the formatting.

@@ -47,11 +47,16 @@ S?oPaulo SaoPaulo
StaCatarina SantaCatarina
MinasGerias MinasGerais

# Burkina Faso
Burkina_Faso BurkinaFaso

# Canada
NorthwestTerritories NorthWestTerritories
BC BritishColumbia
Canada_SK Saskatchewan
Canada_QC Quebec
Canada_ON Ontario
ON Ontario
Canada_NWT NorthwestTerritories
Canada_NFL NewfoundlandAndLabrador
Canada_NB NewBrunswick
@@ -110,6 +115,7 @@ Beijinxuanwu BeijingXuanwu
Cheng-mei ChengMei
Chongqingyuzhong ChongqingYuzhong
Guandong Guangdong
Eastern_China EasternChina
Fujiangulou FujianGulou
Gansuchengguan GansuChengguan
Guanxi Guangxi
@@ -172,6 +178,7 @@ NanPing Nanping
Nangchang Nanchang
Neimeng Neimenggu
NingDe Ningde
North_China NorthChina
PuTian Putian
Qingfang Qingfeng
Quinghai-Gangcha Gangca
@@ -211,6 +218,9 @@ CoteDIvorie CoteDIvoire
YopougonGR926 Yopougon/GR926
Korogho Korhogo
IvoryCoast CoteDIvoire
Ivory_Coast CoteDIvoire
Cote_dIvoire CoteDIvoire
Cote_d'Ivoire CoteDIvoire

# CzechRepublic
CzechRep CzechRepublic
@@ -231,6 +241,8 @@ Equador Ecuador

# Egypt
Helwan746 Helwan/746
El_Fayoum Faiyum
Sharkia Sharqia

# Finland
Finlad Finland
@@ -372,10 +384,20 @@ Pachgani Panchgani
Patna-India Patna
Ujain Ujjain
Uttrakhand Uttarakhand
West_Bengal WestBengal
Ytml Yavatmal

# Indonesia
Bandungjava Bandung
Central_Java CentralJava
East_Java EastJava
East_Kalimantan EastKalimantan
North_Sumatra Sumatra
Sulawesi_Selatan SulawesiSelatan
South_Kalimantan SouthKalimantan
West_Java WestJava
Yogjakarta Yogyakarta
WestJava_Sbg_ WestJava

# Iran
Bandar_Abbas BandarAbbas
@@ -475,6 +497,7 @@ Darkhan-uul DarkhanUul
Dornogobi Dornogovi
Dundgobi Dundgovi
Gobisumber Govisumber
Inner_Mongolia InnerMongolia
Orchon Orkhon
Ulgii Olgii
Ulaabaatar Ulaanbaatar
@@ -492,6 +515,7 @@ RABT Rabat

# Myanmar
Myanmer Myanmar
Kyaing_Tong KyaingTong

# Netherlands
Netherland Netherlands
@@ -620,6 +644,7 @@ NorthNovgorod NizhnyNovgorod
Ossetia-Alania OssetiaAlania
Rostov-On-Don RostovOnDon
Rostov-on-Don RostovOnDon
Rostov_on_Don RostovOnDon
Rostovoblast Rostov
Russia_Krasnodar Krasnodar
Salehard Salekhard
@@ -669,6 +694,7 @@ Daejeion Daejeon
Deajeon Daejeon
Gyeonggibuk Gyeongbuk
Gyongbuk Gyeongbuk
Ha_Nam Hanam
Inchon Incheon
Kangwon Gangwon
Kyounggi Gyeonggi
@@ -693,6 +719,9 @@ Rathnapura Ratnapura
# SaintLucia
StLucia SaintLucia

# SaudiArabia
Saudi_Arabia SaudiArabia

# StVincentAndTheGrenadines
StVincentAndGrenadines StVincentAndTheGrenadines
StVincentTheGrenadines StVincentAndTheGrenadines
@@ -726,9 +755,11 @@ Chantuaburi Chanthaburi
CU-THA CuTha
Cu-Tha CuTha
Kamphaengphet KamphaengPhet
Kohn_Kaen KohnKaen
MaeHongSorn MaeHongSon
Nakhonratchasima NakhonRatachasima
Nakhonratchaisma NakhonRatachasima
Nakhon_Pathom NakhonPathom
NONG KAI NongKhai
Nongkai NongKhai
NONTHANURI Nonthaburi
@@ -737,6 +768,8 @@ Nonthanuri Nonthaburi
Nothaburi Nonthaburi
Pathumthani PathumThani
Petchaburi Phetchaburi
Phathumthani PathumThani
Prachinburi__Thailand_ PrachinBuri
Prachuapkhirikhan PrachuapKhiriKhan
Prachinburi PrachinBuri
Prajianburi PrachinBuri
@@ -752,6 +785,7 @@ Trinidad-Tobago Trinidad

# Turkey
SUrfa Sanliurfa
Ha_Tay Hatay

# TurksAndCaicos
TurksandCaicos TurksAndCaicos
@@ -791,6 +825,8 @@ Delware Delaware
DictrictofColumbia DistrictOfColumbia
DistrictofColombia DistrictOfColumbia
DC DistrictOfColumbia
DE Delaware
Delaware_Bay DelawareBay
Ft.Benning FtBenning
Georgia_NHRC GeorgiaNHRC
Lousiana Louisiana
@@ -799,20 +835,24 @@ Massachussetts Massachusetts
Mem Memphis
ID Idaho
GA Georgia
VA Virginia
VI VirginIslands
North_Carolina NorthCarolina
WI Wisconsin
MD Maryland
WV WestVirginia
Md Maryland
MI Michigan
NEW-YORK NewYork
New_York NewYork
North_Dakota NorthDakota
West_Virginia WestVirginia
Los_Angeles LosAngeles
New_Mexico NewMexico
NewMexcio NewMexico
New-York NewYork
Pennsalvanya Pennsylvania
PA Pennsylvania
South_Carolina SouthCarolina
South_Dakota SouthDakota
West_Virginia WestVirginia
@@ -827,7 +867,23 @@ Venezuala Venezuela

# Vietnam
VietNam Vietnam
Viet_Nam Vietnam
Bac_Lieu BacLieu
Ca_Mau CaMau
Cao_Bang CaoBang
HaNoi Hanoi
Hai_Duong HaiDuong
Hau_Giang HauGiang
Lang_Son LangSon
Long_An LongAn
Nam_Dinh NamDinh
Ninh_Binh NinhBinh
Quang_Ninh QuangNinh
Vietnam_Hau_Giang HuaGiang
Soc_Trang SocTrang
Son_La SonLa
Vietnam_NamDinh NamDinh
Vietnam_BacLiu BacLieu

# WallisAndFutuna
WALLIS-FUTUNA WallisAndFutuna
Oops, something went wrong.

0 comments on commit 804b413

Please sign in to comment.
You can’t perform that action at this time.