Skip to content
Permalink
Browse files

Streamline dengue to just do a simple pull from ViPR

This should be far more maintainable and allows nextstrain.org/dengue to be updated.
  • Loading branch information...
trvrb committed Aug 16, 2019
1 parent f9e7955 commit 84dbc87af2a0d6592c1b4a0bad4280bf44b85d5b
@@ -1,23 +1,31 @@
# DENGUE Pipeline Notes

## Upload documents to VDB
1. Download sequences from [LANL](https://hfv.lanl.gov/components/sequence/HCV/search/searchi.html)
* Select GB Submission Date `>=MM/YYYY` of last VDB update or "Last GenBank Update" in the upper right corner (whichever is earlier).
* Set the rest of the parameters as shown:
![Parameters](figures/download_instructions.png)
* Hit "Search"
* Select "Save Background Info" and check the box for "Click here to include the sequence."
2. Move downloaded file to `fauna/data`
## Upload

### [ViPR sequences](https://www.viprbrc.org/brc/vipr_genome_search.spg?method=ShowCleanSearch&decorator=flavi_dengue)

1. Download sequences
* Select genome length >= 5000
* Download as Genome Fasta
* Set Custom Format Fields to 0: GenBank Accession, 1: Strain Name, 2: Segment, 3: Date, 4: Host, 5: Country, 6: Subtype, 7: Virus Type
2. Move downloaded sequences to `fauna/GenomicFastaResults.fasta`
3. Upload to vdb database
* `python2 vdb/dengue_upload.py -db vdb -v dengue --fname results.tbl --ftype tsv`
* `python2 vdb/dengue_upload.py -db vdb -v dengue --source genbank --locus genome --fname GenomicFastaResults.fasta`

## Update

* Update citation fields
* `python2 vdb/dengue_update.py -db vdb -v dengue --update_citations`
* updates `authors`, `title`, `url`, `journal` and `puburl` fields from genbank files
* If you get `ERROR: Couldn't connect with entrez, please run again` just run command again

## Download sequence documents from VDB

* `python2 vdb/dengue_download.py` # all serotypes together
* `python2 vdb/dengue_download.py --select serotype:1` # just serotype 1
* `python2 vdb/dengue_download.py --select serotype:2` # just serotype 2
* `python2 vdb/dengue_download.py --select serotype:3` # just serotype 3
* `python2 vdb/dengue_download.py --select serotype:4` # just serotype 4
* `python2 vdb/dengue_download.py --select serotype:dengue_virus_1` # just serotype 1
* `python2 vdb/dengue_download.py --select serotype:dengue_virus_2` # just serotype 2
* `python2 vdb/dengue_download.py --select serotype:dengue_virus_3` # just serotype 3
* `python2 vdb/dengue_download.py --select serotype:dengue_virus_4` # just serotype 4

## Download titer documents from TDB

@@ -1,16 +1,5 @@
# ZIKA Pipeline Notes

## Update

* Update citation fields
* `python2 vdb/zika_update.py -db vdb -v zika --update_citations`
* updates `authors`, `title`, `url`, `journal` and `puburl` fields from genbank files
* If you get `ERROR: Couldn't connect with entrez, please run again` just run command again
* Update location fields
* After hand editing `location` in [chateau](https://github.com/blab/chateau)
* `python2 vdb/zika_update.py -db vdb -v zika --update_locations`
* Updates `division`, `country`, `region` fields

## Download

python2 vdb/zika_download.py -db vdb -v zika --fstem zika --resolve_method choose_genbank
@@ -38,3 +27,10 @@ Upload with:
python2 vdb/zibra_upload.py -db vdb -v zika --source fh --locus genome --authors "Black et al" --url https://github.com/blab/zika-colombia/ --title "Genomic epidemiology supports multiple introductions and cryptic transmission of Zika virus in Colombia" --fname ZIKA-COL-good.fasta

python2 vdb/zibra_upload.py -db vdb -v zika --source fh --locus genome --authors "Black et al" --url https://github.com/blab/zika-colombia/ --title "Genomic epidemiology supports multiple introductions and cryptic transmission of Zika virus in Colombia" --fname ZIKA-COL-partial.fasta

## Update

* Update citation fields
* `python2 vdb/zika_update.py -db vdb -v zika --update_citations`
* updates `authors`, `title`, `url`, `journal` and `puburl` fields from genbank files
* If you get `ERROR: Couldn't connect with entrez, please run again` just run command again
@@ -0,0 +1 @@
label fix
@@ -0,0 +1,2 @@
label fix

@@ -1,7 +1,8 @@
label fix
NC_001477 DENV1/NAURUISLAND/REFERENCE/1997
NC_001474 DENV2/THAILAND/REFERENCE/1964
NC_001475 DENV3/SRI_LANKA/REFERENCE/2000
NC_002640 DENV4/NA/REFERENCE/2003
NC_002640 DENV4/NA/REFERENCE/2003
KT452802 DENV4/CAMBODIA/V0624301AC33/2011
KT452803 DENV4/NICARAGUA/703/1999
KT452800 DENV3/PUERTORICO/PRS228762AC27/1963
@@ -1,45 +1,20 @@
import os,datetime
from download import download
from download import get_parser
import rethinkdb as r
import time
import re

class dengue_download(download):
def __init__(self, **kwargs):
download.__init__(self, **kwargs)

def add_selections_command(self, command, selections=[], **kwargs): # Command is an instance of r.table
'''
Add selections filter to command
'''
if len(selections)>0:
for sel in selections: # sel like (field, [value1, value2, ...])
field = sel[0]
values = sel[1]
if field == 'gene_list':
print "Only downloading documents with one or more of %s in 'gene_list' field."%str(values)
command = command.filter(lambda doc: doc[field] in values)
else:
print("Only downloading documents with field \'" + field + "\' equal to one of " + str(values))
command = command.filter(lambda doc: r.expr(values).contains(doc[field]))
return command

if __name__=="__main__":
parser = get_parser()
args = parser.parse_args()
args.fasta_fields = ['strain', 'accession', 'collection_date', 'region', 'country', 'division', 'location', 'authors']
if args.virus == None:
setattr(args, 'virus', 'dengue')
if args.database == None:
setattr(args, 'database', 'vdb')
fasta_fields = ['strain', 'virus', 'accession', 'collection_date', 'region',
'country', 'division', 'location', 'source', 'locus', 'authors', 'url', 'title', 'journal', 'puburl']
args.fasta_fields = fasta_fields
current_date = str(datetime.datetime.strftime(datetime.datetime.now(),'%Y_%m_%d'))
if args.fstem is None:
try:
serotype=args.select[0][-1]
args.fstem = 'dengue_denv%s'%serotype
except:
args.fstem = 'dengue_all'

args.fstem = args.virus + '_' + current_date
if not os.path.isdir(args.path):
os.makedirs(args.path)
connfluVDB = dengue_download(**args.__dict__)
@@ -2,12 +2,12 @@
from dengue_upload import dengue_upload
from update import parser

class dengue_update(dengue_upload):
class dengue_update(update, dengue_upload):
def __init__(self, **kwargs):
update.__init__(self, **kwargs)
dengue_upload.__init__(self, **kwargs)

if __name__=="__main__":
args = parser.parse_args()
connVDB = dengue_update(**args.__dict__)
connVDB.location_fields=['location', 'division', 'country', 'region']
connVDB.update(**args.__dict__)

0 comments on commit 84dbc87

Please sign in to comment.
You can’t perform that action at this time.