Skip to content
Permalink
Browse files

Explicitly specify python2 calls in build docs

  • Loading branch information...
trvrb committed May 26, 2019
1 parent 7f09294 commit 6739de12c112e190e23324603759273e65e3e86c
Showing with 58 additions and 58 deletions.
  1. +2 −2 builds/AVIAN_FLU.md
  2. +7 −7 builds/DENGUE.md
  3. +4 −4 builds/EBOLA.md
  4. +22 −22 builds/FLU.md
  5. +2 −2 builds/MEASLES.md
  6. +8 −8 builds/MUMPS.md
  7. +7 −7 builds/ZIBRA.md
  8. +6 −6 builds/ZIKA.md
@@ -38,6 +38,6 @@

### Download documents from VDB

```bash
python vdb/avian_flu_download.py -db vdb -v avian_flu --select locus:HA subtype:h7n9 --fstem h7n9_ha
```
python2 vdb/avian_flu_download.py -db vdb -v avian_flu --select locus:HA subtype:h7n9 --fstem h7n9_ha
```
@@ -9,16 +9,16 @@
* Select "Save Background Info" and check the box for "Click here to include the sequence."
2. Move downloaded file to `fauna/data`
3. Upload to vdb database
* `python vdb/dengue_upload.py -db vdb -v dengue --fname results.tbl --ftype tsv`
* `python2 vdb/dengue_upload.py -db vdb -v dengue --fname results.tbl --ftype tsv`

## Download sequence documents from VDB

* `python vdb/dengue_download.py` # all serotypes together
* `python vdb/dengue_download.py --select serotype:1` # just serotype 1
* `python vdb/dengue_download.py --select serotype:2` # just serotype 2
* `python vdb/dengue_download.py --select serotype:3` # just serotype 3
* `python vdb/dengue_download.py --select serotype:4` # just serotype 4
* `python2 vdb/dengue_download.py` # all serotypes together
* `python2 vdb/dengue_download.py --select serotype:1` # just serotype 1
* `python2 vdb/dengue_download.py --select serotype:2` # just serotype 2
* `python2 vdb/dengue_download.py --select serotype:3` # just serotype 3
* `python2 vdb/dengue_download.py --select serotype:4` # just serotype 4

## Download titer documents from TDB

* `python tdb/download.py -db tdb -v dengue --fstem dengue`
* `python2 tdb/download.py -db tdb -v dengue --fstem dengue`
@@ -8,17 +8,17 @@
4. Replace `sierra_leone\|\?` with `sierra_leone|sierra_leone`, `liberia\|\?` with `liberia|liberia` and `guinea\|\?` with `guinea|guinea`
5. Replace `sierra_leone\|\|` with `sierra_leone|sierra_leone|`, `liberia\|\|` with `liberia|liberia|` and `guinea\|\|` with `guinea|guinea|`
6. Upload to vdb database
* `python vdb/ebola_upload.py -db vdb -v ebola --source genbank --locus genome --fname Makona_1610_genomes_genbank.fasta`
* `python2 vdb/ebola_upload.py -db vdb -v ebola --source genbank --locus genome --fname Makona_1610_genomes_genbank.fasta`
7. Hand edit author and url info into other 153 genomes
8. Upload to vdb database
* `python vdb/ebola_upload.py -db vdb -v ebola --source genbank --locus genome --fname Makona_1610_genomes_quick.fasta --authors "Quick et al" --url https://github.com/nickloman/ebov/`
* `python2 vdb/ebola_upload.py -db vdb -v ebola --source genbank --locus genome --fname Makona_1610_genomes_quick.fasta --authors "Quick et al" --url https://github.com/nickloman/ebov/`

## Update

1. Update citation fields
* `python vdb/ebola_update.py -db vdb -v ebola --update_citations`
* `python2 vdb/ebola_update.py -db vdb -v ebola --update_citations`
* Updates `authors`, `title`, `url`, `journal` and `puburl` fields from genbank files
* If you get `ERROR: Couldn't connect with entrez, please run again` just run command again

## Download documents from VDB
* `python vdb/ebola_download.py -db vdb -v ebola --fstem ebola --resolve_method choose_genbank`
* `python2 vdb/ebola_download.py -db vdb -v ebola --fstem ebola --resolve_method choose_genbank`
@@ -23,23 +23,23 @@
All of these functions are quite slow given they run over ~600k documents. Use sparingly.

* Update genetic grouping fields
* `python vdb/flu_update.py -db vdb -v flu --update_groupings`
* `python2 vdb/flu_update.py -db vdb -v flu --update_groupings`
* updates `vtype`, `subtype`, `lineage`

* Update locations
* `python vdb/flu_update.py -db vdb -v flu --update_locations`
* `python2 vdb/flu_update.py -db vdb -v flu --update_locations`
* updates `division`, `country` and `region` from `location`

* Update passage_category fields
* `python vdb/flu_update.py -db vdb -v flu --update_passage_categories`
* `python2 vdb/flu_update.py -db vdb -v flu --update_passage_categories`
* update `passage_category` based on `passage` field

### Download documents from VDB

* `python vdb/flu_download.py -db vdb -v flu --select locus:HA lineage:seasonal_h3n2 --fstem h3n2`
* `python vdb/flu_download.py -db vdb -v flu --select locus:HA lineage:seasonal_h1n1pdm --fstem h1n1pdm`
* `python vdb/flu_download.py -db vdb -v flu --select locus:HA lineage:seasonal_vic --fstem vic`
* `python vdb/flu_download.py -db vdb -v flu --select locus:HA lineage:seasonal_yam --fstem yam`
* `python2 vdb/flu_download.py -db vdb -v flu --select locus:HA lineage:seasonal_h3n2 --fstem h3n2`
* `python2 vdb/flu_download.py -db vdb -v flu --select locus:HA lineage:seasonal_h1n1pdm --fstem h1n1pdm`
* `python2 vdb/flu_download.py -db vdb -v flu --select locus:HA lineage:seasonal_vic --fstem vic`
* `python2 vdb/flu_download.py -db vdb -v flu --select locus:HA lineage:seasonal_yam --fstem yam`

## TDB

@@ -50,43 +50,43 @@ All of these functions are quite slow given they run over ~600k documents. Use s
1. Convert [NIMR report](https://www.crick.ac.uk/research/worldwide-influenza-centre/annual-and-interim-reports/) pdfs to csv files
2. Move csv files to subtype directory in `fauna/data/`
3. Upload to tdb database
* `python tdb/upload.py -db tdb -v flu --subtype h3n2 --ftype flat --fstem h3n2_nimr_titers`
* `python2 tdb/upload.py -db tdb -v flu --subtype h3n2 --ftype flat --fstem h3n2_nimr_titers`
* Recommend running with `--preview` to confirm strain names are correctly parsed before uploading
* Can add to [HI_ref_name_abbreviations file](source-data/HI_ref_name_abbreviations.tsv) and [HI_flu_strain_name_fix file](source-data/HI_flu_strain_name_fix.tsv) to fix some strain names.

#### Flat files

1. Move line-list tsv files to `fauna/data/`
2. Upload to tdb database with `python tdb/upload.py -db tdb -v flu --subtype h3n2 --ftype flat --fstem H3N2_HI_titers_upload`
2. Upload to tdb database with `python2 tdb/upload.py -db tdb -v flu --subtype h3n2 --ftype flat --fstem H3N2_HI_titers_upload`

#### CDC files

1. Move line-list tsv files to `fauna/data/`
2. Upload HI titers to tdb database with `python tdb/cdc_upload.py -db cdc_tdb -v flu --ftype flat --fstem HITest_Oct2016_to_Sep2017_titers`
3. Upload FRA titers to tdb database with `python tdb/cdc_upload.py -db cdc_tdb -v flu --ftype flat --fstem FRA_Oct2016_to_Sep2017_titers`
2. Upload HI titers to tdb database with `python2 tdb/cdc_upload.py -db cdc_tdb -v flu --ftype flat --fstem HITest_Oct2016_to_Sep2017_titers`
3. Upload FRA titers to tdb database with `python2 tdb/cdc_upload.py -db cdc_tdb -v flu --ftype flat --fstem FRA_Oct2016_to_Sep2017_titers`

#### Crick files

1. Move Excel documents to `fauna/data/`
2. Run `python tdb/crick_upload.py -db crick_tdb --assay_type hi --fstem H3N2HIs`
3. Run `python tdb/crick_upload.py -db crick_tdb --assay_type fra --fstem H3N2VNs`
4. Run `python tdb/crick_upload.py -db crick_tdb --assay_type hi --fstem H1N1pdm09HIs`
5. Run `python tdb/crick_upload.py -db crick_tdb --assay_type hi --fstem BVicHIs`
6. Run `python tdb/crick_upload.py -db crick_tdb --assay_type hi --fstem BYamHIs`
2. Run `python2 tdb/crick_upload.py -db crick_tdb --assay_type hi --fstem H3N2HIs`
3. Run `python2 tdb/crick_upload.py -db crick_tdb --assay_type fra --fstem H3N2VNs`
4. Run `python2 tdb/crick_upload.py -db crick_tdb --assay_type hi --fstem H1N1pdm09HIs`
5. Run `python2 tdb/crick_upload.py -db crick_tdb --assay_type hi --fstem BVicHIs`
6. Run `python2 tdb/crick_upload.py -db crick_tdb --assay_type hi --fstem BYamHIs`

#### NIID files

1. Make sure `NIID-Tokyo-WHO-CC/` is a sister directory to `fauna/`
2. Upload all titers with `python tdb/upload_all.py --sources niid -db niid_tdb`
2. Upload all titers with `python2 tdb/upload_all.py --sources niid -db niid_tdb`

#### VIDRL files

1. Make sure `VIDRL-Melbourne-WHO-CC/` is a sister directory to `fauna/`
2. Upload all titers with `python tdb/upload_all.py --sources vidrl -db vidrl_tdb`
2. Upload all titers with `python2 tdb/upload_all.py --sources vidrl -db vidrl_tdb`

### Download documents from TDB

* `python tdb/download.py -db tdb -v flu --subtype h3n2`
* `python tdb/download.py -db tdb -v flu --subtype h1n1pdm`
* `python tdb/download.py -db tdb -v flu --subtype vic`
* `python tdb/download.py -db tdb -v flu --subtype yam`
* `python2 tdb/download.py -db tdb -v flu --subtype h3n2`
* `python2 tdb/download.py -db tdb -v flu --subtype h1n1pdm`
* `python2 tdb/download.py -db tdb -v flu --subtype vic`
* `python2 tdb/download.py -db tdb -v flu --subtype yam`
@@ -8,8 +8,8 @@

## Upload to fauna

`python vdb/measles_upload.py -db vdb -v measles --ftype accession --source genbank --locus genome --fname sequence.seq`
`python2 vdb/measles_upload.py -db vdb -v measles --ftype accession --source genbank --locus genome --fname sequence.seq`

## Download from fauna

`python vdb/measles_download.py -db vdb -v measles --fstem measles --resolve_method choose_genbank`
`python2 vdb/measles_download.py -db vdb -v measles --fstem measles --resolve_method choose_genbank`
@@ -8,7 +8,7 @@

## Upload to fauna

`python vdb/mumps_upload.py -db vdb -v mumps --ftype accession --source genbank --locus genome --fname sequence.seq`
`python2 vdb/mumps_upload.py -db vdb -v mumps --ftype accession --source genbank --locus genome --fname sequence.seq`


FASTA header field ordering:
@@ -24,35 +24,35 @@ FASTA header field ordering:

_This is not necessary when uploading accessions as we do here._
This is needed to populate certain attributes such as author & paper title.
`python vdb/mumps_update.py -db vdb -v mumps --update_citations`
`python2 vdb/mumps_update.py -db vdb -v mumps --update_citations`

## Download from fauna

`python vdb/mumps_download.py -db vdb -v mumps --fstem mumps --resolve_method choose_genbank`
`python2 vdb/mumps_download.py -db vdb -v mumps --fstem mumps --resolve_method choose_genbank`

## Upload Broad genomes

Preprocess to fix metadata and header ordering

`python vdb/mumps_preprocess_fasta.py --fasta data/muv-nextstrain-20170718.pruned.fasta > data/mumps_broad.fasta`
`python2 vdb/mumps_preprocess_fasta.py --fasta data/muv-nextstrain-20170718.pruned.fasta > data/mumps_broad.fasta`

Upload to fauna

`python vdb/mumps_upload.py -db vdb -v mumps --source broad --locus genome --fname mumps_broad.fasta --authors "Wohl et al" --title "Unpublished"`
`python2 vdb/mumps_upload.py -db vdb -v mumps --source broad --locus genome --fname mumps_broad.fasta --authors "Wohl et al" --title "Unpublished"`

## Upload BCCDC genomes

If you have a FASTA file and CSV metadata, this script will help (with minor modifications as needed)

`python scripts/mumps.csv-and-fasta-to-vipr-fasta.py data/input.mumps.raw.fasta data/input.mumps.csv data/input.mumps.vipr.fasta`
`python2 scripts/mumps.csv-and-fasta-to-vipr-fasta.py data/input.mumps.raw.fasta data/input.mumps.csv data/input.mumps.vipr.fasta`


Upload to fauna

`python vdb/mumps_upload.py -db vdb -v mumps --source bccdc --locus genome --fname mumps.bc.fasta --authors "Gardy et al" --title "Unpublished"`
`python2 vdb/mumps_upload.py -db vdb -v mumps --source bccdc --locus genome --fname mumps.bc.fasta --authors "Gardy et al" --title "Unpublished"`

## Upload Fred Hutch genomes

Upload to fauna

`python vdb/mumps_upload.py -db vdb -v mumps --source fh --locus genome --fname MuVs-WA0268502_buccal-Washington.USA-16.fasta --authors "Moncla et al" --title "Unpublished"`
`python2 vdb/mumps_upload.py -db vdb -v mumps --source fh --locus genome --fname MuVs-WA0268502_buccal-Washington.USA-16.fasta --authors "Moncla et al" --title "Unpublished"`
@@ -42,27 +42,27 @@ Remember to [install rethinkdb bindings](README.md#install).

Upload metadata with:

python vdb/zibra_metadata_upload.py -db vdb -tb zibra --fname zibra.tsv --ftype tsv --source zibra --virus zika --country brazil --authors ZiBRA --local
python2 vdb/zibra_metadata_upload.py -db vdb -tb zibra --fname zibra.tsv --ftype tsv --source zibra --virus zika --country brazil --authors ZiBRA --local

Upload sequences with:

python vdb/zibra_upload.py -db vdb -tb zibra --fname minion.fasta --ftype fasta --source zibra --virus zika --locus genome --local
python2 vdb/zibra_upload.py -db vdb -tb zibra --fname minion.fasta --ftype fasta --source zibra --virus zika --locus genome --local

Download metadata with:

python vdb/zibra_download.py -db vdb -tb zibra --fstem zibra --ftype tsv --local
python2 vdb/zibra_download.py -db vdb -tb zibra --fstem zibra --ftype tsv --local

Download just metadata for samples from `natal`:

python vdb/zibra_download.py -db vdb -tb zibra --fstem zibra --ftype tsv --select location:natal --local
python2 vdb/zibra_download.py -db vdb -tb zibra --fstem zibra --ftype tsv --select location:natal --local

Push local rethinkdb `vdb.zibra` documents to remote `vdb.zibra` rethinkdb table:

python vdb/sync.py --push --local_table vdb.zibra --remote_table vdb.zibra
python2 vdb/sync.py --push --local_table vdb.zibra --remote_table vdb.zibra

Pull remote rethinkdb `vdb.zibra` documents to local `vdb.zibra` rethinkdb table:

python vdb/sync.py --pull --local_table vdb.zibra --remote_table vdb.zibra
python2 vdb/sync.py --pull --local_table vdb.zibra --remote_table vdb.zibra

## Download latest metadata for consensus builds

@@ -71,6 +71,6 @@ Remember to [install rethinkdb bindings](README.md#install).
From `fauna/` run:

source environment_rethink.sh
python vdb/zibra_download.py -db vdb -tb zibra --fstem zibra --ftype tsv
python2 vdb/zibra_download.py -db vdb -tb zibra --fstem zibra --ftype tsv

This will result in the file `fauna/data/zibra.tsv` that has all necessary metadata. This file can be searched for `2_NB07`, etc... in the `minion_barcode` column to match MinION output to metadata, including strain name.
@@ -3,17 +3,17 @@
## Update

* Update citation fields
* `python vdb/zika_update.py -db vdb -v zika --update_citations`
* `python2 vdb/zika_update.py -db vdb -v zika --update_citations`
* updates `authors`, `title`, `url`, `journal` and `puburl` fields from genbank files
* If you get `ERROR: Couldn't connect with entrez, please run again` just run command again
* Update location fields
* After hand editing `location` in [chateau](https://github.com/blab/chateau)
* `python vdb/zika_update.py -db vdb -v zika --update_locations`
* `python2 vdb/zika_update.py -db vdb -v zika --update_locations`
* Updates `division`, `country`, `region` fields

## Download

python vdb/zika_download.py -db vdb -v zika --fstem zika --resolve_method choose_genbank
python2 vdb/zika_download.py -db vdb -v zika --fstem zika --resolve_method choose_genbank

## Upload

@@ -25,11 +25,11 @@
* Set Custom Format Fields to 0: GenBank Accession, 1: Strain Name, 2: Segment, 3: Date, 4: Host, 5: Country, 6: Subtype, 7: Virus Species
2. Move downloaded sequences to `fauna/data`
3. Upload to vdb database
* `python vdb/zika_upload.py -db vdb -v zika --source genbank --locus genome --fname GenomeFastaResults.fasta`
* `python2 vdb/zika_upload.py -db vdb -v zika --source genbank --locus genome --fname GenomeFastaResults.fasta`

### [Fred Hutch sequences](https://github.com/blab/zika-usvi/tree/master/data)

Upload with:

python vdb/zibra_upload.py -db vdb -v zika --source fh --locus genome --authors "Black et al" --fname zika_usvi_good.fasta --url https://github.com/blab/zika-usvi/ --title "Genetic characterization of the Zika virus epidemic in the US Virgin Islands"
python vdb/zibra_upload.py -db vdb -v zika --source fh --locus genome --authors "Black et al" --fname zika_usvi_partial.fasta --url https://github.com/blab/zika-usvi/ --title "Genetic characterization of the Zika virus epidemic in the US Virgin Islands"
python2 vdb/zibra_upload.py -db vdb -v zika --source fh --locus genome --authors "Black et al" --fname zika_usvi_good.fasta --url https://github.com/blab/zika-usvi/ --title "Genetic characterization of the Zika virus epidemic in the US Virgin Islands"
python2 vdb/zibra_upload.py -db vdb -v zika --source fh --locus genome --authors "Black et al" --fname zika_usvi_partial.fasta --url https://github.com/blab/zika-usvi/ --title "Genetic characterization of the Zika virus epidemic in the US Virgin Islands"

0 comments on commit 6739de1

Please sign in to comment.
You can’t perform that action at this time.