Skip to content

Commit

Permalink
add example data and increase travis perl to 5.24
Browse files Browse the repository at this point in the history
  • Loading branch information
andrewjpage committed Nov 3, 2016
1 parent a3fa025 commit cc8cb70
Show file tree
Hide file tree
Showing 17 changed files with 301,796 additions and 34 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ addons:
- zlib1g
- zlib1g-dev
perl:
- "5.14"
- "5.24"
sudo: false
install:
- "source ./install_dependencies.sh"
Expand Down
33 changes: 0 additions & 33 deletions INSTALL

This file was deleted.

10 changes: 10 additions & 0 deletions example/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Example dataset
In the input_data subdirectory there are 3 published Salmonella genomes from different STs. A snapshot of the Salmonella MLST database is in this subdirectory.

```
cd example
export MLST_DATABASES=./mlst_databases
get_sequence_type -c -s 'Salmonella enterica' input_data/*.fa
```

The expected output files are located in the expected_output_data directory. Each of the 3 genomes has been assigned an ST and there is a file with the sequences of the alleles concatenated. This can then be used as input to RAxML to produce a basic phylogenetic tree.
171 changes: 171 additions & 0 deletions example/expected_output_data/concatenated_alleles.fa
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
>Salmonella_enterica_subsp_enterica_serovar_Typhi_str_CT18_v1/1-3336
GTTTTTCGCCCGGGACACGCGGATTACACCTATGAGCAGAAATACGGCCTGCGCGATTAC
CGCGGCGGTGGACGTTCTTCCGCGCGTGAAACCGCGATGCGCGTAGCGGCAGGGGCGATC
GCCAAGAAATACTTGGCGGAAAAGTTCGGCATCGAAATCCGCGGCTGCCTGACCCAGATG
GGCGACATTCCGCTGGAGATTAAAGACTGGCGTCAGGTTGAGCTTAATCCGTTCTTTTGC
CCCGATGCGGACAAACTTGACGCGCTGGACGAACTGATGCGCGCGCTGAAAAAAGAGGGT
GACTCCATCGGCGCGAAAGTGACGGTGATGGCGAGCGGCGTGCCGGCAGGGCTTGGCGAA
CCGGTATTTGACCGACTGGATGCGGACATCGCCCATGCGCTGATGAGCATCAATGCGGTG
AAAGGCGTGGAGATCGGCGAAGGATTTAACGTGGTGGCGCTGCGCGGCAGCCAGAATCGC
GATGAAATCACGGCGCAGGGTATGGAGATGGTCGCGCGCGTTACGCTTTCTCAGCCGCAT
GAGCCAGGCGCCACTACCGTGCCGGCGCGGAAATTCTTTGATATCTGCCGCGGCCTGCCG
GAGGGCGCGGAGATTGCCGTTCAGTTGGAAGGCGATCGGATGCTGGTGCGTTCTGGCCGT
AGCCGCTTCTCGCTGTCTACGCTGCCTGCCGCCGATTTCCCGAATCTTGACGACTGGCAA
AGCGAAGTTGAATTTACGCTGCCGCAGGCCACGATGAAGCGCCTGATTGAAGCGACCCAG
TTTTCGATGGCCCATCAGGATGTGCGCTACTACTTAAACGGTATGCTGTTTGAAACGGAA
GGTAGCGAACTGCGCACTGTTGCGACCGACGGCCACCGTCTGGCGGTGTGCTCAATGCCG
CTGGAGGCGTCTTTACCTAGCCACTCGGTGATTGTGCCGCGTAAAGGCGTGATTGAACTG
ATGCGTATGCTCGACGGTGGCGAAAACCCGCTGCGCGTGCAGGCAACGCTGACGGAAAAC
GATCTGGTTTTTGCCCTTTCACAGCACTCCGTCGCCTTTGCTCACGCCCAGCTCCAGCGG
GATGGACGAAACTGGCCTGCGTCGCCGCGCTATTTCTCGATTGGCCGCACCACGGCGCTC
GCCCTTCATACCGTTAGCGGGTTCGATATTCGTTATCCATTGGATCGGGAAATCAGCGAA
GCCTTGCTACAATTACCTGAATTACAAAATATTGCGGGCAAACGCGCGCTGATTTTGCGT
GGCAATGGCGGCCGCGAACTGCTGGGCGAAACCCTGACAGTTCGCGGAGCCGAAGTCAGT
TTTTGTGAATGTTATCAACGATGTGCGAAACATTACGATGGCGCGGAAGAAGCGATGCGC
TGGCATACTCGCGGCGTAACAACGCTTGTTGTTACCAGCGGCGAGATGTTGCAAATTGCG
GGATGCCAGAAGGTGGTTCTGTGCTCGCCGCCACCCATCGCTGATGAAATCCTCTATGCG
GCGCAACTGTGTGGCGTGCAGGAAATCTTTAACGTCGGCGGCGCGCAGGCGATTGCCGCT
CTGGCCTTCGGCAGCGAGTCCGTACCGAAAGTGGATAAAATTTTTGGCCCCGGCAACGCC
TTTGTAACCGAAGCCAAGCGTCAGGTCAGCCAGCGTCTCGACGGCGCGGCTATCGATATG
CCAGCCGGGCCGTCTGAAGTGCTGGTGATCGCCGACAGCGGCGCAACACCGGATTTCGTC
GCTTCTGACCTGCTCTCCCAGGCTGAGCACGGCCCGGATTCCCAGGTGATCCTGCTGACG
CCGGATGCTGACATTGCCCGCAAGGTGGCGGAGGCGGTAGAACGTCAACTGGCGGAACTG
CCGCGCGCGGGCACCGCCCGGCAGGCCCTGAGCGCCAGTCGTCTGATTGTGACCAAAGAT
TTAGCGCAGTGCGTCAGCGACTGGGCTACCATGCAATTCGCCGCCGAAATTTTTGAAATT
CTGGATGTCCCGCACCATGTAGAAGTGGTTTCCGCCCATCGCACCCCCGATAAACTGTTC
AGCTTCGCCGAAACGGCGGAAGAGAACGGATATCAAGTGATTATTGCCGGCGCGGGCGGC
GCGGCACACCTGCCGGGAATGATTGCGGCAAAAACGCTGGTCCCGGTACTCGGCGTGCCG
GTACAAAGCGCTGCGCTCAGCGGCGTGGATAGCCTCTACTCCATCGTGCAGATGCCGCGC
GGCATTCCGGTGGGTACGCTGGCGATCGGTAAAGCCGGGGCGGCGAACGCCGCACTGCTG
GCAGCGCAAATTTTGGCTACGCATGATAGCGCGCTGCATCGGCGCATCGCCGACAAACGC
TTCCTGAACGAACTGACCGCCGCTGAAGGGCTGGAACGTTATCTGGGCGCCAAATTCCCG
GGTGCGAAACGTTTCTCGCTCGAGGGGGGAGATGCGCTGATACCTATGCTGAAAGAGATG
GTTCGCCATGCGGGTAACAGCGGCACTCGCGAAGTGGTGCTGGGGATGGCGCACCGCGGT
CGTCTGAACGTGCTGATCAACGTACTGGGTAAAAAACCGCAGGATCTGTTCGACGAGTTT
GCCGGTAAACATAAAGAACATCTGGGTACCGGCGACGTGAAGTATCACATGGGCTTCTCG
TCAGATATCGAAACTGAAGGCGGTCTGGTTCACCTGGCGCTGGCGTTTAACCCATCGCAT
CTGGAAATTGTGAGCCCGGTGGTGATGGGCTCCGTGCGCGCCCGTCTGGACCGACTGGAC
GAACCGAGCAGTAATAAAGTGCTGCCGATCACTATTCACGGCGACGCCGCGGTGACCGGC
CAGGGCGTGGTTCAGGTGCTGGGCCGTAATGGTTCCGACTATTCCGCCGCCGTGCTGGCC
GCCTGTTTACGCGCTGACTGCTGTGAAATCTGGACTGACGTCGATGGCGTGTATACCTGT
GACCCGCGCCAGGTGCCGGACGCCAGGCTGTTGAAATCGATGTCCTACCAGGAAGCGATG
GAGCTCTCTTACTTCGGCGCTAAAGTCCTTCACCCTCGCACCATAACGCCTATCGCCCAG
TTCCAGATCCCCTGTCTGATTAAAAATACCGGCAATCCGCAGGCGCCAGGAACGCTGATC
GGCGCGTCCAGCGACGATGATAATCTGCCGGTTAAAGGGATCTCTAACCTTAACAACATG
GCGATGTTTAGCGTCTCCGGCCCGGGAATGAAAGGGATGATTGGGATGGCGGCGCGTGTT
TTCGCCGCCATGTCTCGCGCCGGGATCTCGGTGGTGCTCATTACCCAGTCCTCCTCTGAG
TACAGCATCAGCTTCTGTGTGCCGCAGAGTGACTGC
>Salmonella_enterica_subsp_enterica_serovar_Typhimurium_DT104_v1/1-3336
gtttttcgtccgggacacgcggattacacctatgagcagaaatacggcctgcgcgattac
cgtggcggtggacgttcttccgcgcgtgaaaccgcgatgcgcgtagcggcaggggcgatc
gccaagaaatacctggcggaaaagttcggcatcgaaatccgcggctgcctgacccagatg
ggcgacattccgctggagattaaagactggcgtcaggttgagcttaatccgttcttttgt
cccgatgcggacaaacttgacgcgctggacgaactgatgcgcgcgctgaaaaaagagggt
gactccatcggcgcgaaagtgacggtgatggcgagcggcgtgccggcagggcttggcgaa
ccggtatttgaccgactggatgcggacatcgcccatgcgctgatgagcattaatgcggtg
aaaggcgtggagatcggcgaaggatttaacgtggtggcgctgcgcggcagccagaatcgc
gatgaaatcacggcgcagggtatggagatggtcgcgcgcgttacgctttctcagccgcat
gagccaggcgccactaccgtgccggcgcggaaattctttgatatctgccgcggcctgccg
gagggcgcggagattgccgttcagttggaaggcgatcggatgctggtgcgttctggccgt
agccgcttctcgctgtctacgctgcctgccgccgatttcccgaatcttgacgactggcaa
agcgaagttgaatttacgctgccgcaggccacgatgaagcgcctgattgaatcgacccag
ttttcgatggcccatcaggatgtgcgctactacttaaacggtatgctgtttgaaacggaa
ggtagcgaactgcgcactgtcgcgaccgacggccaccgtctggcggtgtgctcaatgccg
ctggaagcgtctttacccagccactcggtgattgtgccgcgtaaaggcgtgattgaactg
atgcgtatgctcgacggcggcgaaaacccgctgcgcgtgcaggcgacgctgacggaaaac
gatctggtttttgccctttcacagcacgctgtcgcctttgctcacgcccagctccagcgg
gatggtcgaaactggcctgtggcgccgcgctatttcgcgattggccgcaccacggcgctc
gcccttcataccgttagcgggttcgatattcgttatccattggatcgggaaatcagcgaa
gccttgctacaattacctgaattacaaaatattgcgggcaaacgcgcgctgattttgcgt
ggcaatggcggccgcgaactgctgggcgaaaccctgacagcgcgcggagccgaagtcagt
ttttgtgaatgttatcaacgatgtgcgaaacattacgatggcgcggaagaagcgatgcgc
tggcatactcgcggcgtaacaacgcttgttgttaccagcggcgagatgttgcaaattgcg
ggatgccagaaggtggttctgtgctcgccgccgcccatcgctgatgaaatcctctatgcg
gcgcaactgtgtggcgtgcaggaaatctttaacgtcggcggcgcgcaggcgattgccgct
ctggccttcggcagcgagtccgtaccgaaagtggataaaatttttggccccggcaacgcc
tttgtaaccgaagccaaacgtcaggtcagccagcgtctcgacggcgcggctatcgatatg
ccagccgggccgtctgaagtactggtgatcgcagacagcggcgcaacaccggatttcgtc
gcttctgacctgctctcccaggctgagcacggcccggattcccaggtgatcctgctgacg
cctgatgctgacattgcccgcaaggtggcggaggcggtagaacgtcaactggcggaactg
ccgcgcgcggacaccgcccggcaggccctgagcgccagtcgtctgattgtgaccaaagat
ttagcgcagtgcgtcagcgactgggctaccatgcaattcgccgccgaaatttttgaaatt
ctggatgtcccgcaccatgtagaagtggtttccgctcatcgcacccccgataaactgttc
agcttcgccgaaacggcggaagagaacggatatcaagtgattattgccggcgcgggcggc
gcggcgcacctgccgggaatgattgcggcaaaaacgctggtcccggtactcggcgtgccg
gtacaaagcgctgcgctaagcggcgtggatagcctctactccatcgtgcagatgccgcgc
ggcattccggtgggtacgctggcgatcggtaaagccggtgccgctaacgccgccctgctc
gccgcgcagattctggcgcaacacgacgcggaactgcatcagcgcattgccgacaaacgc
ttcctgaacgaactgaccgccgctgaagggctggaacgttatctgggcgccaaattcccg
ggtgcgaaacgtttctcgctcgaggggggagatgcgctgatacccatgctgaaagagatg
gttcgccatgcgggtaacagcggcactcgcgaagtggtgctggggatggcgcaccgcggt
cgcctgaacgtgctgatcaacgtactgggtaaaaaaccgcaggatctgttcgacgaattt
gccggtaagcataaagaacatctgggtaccggcgacgtgaagtatcacatgggcttctcg
tcagatatcgaaaccgaaggcggtctggttcacctggcgctggcgtttaacccatcgcat
ctggaaattgtgagcccggtggtgatgggctccgtgcgcgcccgtctggacagactggac
gaaccgagcagcaacaaagtgttgccgatcactattcacggcgacgccgcggtgaccggc
cagggcgtggttcaggtgctgggccgtaatggttccgactattccgccgccgtgctggcc
gcctgtttacgcgctgactgctgtgaaatctggactgacgtcgatggcgtgtatacctgt
gacccgcgccaggtgccggacgccaggctgctgaaatcgatgtcctaccaggaagcgatg
gaactctcttacttcggcgccaaagtccttcaccctcgcaccattacgcccatcgcccag
ttccagatcccctgtctgattaaaaataccggtaatccgcaggcgccaggaacgctgatc
ggcgcgtccagcgacgatgataacctgccggttaaagggatctctaaccttaacaacatg
gcgatgtttagcgtctccggcccgggaatgaaagggatgattgggatggcggcgcgtgtt
ttcgccgccatgtctcgcgccgggatctcggtggtgctcattacccagtcctcctctgag
tacagcatcagtttctgtgtgccgcagagtgactgc
>Salmonella_enterica_subsp_enterica_serovar_Weltevreden_str_10259_v0.2/1-3336
GTTTTTCGTCCGGGACACGCGGATTACACCTATGAGCAGAAATACGGCCTGCGCGATTAC
CGTGGCGGTGGACGTTCTTCCGCGCGTGAAACCGCGATGCGCGTAGCGGCAGGGGCGATC
GCCAAGAAATACCTGGCGGAAAAGTTCGGCATCGAAATCCGCGGCTGCCTGACCCAGATG
GGCGATATTCCGCTGGAGATTAAAGACTGGCGTCAGGTTGAGCTTAATCCGTTCTTTTGT
CCCGATGCGGACAAACTTGACGCGCTGGACGAACTGATGCGCGCGCTGAAAAAAGAGGGC
GACTCCATCGGCGCGAAAGTGACGGTGATGGCGAGCGGCGTGCCGGCAGGGCTTGGCGAA
CCGGTATTTGACCGACTGGATGCGGACATTGCCCATGCGCTGATGAGCATCAATGCGGTG
AAAGGCGTGGAGATCGGCGAAGGATTTAACGTGGTGGCGCTGCGCGGCAGCCAGAATCGC
GATGAAATCACGGCGCAGGGTATGGAGATGGTCGCGCGCGTTACGCTTTCTCAGCCGCAT
GAGCCGGGTGCTACTACTGTGCCGGCGCGGAAATTCTTTGATATCTGCCGCGGCCTGCCG
GAGGGCGCGGAGATTGCCGTTCAGTTGGAAGGCGATCGGATGCTGGTGCGTTCTGGCCGT
AGCCGCTTCTCGCTGTCTACACTGCCTGCCGCCGATTTCCCGAATCTTGACGACTGGCAA
AGCGAAGTTGAATTTACGCTGCCGCAGGCCACGATGAAGCGCCTGATTGAAGCGACCCAG
TTTTCGATGGCCCATCAGGATGTGCGCTACTACTTAAACGGTATGCTGTTTGAAACGGAA
GGTAGCGAACTGCGCACTGTCGCGACCGACGGCCACCGCCTGGCGGTGTGCTCAATGCCG
CTGGAAGCGTCTTTACCCAGCCACTCGGTGATTGTGCCGCGTAAAGGCGTGATTGAACTG
ATGCGTATGCTCGACGGCGGCGAAAACCCGCTGCGCGTGCAGGCGACTCTGACGGAAAAC
GATCTGGTTTTTGCCCTTTCACAGCACGCCGTCGCCTTTGCTCACGCCCAGCTCCAGCGG
GATGGTCGAAACTGGCCTGCGTCGCCGCGCTATTTCGCGATTGGCCGCACCACGGCGCTC
GCCCTTCATACCGTTAGCGGGTTCGATATTCGTTATCCATTGGATCGGGAAATCAGCGAA
GCCTTGCTACAATTACCTGAATTACAAAATATTGCGGGCAAACGCGCGCTGATTTTGCGT
GGCAATGGCGGCCGCGAACTGCTGGGCGAAACCCTGACAGCTCGCGGAGCCGAAGTCAGT
TTTTGTGAATGTTATCAACGATGTGCGAAACATTACGATGGCGCGGAAGAAGCGATGCGC
TGGCATACTCGCGGCGTAACAACGCTTGTTGTTACCAGCGGCGAGATGTTGCAAATTGCG
GGATGTCAGAACGTGGTTCTGTGCTCGCCGCCGCCCATCGCTGATGAAATCCTCTATGCG
GCACAACTGTGTGGCGTGCAGGAAATCTTTAACGTCGGCGGCGCGCAGGCGATTGCCGCT
CTGGCCTTCGGCAGCGAGTCCGTACCGAAAGTGGATAAAATTTTTGGCCCCGGCAACGCC
TTTGTAACCGAAGCCAAGCGTCAGGTCAGCCAGCGTCTCGACGGCGCGGCTATCGATATG
CCAGCCGGGCCGTCTGAAGTACTGGTGATCGCCGACAGCGGCGCAACACCGGATTTCGTC
GCTTCTGACCTGCTCTCCCAGGCTGAGCACGGCCCGGATTCCCAGGTGATCCTGCTGACG
CCTGATGCTGACATTGCCCGCAAGGTGGCGGAGGCGGTAGAACGTCAACTGGCGGAACTG
CCGCGCGCGGACACCGCCCGGCAGGCCCTGAGCGCCAGTCGTCTGATTGTGACCAAAGAT
TTAGCGCAGTGCGTCAGCGACTGGGCTACCATGCAATTCGCCGCCGAAATTTTTGAAATT
CTGGATGTCCCGCACCATGTAGAAGTGGTTTCCGCTCATCGCACCCCCGATAAACTGTTC
AGCTTCGCCGAAACGGCGGAAGAGAACGGATATCAAGTGATTATTGCCGGCGCGGGCGGC
GCGGCGCACCTGCCGGGAATGATTGCGGCAAAAACGCTGGTCCCGGTACTCGGCGTGCCG
GTACAAAGCGCTGCGCTCAGCGGCGTGGATAGCCTCTACTCCATCGTGCAGATGCCGCGC
GGCATTCCGGTGGGTACGCTGGCGATCGGTAAAGCCGGTGCCGCTAACGCCGCCCTGCTC
GCCGCGCAGATTCTGGCGCAACACGACGCGGAACTGCATCAGCGCATCGCTGACAAACGC
TTCCTGAACGAACTGACCGCCGCTGAAGGGCTGGAACGTTATCTGGGCGCCAAATTCCCG
GGTGCGAAACGTTTCTCGCTCGAGGGGGGAGATGCGCTGATACCCATGCTGAAAGAGATG
GTTCGCCATGCGGGTAACAGCGGCACTCGCGAAGTGGTGCTGGGGATGGCGCACCGCGGT
CGCCTGAACGTGCTGATCAACGTACTGGGTAAAAAACCGCAGGATCTGTTCGACGAATTT
GCCGGTAAGCATAAAGAACATCTGGGTACCGGCGACGTGAAGTATCACATGGGCTTCTCG
TCAGATATCGAAACCGAAGGCGGTCTGGTTCACCTGGCGCTGGCGTTTAACCCATCGCAT
CTGGAAATTGTGAGCCCGGTGGTGATGGGCTCCGTGCGCGCCCGTCTGGACAGACTGGAC
GAACCGAGCAGCAACAAAGTGTTGCCGATCACTATTCACGGCGACGCCGCGGTGACCGGC
CAGGGCGTGGTTCAGGTGCTGGGCCGTAATGGTTCCGACTATTCCGCCGCCGTGCTGGCC
GCCTGTTTACGCGCTGACTGCTGTGAAATCTGGACTGACGTCGATGGCGTGTATACCTGT
GACCCGCGTCAGGTGCCGGACGCCAGGCTGCTGAAATCGATGTCCTACCAGGAAGCGATG
GAACTCTCTTACTTCGGCGCCAAAGTCCTTCACCCTCGCACCATAACGCCTATCGCCCAG
TTCCAGATCCCCTGTCTGATTAAAAATACCGGTAATCCGCAGGCGCCAGGAACGCTGATC
GGCGCGTCCAGCGACGATGATAATCTGCCGGTGAAAGGGATCTCTAACCTTAACAACATG
GCGATGTTTAGCGTCTCCGGCCCTGGAATGAAAGGGATGATTGGGATGGCGGCGCGTGTT
TTCGCCGCCATGTCTCGCGCCGGGATCTCGGTGGTGCTCATTACCCAGTCCTCCTCTGAG
TACAGCATCAGTTTCTGTGTGCCGCAGAGTGACTGC
4 changes: 4 additions & 0 deletions example/expected_output_data/mlst_results.allele.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
"Isolate" "ST" "New ST" "Contamination" "aroC" "dnaN" "hemD" "hisD" "purE" "sucA" "thrA"
"Salmonella_enterica_subsp_enterica_serovar_Typhi_str_CT18_v1" "2" "" "" "1" "1" "2" "1" "1" "1" "5"
"Salmonella_enterica_subsp_enterica_serovar_Typhimurium_DT104_v1" "19" "" "" "10" "7" "12" "9" "5" "9" "2"
"Salmonella_enterica_subsp_enterica_serovar_Weltevreden_str_10259_v0.2" "365" "" "" "130" "97" "25" "125" "84" "9" "101"
Loading

0 comments on commit cc8cb70

Please sign in to comment.