/
README_getOTLtree
106 lines (62 loc) · 3.11 KB
/
README_getOTLtree
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
NOTE: The Open Tree of Life just updated its API, causing the curl scripts
in this file to stop working. We are waiting for some issues to be resolved with
the Open Tree of Life API v3 and then we will revise our parser to match their URL responses.
In the meantime, this program will not work.
##########################
Retrieve Open Tree of Life Phylogenetic Reference Tree
getOTLree.py
Created By: Lauren McKinnon
Email: laurenmckinnon77@gmail.com
##########################
Purpose: Retrieve an Open Tree of Life reference tree.
##########################
ARGUMENT OPTIONS:
-h, --help show this help message and exit
-i INPUT Input Newick Tree file
-o OUTPUT Output File Name
-e EXCLUDE Exclude the list of species not found in tree
##########################
REQUIREMENTS:
getOTLtree.py uses Python version 2.7
Python libraries that must be installed include:
1. sys
2. argparse
3. requests,json
4. re
If any of those libraries is not currently in your Python Path, use the following command:
pip install --user [library_name]
to install the library to your path.
##########################
Input File:
This algorithm requires a phylogenetic tree in Newick format.
The input file may also be a file of multiple of phylogenetic trees separated by line.
Output File:
An output file is not required. If an output file is not supplied, the phylogenetic tree
will be written to standard out.
##########################
USAGE:
Typical usage requires the -i option.
The algorithm will use all the names of species in the input file to retrieve
a reference phylogenetic tree from the Open Tree of Life database.
Example input Newick tree:
(Spiroplasma_taiwanense,(Mycoplasma_pulmonis,(Mycoplasma_anseris,Mycoplasma_glycophilum)));
A malformatted Newick tree file will also work, as will putting all species on a separate line or
separating species by commas. All parentheses, commas, semicolons, and end-of-line
characters are removed from the input file. Underscores are replaced by spaces.
By default, the reference phylogenetic tree is first stored in memory and then written to the output file
or standard out.
By default, a list of species that could not be included in the reference tree is also written to the
output file. To ignore this behavior, include the -e option.
Example usage:
python getOTLtree.py -i test/list_of_species -e -o outputName
python getOTLtree.py -i test/list_of_species
python getOTLtree.py -i test/list_of_species -e
Running the first of the above commands will produce an output file for each tree in the input file
called outputName[lineNumber] in the current directory. If the input file contains only one tree,
a single output file will be produced in the current directory.
Running the second of the above commands will write to standard out the reference phylogenetic trees.
This command should take a few seconds on a single core.
Running the third of the above commands will write to standard out the reference phylogenetic trees
without the extra information of species not included.
##########################
Thank you, and happy researching!!