-
Notifications
You must be signed in to change notification settings - Fork 4
/
README
131 lines (81 loc) · 4.02 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
RACA (Reference-Assisted Chromosome Assembly)
=============================================
This is the version used in the original PNAS paper.
0. Warning
----------
Please do not use a '.' symbol as a part of a chromosome name.
The '.' symbol is used as a delimiter to separate a species name and a chromosome name in output files.
If the '.' symbol is already in a chromosome name in your sequence files, please convert it to other symbol,
like '_', before creating the chain/net files and read mapping files.
1. How to compile?
------------------
Just type 'make' to compile the RACA package.
2. How to run?
--------------
2.1 Configuration file
RACA requires a single configuration file as a parameter.
The configuration file has all parameters that are needed for RACA.
Example dataset below has a sample configuration file 'params.txt'
and all other parameter files which are self-explanatory.
(also refer to the data directory.)
Please read carefully the description of each configuration variable
and modify them as needed.
2.2 Run RACA
There is a wrapper Perl script, 'Run_RACA.pl'. To run RACA, type as:
<path to RACA>/Run_RACA.pl <path to the configuration file>
3. Example dataset (Tibetan antelope assembly)
----------------------------------------------
3.1 Download the dataset
Visit http://bioinfo.konkuk.ac.kr/RACA/ and click the
link "Tibetan antelope (TA) data". Then you can download the file
TAdata.tgz file.
3.2 Compile the dataset
Go into the directory where you downloaded the TAdata.tgz file and run:
tar xvfz TAdata.tgz
cd TAdata/
make
3.3 Run RACA for the dataset
In the TAdata directory run:
<path to RACA>/Run_RACA.pl params.txt
3.4 Where are output files?
In the TAdata/Out_RACA directory.
4. What are produced?
---------------------
In the output directory that is specified in the above configuration file,
the following files are produced.
- rec_chrs.refined.txt
This file contains the order and orientation of target scaffolds in
each reconstructed RACA chromosome fragment. Each column is defined
as:
Column1: the RACA chromosome fragment id
Column2: start position (0-based) in the RACA chromosome fragment
Column3: end position (1-based) in the RACA chromosome fragment
Column4: target scaffold id or 'GAPS'
Column5: start position (0-based) in the target scaffold
Column6: end position (1-based) in the target scaffold
- rec_chrs.<ref_spc>.segments.refined.txt
This file contains the mapping between the RACA chromosome fragments
and the genome sequences of the reference species <ref_spc>.
- ref_chrs.<tar_spc>.segments.refined.txt
This file contains the mapping between the RACA chromosome fragments
and the genome sequences of the target species <tar_spc>.
- ref_chrs.<out_spc>.segments.refined.txt
This file contains the mapping between the RACA chromosome fragments
and the genome sequences of the outgroup species <out_spc>. This file
is created for each outgroup species.
- rec_chrs.adjscores.txt
This file constins the adjacency scores that were used to reconstruct
the RACA chromosome fragments. Each column is defined as:
Column1: the RACA chromosome fragment id
Column2: start position (1-based) in the RACA chromosome fragment
Column3: end position (1-based) in the RACA chromosome fragment
Column4: the adjacency score
- rec_chrs.size.txt
This file contains the total size (the second column) and the total
number of target scaffolds (the third column) that are placed in each
RACA chromosome fragment (the first column).
There are other intermediate files and directories in the output directory.
They can be safely ignored.
4. How to ask questions?
------------------------
Contact Jaebum Kim (Jaebum.Kim@gmail.com)