Skip to content

Script to preprocess RNA-Seq data and create a mastertable. Uses RNA-Seq processed data from multiple subfolders, where each subfolder contains the output file with the same name, and the name of the subfolder is the experimental condition.

milospjanic/moveExt2MasterTable

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

moveExt2MasterTable

This script can be very handy in case you have performed an experiment, i.e. RNA-Seq and have obtained output in multiple subfolders corresponding to different experimental conditions, where each subfolder will contain the output file of the same name, and the name of the subfolder will be the name of the experimental condition. This is the usual output you get from various sequencing facilities. creating a master table may be a complicated task, so I made a script to automatize this process, moveExt2MasterTable.

Usage

In case you want to utilize all files located in various subfoder that have a tsv extension and create a master table using first column (ID) and 5th column (e.g. where you have RPKM values) call script with two parameters extension name and column number

chmod 775 moveExt2MasterTable.sh
./moveExt2MasterTable.sh tsv 5

Dependencies

Script will connect to GitHub and download another script from the raw URL https://raw.githubusercontent.com/milospjanic/fileMulti2TableMod1/master/fileMulti2TableMod1.awk

Output

Output will be master table in a file mastertable, in addition all the files will be copied to the current folder from subfolder, and have .file extension

Example

Here is an example of a .tsv file lcoated in subfolders

target_id       length  eff_length      est_counts      tpm
ENST00000456328.2       1657    1478    14.0122 0.181189
ENST00000450305.2       632     453     0       0
ENST00000488147.1       1351    1172    250.598 4.0865
ENST00000619216.1       68      3.2997  0.936144        5.42212
ENST00000473358.1       712     533     0       0
ENST00000469289.1       535     356     0       0
ENST00000607096.1       138     6.64697 0       0
ENST00000417324.1       1187    1008    0.0151873       0.000287953
ENST00000461467.1       590     411     12.054  0.560521
ENST00000606857.1       840     661     0       0
ENST00000492842.1       940     761     0       0
ENST00000335137.3       918     739     0       0
ENST00000466430.5       2748    2569    33.7062 0.250753
ENST00000477740.5       491     312     0       0
ENST00000471248.1       629     450     0       0
ENST00000610542.1       723     544     0       0
ENST00000453576.2       336     157     0       0
ENST00000495576.1       1319    1140    17.672  0.296266
ENST00000442987.3       3812    3633    113.034 0.594629

You want to extract 5 column to create master table

Run:

./moveExt2MasterTable.sh tsv 5

Output will be the following master table with all the samples combined. The header will contain the sample or condition names (i.e. subfolder names) and row names are genes/transcripts (from column 1).

 071_2_E7_24h_TAG 071_2_E7_72h_TAG 071_2_E8_24h_TAG 071_2_E8_72h_TAG 334_1_E7_24h_TAG 334_1_E7_72h_TAG 334_1_E8_24h_TAG 334_1_E8_72h_TAG 756_3_E7_24h_TAG 756_3_E7_72h_TAG 756_3_E8_24h_TAG 756_3_E8_72h_TAG 835_1_E7_24h_TAG 835_1_E7_72h_TAG 835_1_E8_24h_TAG 835_1_E8_72h_TAG H1_E7_24h_TAG H1_E7_72h_TAG H1_E8_24h_TAG H1_E8_72h_TAG H7_E7_24h_TAG H7_E7_72h_TAG H7_E8_24h_TAG H7_E8_72h_TAG
ENST00000610542.1 0 0 0 0 0 0 9.8316e-05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ENST00000619216.1 3.33385 2.8217 5.81411 1.82309 5.42212 0 2.62023 1.28491 0 1.78071 0 3.54873 0 0 0 2.10255 0 0 1.56025 1.44704 1.2346 0 4.14966 1.30646
ENST00000450305.2 0 0 0 0 0 0.0804298 0 0 0 0.0357021 0 0 0 0 0 0 0.00802667 0 0 0 0 0 0 0
ENST00000442987.3 1.04805 1.81335 0.324263 0.331067 0.594629 1.13613 0.367012 0.548072 1.01927 1.15806 0.859238 0.715326 1.02043 1.23834 0.580001 0.311195 1.21486 0.718531 0.704979 0.48342 0.981332 1.32601 0.754685 0.717388
ENST00000492842.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ENST00000469289.1 0 0 0 0 0 0.0586353 0 0 0.0540325 0.0638736 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ENST00000471248.1 0.2645 0 0 0 0 0.111799 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0763442 0 0 0 0
ENST00000461467.1 0.0471473 0 0.270835 0.245762 0.560521 0 0.198852 0.0859918 0.101713 0.329731 0.612336 0.118115 0.0975844 0.02535 0.133357 0 0.344864 0.142344 0.157868 1.3982 0.348791 0.204793 0 0.185079
ENST00000495576.1 0.308196 0.354407 0.208001 0.192907 0.296266 0.118206 0.127151 0.245715 0.126846 0.116218 0.190281 0.122054 0.257678 0 0.228791 0.0479056 0.249924 0.138056 0.135063 0.216182 0.183194 0.258463 0.140834 0.108693

About

Script to preprocess RNA-Seq data and create a mastertable. Uses RNA-Seq processed data from multiple subfolders, where each subfolder contains the output file with the same name, and the name of the subfolder is the experimental condition.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages