Utils

Content:

Rows into columns
Print a range of fields
Convert GTF into a Bed file
DNA reverse complementary
Flybase Orthology relationship
Synteny analysis
Convert a txt file into a csv
Merge all Gene-Expression values from the grape-nf pipeline
QC_Bam.files
Obtain a window around the TSS of some genes
GTF analysis
Calculate the overlapping between genic-exonics and their neighbor PCGs

1) Rows into columns:

rows.to.columns.count.sh

Description:

Receives rows and convert them into columns and count them

Usage:


head -1 some_file.txt | Utils/rows.to.columns.count.sh

Example of output:

    Rows to Columns script:
    -----------------------

1   GENE
2   EVENT
3   COORD
4   LENGTH
5   FullCO

    End of the script
    -----------------

2) Print a range of fields

Print.fields.interval.awk

Description:

Some times you have a tab separated file and you want to print a interval of fields using awk and it is prone to errors to type manually the fields using the following code:


awk -F "\t" '{print $2, "\t", $3, "\t", $4}' file.txt

Usage:


awk -v start=2 -v end=5 -f Utils/Print.fields.interval.awk  file.txt

With the example above it will print the fields from 2 until 5

3) Convert GTF into a Bed file

gtf.to.bed.R

Description:

Within GTF files the 4th and 5th fields are the start and end of a genomic feature (gene, transcript, exon, etc.). However GTFs are 1-based (closed intervals) whereas Beds are 0-counted (half-open intervals). This script does the conversion and prints the Bed file. Also it can calculate the length of a genomic features (gene, transcript, exon, etc. --length=TRUE).

For ID printing default is 2 (the second feature of the 9th field of a GTF file) this can be changed, for example 1 if you want Gene_ID, if 9th file of the GTF file is the following:

gene_id (1)	gene_symbol (2)	transcript_id (3)	transcript_symbol (4)
"FBgn0031208"	"CG11023"	"FBtr0300689"	"CG11023-RB"

Usage:


cat GTF.file.gtf | Utils/gtf.to.bed.R

4) DNA reverse complementary

DNA.reverse.complementary.py

Description:

The aim of this script is to change the direction of a DNA sequence from 5'-3' to 3'-5' and vice versa.

Usage:


cat DNA.sequence.negative.strand.fasta | Utils/DNA.reverse.complementary.py

5) Flybase Orthology relationship

Flybase_orthoParalogs.py

Written in Python 3.5.4

Description:

Script to create a file including all orthology relationships for the 4 fly species [D_me, D_sechelia (phylogenetically close), D_persimilis (halfway), & D_willistoni (far away)]

WARNING: prints all orthologs, including paralogs; paralogs are assigned with the same myID

5.1) gene_list: one column with gene_id like: FBgn0031208
5.2) orthologs_list: obtained from the Flybase


python Flybase_orthoParalogs.py gene_list orthologs_list output_name

Example output:

D_me	D_sec	D_per	D_wil	myID
FBgn0031208	FBgn0177630	FBgn0150621	FBgn0212170	myID1
FBgn0031208	FBgn0177629	FBgn0150620	FBgn0212170	myID1
FBgn0031208	FBgn0177629	FBgn0160043	FBgn0212170	myID1
FBgn0002121	FBgn0171620	FBgn0151538	FBgn0226562	myID2
FBgn0031209	FBgn0171619	FBgn0151527	FBgn0218492	myID3

6) Synteny analysis

synteny_Flies.4sp.py

Written in Python 3.5.4

Description:


python synteny_Flies.4sp.py  dme_geneorder.list.txt\
dpse.gene.order.list.txt\
Orthologs.out\
Out.synteny.txt\
3 3 1 'yes'

7) Convert a txt file into a csv

convert.txt.to.csv.py

Written in Python 3.5.4

Description:


cat file.txt | python ~/Utils/convert.txt.to.csv.py

cat file.txt | python ~/Utils/convert.txt.to.csv.py --output="Important.file.csv"

8) Merge all Gene-Expression values from the grape-nf pipeline

generate.ExpressionMatrix.py
Written in Python.3.5

Description:

When you run the grape-nf pipeline you obtain a file called: pipeline.db which it contains the description of results and their absolute file path . In order to merge all the results of quantification (RSEM: gene/transcript-expression) in one dataframe use the script: generate.ExpressionMatrix.py as is shown below:

8.1) To obtain expression values in TPMs:


~/Utils/generate.ExpressionMatrix.py --input=pipeline.db

8.2) To obtain expression values in FPKMs:


~/Utils/generate.ExpressionMatrix.py --input=pipeline.db --type=FPKM

8.3) To obtain expression values in Counts:


~/Utils/generate.ExpressionMatrix.py --input=pipeline.db --type=Count

9) QC_Bam.files

This folder contains 4 scripts to obtain QC of aligned bam files.
The whole pipeline relies that input file has to be called: input_bamQC.tsv if not it chrashes.

9.1) Input file

If you just run grape-nf run: bamqc_input.py in the following way:


python ~/Utils/QC_Bam.files/bamqc_input.py pipeline.db\
input_bamQC.tsv

9.2) QC

Qualimap:

Qualimap_array.sh: job array

Description: You don't need to worry to create cluster_out folder or any type of subdirectory. Uses the current directory so no absolute paths are needed.

Modify:

1) Must modify: -t according of the number of bam files that you want to analyze.
2) You can modify the number of parallel jobs by modify: -tc
3) You can also modify the gtf file, currently is using: gtf dm6 r6.29


qsub Qualimap_array.sh

Picard

Picard_array.sh: CollectAlignmentSummaryMetrics and CollectGcBiasMetrics
Picard_array.sh: job array

Description: Run in the same folder as Qualimap_array.sh. And don't worry for the rest.

Modify:

1) Must modify: -t according of the number of bam files that you want to analyze.
2) You can modify the number of parallel jobs by modify: -tc
3) You can modify reference genome.


qsub Picard_array.sh

9.3) Merge all the QC results

merge.QC_files.sh

Description: this is not a array script you need to run it. The suffix of the input files are: _metrics.txt, _bamqc.txt and _GC_summary.txt. The final output filename is: QC_Data.csv

1) Copy the header.txt file into ~/QC/QC
2) Run the script in ~/QC
3) And run the script in the following way:

path=/no_backup_isis/rg/ramador/D_me/RNA-seq/grape-nf-modENCODE/QC

~/Utils/QC_Bam.files/merge.QC_files.sh $path

10) Obtain a window around the TSS

ObtainWindow_TSS.bed.py
Written in Python.3.5

Description:

It receives as an input 1) a bed file, 2) a window interger number, and 3) the outputname bed file. In the example below it uses a 50 bp window around the TSS of all genes within bed file to obtain the promoters bed file.


python ~/Utils/ObtainWindow_TSS.bed.py input_BedFile.bed 50 output_name.bed

11) GTF analysis:

GTF.analysis.GeneIDName.py

Description:

This script has two options either gene, transcript or exon.

For gene from a gtf file you obtain: gene id, gene name, gene type, gene length and strand.
For transcript you obtain gene id, gene name, transcript id, transcript name, transcript type (it is the same as gene type), transcript length and strand.

Gene Usage:


/Utils/GTF.analysis.GeneIDName.py  --input=dme.r6.29.gtf --output=gene.example.tsv

Transcript Usage:


Utils/GTF.analysis.GeneIDName.py  --input=dme.r6.29.gtf --output=transcript.example.tsv --type=transcript

12) Calculate the overlapping between genic-exonics and their neighbor PCGs:

calculate.overlapping.genic.exonic.sh

Description: to calculate the overlapping between the genic-exonics and their neighbor PCGs we used bedtools intersect. You need to modify three things to run again this script:

1) Bed file by exons: this means that you need the chr, start and end for each exon per gene.
2) The complete list of all genic-exonics with their neighbor PCGs.
3) Modify -t for the number of rows from the the list of all genic-exonics.


qsub calculate.overlapping.genic.exonic.sh

Name		Name	Last commit message	Last commit date
Latest commit History 174 Commits
Emacs		Emacs
QC_Bam.files		QC_Bam.files
Sequence_Conservation		Sequence_Conservation
Synteny		Synteny
bigwig		bigwig
convert_gtf_bed		convert_gtf_bed
nexflow_configs		nexflow_configs
.gitignore		.gitignore
DNA.reverse.complementary.py		DNA.reverse.complementary.py
GTF.analysis.GeneIDName.py		GTF.analysis.GeneIDName.py
LICENSE		LICENSE
ObtainWindow_TSS.bed.py		ObtainWindow_TSS.bed.py
Print.fields.interval.awk		Print.fields.interval.awk
README.md		README.md
calculate.overlapping.genic.exonic.sh		calculate.overlapping.genic.exonic.sh
convert.txt.to.csv.py		convert.txt.to.csv.py
csvLook.sh		csvLook.sh
generate.ExpressionMatrix.py		generate.ExpressionMatrix.py
rows.to.columns.count.csv.sh		rows.to.columns.count.csv.sh
rows.to.columns.count.sh		rows.to.columns.count.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Utils

Content:

1) Rows into columns:

2) Print a range of fields

3) Convert GTF into a Bed file

4) DNA reverse complementary

5) Flybase Orthology relationship

6) Synteny analysis

7) Convert a txt file into a csv

8) Merge all Gene-Expression values from the grape-nf pipeline

8.1) To obtain expression values in TPMs:

8.2) To obtain expression values in FPKMs:

8.3) To obtain expression values in Counts:

9) QC_Bam.files

9.1) Input file

9.2) QC

Qualimap:

Picard

9.3) Merge all the QC results

10) Obtain a window around the TSS

11) GTF analysis:

12) Calculate the overlapping between genic-exonics and their neighbor PCGs:

About

Releases

Packages

Languages

License

razielar/utils

Folders and files

Latest commit

History

Repository files navigation

Utils

Content:

1) Rows into columns:

2) Print a range of fields

3) Convert GTF into a Bed file

4) DNA reverse complementary

5) Flybase Orthology relationship

6) Synteny analysis

7) Convert a txt file into a csv

8) Merge all Gene-Expression values from the grape-nf pipeline

8.1) To obtain expression values in TPMs:

8.2) To obtain expression values in FPKMs:

8.3) To obtain expression values in Counts:

9) QC_Bam.files

9.1) Input file

9.2) QC

Qualimap:

Picard

9.3) Merge all the QC results

10) Obtain a window around the TSS

11) GTF analysis:

12) Calculate the overlapping between genic-exonics and their neighbor PCGs:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages