Pre-processing of raw FASTQ files

see subfolder 'ngs_pipeline'

Set up conda environment

conda env create --file annot_env.yml
conda activate annot_env

Assemble AnnData

expression matrix + sample info from pre-processing NOTE: adjust in MIRSORT_ANNOTATION_DF.json accordingly

python assemble_anndata.py

Generate small RNA annotation

Run sRNA annotation pipelines (unitas and sports)

Use fasta file of sequences after pre-processing for sequence annotation Allow 1 missmatch in annotation pipelines (unitas and sports)

Generate fasta file from snoDB tsv file

NOTE: snoDB tsv file downloaded from https://bioinfo-scottgroup.med.usherbrooke.ca/snoDB/

python snoDB2fa.py

Run unitas (https://www.smallrnagroup.uni-mainz.de/software.html)

perl unitas_1.7.7.pl -i features_detected_sequences__publication.fa -species homo_sapiens -species_miR_only -tail 2 -intmod 1 -mismatch 1 -insdel 0 -refseq snoDB.fa -dump_prefix unitas_annotation/UNITAS

Run sports (https://github.com/junchaoshi/sports1.1)

NOTE: get sports pre-compiled 'Homo_sapiens' annotation database from https://ncrnainfo-my.sharepoint.com/personal/sports_ncrna_info/_layouts/15/guestaccess.aspx?docid=0773ed3d5f6b74f35bbd643e1af221c31&authkey=AcRxf8walnGUIEhgI--8CDc

perl sports.pl -i features_detected_sequences__publication.fa -p 4 -k -M 1 -g Homo_sapiens/genome/hg38/genome -m Homo_sapiens/miRBase/21/miRBase_21-hsa -r Homo_sapiens/rRNAdb/human_rRNA -t Homo_sapiens/GtRNAdb/hg19/hg19-tRNAs -w Homo_sapiens/piRBase/piR_human -e Homo_sapiens/Ensembl/release-89/Homo_sapiens.GRCh38.ncrna -f Homo_sapiens/Rfam/12.3/Rfam-12.3-human -o sports_annotation/

Drop all sequences that do not have any annotation in unitas or sports

cd unitas_annotation/UNITAS_dd-mm-yyyy_features_detected_sequences__publication.fa_#1
awk 'NF>=3' unitas.full_annotation_matrix.txt | awk '$3 !~ "low_complexity" {print $0}' > unitas.full_annotation_matrix_justannoseqs.txt
cd sports_annotation/1_features_detected_sequences__publication/features_detected_sequences__publication_result
awk '$6 !~ "NO_Annotation" {print $0}' features_detected_sequences__publication_output.txt > features_detected_sequences__publication_output_justannoseqs.txt

Merge annotations

python merge_sRNAclass_annotations.py

Get subclassification for rRNAs and YRNAs

python rRNA_position_classification.py

Generate sRNA subclass annotation

python generate_sRNA_sub_class_annotation_df.py

Create sequence-based sRNA annotation dataframe

python create_seq_annotation_df.py

Aggregate expression on sRNA names

Add sRNA annotation dataframe as var and reduce AnnData to features with 'subclass_name' annotation

python ad_reduce_features.py

Aggregate AnnData based on 'subclass_name'

python ad_aggregate.py

Create subclass-name-based annotation dataframe and add to aggregated AnnData

python create_aggregated_annotation_df.py

Reduce features by expression threshold and subset to blood components and whole blood samples

python ad_reduce_features_further.py

Create csv files for dashboard

python ad2csv.py

Compare to previous benchmark dataset (Juzenas et al. 2017 NAR)

python compare_2_Juzenas.py

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
ngs_pipeline		ngs_pipeline
LICENSE		LICENSE
MIRSORT_ANNOTATION_DF.json		MIRSORT_ANNOTATION_DF.json
README.md		README.md
ad2csv.py		ad2csv.py
ad_aggregate.py		ad_aggregate.py
ad_reduce_features.py		ad_reduce_features.py
ad_reduce_features_further.py		ad_reduce_features_further.py
annot_env.yml		annot_env.yml
assemble_anndata.py		assemble_anndata.py
compare_2_Juzenas.py		compare_2_Juzenas.py
create_aggregated_annotation_df.py		create_aggregated_annotation_df.py
create_seq_annotation_df.py		create_seq_annotation_df.py
generate_sRNA_sub_class_annotation_df.py		generate_sRNA_sub_class_annotation_df.py
merge_sRNAclass_annotations.py		merge_sRNAclass_annotations.py
rRNA_position_classification.py		rRNA_position_classification.py
snoDB.tsv		snoDB.tsv
snoDB2fa.py		snoDB2fa.py
utils.py		utils.py

License

gitHBDX/mirblood-code

Folders and files

Latest commit

History

Repository files navigation