Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Applying FLAMES to PacBIO #26

Closed
apc1992 opened this issue Jul 14, 2022 · 0 comments
Closed

Applying FLAMES to PacBIO #26

apc1992 opened this issue Jul 14, 2022 · 0 comments

Comments

@apc1992
Copy link

apc1992 commented Jul 14, 2022

Hello,

I was trying to use FLAMES in a isoform characterization benchmarking study with a single sample but, since I am new with the long-read world, it is not clear to me yet which are the key parameters that I need to consider in the configuration file. After running FLAMES i found my isoform_filtered gff3 file almost empty. This is my output data:

        SIZE           DATE              FILE

2444550950 Jul 13 19:28 align2genome.bam
3252184 Jul 13 19:28 align2genome.bam.bai
16 Jul 13 19:43 isoform_annotated.filtered.gff3
15122175 Jul 13 19:32 isoform_annotated.gff3
61 Jul 13 19:43 isoform_FSM_annotation.csv
3534807677 Jul 13 18:11 merged.fastq.gz
59 Jul 13 18:11 pseudo_barcode_annotation.csv
1505577353 Jul 13 19:41 realign2transcript.bam
3221800 Jul 13 19:41 realign2transcript.bam.bai
98666092 Jul 13 19:33 transcript_assembly.fa
2062401 Jul 13 19:33 transcript_assembly.fa.fai
118564 Jul 13 19:42 transcript_count.bad_coverage.csv.gz
186937 Jul 13 19:42 transcript_count.csv.gz
3617886 Jul 13 19:32 tss_tes.bedgraph

My input parameters and data was:
--gff3 gencode.v40.annotation.gtf (human annotations)
--genomefa GRCh38.primary_assembly.genome.fa. (human reference genome)
--outdir FLAMES_output/
--fq_dir fastq/ (path to my directory containing my unique fastq file)

I am not using any configuration file so FLAMES is applying other parameters by default and I guess this is the main problem for me since it is designed for ONT. So my question would be, which are the best parameters for running an analysis with PacBio files? Which are your recommendations?

Here I paste a config file I used for ONT data so you indicate if this is everything I need to correct or, apart from correcting these parms for PacBio there is extra params to consider.

"pipeline_parameters":{
"do_genome_alignment":true,
"do_isoform_identification":true,
"do_read_realignment":true,
"do_transcript_quantification":true
},
"global_parameters":{
"generate_raw_isoform":false,
"has_UMI":false
},
"isoform_parameters":{
"MAX_DIST":10,
"MAX_TS_DIST":120,
"MAX_SPLICE_MATCH_DIST":10,
"min_fl_exon_len":40,
"Max_site_per_splice":3,
"Min_sup_cnt":10,
"Min_cnt_pct":0.001,
"Min_sup_pct":0.2,
"strand_specific":0,
"remove_incomp_reads":5
},
"alignment_parameters":{
"use_junctions":true,
"no_flank":false
},
"realign_parameters":{
"use_annotation":true
},
"transcript_counting":{
"min_tr_coverage":0.3,
"min_read_coverage":0.3
}
}

Thank you very much for your help in advance and my apologies for such basic question!
Best,
AP

@apc1992 apc1992 closed this as completed May 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant