Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mapping with uLTRA without GTF? #12

Closed
unique379r opened this issue Mar 9, 2022 · 3 comments
Closed

Mapping with uLTRA without GTF? #12

unique379r opened this issue Mar 9, 2022 · 3 comments

Comments

@unique379r
Copy link

Hi
I am trying to run uLTRA without an annotation GTF (since i dont have any annotation), but it seems uLTRA does needed and gives me error. It does not work with or without "--disable_mm2" option.

Command i used:

conda activate ultra
uLTRA align --prefix MySample --isoseq genome.fa reads.fastq output_dir/
conda deactivate

ERROR:

FileNotFoundError: [Errno 2] No such file or directory: output_dir/ref_part_sequences.pickle'

I know that its uLTRA try to build an indices within the output dir but this case just just complaining...Any idea why it is so ?

Please help
Thanks

-best
Rupesh

@ksahlin
Copy link
Owner

ksahlin commented Mar 10, 2022

Hi @unique379r,

uLTRA requires an annotation, and also will not have any advantage over e.g., minimap2 for annotation-free alignment.

I recommend that you use minimap2 or deSALT for annotation-free alignment.

Best,
K

@ksahlin ksahlin closed this as completed Mar 10, 2022
@unique379r
Copy link
Author

unique379r commented Mar 15, 2022

Hi Again @ksahlin
Thank you for your suggestion but i really wanted to try uLTRA too along with minimap2 and deSALT. Therefore, i got the gtf (https://tinyurl.com/5e8emmf2) but when i tried to make indices, some utility script of uLTRa gives me error. Now sure, why as it seems fine to me, please have a look and looking forward for your help.

uLTRA index chm13v2.0.fasta CHM13.v1.v2.gtf .

ERROR

Traceback (most recent call last):
  File "/Apps/envs/ultra/bin/uLTRA", line 714, in <module>
    prep_splicing(args, refs_lengths)
  File "=Apps/envs/ultra/bin/uLTRA", line 80, in prep_splicing
    max_intron_chr, exon_choordinates_to_id, chr_to_id, id_to_chr = augmented_gene.create_graph_from_exon_parts(db, args.flank_size, args.small_exon_threshold, args.min_segm, refs_lengths)
  File "=/Apps/envs/ultra/lib/python3.9/site-packages/modules/create_augmented_gene.py", line 323, in create_graph_from_exon_parts
    exon_gene_ids = exon.attributes["gene_id"] # is a list of strings
  File "/Apps/envs/ultra/lib/python3.9/site-packages/gffutils/attributes.py", line 63, in __getitem__
    v = self._d[k]
KeyError: 'gene_id'

My gtf

chr1	Liftoff	exon	11136	11635	.	-	.	gene_id "LOFF_G0000001"; transcript_id "LOFF_T0000001"; gene_name "AL627309.3";
chr1	Liftoff	exon	11630	11831	.	+	.	gene_id "LOFF_G0000002"; transcript_id "LOFF_T0000002"; gene_name "AP006222.2";
chr1	Liftoff	exon	11639	12457	.	-	.	gene_id "LOFF_G0000001"; transcript_id "LOFF_T0000001"; gene_name "AL627309.3";
chr1	Liftoff	exon	12900	13433	.	+	.	gene_id "LOFF_G0000002"; transcript_id "LOFF_T0000002"; gene_name "AP006222.2";
chr1	CAT	exon	14253	14325	.	+	.	gene_id "CHM13_G0000001"; transcript_id "CHM13_T0000001"; gene_name "CHM13_G0000001";
chr1	CAT	exon	14292	14353	.	+	.	gene_id "CHM13_G0000001"; transcript_id "CHM13_T0000002"; gene_name "CHM13_G0000001";
chr1	CAT	exon	20566	20905	.	+	.	gene_id "CHM13_G0000001"; transcript_id "CHM13_T0000002"; gene_name "CHM13_G0000001";
chr1	CAT	exon	20566	21099	.	+	.	gene_id "CHM13_G0000001"; transcript_id "CHM13_T0000001"; gene_name "CHM13_G0000001";
chr1	Liftoff	exon	52976	53422	.	-	.	gene_id "LOFF_G0000003"; transcript_id "LOFF_T0000003"; gene_name "AL731661.1";
chr1	Liftoff	exon	53560	53826	.	-	.	gene_id "LOFF_G0000003"; transcript_id "LOFF_T0000003"; gene_name "AL731661.1";

@ksahlin
Copy link
Owner

ksahlin commented Mar 16, 2022

Hi @unique379r,

I tested it on your dataset. Some of the lines in your file have been switched to geneID instead of gene_id. The fields should be consistent as gene_id, so I believe this is a file format error. This can be easily fixed by replacing geneID by gene_id on the lines where it occurs in your GTF file.

The first offending line in the GTF file is pasted below (but there are several lines where this happens)

chrY    Gnomon  exon    19436164        19436323        .       +       .       geneID "gene-PRYP5"; transcript_id "rna-XM_017030085.2"; gene_name "HSFY1";

Best,
K

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants