Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bed file #561

Open
ZiliaMR opened this issue Feb 28, 2024 · 3 comments
Open

bed file #561

ZiliaMR opened this issue Feb 28, 2024 · 3 comments

Comments

@ZiliaMR
Copy link

ZiliaMR commented Feb 28, 2024

Hello,

I am conducting an analysis with Helicobacter pylori genomes. Initially, I performed the analysis with a small dataset (n=21) and now have some queries regarding result visualization. I aim to zoom into a specific region containing genes of interest, but it seems I need to utilize either the BED or GFF file format for this purpose. My question is, are these files obtained from the annotation process? If so, which one from my dataset (n=21) should I utilize?

Thanks in advance

@ekg
Copy link
Contributor

ekg commented Feb 28, 2024

You can use annotations over any of the genomes you've put into the pangenome graph. These annotations would be derived by any method that could make them, either in silico or based on RNA analyses, or comparative genomics. In odgi, a BED file can be used to collect a subgraph of interest (odgi extract) or to guide different processes (like odgi depth), or to affect visualization (odgi draw). Let me know if this helps explain.

@ZiliaMR
Copy link
Author

ZiliaMR commented Mar 7, 2024

Hello,

Thank you very much. I believe I have successfully managed it. However, I am now facing a new challenge.

I am attempting to analyze 1012 genomes of Helicobacter pylori on a server using 1 node (128 GB RAM) with the following command.

pggb -i 1012seqs_hp.fasta.gz -o output_1012 -x auto -n 1012 -p 90 -m

It seems that the analysis was not completed.
Here is the message that I obtained.

put_1012/1012seqs_hp.fasta.gz.999a088.mappings.wfmash.paf --invert-filtering
991586.78s user 1694.34s system 1996% cpu 49744.84s total 2476824Kb max memory
[seqwish::seqidx] 0.002 indexing sequences
[seqwish::seqidx] 15.853 index built
[seqwish::alignments] 15.853 processing alignments
[seqwish::alignments] 594.701 indexing
[seqwish::alignments] 14440.858 index built
[seqwish::transclosure] 14440.969 computing transitive closures
[seqwish::transclosure] 14441.373 0.00% 0-10000000 overlap_collect
Command terminated by signal 9
seqwish -s 1012seqs_hp.fasta.gz -p pggb_output_1012/1012seqs_hp.fasta.gz.999a088.alignments.wfmash.paf -k 19 -f 0 -g pggb_output_1012/1012seqs_hp.fasta.gz.999a088.417fcdf.seqwish.gfa -B 10000000 -t 20 --temp-dir pggb_output_1012 -P
22362.05s user 9519.70s system 208% cpu 15274.65s total 125926572Kb max memory

What could be the reason for this? Is it feasible to perform this analysis with the resources or numbers of strains that I am using?

I would appreciate any insights or suggestions you have regarding this.

Thank you in advance for your help.

@subwaystation
Copy link
Member

My first guess would be that you ran out of RAM. SEQWISH occupied over 125G when it crashed. Put something like --transclose-batch 10000 --resume and you should find out quickly if this was the limiting factor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants