Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error after identifying connected components #5

Closed
cifuj opened this issue Apr 5, 2022 · 15 comments
Closed

Error after identifying connected components #5

cifuj opened this issue Apr 5, 2022 · 15 comments

Comments

@cifuj
Copy link

cifuj commented Apr 5, 2022

Hi! I have an error after the step "identifying connected components".
I installed SuperPang using conda and I am working with 4 bins (>90% compl <5% contamination) from the same species (ANI > 97%). test-SuperPang.py runs with out problems.
The error occurs after running
$SuperPang.py -f genomes/*fa -t 20 -o SuperPang --force-overwrite --checkm checkm_sox.txt

Traceback (most recent call last):
File "/home/jcifuentes/miniconda3/envs/SuperPang/bin/SuperPang.py", line 291, in
main(parse_args())
File "/home/jcifuentes/miniconda3/envs/SuperPang/bin/SuperPang.py", line 144, in main
contigs = Assembler(input_minimap2, args.ksize, args.threads).run(args.minlen, args.mincov, args.bubble_identity_threshold, args.genome_assignment_threshold, args.threads)
File "/home/jcifuentes/miniconda3/envs/SuperPang/lib/python3.8/site-packages/SuperPang/lib/Assembler.py", line 322, in run
psets = get_psets(comp2nvs[nc1] | comp2nvs[nc2])
File "/home/jcifuentes/miniconda3/envs/SuperPang/lib/python3.8/site-packages/SuperPang/lib/Assembler.py", line 292, in get_psets
assert len(nvs_) == 2
AssertionError

The checkM file has the format Bin_Id Completeness Contamination, but I have the same error after running SuperPang with the --assume-complete flag.

@fpusan
Copy link
Owner

fpusan commented Apr 5, 2022

Could you share the input bins with me here in a zip file?

@cifuj
Copy link
Author

cifuj commented Apr 5, 2022

I'm sorry, but I can't share them. Is there any other information I can provide?
I tried running it with another set of bins (5 bins medium- to high-quality >95% ANI) and I have the same error.

@fpusan
Copy link
Owner

fpusan commented Apr 5, 2022

Hm, I wonder what the problem may be. Does test-SuperPang.py work for you?
Are these "standard" bacterial/archaeal genomes? something "strange" about them?
I have an idea of a couple of things that might be throwing that error, but it's hard to know what's really happening without being able to reproduce it myself (of course it is understandable if you can't share your data).

@fpusan
Copy link
Owner

fpusan commented Apr 5, 2022

Let's test a thing
Edit the
/home/jcifuentes/miniconda3/envs/SuperPang/lib/python3.8/site-packages/SuperPang/lib/Assembler.py
In line 292 change
assert len(nvs_) == 2
for
if len(nvs_) != 2: print(nvs_); assert False
This will still die but will print some info about what's going wrong. You can share the output (a horrible bunch of numbers) with me.

EDIT: Fixed a typo in the code

@cifuj
Copy link
Author

cifuj commented Apr 6, 2022

I only obtained this output after the "Extracting forward component"
[0, 1, 2, 3]

I changed the order of the bins and SuperPang run until comp 246/718 and then it fails also after:
Extracting forward component
[0, 1, 2, 3]
Traceback (most recent call last):
File "/home/jcifuentes/miniconda3/envs/SuperPang/bin/SuperPang.py", line 291, in
main(parse_args())
File "/home/jcifuentes/miniconda3/envs/SuperPang/bin/SuperPang.py", line 144, in main
contigs = Assembler(input_minimap2, args.ksize, args.threads).run(args.minlen, args.mincov, args.bubble_identity_threshold, args.genome_assignment_threshold, args.threads)
File "/home/jcifuentes/miniconda3/envs/SuperPang/lib/python3.8/site-packages/SuperPang/lib/Assembler.py", line 322, in run
psets = get_psets(comp2nvs[nc1] | comp2nvs[nc2])
File "/home/jcifuentes/miniconda3/envs/SuperPang/lib/python3.8/site-packages/SuperPang/lib/Assembler.py", line 292, in get_psets
if len(nvs_) != 2: print(nvs_); assert False
AssertionError

This is how it look lines 291-293

 for nvs_ in psets.values():
    if len(nvs_) != 2: print(nvs_); assert False
 return psets

@fpusan
Copy link
Owner

fpusan commented Apr 7, 2022

Thanks for trying this. It seems like a different case of a bug I'd fixed before, but I would need more info
Can you now try changing the whole get_psets function to this one?

            ### Extract forward component
            def get_psets(nvs):
                """Get a dict with a frozenset of vertices as keys, and the two complementary non-branching paths sharing those vertices as values"""
                psets = defaultdict(list)
                for nv in nvs:
                    pset = frozenset(vertex2NBP[nv])
                    psets[pset].append(nv)
                for fs, nvs_ in psets.items():
                    if len(nvs_) != 2:
                        print('\nERROR!', len(nvs_), fs)
                        print()
                        for nv_ in nvs_:
                            print(nv_, vertex2NBP[nv_])
                        print('\nDONE\n')
                    assert len(nvs_) == 2
                return psets

@cifuj
Copy link
Author

cifuj commented Apr 7, 2022

This is the output after modifiying Assembler.py
SuperPang_error_cifuj_20220407.txt

@fpusan
Copy link
Owner

fpusan commented Apr 7, 2022

Thanks! This gives me something to work with... I'll come back to you if I need some extra stuff

@fpusan
Copy link
Owner

fpusan commented Apr 8, 2022

Ok, try replacing the function with this

            ### Extract forward component
            def get_psets(nvs):
                """Get a dict with a frozenset of vertices as keys, and the two complementary non-branching paths sharing those vertices as values"""
                psets = defaultdict(list)
                for nv in nvs:
                    pset = frozenset(vertex2NBP[nv])
                    psets[pset].append(nv)
                for ps, nvs_ in psets.items():
                    if len(nvs_) > 2: # If more than two NBPs share the same pset, select the first two that are RC of each other
                        for nv1, nv2 in combinations(nvs_): # this is a temporary patch, I should look into why this happens when I get a proper example
                            NBP1, NBP2 = vertex2NBP[nv1], vertex2NBP[nv2]
                            if NBP1 == NBP2[::-1]:
                                psets[ps] = [nv1, nv2]
                                break
                        else:
                            assert False # if there is no RC pair, die anyways
                for nvs_ in psets.values():         
                    assert len(nvs_) == 2
                return psets

Hopefully this'll work

@cifuj
Copy link
Author

cifuj commented Apr 8, 2022

I obtained the following error.

Traceback (most recent call last):
File "/home/jcifuentes/miniconda3/envs/SuperPang/bin/SuperPang.py", line 291, in
main(parse_args())
File "/home/jcifuentes/miniconda3/envs/SuperPang/bin/SuperPang.py", line 144, in main
contigs = Assembler(input_minimap2, args.ksize, args.threads).run(args.minlen, args.mincov, args.bubble_identity_threshold, args.genome_assignment_threshold, args.threads)
File "/home/jcifuentes/miniconda3/envs/SuperPang/lib/python3.8/site-packages/SuperPang/lib/Assembler.py", line 331, in run
psets = get_psets(comp2nvs[nc1] | comp2nvs[nc2])
File "/home/jcifuentes/miniconda3/envs/SuperPang/lib/python3.8/site-packages/SuperPang/lib/Assembler.py", line 293, in get_psets
for nv1, nv2 in combinations(nvs_): # this is a temporary patch, I should look into why this happens when I get a proper example
TypeError: combinations() missing required argument 'r' (pos 2)

I tried with 2 different set of bins from 2 different species and I obtained the same error after a different number of "Working on comp XX/YY"

@fpusan
Copy link
Owner

fpusan commented Apr 8, 2022

My fault, was missing an argument

            ### Extract forward component
            def get_psets(nvs):
                """Get a dict with a frozenset of vertices as keys, and the two complementary non-branching paths sharing those vertices as values"""
                psets = defaultdict(list)
                for nv in nvs:
                    pset = frozenset(vertex2NBP[nv])
                    psets[pset].append(nv)
                for ps, nvs_ in psets.items():
                    if len(nvs_) > 2: # If more than two NBPs share the same pset, select the first two that are RC of each other
                        for nv1, nv2 in combinations(nvs_, 2): # this is a temporary patch, I should look into why this happens when I get a proper example
                            NBP1, NBP2 = vertex2NBP[nv1], vertex2NBP[nv2]
                            if NBP1 == NBP2[::-1]:
                                psets[ps] = [nv1, nv2]
                                break
                        else:
                            assert False # if there is no RC pair, die anyways
                for nvs_ in psets.values():         
                    assert len(nvs_) == 2
                return psets

@cifuj
Copy link
Author

cifuj commented Apr 13, 2022

Now I have a different error in the function #Find reverse complement Non-Branching Paths

Traceback (most recent call last):
File "/home/jcifuentes/miniconda3/envs/SuperPang/bin/SuperPang.py", line 291, in
main(parse_args())
File "/home/jcifuentes/miniconda3/envs/SuperPang/bin/SuperPang.py", line 144, in main
contigs = Assembler(input_minimap2, args.ksize, args.threads).run(args.minlen, args.mincov, args.bubble_identity_threshold, args.genome_assignment_threshold, args.threads)
File "/home/jcifuentes/miniconda3/envs/SuperPang/lib/python3.8/site-packages/SuperPang/lib/Assembler.py", line 332, in run
assert len(psets) == len(comp2nvs[nc1]) == len(comp2nvs[nc1]) # each sequence path should have a reverse complement equivalent (same vertices in the kmer graph, reverse order)
AssertionError

@fpusan fpusan closed this as completed in a6cf029 Apr 25, 2022
@fpusan
Copy link
Owner

fpusan commented Apr 25, 2022

Hi again!
I managed to reproduce the issue with data from another user. Version 0.9.4beta1 (available in conda) has a temptative fix for this problem. Let me know if it works for you!

@cifuj
Copy link
Author

cifuj commented Apr 26, 2022

Thank you very much!
It works with the 2 datasets I had problems before.

@fpusan
Copy link
Owner

fpusan commented Apr 26, 2022

Glad to hear. Thanks for testing it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants