Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kmer_finder test failing in architecture i386 #730

Closed
linqigang888 opened this issue Sep 21, 2023 · 5 comments
Closed

kmer_finder test failing in architecture i386 #730

linqigang888 opened this issue Sep 21, 2023 · 5 comments

Comments

@linqigang888
Copy link

While preparing to bring cutadapt 4.4 with Python 3.11 into Debian, the tests for kmer_finder worked as expected in amd64, but for i386, test_kmer_finder_initialize_total_greater_than_max resulted in the following error:

    def test_kmer_finder_initialize_total_greater_than_max():
>       kmer_finder = KmerFinder([(0, None, ["A" * 31, "B" * 31, "C" * 31, "D" * 43])])

tests/test_kmer_finder.py:85: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   ValueError: DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD of length 43 is longer than the maximum of 31.

src/cutadapt/_kmer_finder.pyx:141: ValueError
============================================================================================= short test summary info =============================================================================================
FAILED tests/test_kmer_finder.py::test_kmer_finder_initialize_total_greater_than_max - ValueError: DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD of length 43 is longer than the maximum of 31.
=================================================================================== 1 failed, 621 passed, 1 deselected in 5.87s ===================================================================================

Here is the current patch prepared for Debian to accept differing test results:

--- a/tests/test_kmer_finder.py
+++ b/tests/test_kmer_finder.py
@@ -82,12 +82,15 @@
 
 
 def test_kmer_finder_initialize_total_greater_than_max():
-    kmer_finder = KmerFinder([(0, None, ["A" * 31, "B" * 31, "C" * 31, "D" * 43])])
-    assert kmer_finder.kmers_present("X" * 100 + "A" * 31)
-    assert kmer_finder.kmers_present("X" * 100 + "B" * 31)
-    assert kmer_finder.kmers_present("X" * 100 + "C" * 31)
-    assert kmer_finder.kmers_present("X" * 100 + "D" * 43)
-    assert not kmer_finder.kmers_present(string.ascii_letters)
+    try:
+        kmer_finder = KmerFinder([(0, None, ["A" * 31, "B" * 31, "C" * 31, "D" * 43])])
+        assert kmer_finder.kmers_present("X" * 100 + "A" * 31)
+        assert kmer_finder.kmers_present("X" * 100 + "B" * 31)
+        assert kmer_finder.kmers_present("X" * 100 + "C" * 31)
+        assert kmer_finder.kmers_present("X" * 100 + "D" * 43)
+        assert not kmer_finder.kmers_present(string.ascii_letters)
+    except ValueError as error:
+        assert ("D" * 43) in str(error)
 
 
 def test_kmer_finder_finds_all():

I am sure this test could be handled differently, so I have left it as an issue instead of a pr.

@marcelm
Copy link
Owner

marcelm commented Sep 21, 2023

Thanks for the report. It’s possible that the test failure actually indicates the code won’t work on 32 bit because I see a hardcoded DEF MAX_WORD_SIZE = 64 in the code. I’ll wait for @rhpvorderman to chime in. We could in the simplest case just disable the k-mer heuristic (with identical results, it’s just somewhat slower) if Cutadapt is running on a 32-bit machine.

@rhpvorderman
Copy link
Collaborator

Isn't it more convenient to not support i386? I mean, you have done your wonderful cutting in cutadapt... And then, what? You are going to try assembly or mapping on a platform that can only address 2GiB of memory?

Anyway, I think the most convenient fix is to change this line:

https://github.com/marcelm/cutadapt/blob/main/src/cutadapt/_kmer_finder.pyx#L49C1-L49C26

And not use size_t, but use uint64_t instead. Then it should work on 32-bit as well. (Allthough it will be much slower than using a size_t)

@rhpvorderman
Copy link
Collaborator

I made a small patch #731. @linqigang888 does that fix your problem?

@linqigang888
Copy link
Author

Thank you. I have replaced the patch in Debian with #731.

@rhpvorderman
Copy link
Collaborator

rhpvorderman commented Sep 25, 2023

Awesome! Always glad to help out Debian. My favourite operating system ❤️. Thank you for maintaining cutadapt in the Debian repos!

@marcelm marcelm closed this as completed Sep 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants