Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

amptk database error #79

Closed
yuanyuan12543 opened this issue Jan 17, 2021 · 2 comments
Closed

amptk database error #79

yuanyuan12543 opened this issue Jan 17, 2021 · 2 comments

Comments

@yuanyuan12543
Copy link

Hi, I am trying to use the amptk command to construct the chordates COI data download from the link provided by this website.
https://amptk.readthedocs.io/en/latest/taxonomy.html
Firtly, I did the reformat taxonomy, it worked out. However, when I tried to make the database using the amptk database command, it did not work out.

Here is the command I used:
amptk database -i chordates.bold-reformated.fasta -f GGWACWGGWTGAACWGTWTAYCCYCC -r CCNCCTCCNGCWGGRTCRAARAA --primer_required none --derep_fulllength --format off --primer_mismatch 10 -o COI --min_len 200 --create_db usearch

And here is the error message I got
[12:49:15 PM]: OS: Red Hat Enterprise Linux Server 7.7, 32 cores, ~ 131 GB RAM. Python: 3.7.9
[12:49:15 PM]: Base name set to: COI
[12:49:15 PM]: Searching for primers, this may take awhile: Fwd: GGWACWGGWTGAACWGTWTAYCCYCC Rev: TTYTTYGAYCCWGCNGGAGGNGG
[12:49:16 PM]: 178,744 records loaded
[12:49:16 PM]: Using 32 cpus to process data
[12:49:30 PM]: 173,536 records passed (97.09%)
[12:49:30 PM]: Errors: 0 no taxonomy info, 0 no ID, 5,208 length out of range, 0 too many ambiguous bases, 0 no primers found
[12:49:30 PM]: Now dereplicating sequences (collapsing identical sequences)
Traceback (most recent call last):
File "/apps/amptk/1.5.1/bin/amptk", line 788, in
main()
File "/apps/amptk/1.5.1/bin/amptk", line 779, in main
mod.main(arguments)
File "/apps/amptk/1.5.1/lib/python3.7/site-packages/amptk/extract_region.py", line 614, in main
dereplicate(derep_tmp, OutName, args=args)
File "/apps/amptk/1.5.1/lib/python3.7/site-packages/amptk/extract_region.py", line 58, in dereplicate
if not sequence in seqs:
TypeError: unhashable type: 'dict'

How can not taxonomy information found. When I checked the chordates.bold-reformated.fasta file, the file looks very good. I am not sure what is the problem with my script. Can anybody help me out with it?

Thanks!

Junli

@JonathanVanHamme
Copy link

Hi Junli,

It looks like the log file is saying that you have zero errors in the taxonomy information, zero records with no ID, 5208 with length out of range etc. So, I think you are fine on that front.

I had a similar problem with "line 58" (see your error message) as you can read here:

#77

Jon updated the code for AMPtk 1.5.1 to fix a typo in the dereplicate function, so you may need to update your install and run it again. You can read Jon's comment in the link above, this is the command he provided:

python -m pip install --no-deps --force git+https://github.com/nextgenusfs/amptk.git

That worked for me. Good luck!
Jon

@nextgenusfs
Copy link
Owner

Hi @yuanyuan12543 -- it looks like you've set the minimum read length at 200 bp and then the log file says 5,208 length out of range. So I'm guessing your fragment is shorter than 200 bp? The COI primers typically amplify something around 150 bp if my memory is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants