Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'sequence_length_template' when basecalling is turned off #37

Closed
lerminin opened this issue Mar 10, 2023 · 2 comments
Closed

Comments

@lerminin
Copy link

Hi there,

I'm trying to run duplex basecalling data from a kit14/R10.4.1 Mk1B run. During run setup, we set basecalling to OFF in MinKNOW as we are basecalling with Guppy on a separate server after the run is complete.

When I run duplex_tools pairs_from_summary using the output sequencing_summary.txt file, I get the following KeyError:

[12:15:18 - FindPairs] Duplex tools version: 0.3.1
[12:15:18 - FindPairs] Loading sequencing summary.
[12:15:30 - FindPairs] Calculating metrics.
Traceback (most recent call last):
  File "conda_envs/duplex_tools_v0.3.1/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'sequence_length_template'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "conda_envs/duplex_tools_v0.3.1/bin/duplex_tools", line 8, in <module>
    sys.exit(main())
  File "conda_envs/duplex_tools_v0.3.1/lib/python3.10/site-packages/duplex_tools/__init__.py", line 39, in main
    args.func(args)
  File "conda_envs/duplex_tools_v0.3.1/lib/python3.10/site-packages/duplex_tools/pairs_from_summary.py", line 367, in main
    find_pairs(
  File "conda_envs/duplex_tools_v0.3.1/lib/python3.10/site-packages/duplex_tools/pairs_from_summary.py", line 86, in find_pairs
    seqsummary = calculate_metrics_for_next_strand(seqsummary)
  File "conda_envs/duplex_tools_v0.3.1/lib/python3.10/site-packages/duplex_tools/pairs_from_summary.py", line 267, in calculate_metrics_for_next_strand
    seqsummary["sequence_length_template"].shift(-1)
  File "conda_envs/duplex_tools_v0.3.1/lib/python3.10/site-packages/pandas/core/frame.py", line 3807, in __getitem__
    indexer = self.columns.get_loc(key)
  File "conda_envs/duplex_tools_v0.3.1/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3804, in get_loc
    raise KeyError(key) from err
KeyError: 'sequence_length_template'

I realized this is because our sequencing_summary.txt file only has these headers, and is missing the sequence_length_template header (among others):

filename_fastq filename_fast5 filename_pod5 parent_read_id read_id run_id channel mux minknow_events start_time duration pore_type experiment_id sample_id end_reason

This only seems happens for runs where we have explicitly turned off basecalling; for the runs where basecalling in MinKNOW is enabled, all required headers in the sequencing_summary.txt file are present.

Is there a way to recover the sequence_length_template information somehow to get duplex_tools to run?

@cjw85
Copy link
Member

cjw85 commented Mar 10, 2023

Hi @lerminin,

You need to provide the sequencing_summary.txt file produced by Guppy during basecalling.

@lerminin
Copy link
Author

whoops silly oversight on my part - working now with the proper file, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants