Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: 264 columns passed, passed data had 265 columns - Prime Editing #367

Closed
mbiokyle29 opened this issue Jan 9, 2024 · 0 comments
Closed

Comments

@mbiokyle29
Copy link
Contributor

Describe the bug
An un-handeled error seems to occur at this line:
https://github.com/pinellolab/CRISPResso2/blob/master/CRISPResso2/CRISPRessoCORE.py#L4568

The code is:

pe_modification_percentage_summary_df = pd.DataFrame(mod_pcts, columns=colnames).apply(pd.to_numeric,errors='ignore')

The error is:

ipdb> pe_modification_percentage_summary_df = pd.DataFrame(mod_pcts, columns=colnames).apply(pd.to_numeric,errors='ignore')
*** ValueError: 264 columns passed, passed data had 265 columns

In this particular case, the following 2 things happened:

  • The intended edit here is a single bp deletion and a few mismatches/point mutations, the prime edited amplicon sequence is 1 bp less than the reference
  • It seems that 100% of the reads which aligned went to the Prime-edited amplicon (nothing went to reference), which is maybe related?

As far as I can tell, the pe_modification_percentage_summary_df should have 2 label columns and then 1 column per base position in the prime edited amplicon sequence (per: list(refs[ref_names_for_pe[0]]['sequence'])). The data for the rows comes from ref1_all_insertion_count_vectors. In this case, the length of the raw data array in ref1_all_insertion_count_vectors for the prime edited ref does not match the length of the prime edited sequence (it matches the length of the reference).

A few relevant things from my debugging:

ipdb> len(refs['Prime-edited']['sequence'])
262
ipdb> len(refs['Reference']['sequence'])
263
ipdb> len(ref1_all_insertion_count_vectors['Prime-edited'])
263
ipdb> len(ref1_all_insertion_count_vectors['Reference'])
263

# the totals field (last in mod_pcts) has the correct length
ipdb> len(mod_pcts[-1])
264
ipdb> len(mod_pcts[0])
265

Expected behavior

  • The tool should not crash here
  • The tool should accommodate the case where the prime editing changes the length of the expected amplicon

To reproduce
The best I can do publicly is (I will reach out via email, the data is confidential):

> CRISPResso -n ############ -a ############ -o ./output --min_frequency_alleles_around_cut_to_plot 0.05 --max_rows_alleles_around_cut_to_plot 50 --needleman_wunsch_gap_extend -1 --fastq_r1 ./input/##########.fastq.gz --fastq_r2 ./input/P########.fastq.gz --max_paired_end_reads_overlap 500 -w 5 -wc 0 --ignore_substitutions --prime_editing_pegRNA_spacer_seq ############## --prime_editing_pegRNA_extension_seq ############## --prime_editing_pegRNA_extension_quantification_window_size 5 --prime_editing_pegRNA_scaffold_seq ################ --prime_editing_pegRNA_scaffold_min_match_length 3 --debug

Debug output
Paste the entire output when you run CRISPResso with the flag --debug.

Here is the output (with redactions as necessary, again will reach out):

> CRISPResso -n ############ -a ############ -o ./output --min_frequency_alleles_around_cut_to_plot 0.05 --max_rows_alleles_around_cut_to_plot 50 --needleman_wunsch_gap_extend -1 --fastq_r1 ./input/##########.fastq.gz --fastq_r2 ./input/P########.fastq.gz --max_paired_end_reads_overlap 500 -w 5 -wc 0 --ignore_substitutions --prime_editing_pegRNA_spacer_seq ############## --prime_editing_pegRNA_extension_seq ############## --prime_editing_pegRNA_extension_quantification_window_size 5 --prime_editing_pegRNA_scaffold_seq ################ --prime_editing_pegRNA_scaffold_min_match_length 3 --debug

                               ~~~CRISPResso 2~~~
        -Analysis of genome editing outcomes from deep sequencing data-

                                        _
                                       '  )
                                       .-'
                                      (____
                                   C)|     \
                                     \     /
                                      \___/

                          [CRISPResso version 2.2.11]
[Note that starting in version 2.1.0 insertion quantification has been changed
to only include insertions completely contained by the quantification window.
To use the legacy quantification method (i.e. include insertions directly adjacent
to the quantification window) please use the parameter --use_legacy_insertion_quantification]
                 [For support contact kclement@mgh.harvard.edu]

WARNING @ Tue, 09 Jan 2024 08:54:55:
     Folder ############################## already exists.

INFO  @ Tue, 09 Jan 2024 08:54:55:
     Alignment between extension sequence and reference sequence:
---------------------------------------------------------------################-################------------------------------------------------------------------------------------------------------------------------------------------------------------------------
############

INFO  @ Tue, 09 Jan 2024 08:54:55:
     Using cut points from Reference as template for other references

INFO  @ Tue, 09 Jan 2024 08:54:55:
     Reference 'Prime-edited' has cut points defined: [93, 80]. Not inferring.

INFO  @ Tue, 09 Jan 2024 08:54:55:
     Reference 'Prime-edited' has sgRNA_intervals defined: [(63, 93), (61, 79)]. Not inferring.

INFO  @ Tue, 09 Jan 2024 08:54:55:
     Estimating average read length...

INFO  @ Tue, 09 Jan 2024 08:54:55:
     Checking average read length from ./input/################.fastq.gz

INFO  @ Tue, 09 Jan 2024 08:54:55:
     Average read length is 124 from ./input/################.fastq.gz

INFO  @ Tue, 09 Jan 2024 08:54:55:
     Merging paired sequences with Flash...

INFO  @ Tue, 09 Jan 2024 08:54:55:
     Running FLASH command: flash "./input/################.fastq.gz" "./input/################.fastq.gz" --min-overlap 10 --max-overlap 500 --allow-outies -z -d ############## -o out >> ################/CRISPResso_RUNNING_LOG.txt 2>&1

INFO  @ Tue, 09 Jan 2024 08:54:55:
     Done!

INFO  @ Tue, 09 Jan 2024 08:54:55:
     Aligning sequences...

INFO  @ Tue, 09 Jan 2024 08:54:55:
     Processing pegRNA scaffold sequence...

INFO  @ Tue, 09 Jan 2024 08:54:55:
     Searching for scaffold-templated reads with the sequence: 'GCA' starting at position 94 in reads that align to the prime-edited sequence

INFO  @ Tue, 09 Jan 2024 08:54:55:
     Processing reads; N_TOT_READS: 0 N_COMPUTED_ALN: 0 N_CACHED_ALN: 0 N_COMPUTED_NOTALN: 0 N_CACHED_NOTALN: 0

INFO  @ Tue, 09 Jan 2024 08:54:56:
     Finished reads; N_TOT_READS: 5359 N_COMPUTED_ALN: 405 N_CACHED_ALN: 1691 N_COMPUTED_NOTALN: 335 N_CACHED_NOTALN: 2928

INFO  @ Tue, 09 Jan 2024 08:54:56:
     Done!

INFO  @ Tue, 09 Jan 2024 08:54:56:
     Quantifying indels/substitutions...

INFO  @ Tue, 09 Jan 2024 08:54:57:
     Done!

INFO  @ Tue, 09 Jan 2024 08:54:57:
     Calculating allele frequencies...

INFO  @ Tue, 09 Jan 2024 08:54:57:
     Done!

INFO  @ Tue, 09 Jan 2024 08:54:57:
     Saving processed data...

INFO  @ Tue, 09 Jan 2024 08:54:57:
     Making Plots...

INFO  @ Tue, 09 Jan 2024 08:55:11:
     Processing pegRNA scaffold sequence...

INFO  @ Tue, 09 Jan 2024 08:55:11:
     Searching for scaffold-templated reads with the sequence: 'GCA' starting at position 94 in reads that align to the prime-edited sequence

Traceback (most recent call last):
  File "/path/CRISPResso2-2.2.11/.venv/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 969, in _finalize_columns_and_data
    columns = _validate_or_indexify_columns(contents, columns)
  File "/path/CRISPResso2-2.2.11/.venv/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 1017, in _validate_or_indexify_columns
    raise AssertionError(
AssertionError: 264 columns passed, passed data had 265 columns

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/path/CRISPResso2-2.2.11/.venv/lib/python3.10/site-packages/CRISPResso2-2.2.11-py3.10-macosx-12.3-x86_64.egg/CRISPResso2/CRISPRessoCORE.py", line 4470, in main
    pe_modification_percentage_summary_df = pd.DataFrame(mod_pcts, columns=colnames).apply(pd.to_numeric,errors='ignore')
  File "/path/CRISPResso2-2.2.11/.venv/lib/python3.10/site-packages/pandas/core/frame.py", line 746, in __init__
    arrays, columns, index = nested_data_to_arrays(
  File "/path/CRISPResso2-2.2.11/.venv/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 510, in nested_data_to_arrays
    arrays, columns = to_arrays(data, columns, dtype=dtype)
  File "/path/CRISPResso2-2.2.11/.venv/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 875, in to_arrays
    content, columns = _finalize_columns_and_data(arr, columns, dtype)
  File "/path/CRISPResso2-2.2.11/.venv/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 972, in _finalize_columns_and_data
    raise ValueError(err) from err
ValueError: 264 columns passed, passed data had 265 columns
CRITICAL @ Tue, 09 Jan 2024 08:55:11:
     Traceback (most recent call last):
  File "/path/CRISPResso2-2.2.11/.venv/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 969, in _finalize_columns_and_data
    columns = _validate_or_indexify_columns(contents, columns)
  File "/path/CRISPResso2-2.2.11/.venv/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 1017, in _validate_or_indexify_columns
    raise AssertionError(
AssertionError: 264 columns passed, passed data had 265 columns

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/path/CRISPResso2-2.2.11/.venv/lib/python3.10/site-packages/CRISPResso2-2.2.11-py3.10-macosx-12.3-x86_64.egg/CRISPResso2/CRISPRessoCORE.py", line 4470, in main
    pe_modification_percentage_summary_df = pd.DataFrame(mod_pcts, columns=colnames).apply(pd.to_numeric,errors='ignore')
  File "/path/CRISPResso2-2.2.11/.venv/lib/python3.10/site-packages/pandas/core/frame.py", line 746, in __init__
    arrays, columns, index = nested_data_to_arrays(
  File "/path/CRISPResso2-2.2.11/.venv/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 510, in nested_data_to_arrays
    arrays, columns = to_arrays(data, columns, dtype=dtype)
  File "/path/CRISPResso2-2.2.11/.venv/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 875, in to_arrays
    content, columns = _finalize_columns_and_data(arr, columns, dtype)
  File "/path/CRISPResso2-2.2.11/.venv/lib/python3.10/site-packages/pandas/core/internals/construction.py", line 972, in _finalize_columns_and_data
    raise ValueError(err) from err
ValueError: 264 columns passed, passed data had 265 columns


CRITICAL @ Tue, 09 Jan 2024 08:55:11:
     Unexpected error, please check your input.

ERROR: 264 columns passed, passed data had 265 columns
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants