Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError("All arrays must be of the same length") #12

Open
smilenaderi opened this issue Jun 21, 2023 · 4 comments
Open

ValueError("All arrays must be of the same length") #12

smilenaderi opened this issue Jun 21, 2023 · 4 comments

Comments

@smilenaderi
Copy link

smilenaderi commented Jun 21, 2023

Bug Description

I tried to run it on the following fasta file it gives me this error:

>seq-2
MKKKKKKKLKKLKKKLKKKLKKKKKLLLLLLLLKKKKKKK
>seq-9
MKKKIKKIKKKIEKKKKKKLKKLKKKKKKKKLLLLLLLLL
>seq-10
MSEKFSEIAEKYDEERILSRSAGELAELTRELGLKPGDRVLDVGCGTGYLTLPLAERVGPEGTVIGIDRSEEMLARARERAAAAGLSNVEFQVADAEALPFPDESFDLVTCRLVLHHLPDPAKALREMRRVLKPGGRFVVSDWDASSMAFPDEEAELAERLRRYAEARAAAGGERDALRRALEAAGFRDVTVRSLTAWRRRAGEAAAAAL
>seq-13
MKKKKKLKKKLKKKKKKKK

Runtime Environment

Fresh install of requirements

Logs

annopro -i test_proteins.fasta -o output-test
Download cafa4.dmnd...
100% [........................................................................] 46988123 / 46988123
Validate md5sum of cafa4.dmnd...
diamond v2.1.0.154 (C) Max Planck Society for the Advancement of Science
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

#CPU threads: 4
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: output-test
#Target sequences to report alignments for: 25
Opening the database...  [0.042s]
Database: /home/ubuntu/.annopro/data/cafa4.dmnd (type: Diamond database, sequences: 87514, letters: 44798577)
Block size = 2000000000
Opening the input file...  [0s]
Opening the output file...  [0s]
Loading query sequences...  [0s]
Masking queries...  [0.001s]
Algorithm: Double-indexed
Building query histograms...  [0s]
Loading reference sequences...  [0.055s]
Masking reference...  [0.588s]
Initializing temporary storage...  [0s]
Building reference histograms...  [0.493s]
Allocating buffers...  [0s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 1/4.
Building reference seed array...  [0.163s]
Building query seed array...  [0s]
Computing hash join...  [0.004s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 2/4.
Building reference seed array...  [0.192s]
Building query seed array...  [0s]
Computing hash join...  [0.002s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 3/4.
Building reference seed array...  [0.213s]
Building query seed array...  [0s]
Computing hash join...  [0.003s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 1/2, index chunk 4/4.
Building reference seed array...  [0.154s]
Building query seed array...  [0s]
Computing hash join...  [0.003s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 2/2, index chunk 1/4.
Building reference seed array...  [0.155s]
Building query seed array...  [0s]
Computing hash join...  [0.003s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 2/2, index chunk 2/4.
Building reference seed array...  [0.19s]
Building query seed array...  [0s]
Computing hash join...  [0.003s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 2/2, index chunk 3/4.
Building reference seed array...  [0.211s]
Building query seed array...  [0s]
Computing hash join...  [0.002s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 2/2, index chunk 4/4.
Building reference seed array...  [0.154s]
Building query seed array...  [0s]
Computing hash join...  [0.004s]
Masking low complexity seeds...  [0s]
Searching alignments...  [0s]
Deallocating memory...  [0s]
Deallocating buffers...  [0.004s]
Clearing query masking...  [0s]
Computing alignments... Loading trace points...  [0.001s]
Sorting trace points...  [0s]
Computing alignments...  [0s]
Deallocating buffers...  [0s]
Loading trace points...  [0s]
 [0.002s]
Deallocating reference...  [0.002s]
Loading reference sequences...  [0s]
Deallocating buffers...  [0s]
Deallocating queries...  [0s]
Loading query sequences...  [0s]
Closing the input file...  [0s]
Closing the output file...  [0s]
Closing the database...  [0.002s]
Cleaning up...  [0s]
Total time = 2.766s
Reported 21 pairwise alignments, 21 HSPs.
1 queries aligned.
Invalid feature 0.6934-309 for seq-13 at line 596
Invalid feature 0.6934-309 for seq-13 at line 596
Invalid feature 0.5127-315 for seq-13 at line 596
Invalid feature 0.5127-315 for seq-13 at line 596
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/annopro/bin/annopro", line 8, in <module>
    sys.exit(console_main())
  File "/home/ubuntu/anaconda3/envs/annopro/lib/python3.8/site-packages/annopro/__init__.py", line 27, in console_main
    main(
  File "/home/ubuntu/anaconda3/envs/annopro/lib/python3.8/site-packages/annopro/__init__.py", line 71, in main
    process(
  File "/home/ubuntu/anaconda3/envs/annopro/lib/python3.8/site-packages/annopro/data_procession/__init__.py", line 8, in process
    data = Data_process(protein_file=profeat_file,
  File "/home/ubuntu/anaconda3/envs/annopro/lib/python3.8/site-packages/annopro/data_procession/data_predict.py", line 36, in __init__
    self.__data__()
  File "/home/ubuntu/anaconda3/envs/annopro/lib/python3.8/site-packages/annopro/data_procession/data_predict.py", line 39, in __data__
    proteins_f = profeat_to_df(self.protein_file)
  File "/home/ubuntu/anaconda3/envs/annopro/lib/python3.8/site-packages/profeat/__init__.py", line 69, in profeat_to_df
    return pd.DataFrame(feature_list).T
  File "/home/ubuntu/anaconda3/envs/annopro/lib/python3.8/site-packages/pandas/core/frame.py", line 636, in __init__
    mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
  File "/home/ubuntu/anaconda3/envs/annopro/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 502, in dict_to_mgr
    return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
  File "/home/ubuntu/anaconda3/envs/annopro/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 120, in arrays_to_mgr
    index = _extract_index(arrays)
  File "/home/ubuntu/anaconda3/envs/annopro/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 674, in _extract_index
    raise ValueError("All arrays must be of the same length")
ValueError: All arrays must be of the same length
@swallow-design
Copy link
Contributor

The error is likely due to a problem with profeat when calculating protein features, possibly because profeat cannot recognize your input sequence. If it is convenient for you, please provide us with the complete sequence file for analysis or use our website: https://idrblab.org/annopro

@swallow-design
Copy link
Contributor

We recently reproduced the same bug during testing, and found that there were multiple protein sequences with the same ID. Perhaps you have encountered a similar problem and can investigate it.

@Jialeen
Copy link

Jialeen commented Apr 7, 2024

The error is likely due to a problem with profeat when calculating protein features, possibly because profeat cannot recognize your input sequence. If it is convenient for you, please provide us with the complete sequence file for analysis or use our website: https://idrblab.org/annopro

I run with the data from https://idrblab.org/annopro, but the problem still exists: ValueError: All arrays must be of the same length

@1813805349
Copy link

The error is likely due to a problem with profeat when calculating protein features, possibly because profeat cannot recognize your input sequence. If it is convenient for you, please provide us with the complete sequence file for analysis or use our website: https://idrblab.org/annopro

I run with the data from https://idrblab.org/annopro, but the problem still exists: ValueError: All arrays must be of the same length

This problem is caused by the amino acid sequence length being less than 30 during profeat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants