You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After the release of #120 in v4.6.2, I now get the following error traceback when trying to parse fastas:
99 with _get_filesystem(fasta_uri).open(fasta_uri, "r") as fastafile:
100 results = []
--> 101 for description, sequence in MyUniProt(fastafile):
102 description['sequence'] = sequence
103 results.append(description)
File /opt/conda/lib/python3.8/site-packages/pyteomics/auxiliary/file_helpers.py:178, in IteratorContextManager.__next__(self)
176 def __next__(self):
177 # try:
--> 178 return next(self._reader)
File /opt/conda/lib/python3.8/site-packages/pyteomics/fasta.py:232, in FASTA._read(self)
230 sequence = sequence[:-1]
231 if self.parser is not None:
--> 232 description = self.parser(description)
233 yield Protein(description, sequence)
234 accumulated_strings = [stripped_string[1:]]
File /opt/conda/lib/python3.8/site-packages/pyteomics/fasta.py:144, in _add_raw_field.<locals>._new_parser(instance, descr)
142 parsed[RAW_HEADER_KEY] = descr
143 else:
--> 144 raise aux.PyteomicsError('Cannot save raw protein header, since the corresponsing'
145 'key ({}) already exists.'.format(RAW_HEADER_KEY))
146 return parsed
PyteomicsError: Pyteomics error, message: 'Cannot save raw protein header, since the corresponsingkey (__raw__) already exists.'
MyUniProt is just a custom parser with a more robust regex pattern:
class MyUniProt(fasta.UniProt):
"""Redefine the header-parsing pattern to tolerate '-' in the entry field."""
header_pattern = r'^(?P<db>\w+)\|(?P<id>[-\w]+)\|(?P<entry>[-\w]+)\s+(?P<name>.*?)(?:(\s+OS=(?P<OS>[^=]+))|(\s+OX=(?P<OX>\d+))|(\s+GN=(?P<GN>\S+))|(\s+PE=(?P<PE>\d))|(\s+SV=(?P<SV>\d+)))*\s*$'
def parser(self, header):
"""
Catch errors when parsing a header and return a simpler dict; this allows
parsing FASTAs where not all entries are in a valid Uniprot format.
"""
try:
return fasta.UniProt.parser(self, header)
except:
_logger.warning("Error parsing header: %s", header, exc_info=True)
return {
"id": header,
"entry": header,
}
This parsing works without a problem in v4.6.1.
The text was updated successfully, but these errors were encountered:
I see, the parent method is already wrapped, but the metaclass cannot tell, so it tries to wrap it again. The interim solution is to remove fasta.RAW_HEADER_KEY from the return value of fasta.UniProt.parser(self, header) before returning it.
The longer term solution would be to modify the check in the _add_raw_field wrapper so that if the fasta.RAW_HEADER_KEY key is present, if its value is the same as the string we would assign to it otherwise, don't throw an error.
After the release of #120 in v4.6.2, I now get the following error traceback when trying to parse fastas:
MyUniProt
is just a custom parser with a more robust regex pattern:This parsing works without a problem in v4.6.1.
The text was updated successfully, but these errors were encountered: