You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A small but non-zero number of the ontology transforms can't be parsed by pandas properly. This is probably caught by one or another of the existing validations but when it gets to the kg-bioportal merge step this becomes an issue like the following:
15:44:08 Traceback (most recent call last):
15:44:08 File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
15:44:08 result = (True, func(*args, **kwds))
15:44:08 File "/var/lib/jenkins/workspace/NCBO/kg-bioportal/gitrepo/venv/lib/python3.8/site-packages/kgx/cli/cli_utils.py", line 809, in parse_source
15:44:08 transformer.transform(input_args)
15:44:08 File "/var/lib/jenkins/workspace/NCBO/kg-bioportal/gitrepo/venv/lib/python3.8/site-packages/kgx/transformer.py", line 303, in transform
15:44:08 self.process(source_generator, sink)
15:44:08 File "/var/lib/jenkins/workspace/NCBO/kg-bioportal/gitrepo/venv/lib/python3.8/site-packages/kgx/transformer.py", line 343, in process
15:44:08 for rec in source:
15:44:08 File "/var/lib/jenkins/workspace/NCBO/kg-bioportal/gitrepo/venv/lib/python3.8/site-packages/kgx/source/tsv_source.py", line 184, in parse
15:44:08 for chunk in file_iter:
15:44:08 File "/var/lib/jenkins/workspace/NCBO/kg-bioportal/gitrepo/venv/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1187, in __next__
15:44:08 return self.get_chunk()
15:44:08 File "/var/lib/jenkins/workspace/NCBO/kg-bioportal/gitrepo/venv/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1284, in get_chunk
15:44:08 return self.read(nrows=size)
15:44:08 File "/var/lib/jenkins/workspace/NCBO/kg-bioportal/gitrepo/venv/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1254, in read
15:44:08 index, columns, col_dict = self._engine.read(nrows)
15:44:08 File "/var/lib/jenkins/workspace/NCBO/kg-bioportal/gitrepo/venv/lib/python3.8/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 230, in read
15:44:08 data = self._reader.read(nrows)
15:44:08 File "pandas/_libs/parsers.pyx", line 787, in pandas._libs.parsers.TextReader.read
15:44:08 File "pandas/_libs/parsers.pyx", line 861, in pandas._libs.parsers.TextReader._read_rows
15:44:08 File "pandas/_libs/parsers.pyx", line 847, in pandas._libs.parsers.TextReader._tokenize_rows
15:44:08 File "pandas/_libs/parsers.pyx", line 1960, in pandas._libs.parsers.raise_parser_error
15:44:08 pandas.errors.ParserError: Error tokenizing data. C error: Expected 8 fields in line 6, saw 9
Most of these errors are due to #32 so the solution is to re-transform, but the pandas error does not specify which ontology led to the error. A pre-screening in this repo would be helpful: just load each graph file into pandas and warn loudly if it doesn't parse.
The text was updated successfully, but these errors were encountered:
A small but non-zero number of the ontology transforms can't be parsed by
pandas
properly. This is probably caught by one or another of the existing validations but when it gets to the kg-bioportal merge step this becomes an issue like the following:Most of these errors are due to #32 so the solution is to re-transform, but the pandas error does not specify which ontology led to the error. A pre-screening in this repo would be helpful: just load each graph file into pandas and warn loudly if it doesn't parse.
The text was updated successfully, but these errors were encountered: