-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loading data using builder.py fails #52
Comments
A similar error related to building the database is: 2021-02-14 18:17:31,592 - importer - ERROR - Writing Stats object full_stats_1_0 in file:/CKG/src/graphdb_builder/../../data/imports/stats/stats.hdf > Trying to store a string with len [9] in [date] column but 2021-02-14 18:17:31,296 - ontologies_controller - ERROR - Error: Tag-value pair parsing failed for: 2021-02-14 18:40:09,476 - database_controller - ERROR - Database UniProt: (<class 'Exception'>, Exception('Something went wrong. Exception raised when an error code signifying a permanent error. 550 Failed to open file..\nURL:ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/reference_proteomes/Eukaryota/UP000005640_9606.fasta.gz.\nURL:ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/reference_proteomes/Eukaryota/UP000005640_9606.fasta.gz'), <traceback object at 0x10a961780>), file: databases_controller.py,line: 66 -1 / unknownDone Parsing database GWASCatalog 2021-02-15 04:28:00,160 - database_controller - ERROR - Database DGIdb: (<class 'Exception'>, Exception("mapping - No mapping file ../../../data/databases/DrugBank/complete_mapping.tsv for entity Drug. Error: [Errno 2] No such file or directory: '../../../data/databases/DrugBank/complete_mapping.tsv'"), <traceback object at 0x1927e3640>), file: databases_controller.py,line: 143 |
The first error: 2021-02-14 18:18:26,308 - database_controller - ERROR - Database DrugBank: (<class 'lxml.etree.XMLSyntaxError'>, XMLSyntaxError('Document is empty, line 1, column 1'), <traceback object at 0x19d61a3c0>), file: databases_controller.py,line: 205 Came from the download from the DrugBank decompressing the file. Using the OS X compress created a directory in the archive __MACOSX, which was causing the issue during parsing. Fixed by : Can be fixed after the fact by zip -d filename.zip __MACOSX/* https://stackoverflow.com/questions/10924236/mac-zip-compress-without-macosx-folder |
uniprot error: 2021-02-14 18:40:09,476 - database_controller - ERROR - Database UniProt: (<class 'Exception'>, Exception('Something went wrong. Exception raised when an error code signifying a permanent error. 550 Failed to open file..\nURL:ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/reference_proteomes/Eukaryota/UP000005640_9606.fasta.gz.\nURL:ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/reference_proteomes/Eukaryota/UP000005640_9606.fasta.gz'), <traceback object at 0x10a961780>), file: databases_controller.py,line: 66 updated to this line to fix the file path: 9 uniprot_fasta_file: 'ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/reference_proteomes/Eukaryota/UP000005640/UP000005640_9606.fasta.gz' ./src/graphdb_builder/databases/config/uniprotConfig.yml |
The error for ICD10 code import fails because the input file seems incompatible with the parser. I am not sure what the correct file should be, the parser (ontologies/parsers/icdParser.py) seems to suggest that it should be a tab separated file with at least 6 columns, the downloaded file is just two columns and does not have any tabs: ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Publications/ICD10CM/2020/icd10cm_codes_2020.txt |
Hi, apologies for the late response. ICD10 codes are not included in this version of CKG. The parser we committed last year was in development and was not finalized. Closing until there is a parser supporting this node type. |
Describe the bug
loading data into the database fails when trying to load the latest version of drugdb. seems like the file names and potential the format of the input file have changed.
2021-02-14 18:18:26,308 - database_controller - ERROR - Database DrugBank: (<class 'lxml.etree.XMLSyntaxError'>, XMLSyntaxError('Document is empty, line 1, column 1'), <traceback object at 0x19d61a3c0>), file: databases_controller.py,line: 205
To Reproduce
Steps to reproduce the behavior:
go to the builder.py and execute with standard command for minimal or full
Expected behavior
no errors in the log
The text was updated successfully, but these errors were encountered: