Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing XML files? #2

Closed
kasev opened this issue Nov 11, 2020 · 3 comments
Closed

missing XML files? #2

kasev opened this issue Nov 11, 2020 · 3 comments
Assignees

Comments

@kasev
Copy link
Contributor

kasev commented Nov 11, 2020

to check what files are missing from the xml files for download we are using.

@kasev kasev self-assigned this Nov 11, 2020
@kasev
Copy link
Contributor Author

kasev commented Nov 11, 2020

in the xml data for download (version from August 12, 2020), there is currently 81156 xml files. 5 of these files are empty (zero bytes):

['./xml/HD070403.xml',
 './xml/HD071377.xml',
 './xml/HD071378.xml',
 './xml/HD072745.xml',
 './xml/HD072755.xml']

That means we have 81,151. That is somehow less than we can via the API, which currently returns 81,476.

@petrifiedvoices
Copy link
Member

The following five XML files have a 0 bytes size in the data dump, but they exist on EDH web when searching for individual records - I will report them to EDH directly, so they can be fixed and added to the dump.

edhEpidocDump_HD070001-HD082046/xml/HD070403.xml
edhEpidocDump_HD070001-HD082046/xml/HD071377.xml
edhEpidocDump_HD070001-HD082046/xml/HD071378.xml
edhEpidocDump_HD070001-HD082046/xml/HD072745.xml
edhEpidocDump_HD070001-HD082046/xml/HD072755.xml

The files HD082047 - HD082389 do not have their XML files yet available in the dump, but they are available through the API. Their XML file is available when searching for individual records but not as bulk-data, since they were created post 12 Aug 2020 (the date when the latest XML dump was created).

We know there is nothing wrong with our script, but it's the problem with the source of the data.

@petrifiedvoices
Copy link
Member

old problem, solved long time ago

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants