Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix imported data that are missing raw info (scraping error occurred) #98

Open
georgeslabreche opened this issue Jul 17, 2017 · 3 comments
Assignees
Milestone

Comments

@georgeslabreche
Copy link

Get all documents that are missing raw info with the following query:

db.businesses.find({"raw.info": []}).count()

@georgeslabreche georgeslabreche self-assigned this Jul 17, 2017
@diamanthaxhimusa
Copy link
Contributor

@georgeslabreche this has to wait untill we have the rest of the data, in order to prevent doing same thing twice.

@dafinaolluri dafinaolluri added this to the Sprint 10 milestone Jul 24, 2017
@georgeslabreche
Copy link
Author

georgeslabreche commented Jul 26, 2017

Here is the businesses collection dump for those 6 businesses: arbk-data-dump-empty-docs-fix.zip

Be sure to delete the 6 documents in the database before you import this dump so that you do not have documents with duplicate registration numbers!!!!

  1. db.businesses.find({"raw.info": []}).count() --> should return 6.
  2. db.businesses.remove({"raw.info": []}) ---> should return WriteResult({ "nRemoved" : 6 }).
  3. db.businesses.find({"raw.info": []}).count() --> should return 0.
  4. Import the data dump.
  5. db.businesses.find({"raw.info": []}).count() --> should still return 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants