Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to fix Yelp missing or mismatched business ids in evaluation data? #1

Open
Stone-1024 opened this issue Nov 17, 2021 · 4 comments

Comments

@Stone-1024
Copy link

Hey! I'm testing your model and getting into trouble with yelp evaluation data. Some business ids are missing in the business.json.
How can I repair the yelp summaries_0-200_cleaned.csv?

@jinbae
Copy link
Collaborator

jinbae commented Nov 17, 2021

Hi! I downloaded the original yelp summaries_0-200_cleand.csv from https://github.com/sosuperic/MeanSum. I found some missing or mismatched business ids, and fixed them using reviews.json. (156th business id is NULL because I couldn't find any corresponding reviews from reviews.json) Please check reviews.json!

@Stone-1024
Copy link
Author

Hi! I downloaded the original yelp summaries_0-200_cleand.csv from https://github.com/sosuperic/MeanSum. I found some missing or mismatched business ids, and fixed them using reviews.json. (156th business id is NULL because I couldn't find any corresponding reviews from reviews.json) Please check reviews.json!

Thanks for your reply! I do check reviews.json, but Input.original_reviews in summaries_0-200_cleand.csv doesn't match it. Then I read yelp dataset user agreement and find 'Last Updated: February 16, 2021'. Is it the same as yours?

@jinbae
Copy link
Collaborator

jinbae commented Nov 18, 2021

Hi! I downloaded the original yelp summaries_0-200_cleand.csv from https://github.com/sosuperic/MeanSum. I found some missing or mismatched business ids, and fixed them using reviews.json. (156th business id is NULL because I couldn't find any corresponding reviews from reviews.json) Please check reviews.json!

Thanks for your reply! I do check reviews.json, but Input.original_reviews in summaries_0-200_cleand.csv doesn't match it. Then I read yelp dataset user agreement and find 'Last Updated: February 16, 2021'. Is it the same as yours?

It seems to be the cause of the problem. I used the previous version of yelp dataset (last updated: February 21, 2020). The evaluation data seem to have been created with that version.

@Stone-1024
Copy link
Author

Hi! I downloaded the original yelp summaries_0-200_cleand.csv from https://github.com/sosuperic/MeanSum. I found some missing or mismatched business ids, and fixed them using reviews.json. (156th business id is NULL because I couldn't find any corresponding reviews from reviews.json) Please check reviews.json!

Thanks for your reply! I do check reviews.json, but Input.original_reviews in summaries_0-200_cleand.csv doesn't match it. Then I read yelp dataset user agreement and find 'Last Updated: February 16, 2021'. Is it the same as yours?

It seems to be the cause of the problem. I used the previous version of yelp dataset (last updated: February 21, 2020). The evaluation data seem to have been created with that version.

I see. Truly grateful for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants