Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YELP raw_train.csv file no longer available on Google Drive, please provide alternate source #38

Open
richlysakowski opened this issue Nov 13, 2022 · 2 comments

Comments

@richlysakowski
Copy link

raw_train.csv

https://drive.google.com/open?id=1xeUnqkhuzGGzZKThzPeXe2Vf6Uu_g_xM gives a 404 error

Please provide update link to exact dataset used in the book, or to an entirely new set of yelp CSV-formatted datasets (train, test, and reviews_with_splits_lite)

@ajhergenroeder
Copy link

@richlysakowski -- I had the same problem. I think this one on Yelp is identical -- that's what I'm going to use.
https://www.kaggle.com/datasets/ilhamfp31/yelp-review-dataset

@photomz
Copy link

photomz commented Jun 11, 2023

@richlysakowski
Here's what worked for me running on Jupyter notebook (Google Colab, June 2023).
First, have ~/.kaggle/kaggle.json with 600 permissions.

from pathlib import Path

creds = 'your JSON credentials from Kaggle.com'
cred_path = Path('~/.kaggle/kaggle.json').expanduser()
if not cred_path.exists():
    cred_path.parent.mkdir(exist_ok=True)
    cred_path.write_text(creds)
    cred_path.chmod(0o600)

Then, download directly from Kaggle API:

from kaggle.api.kaggle_api_extended import KaggleApi

api = KaggleApi()
api.authenticate()

dataset_slug = 'ilhamfp31/yelp-review-dataset'
api.dataset_download_files(dataset_slug, unzip=True)

You may have to rename a few files and folders:

mkdir data
mkdir data/yelp
mv yelp_review_polarity_csv/* data/yelp/
mv data/yelp/test.csv data/yelp/raw_test.csv
mv data/yelp/train.csv data/yelp/raw_train.csv
rm -r yelp_review_polarity_csv/

You should be able to run the rest of the Yelp notebooks as per normal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants