News categorization.
-
Install pyenv:
curl https://pyenv.run | bash
-
Install and activate the Python version:
pyenv install
-
Install Poetry.
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python source $HOME/.poetry/env
-
Install dependencies
poetry install poetry env info # Show virtualenv information
-
Install the pre-commit Git hooks:
poetry run pre-commit install
-
Configure Google Drive Remote in DVC using Google Service Account credentials:
# Download and copy credentials file to .dvc/tmp/gdrive-user-credentials.json poetry run dvc remote modify gdrive-remote \ --local gdrive_service_account_json_file_path .dvc/tmp/gdrive-user-credentials.json
# Download data from the DVC remote
poetry run dvc pull data/input/raw/agnews.zip
# Prepare the raw data
poetry run python -m app.data.prep_agnews \
--download-url https://corise-mlops.s3.us-west-2.amazonaws.com/project1/agnews.zip \
--output-dir data/input