# Data Retrieval (Kaggle Version)

For the sake of completeness, let us briefly outline how the data might be downloaded (but as csv file) from [kaggle](https://www.kaggle.com). To this end, kaggle API credentials are required (such credentials can be generated at <https://www.kaggle.com/settings>). Should no credentials be supplied, the kaggle API shall raise an error on loading.

It should be noted that the datasets are not (necessarily) identical as it contains possibly contaminations deliberatively injected by the instructors for exercise purposes.

The following functionality is requisite:

In [None]:
import os
from kaggle.api.kaggle_api_extended import KaggleApi
from dotenv import load_dotenv

from ipynb_utils import CFG

We shall now specify the precise dataset to be downloaded, as well as the intended destination for its storage:

In [None]:
# Kaggle identifier of the dataset.
DATASET_SLUG = "uciml/pima-indians-diabetes-database"

# Filename assigned by kaggle on download.
TMP_NAME = "diabetes.csv"

# Desired filename for local use.
TARGET_NAME = "df_kaggle.csv"

We now proceed to load the Kaggle API credentials, which are required to reside within the .env file:

In [None]:
load_dotenv()

os.environ["KAGGLE_USERNAME"] = os.getenv("KAGGLE_USERNAME")
os.environ["KAGGLE_KEY"] = os.getenv("KAGGLE_KEY")

Finally, we may now download the dataset from kaggle:

In [None]:
DATA_DIR = CFG["DATA_DIR"]

api = KaggleApi()
api.authenticate()

api.dataset_download_files(DATASET_SLUG, path=DATA_DIR, unzip=True)

tmp_path = os.path.join(DATA_DIR, TMP_NAME)
target_path = os.path.join(DATA_DIR, TARGET_NAME)

os.rename(tmp_path, target_path)