## Download TACO data

### Option 1: Download directly

You can download the raw data used in this analysis directly from the [TACO project](https://github.com/pedropro/TACO) repository. In our notebooks, we will be using a version of the dataset uploaded to Kaggle at the end of 2019.

Kaggle is a platform for data science enthusiasts and professionals to compete, collaborate, and learn through machine learning challenges and datasets. We suggest that you may manually download the files from this [TACO Kaggle repository](https://www.kaggle.com/datasets/kneroma/tacotrashdataset) by clicking on the **Download** button.

After downloading the files, move them to the `data/TACO` folder to mimic the structure below.

```plaintext
project_root/
│
├── data/
│   ├── TACO/
│   │   ├── data/
│   │   │   ├── batch1/
│   │   │   ├── batch2/
│   │   │   ├── batch3/
│   │   │   ├── ...
│   │   │   └── batch15/
│   │   └── annotations.json
│
├── README.md
├── environment.yml
└── ...
```

### Option 2: Download using API

If you wish to download the data using the Kaggle API, follow the steps below.

#### Step 1: Create Kaggle API token

1. If you don't have one, create a [Kaggle](https://www.kaggle.com) account.
2. Visit your Kaggle [account settings](https://www.kaggle.com/account/login?phase=startRegisterTab&returnUrl=%2F).
2. Click on **Create New API Token**.
3. Save the downloaded `kaggle.json` file containing your username and API key, and move this `kaggle.json` file to the `data` folder of this repository.

```plaintext
project_root/
│
├── data/
│   └──── kaggle.json
│
├── README.md
├── environment.yml
└── ...
```

In [1]:
import os
import pandas as pd

# Retrieve directories where we will save the data
parent_directory = os.path.abspath('../')
data_directory = os.path.join(parent_directory,'data')

# Store the API credentials
file_path = os.path.join(data_directory, 'kaggle.json')
kaggle_credentials = pd.read_json(file_path, typ='series')
username = kaggle_credentials["username"]
api_key = kaggle_credentials["key"]

#### Step 2: Save token file in local user folder

The token needs to be saved in a folder called `.kaggle` in your user directory (`username/.kaggle`). This folder will be hidden (notice that the folder name starts with a `.`).

In [2]:
# Save the credentials .json locally in a local folder, e.g. username/.kaggle/
token_dir = os.path.expanduser("~/.kaggle")  # Path to Kaggle config directory
os.makedirs(token_dir, exist_ok=True)  # Create directory if it doesn't exist

# Write API credentials to kaggle.json file
with open(os.path.join(token_dir, "kaggle.json"), "w") as file:
    file.write(f'{{"username": "{username}", "key": "{api_key}"}}')

#### Step 3: Use Kaggle API to download files

Running the cell below will download the images and the annotation file (which includes the image metadata) into the following folder: `data/TACO`. The size of the dataset is 2.8 GB, so please make sure you have this amount of storage space available locally.

In [3]:
# Set to True if you wish to download the data, which can be time consuming
download_files = False

In [4]:
from kaggle.api.kaggle_api_extended import KaggleApi

# Initialize Kaggle API
api = KaggleApi()

# Authenticate with your Kaggle credentials
api.authenticate()

# This is the online location of the dataset
dataset_name = 'kneroma/tacotrashdataset'

# Download the dataset if download_files is set to True
if download_files:
    api.dataset_download_files(dataset_name, path=os.path.join(data_directory,'TACO'), unzip=True)

