# Downloading the dataset
Our goal with this notebook is to download our raw datasets directly from Kaggle and unzip the files in the expected directory. Other libraries and notebooks will use this raw data in order to preprocess the data, apply the feature engineering steps etc.

#### Steps:
- [x] Setup Kaggle command line client
- [x] Download dataset from Kaggle
- [x] Decompress the files into the expected directories

In [1]:
# First we must mount google drive 
import os.path
from google.colab import drive
GDRIVE_BASE_PATH = '/content/gdrive'
drive.mount(GDRIVE_BASE_PATH, force_remount=True)

# Load the project from GitHub and adjust our `HOME_DIR`
HOME_DIR = f'{GDRIVE_BASE_PATH}/My Drive/tappy_parkinsons'
if os.path.isdir(os.path.expanduser(HOME_DIR)):
  ! git clone https://github.com/sweeloke/tappy_parkinsons.git '$HOME_DIR'

# Going to the home directory and loading the project setup
% cd '$HOME_DIR'
! git fetch origin && git reset --hard origin/master
from util.project_setup import ProjectSetup

Mounted at /content/gdrive
Cloning into '/content/gdrive/My Drive/tappy_parkinsons'...
remote: Enumerating objects: 94, done.[K
remote: Counting objects: 100% (94/94), done.[K
remote: Compressing objects: 100% (84/84), done.[K
remote: Total 94 (delta 38), reused 22 (delta 4), pack-reused 0
Unpacking objects: 100% (94/94), done.
/content/gdrive/My Drive/tappy_parkinsons


In [3]:
# Now we setup the kaggle token so we can perform the download directly from them
from google.colab import files
import os.path

if not os.path.isfile(os.path.expanduser('~/.kaggle/kaggle.json')):
  print('Could not find ~/.kaggle/kaggle.json to download the dataset. Please do the following steps:')
  print('  1. Go to your account on kaggle')
  print('  2. Scroll to API section and click on `Expire API Token` to remove previous tokens')
  print('  3. Click on `Create New API Token`. It will download `kaggle.json` file on your machine')
  print('  4. Upload you kaggle.json in the box bellow')

  % cd /content
  files.upload()

  ! mkdir -p ~/.kaggle
  ! cp kaggle.json ~/.kaggle/
  ! chmod 600 ~/.kaggle/kaggle.json
  ! pip install -q kaggle

# This should output our desired dataset
! kaggle datasets list -s tappy

Could not find ~/.kaggle/kaggle.json to download the dataset. Please do the following steps:
  1. Go to your account on kaggle
  2. Scroll to API section and click on `Expire API Token` to remove previous tokens
  3. Click on `Create New API Token`. It will download `kaggle.json` file on your machine
  4. Upload you kaggle.json in the box bellow
/content


Saving kaggle.json to kaggle.json
ref                                                     title                                           size  lastUpdated          downloadCount  
------------------------------------------------------  ----------------------------------------------  ----  -------------------  -------------  
valkling/tappy-keystroke-data-with-parkinsons-patients  Tappy Keystroke Data with Parkinson's Patients  85MB  2018-02-04 05:41:47            731  


In [4]:
# Now we download and unzip the datasets in our google drive
% mkdir -p '$ProjectSetup.raw_downloaded_dir'
% cd '$ProjectSetup.raw_downloaded_dir'

! kaggle datasets download --unzip -o valkling/tappy-keystroke-data-with-parkinsons-patients
! unzip -o -q '*.zip'
! rm *.zip

/content/gdrive/My Drive/tappy_parkinsons/data/raw_downloaded
Downloading tappy-keystroke-data-with-parkinsons-patients.zip to /content/gdrive/My Drive/tappy_parkinsons/data/raw_downloaded
 91% 77.0M/85.0M [00:01<00:00, 65.8MB/s]
100% 85.0M/85.0M [00:01<00:00, 73.2MB/s]

2 archives were successfully processed.
