# 📊 **SageMaker Data Ingestion using Kaggle**
---

## Background 
This notebook demonstrates how you can ingest data into SageMaker using Kaggle API. 
[Kaggle](https://www.kaggle.com/) is an online community platform that has numerous datasets and ML challenges for data scientists and machine learning enthusiasts

---


## **Setup** 

### Import Python Packages

In [14]:
import pandas as pd
import time

In [7]:
!pip install --q kaggle 

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.2.2[0m[39;49m -> [0m[32;49m22.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


To use Kaggle API, you must have an account and an API token. You can register for a Kaggle account [here](https://www.kaggle.com/docs/api) and generate your API token, it is completely free. 

In [8]:
!touch ~/.kaggle/kaggle.json # Creates json file to store Kaggle API Credentials
kaggle_api_token = {"username":"<username>","key":"<api_key>"}  # Insert your own username and API Key here

In [9]:
import json 

# Writes API Credentials to Kaggle file
with open('/root/.kaggle/kaggle.json', 'w') as file: 
    json.dump(kaggle_api_token,file)
    

For security reasons, we must ensure that other users of our computer do not have read access to our Kaggle credentials.

In [10]:
!chmod 600 ~/.kaggle/kaggle.json

Since our access token is now confiurged we can list and download the available datasets. This might take some time depending on your network connection.

In [16]:
%%time

!kaggle datasets list # List available datasets
!kaggle datasets download -d iamsouravbanerjee/game-of-thrones-dataset --unzip # Downloads & Unzip dataset

ref                                                             title                                       size  lastUpdated          downloadCount  voteCount  usabilityRating  
--------------------------------------------------------------  -----------------------------------------  -----  -------------------  -------------  ---------  ---------------  
akshaydattatraykhare/diabetes-dataset                           Diabetes Dataset                             9KB  2022-10-06 08:55:25          12997        409  1.0              
whenamancodes/covid-19-coronavirus-pandemic-dataset             COVID -19 Coronavirus Pandemic Dataset      11KB  2022-09-30 04:05:11          10257        321  1.0              
thedevastator/240000-household-electricity-consumption-records  Household Electricity Consumption            3MB  2022-10-24 01:22:40            869         27  1.0              
akshaydattatraykhare/data-for-admission-in-the-university       Data for Admission in the University     

Now that the dataset is downloaded, let us visualize what the cvs file looks like. We will use pandas to load and display the data. 

In [18]:
data = pd.read_csv("Game_of_Thrones.csv", header=0)
df = data.copy()
df.head() 


Unnamed: 0,Season,No. of Episode (Season),No. of Episode (Overall),Title of the Episode,Running Time (Minutes),Directed by,Written by,Original Air Date,U.S. Viewers (Millions),Music by,Cinematography by,Editing by,IMDb Rating,Rotten Tomatoes Rating (Percentage),Metacritic Ratings,Ordered,Filming Duration,Novel(s) Adapted,Synopsis
0,1,1,1,Winter Is Coming,61,Tim Van Patten,"David Benioff, D. B. Weiss",17-Apr-2011,2.22,Ramin Djawadi,Alik Sakharov,Oral Norrie Ottey,8.9,100,9.1,"March 2, 2010",Second half of 2010,A Game of Thrones,"North of the Seven Kingdoms of Westeros, Night..."
1,1,2,2,The Kingsroad,55,Tim Van Patten,"David Benioff, D. B. Weiss",24-Apr-2011,2.2,Ramin Djawadi,Alik Sakharov,Oral Norrie Ottey,8.6,100,8.9,"March 2, 2010",Second half of 2010,A Game of Thrones,"Ned, the new Hand of the King, travels to King..."
2,1,3,3,Lord Snow,57,Brian Kirk,"David Benioff, D. B. Weiss",1-May-2011,2.44,Ramin Djawadi,Marco Pontecorvo,Frances Parker,8.5,81,8.7,"March 2, 2010",Second half of 2010,A Game of Thrones,Ned attends the King's Small Council and learn...
3,1,4,4,"Cripples, Bastards, and Broken Things",55,Brian Kirk,Bryan Cogman,8-May-2011,2.45,Ramin Djawadi,Marco Pontecorvo,Frances Parker,8.6,100,9.1,"March 2, 2010",Second half of 2010,A Game of Thrones,"While returning to King's Landing, Tyrion stop..."
4,1,5,5,The Wolf and the Lion,54,Brian Kirk,"David Benioff, D. B. Weiss",15-May-2011,2.58,Ramin Djawadi,Marco Pontecorvo,Frances Parker,9.0,95,9.0,"March 2, 2010",Second half of 2010,A Game of Thrones,"King Robert's eunuch spy, Varys, has uncovered..."
