# Create Dataset using API

In this notebook, we'll go across the steps to access a website's information using its API.

We'll use the Twitch API provided by Free Code Camp https://wind-bow.glitch.me/, which makes the API use very easy and simple, to access a list of channels and its information. 


## Import Libraries
We will use three major libraries for this project. Numpy which makes array handling very efficient and quick, Pandas to handle the datsets, and requests to get the data from an API

In [1]:
# Import necessary libraries
import pandas as pd
import requests
import json

## Understand the API
Let's first understand what data is available through the API. We understand this with the help of freecodecamp channel. Later on we will use all channel 

In [2]:
url = "https://wind-bow.glitch.me/twitch-api/channels/freecodecamp"
JSONContent = requests.get(url).json()
content = json.dumps(JSONContent, indent = 4,sort_keys=True)
print(content)

{
    "_id": 79776140,
    "_links": {
        "chat": "https://api.twitch.tv/kraken/chat/freecodecamp",
        "commercial": "https://api.twitch.tv/kraken/channels/freecodecamp/commercial",
        "editors": "https://api.twitch.tv/kraken/channels/freecodecamp/editors",
        "follows": "https://api.twitch.tv/kraken/channels/freecodecamp/follows",
        "self": "https://api.twitch.tv/kraken/channels/freecodecamp",
        "stream_key": "https://api.twitch.tv/kraken/channels/freecodecamp/stream_key",
        "subscriptions": "https://api.twitch.tv/kraken/channels/freecodecamp/subscriptions",
        "teams": "https://api.twitch.tv/kraken/channels/freecodecamp/teams",
        "videos": "https://api.twitch.tv/kraken/channels/freecodecamp/videos"
    },
    "background": null,
    "banner": null,
    "broadcaster_language": "en",
    "created_at": "2015-01-14T03:36:47Z",
    "delay": null,
    "display_name": "FreeCodeCamp",
    "followers": 10122,
    "game": "Creative",
    "langua

We can see we have a lot of useful information here. We can use '_id', display_name', 'status', 'followers', and 'views' and compile this data together in a dataset.

## Create the dataset

In [5]:
# List of channels we want to access
channels = ["ESL_SC2", "OgamingSC2", "cretetion", "freecodecamp", "storbeck", "habathcx", "RobotCaleb", "noobs2ninjas",
            "ninja", "shroud", "Dakotaz", "esltv_cs", "pokimane", "tsm_bjergsen", "boxbox", "wtcn", "a_seagull",
           "kinggothalion", "amazhs", "jahrein", "thenadeshot", "sivhd", "kingrichard"]

channels_list = []
# For each channel, we access its information through its API
for channel in channels:
    JSONContent = requests.get("https://wind-bow.glitch.me/twitch-api/channels/" + channel).json()
    if 'error' not in JSONContent:
        channels_list.append([JSONContent['_id'], JSONContent['display_name'], JSONContent['status'],
                             JSONContent['followers'], JSONContent['views']])

We have all the information inside a list and now we can create a dataset out of it.

In [6]:
dataset = pd.DataFrame(channels_list)
dataset.sample(5)

Unnamed: 0,0,1,2,3,4
5,6726509,Habathcx,Massively Effective,14,764
2,90401618,cretetion,It's a Divison kind of Day,908,11631
1,71852806,OgamingSC2,UnderDogs - Rediffusion - Qualifier.,40895,20694507
0,30220059,ESL_SC2,RERUN: StarCraft 2 - Terminator vs. Parting (P...,135394,60991791
3,79776140,FreeCodeCamp,Greg working on Electron-Vue boilerplate w/ Ak...,10122,163747


Our dataset is ready but we can see a couple of problems here. Firstly, the headings are not representative of the data the column contains and secondly, there may be places which are empty.

In [5]:
dataset.columns = ['Id', 'Name', 'Status', 'Followers', 'Views']
dataset.dropna(axis = 0, how = 'any', inplace = True)
dataset.index = pd.RangeIndex(len(dataset.index))
dataset

Unnamed: 0,Id,Name,Status,Followers,Views
0,30220059,ESL_SC2,RERUN: StarCraft 2 - Terminator vs. Parting (P...,135394,60991791
1,71852806,OgamingSC2,UnderDogs - Rediffusion - Qualifier.,40895,20694507
2,90401618,cretetion,It's a Divison kind of Day,908,11631
3,79776140,FreeCodeCamp,Greg working on Electron-Vue boilerplate w/ Ak...,10122,163747
4,6726509,Habathcx,Massively Effective,14,764
5,54925078,RobotCaleb,Code wrangling,20,4602
6,82534701,noobs2ninjas,Building a new hackintosh for #programming and...,835,48102


## Export the dataset
Our complete dataset is now ready to be exported into a .csv file

In [6]:
dataset.to_csv("Dataset.csv", index = False)

Finally, our dataset is now ready and is available as the file 'Dataset.csv'. We can use similar procedure to access information from any site that provides the option to access its information through its API.