# Get RAW data

## Modules

In [1]:
import requests
import json

We will use the "Wikipedia Table API" to extract data from Wikipedia web page where all the episodes of The Simpsons are listed.

Reference:
- https://www.wikitable2json.com/
- https://github.com/atye/wikitable2json/

Wikipedia pages
- https://en.wikipedia.org/wiki/List_of_The_Simpsons_episodes_(seasons_1%E2%80%9320)
- https://en.wikipedia.org/wiki/List_of_The_Simpsons_episodes_(season_21%E2%80%93present)

Wikipedia store information of episodes of each serie in one table. We have to exclude the first table of each page because it is a sort of index.

Consider that the API use 0-index enumeration of the table, so we have to use table=1 parameter to get the second table and so on.

https://www.wikitable2json.com/api/List_of_The_Simpsons_episodes_(seasons_1%E2%80%9320)?table=1&&keyRows=1&lang=en


In [2]:
url1='https://www.wikitable2json.com/api/List_of_The_Simpsons_episodes_(seasons_1%E2%80%9320)'
url2='https://www.wikitable2json.com/api/List_of_The_Simpsons_episodes_(season_21%E2%80%93present)'
payload = {'Rows': 1, 'lang': 'en', 'table': 1}
print("Parameters: ",payload)

Parameters:  {'Rows': 1, 'lang': 'en', 'table': 1}


In [3]:
destination_folder='01.Raw data'
destination_filename_prefix='Episodes_RAW_S'

## Download of episodes

### Part one

In [4]:
for x in range(1, 20):    
    # Update parameters
    payload.update({"table": x})
    print(payload)
    
    # Calculate destination file path
    destination_filepath=destination_folder+'/'+destination_filename_prefix+str(x).zfill(2)+'.json'
    
    # Request of API response
    resp = requests.get(url1, payload)
    print("Season %d: %d" % (x,resp.status_code))
    
    # Read response and transform it to JSON
    data = resp.json()
    
    # Write to file
    with open(destination_filepath, 'w', encoding='utf-8') as f:
        json.dump(data, f, ensure_ascii=False, indent=4)

{'Rows': 1, 'lang': 'en', 'table': 1}
Season 1: 200
{'Rows': 1, 'lang': 'en', 'table': 2}
Season 2: 200
{'Rows': 1, 'lang': 'en', 'table': 3}
Season 3: 200
{'Rows': 1, 'lang': 'en', 'table': 4}
Season 4: 200
{'Rows': 1, 'lang': 'en', 'table': 5}
Season 5: 200
{'Rows': 1, 'lang': 'en', 'table': 6}
Season 6: 200
{'Rows': 1, 'lang': 'en', 'table': 7}
Season 7: 200
{'Rows': 1, 'lang': 'en', 'table': 8}
Season 8: 200
{'Rows': 1, 'lang': 'en', 'table': 9}
Season 9: 200
{'Rows': 1, 'lang': 'en', 'table': 10}
Season 10: 200
{'Rows': 1, 'lang': 'en', 'table': 11}
Season 11: 200
{'Rows': 1, 'lang': 'en', 'table': 12}
Season 12: 200
{'Rows': 1, 'lang': 'en', 'table': 13}
Season 13: 200
{'Rows': 1, 'lang': 'en', 'table': 14}
Season 14: 200
{'Rows': 1, 'lang': 'en', 'table': 15}
Season 15: 200
{'Rows': 1, 'lang': 'en', 'table': 16}
Season 16: 200
{'Rows': 1, 'lang': 'en', 'table': 17}
Season 17: 200
{'Rows': 1, 'lang': 'en', 'table': 18}
Season 18: 200
{'Rows': 1, 'lang': 'en', 'table': 19}
Season 

### Part two

In [5]:
# Starting season correction
n = 21

for x in [1,3,4,5,6,8,9,10,11,12,13,14,15,16]:
    # Update parameters
    payload.update({"table": x})
    print(payload)
    
    # Calculate destination file path
    destination_filepath=destination_folder+'/'+destination_filename_prefix+str(n).zfill(2)+'.json'
    
    # Request of API response
    resp = requests.get(url2, payload)
    print("Season %d: %d" % (n,resp.status_code))
    
    # Read response and transform it to JSON
    data = resp.json()
    
    # Write to file
    with open(destination_filepath, 'w', encoding='utf-8') as f:
        json.dump(data, f, ensure_ascii=False, indent=4)
        
    # Increment season counter
    n=n+1
        
        

{'Rows': 1, 'lang': 'en', 'table': 1}
Season 21: 200
{'Rows': 1, 'lang': 'en', 'table': 3}
Season 22: 200
{'Rows': 1, 'lang': 'en', 'table': 4}
Season 23: 200
{'Rows': 1, 'lang': 'en', 'table': 5}
Season 24: 200
{'Rows': 1, 'lang': 'en', 'table': 6}
Season 25: 200
{'Rows': 1, 'lang': 'en', 'table': 8}
Season 26: 200
{'Rows': 1, 'lang': 'en', 'table': 9}
Season 27: 200
{'Rows': 1, 'lang': 'en', 'table': 10}
Season 28: 200
{'Rows': 1, 'lang': 'en', 'table': 11}
Season 29: 200
{'Rows': 1, 'lang': 'en', 'table': 12}
Season 30: 200
{'Rows': 1, 'lang': 'en', 'table': 13}
Season 31: 200
{'Rows': 1, 'lang': 'en', 'table': 14}
Season 32: 200
{'Rows': 1, 'lang': 'en', 'table': 15}
Season 33: 200
{'Rows': 1, 'lang': 'en', 'table': 16}
Season 34: 200
