# Data Import - Working with Web APIs and JSON

### Importing Data from JSON Files

In [1]:
import pandas as pd 
import json 

We can transform the data that is stored in JSON file into a Python object.

In [2]:
with open("blockbusters.json") as f:
    data = json.load(f)

In [6]:
type(data)

list

In [7]:
len(data)

18

Let's check the first element of the data.

In [8]:
data[0]

{'title': 'Avengers: Endgame',
 'id': 299534,
 'revenue': 2797800564,
 'genres': [{'id': 12, 'name': 'Adventure'},
  {'id': 878, 'name': 'Science Fiction'},
  {'id': 28, 'name': 'Action'}],
 'belongs_to_collection': {'id': 86311,
  'name': 'The Avengers Collection',
  'poster_path': '/yFSIUVTCvgYrpalUktulvk3Gi5Y.jpg',
  'backdrop_path': '/zuW6fOiusv4X9nnW3paHGfXcSll.jpg'},
 'runtime': 181}

We should convert the data to dataframe.

In [9]:
df = pd.DataFrame(data)
df

Unnamed: 0,title,id,revenue,genres,belongs_to_collection,runtime
0,Avengers: Endgame,299534,2797800564,"[{'id': 12, 'name': 'Adventure'}, {'id': 878, ...","{'id': 86311, 'name': 'The Avengers Collection...",181
1,Avatar,19995,2787965087,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...","{'id': 87096, 'name': 'Avatar Collection', 'po...",162
2,Star Wars: The Force Awakens,140607,2068223624,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...","{'id': 10, 'name': 'Star Wars Collection', 'po...",136
3,Avengers: Infinity War,299536,2046239637,"[{'id': 12, 'name': 'Adventure'}, {'id': 28, '...","{'id': 86311, 'name': 'The Avengers Collection...",149
4,Titanic,597,1845034188,"[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'n...",,194
5,Jurassic World,135397,1671713208,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...","{'id': 328, 'name': 'Jurassic Park Collection'...",124
6,The Lion King,420818,1656943394,"[{'id': 12, 'name': 'Adventure'}, {'id': 10751...",,118
7,The Avengers,24428,1519557910,"[{'id': 878, 'name': 'Science Fiction'}, {'id'...","{'id': 86311, 'name': 'The Avengers Collection...",143
8,Furious 7,168259,1506249360,"[{'id': 28, 'name': 'Action'}, {'id': 53, 'nam...","{'id': 9485, 'name': 'The Fast and the Furious...",137
9,Avengers: Age of Ultron,99861,1405403694,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...","{'id': 86311, 'name': 'The Avengers Collection...",141


We can see the data frame still includes nested information. Before flattening the data, let's load it by using a more straightforward way.

In [13]:
df = pd.read_json("blockbusters.json")
df

Unnamed: 0,title,id,revenue,genres,belongs_to_collection,runtime
0,Avengers: Endgame,299534,2797800564,"[{'id': 12, 'name': 'Adventure'}, {'id': 878, ...","{'id': 86311, 'name': 'The Avengers Collection...",181
1,Avatar,19995,2787965087,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...","{'id': 87096, 'name': 'Avatar Collection', 'po...",162
2,Star Wars: The Force Awakens,140607,2068223624,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...","{'id': 10, 'name': 'Star Wars Collection', 'po...",136
3,Avengers: Infinity War,299536,2046239637,"[{'id': 12, 'name': 'Adventure'}, {'id': 28, '...","{'id': 86311, 'name': 'The Avengers Collection...",149
4,Titanic,597,1845034188,"[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'n...",,194
5,Jurassic World,135397,1671713208,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...","{'id': 328, 'name': 'Jurassic Park Collection'...",124
6,The Lion King,420818,1656943394,"[{'id': 12, 'name': 'Adventure'}, {'id': 10751...",,118
7,The Avengers,24428,1519557910,"[{'id': 878, 'name': 'Science Fiction'}, {'id'...","{'id': 86311, 'name': 'The Avengers Collection...",143
8,Furious 7,168259,1506249360,"[{'id': 28, 'name': 'Action'}, {'id': 53, 'nam...","{'id': 9485, 'name': 'The Fast and the Furious...",137
9,Avengers: Age of Ultron,99861,1405403694,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...","{'id': 86311, 'name': 'The Avengers Collection...",141


Normalize belongs_to_collection column

In [14]:
pd.json_normalize(data = data, sep = "_")

Unnamed: 0,title,id,revenue,genres,runtime,belongs_to_collection_id,belongs_to_collection_name,belongs_to_collection_poster_path,belongs_to_collection_backdrop_path,belongs_to_collection
0,Avengers: Endgame,299534,2797800564,"[{'id': 12, 'name': 'Adventure'}, {'id': 878, ...",181,86311.0,The Avengers Collection,/yFSIUVTCvgYrpalUktulvk3Gi5Y.jpg,/zuW6fOiusv4X9nnW3paHGfXcSll.jpg,
1,Avatar,19995,2787965087,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...",162,87096.0,Avatar Collection,/nslJVsO58Etqkk17oXMuVK4gNOF.jpg,/8nCr9W7sKus2q9PLbYsnT7iCkuT.jpg,
2,Star Wars: The Force Awakens,140607,2068223624,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...",136,10.0,Star Wars Collection,/iTQHKziZy9pAAY4hHEDCGPaOvFC.jpg,/d8duYyyC9J5T825Hg7grmaabfxQ.jpg,
3,Avengers: Infinity War,299536,2046239637,"[{'id': 12, 'name': 'Adventure'}, {'id': 28, '...",149,86311.0,The Avengers Collection,/yFSIUVTCvgYrpalUktulvk3Gi5Y.jpg,/zuW6fOiusv4X9nnW3paHGfXcSll.jpg,
4,Titanic,597,1845034188,"[{'id': 18, 'name': 'Drama'}, {'id': 10749, 'n...",194,,,,,
5,Jurassic World,135397,1671713208,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...",124,328.0,Jurassic Park Collection,/qIm2nHXLpBBdMxi8dvfrnDkBUDh.jpg,/pJjIH9QN0OkHFV9eue6XfRVnPkr.jpg,
6,The Lion King,420818,1656943394,"[{'id': 12, 'name': 'Adventure'}, {'id': 10751...",118,,,,,
7,The Avengers,24428,1519557910,"[{'id': 878, 'name': 'Science Fiction'}, {'id'...",143,86311.0,The Avengers Collection,/yFSIUVTCvgYrpalUktulvk3Gi5Y.jpg,/zuW6fOiusv4X9nnW3paHGfXcSll.jpg,
8,Furious 7,168259,1506249360,"[{'id': 28, 'name': 'Action'}, {'id': 53, 'nam...",137,9485.0,The Fast and the Furious Collection,/uv63yAGg1zETAs1XQsOQpava87l.jpg,/z5A5W3WYJc3UVEWljSGwdjDgQ0j.jpg,
9,Avengers: Age of Ultron,99861,1405403694,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...",141,86311.0,The Avengers Collection,/yFSIUVTCvgYrpalUktulvk3Gi5Y.jpg,/zuW6fOiusv4X9nnW3paHGfXcSll.jpg,


We should flatten the genres column and attach the dataframe with the meta data.

In [17]:
pd.json_normalize(data = data, record_path = "genres", meta = ["title", "id"],
                  record_prefix = "genre_")

Unnamed: 0,genre_id,genre_name,title,id
0,12,Adventure,Avengers: Endgame,299534
1,878,Science Fiction,Avengers: Endgame,299534
2,28,Action,Avengers: Endgame,299534
3,28,Action,Avatar,19995
4,12,Adventure,Avatar,19995
5,14,Fantasy,Avatar,19995
6,878,Science Fiction,Avatar,19995
7,28,Action,Star Wars: The Force Awakens,140607
8,12,Adventure,Star Wars: The Force Awakens,140607
9,878,Science Fiction,Star Wars: The Force Awakens,140607


## Working with APIs and JSON 

Let's import the proper library for the requests

In [23]:
import requests
pd.options.display.max_columns = 30

You should use your own API key for the authentication 

In [27]:
api_key  = "85e49e8973f1dedd4876f20043342a39"

We want to pull star wars data so the proper id is in the below cell.

In [28]:
movie_id = 140607

In [34]:
movie_api = "https://api.themoviedb.org/3/movie/{}?api_key="
movie_api

'https://api.themoviedb.org/3/movie/{}?api_key='

Let's set the proper url for the http request

In [35]:
url = movie_api.format(movie_id) + api_key
url

'https://api.themoviedb.org/3/movie/140607?api_key=85e49e8973f1dedd4876f20043342a39'

In [36]:
r = requests.get(url)
r

<Response [200]>

Let's transform the request object in the json file format.

In [37]:
data = r.json()

In [38]:
data

{'adult': False,
 'backdrop_path': '/8BTsTfln4jlQrLXUBquXJ0ASQy9.jpg',
 'belongs_to_collection': {'id': 10,
  'name': 'Star Wars Collection',
  'poster_path': '/r8Ph5MYXL04Qzu4QBbq2KjqwtkQ.jpg',
  'backdrop_path': '/d8duYyyC9J5T825Hg7grmaabfxQ.jpg'},
 'budget': 245000000,
 'genres': [{'id': 12, 'name': 'Adventure'},
  {'id': 28, 'name': 'Action'},
  {'id': 878, 'name': 'Science Fiction'}],
 'homepage': 'http://www.starwars.com/films/star-wars-episode-vii',
 'id': 140607,
 'imdb_id': 'tt2488496',
 'original_language': 'en',
 'original_title': 'Star Wars: The Force Awakens',
 'overview': 'Thirty years after defeating the Galactic Empire, Han Solo and his allies face a new threat from the evil Kylo Ren and his army of Stormtroopers.',
 'popularity': 52.573,
 'poster_path': '/wqnLdwVXoBjKibFRR5U3y0aDUhs.jpg',
 'production_companies': [{'id': 1,
   'logo_path': '/o86DbpburjxrqAzEDhXZcyE8pDb.png',
   'name': 'Lucasfilm Ltd.',
   'origin_country': 'US'},
  {'id': 11461,
   'logo_path': '/p9Fo

In [39]:
type(data)

dict

For the further investigation we should change the format as pandas dataframe

In [40]:
df = pd.Series(data).to_frame().T
df

Unnamed: 0,adult,backdrop_path,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,popularity,poster_path,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count
0,False,/8BTsTfln4jlQrLXUBquXJ0ASQy9.jpg,"{'id': 10, 'name': 'Star Wars Collection', 'po...",245000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 28, '...",http://www.starwars.com/films/star-wars-episod...,140607,tt2488496,en,Star Wars: The Force Awakens,Thirty years after defeating the Galactic Empi...,52.573,/wqnLdwVXoBjKibFRR5U3y0aDUhs.jpg,"[{'id': 1, 'logo_path': '/o86DbpburjxrqAzEDhXZ...","[{'iso_3166_1': 'US', 'name': 'United States o...",2015-12-15,2068223624,136,"[{'english_name': 'English', 'iso_639_1': 'en'...",Released,Every generation has a story.,Star Wars: The Force Awakens,False,7.295,18224


Lastly, we should flatten the data 

In [41]:
pd.json_normalize(data, sep = "_")

Unnamed: 0,adult,backdrop_path,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,popularity,poster_path,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count,belongs_to_collection_id,belongs_to_collection_name,belongs_to_collection_poster_path,belongs_to_collection_backdrop_path
0,False,/8BTsTfln4jlQrLXUBquXJ0ASQy9.jpg,245000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 28, '...",http://www.starwars.com/films/star-wars-episod...,140607,tt2488496,en,Star Wars: The Force Awakens,Thirty years after defeating the Galactic Empi...,52.573,/wqnLdwVXoBjKibFRR5U3y0aDUhs.jpg,"[{'id': 1, 'logo_path': '/o86DbpburjxrqAzEDhXZ...","[{'iso_3166_1': 'US', 'name': 'United States o...",2015-12-15,2068223624,136,"[{'english_name': 'English', 'iso_639_1': 'en'...",Released,Every generation has a story.,Star Wars: The Force Awakens,False,7.295,18224,10,Star Wars Collection,/r8Ph5MYXL04Qzu4QBbq2KjqwtkQ.jpg,/d8duYyyC9J5T825Hg7grmaabfxQ.jpg


In [42]:
pd.json_normalize(data = data, record_path = "genres", meta = "title")

Unnamed: 0,id,name,title
0,12,Adventure,Star Wars: The Force Awakens
1,28,Action,Star Wars: The Force Awakens
2,878,Science Fiction,Star Wars: The Force Awakens


In [43]:
pd.json_normalize(data = data, record_path = "production_companies", meta = "title")

Unnamed: 0,id,logo_path,name,origin_country,title
0,1,/o86DbpburjxrqAzEDhXZcyE8pDb.png,Lucasfilm Ltd.,US,Star Wars: The Force Awakens
1,11461,/p9FoEt5shEKRWRKVIlvFaEmRnun.png,Bad Robot,US,Star Wars: The Force Awakens


### Discover Module

In [50]:
discover_api = "https://api.themoviedb.org/3/discover/movie?api_key="

Let's pull the movies between the years 2020-01-01 to 2020-02-29

In [51]:
query = "&primary_release_date.gte=2020-01-01&primary_release_date.lte=2020-02-29"

let's create the url for the request

In [52]:
url = discover_api+api_key+query

In [53]:
data = requests.get(url).json()

In [54]:
data

{'page': 1,
 'results': [{'adult': False,
   'backdrop_path': '/stmYfCUGd8Iy6kAMBr6AmWqx8Bq.jpg',
   'genre_ids': [28, 878, 35, 10751],
   'id': 454626,
   'original_language': 'en',
   'original_title': 'Sonic the Hedgehog',
   'overview': 'Powered with incredible speed, Sonic The Hedgehog embraces his new home on Earth. That is, until Sonic sparks the attention of super-uncool evil genius Dr. Robotnik. Now it’s super-villain vs. super-sonic in an all-out race across the globe to stop Robotnik from using Sonic’s unique power for world domination.',
   'popularity': 81.297,
   'poster_path': '/aQvJ5WPzZgYVDrxLX4R6cLJCEaQ.jpg',
   'release_date': '2020-02-12',
   'title': 'Sonic the Hedgehog',
   'video': False,
   'vote_average': 7.4,
   'vote_count': 8817},
  {'adult': False,
   'backdrop_path': '/uozb2VeD87YmhoUP1RrGWfzuCrr.jpg',
   'genre_ids': [28, 80],
   'id': 495764,
   'original_language': 'en',
   'original_title': 'Birds of Prey (and the Fantabulous Emancipation of One Harley

In [55]:
pd.DataFrame(data)

Unnamed: 0,page,results,total_pages,total_results
0,1,"{'adult': False, 'backdrop_path': '/stmYfCUGd8...",277,5529
1,1,"{'adult': False, 'backdrop_path': '/uozb2VeD87...",277,5529
2,1,"{'adult': False, 'backdrop_path': '/yFRpUmsreY...",277,5529
3,1,"{'adult': False, 'backdrop_path': '/v7eMYrYpcT...",277,5529
4,1,"{'adult': False, 'backdrop_path': '/xFxk4vnirO...",277,5529
5,1,"{'adult': False, 'backdrop_path': '/sizHX5Vbwl...",277,5529
6,1,"{'adult': False, 'backdrop_path': '/4J1Vu6oGzt...",277,5529
7,1,"{'adult': False, 'backdrop_path': '/z8sNNjEXEp...",277,5529
8,1,"{'adult': False, 'backdrop_path': '/AgkGh4epJy...",277,5529
9,1,"{'adult': False, 'backdrop_path': '/isBjNHBblz...",277,5529


In [56]:
pd.DataFrame(data["results"])

Unnamed: 0,adult,backdrop_path,genre_ids,id,original_language,original_title,overview,popularity,poster_path,release_date,title,video,vote_average,vote_count
0,False,/stmYfCUGd8Iy6kAMBr6AmWqx8Bq.jpg,"[28, 878, 35, 10751]",454626,en,Sonic the Hedgehog,"Powered with incredible speed, Sonic The Hedge...",81.297,/aQvJ5WPzZgYVDrxLX4R6cLJCEaQ.jpg,2020-02-12,Sonic the Hedgehog,False,7.4,8817
1,False,/uozb2VeD87YmhoUP1RrGWfzuCrr.jpg,"[28, 80]",495764,en,Birds of Prey (and the Fantabulous Emancipatio...,"Harley Quinn joins forces with a singer, an as...",68.116,/h4VB6m0RwcicVEZvzftYZyKXs6K.jpg,2020-02-05,Birds of Prey (and the Fantabulous Emancipatio...,False,7.0,9555
2,False,/yFRpUmsreYO5Bc0HVBTsJsHIIox.jpg,"[12, 10751, 18]",481848,en,The Call of the Wild,Buck is a big-hearted dog whose blissful domes...,47.152,/33VdppGbeNxICrFUtW2WpGHvfYc.jpg,2020-02-19,The Call of the Wild,False,7.6,3338
3,False,/v7eMYrYpcTJORioMKMgjnzyewWH.jpg,[10749],679057,ko,가슴 큰 태희,Cha-wook and Min-joo are about to get married....,52.865,/aKoGgpKn6N04kWQp03LSAweN3dn.jpg,2020-02-27,Bosomy Tae-hee,False,6.5,4
4,False,/xFxk4vnirOtUxpOEWgA1MCRfy6J.jpg,"[10751, 16, 12, 35, 14]",508439,en,Onward,"In a suburban fantasy world, two teenage elf b...",44.437,/f4aul3FyD3jv3v4bul1IrkWZvzq.jpg,2020-02-29,Onward,False,7.7,5617
5,False,/sizHX5VbwlBihaathTQHVGk1jdi.jpg,"[878, 18, 28]",514207,ru,Вторжение,"Two years after the fall of the alien ship, th...",40.575,/dqKqzcdhtJwOhjqe89RTURqILtl.jpg,2020-01-01,Invasion,False,6.9,715
6,False,/4J1Vu6oGzt60fakP4delEPDqEhI.jpg,"[53, 28, 80]",38700,en,Bad Boys for Life,Marcus and Mike are forced to confront new thr...,49.903,/y95lQLnuNKdPAzw9F9Ab8kJ80c3.jpg,2020-01-15,Bad Boys for Life,False,7.1,7507
7,False,/z8sNNjEXEpZNQCHCuo3QH8kK00t.jpg,[10749],665142,ko,어린 이모 3,Seok-yeong has been living with his father eve...,34.892,/qD7kT9LysayisTiCrdFyhZWIuK1.jpg,2020-01-02,Young Aunt 3,False,5.3,6
8,False,/AgkGh4epJyhvs9JqSEUVABz1gJh.jpg,"[10751, 14, 12]",448119,en,Dolittle,"After losing his wife seven years earlier, the...",35.144,/uoplwswBDy7gsOyrbGuKyPFoPCs.jpg,2020-01-02,Dolittle,False,6.6,3358
9,False,/isBjNHBblzxrzHrAtRbjgkYzAut.jpg,"[10749, 18]",664413,pl,365 dni,"A woman falls victim to a dominant mafia boss,...",36.154,/6KwrHucIE3CvNT7kTm2MAlZ4fYF.jpg,2020-02-07,365 Days,False,7.1,8373
