# Notebook 4 : JSON et API

In [1]:
import json
import random

from io import StringIO # Pour éviter les avertissements de read_json

import pandas as pd
import requests

## Format JSON

Nous considérons deux jeux de données artificiels pour illustrer des limites du format JSON à garder à l'esprit en pratique.

In [2]:
nombres = pd.DataFrame({"Nombre": [random.random() for _ in range(5)]})
nananinf = pd.DataFrame({"Valeur": [3.14, pd.NA, float("nan"), float("inf")]})

1. Convertir `nombres` au format JSON avec la méthode `to_json` et stocker le résultat dans une variable `nombres_json`.

In [3]:
nombres_json=nombres.to_json()
nombres_json

'{"Nombre":{"0":0.2447201296,"1":0.6929552924,"2":0.4137836212,"3":0.7827722473,"4":0.9695096807}}'

2. Importer `nombres_json` avec la fonction `read_json` de Pandas dans un dataframe `nombres_bis`. Comparer les objets `nombres` et `nombres_bis`.

In [4]:
nombres_bis = pd.read_json(StringIO(nombres_json))
nombres_bis

Unnamed: 0,Nombre
0,0.24472
1,0.692955
2,0.413784
3,0.782772
4,0.96951


3. Lire la documentation de `to_json` pour connaître l'option permettant de gérer (mais pas de résoudre) le problème précédent.

4. Convertir `nananinf` au format JSON avec la méthode `to_json` et stocker le résultat dans une variable `nananinf_json`. Que sont devenus `NA`, `NaN` et `inf` ?

5. Importer `nananinf_json` avec la fonction `read_json` de Pandas dans un dataframe `nananinf_bis`. Comparer les objets `nananinf` et `nananinf_bis`.

6. Reprendre les questions 4 et 5 sur l'objet `[float("nan"), float("inf")]` avec les fonctions `dumps` et `loads`. Quelle est la différence ? Lire la documentation de `dumps` pour comprendre l'option `allow_nan`.

## Iris

Nous reprenons ici le jeu de données des [Iris de Fisher](https://fr.wikipedia.org/wiki/Iris_de_Fisher) pour étudier les différentes façons d'exporter un dataframe au format JSON.

1. Charger le jeu de données dans un dataframe `iris` à partir du fichier `iris.csv`.

In [7]:
iris = pd.read_csv("/home/onyxia/work/formation_cepe/data/iris.csv")

2. Comparer les résultats obtenus en exportant `iris` au format JSON avec `to_json` et :
- `orient="columns"`,
- `orient="index"`,
- `orient="records"`.

In [8]:
iris.to_json(orient = "columns")

'{"SepalLength":{"0":5.1,"1":4.9,"2":4.7,"3":4.6,"4":5.0,"5":5.4,"6":4.6,"7":5.0,"8":4.4,"9":4.9,"10":5.4,"11":4.8,"12":4.8,"13":4.3,"14":5.8,"15":5.7,"16":5.4,"17":5.1,"18":5.7,"19":5.1,"20":5.4,"21":5.1,"22":4.6,"23":5.1,"24":4.8,"25":5.0,"26":5.0,"27":5.2,"28":5.2,"29":4.7,"30":4.8,"31":5.4,"32":5.2,"33":5.5,"34":4.9,"35":5.0,"36":5.5,"37":4.9,"38":4.4,"39":5.1,"40":5.0,"41":4.5,"42":4.4,"43":5.0,"44":5.1,"45":4.8,"46":5.1,"47":4.6,"48":5.3,"49":5.0,"50":7.0,"51":6.4,"52":6.9,"53":5.5,"54":6.5,"55":5.7,"56":6.3,"57":4.9,"58":6.6,"59":5.2,"60":5.0,"61":5.9,"62":6.0,"63":6.1,"64":5.6,"65":6.7,"66":5.6,"67":5.8,"68":6.2,"69":5.6,"70":5.9,"71":6.1,"72":6.3,"73":6.1,"74":6.4,"75":6.6,"76":6.8,"77":6.7,"78":6.0,"79":5.7,"80":5.5,"81":5.5,"82":5.8,"83":6.0,"84":5.4,"85":6.0,"86":6.7,"87":6.3,"88":5.6,"89":5.5,"90":5.5,"91":6.1,"92":5.8,"93":5.0,"94":5.6,"95":5.7,"96":5.7,"97":6.2,"98":5.1,"99":5.7,"100":6.3,"101":5.8,"102":7.1,"103":6.3,"104":6.5,"105":7.6,"106":4.9,"107":7.3,"108":6.7,"10

In [9]:
iris.to_json(orient = "index")

'{"0":{"SepalLength":5.1,"SepalWidth":3.5,"PetalLength":1.4,"PetalWidth":0.2,"Species":"setosa"},"1":{"SepalLength":4.9,"SepalWidth":3.0,"PetalLength":1.4,"PetalWidth":0.2,"Species":"setosa"},"2":{"SepalLength":4.7,"SepalWidth":3.2,"PetalLength":1.3,"PetalWidth":0.2,"Species":"setosa"},"3":{"SepalLength":4.6,"SepalWidth":3.1,"PetalLength":1.5,"PetalWidth":0.2,"Species":"setosa"},"4":{"SepalLength":5.0,"SepalWidth":3.6,"PetalLength":1.4,"PetalWidth":0.2,"Species":"setosa"},"5":{"SepalLength":5.4,"SepalWidth":3.9,"PetalLength":1.7,"PetalWidth":0.4,"Species":"setosa"},"6":{"SepalLength":4.6,"SepalWidth":3.4,"PetalLength":1.4,"PetalWidth":0.3,"Species":"setosa"},"7":{"SepalLength":5.0,"SepalWidth":3.4,"PetalLength":1.5,"PetalWidth":0.2,"Species":"setosa"},"8":{"SepalLength":4.4,"SepalWidth":2.9,"PetalLength":1.4,"PetalWidth":0.2,"Species":"setosa"},"9":{"SepalLength":4.9,"SepalWidth":3.1,"PetalLength":1.5,"PetalWidth":0.1,"Species":"setosa"},"10":{"SepalLength":5.4,"SepalWidth":3.7,"PetalL

In [10]:
iris.to_json(orient = "records")

'[{"SepalLength":5.1,"SepalWidth":3.5,"PetalLength":1.4,"PetalWidth":0.2,"Species":"setosa"},{"SepalLength":4.9,"SepalWidth":3.0,"PetalLength":1.4,"PetalWidth":0.2,"Species":"setosa"},{"SepalLength":4.7,"SepalWidth":3.2,"PetalLength":1.3,"PetalWidth":0.2,"Species":"setosa"},{"SepalLength":4.6,"SepalWidth":3.1,"PetalLength":1.5,"PetalWidth":0.2,"Species":"setosa"},{"SepalLength":5.0,"SepalWidth":3.6,"PetalLength":1.4,"PetalWidth":0.2,"Species":"setosa"},{"SepalLength":5.4,"SepalWidth":3.9,"PetalLength":1.7,"PetalWidth":0.4,"Species":"setosa"},{"SepalLength":4.6,"SepalWidth":3.4,"PetalLength":1.4,"PetalWidth":0.3,"Species":"setosa"},{"SepalLength":5.0,"SepalWidth":3.4,"PetalLength":1.5,"PetalWidth":0.2,"Species":"setosa"},{"SepalLength":4.4,"SepalWidth":2.9,"PetalLength":1.4,"PetalWidth":0.2,"Species":"setosa"},{"SepalLength":4.9,"SepalWidth":3.1,"PetalLength":1.5,"PetalWidth":0.1,"Species":"setosa"},{"SepalLength":5.4,"SepalWidth":3.7,"PetalLength":1.5,"PetalWidth":0.2,"Species":"setosa

3. Exporter `iris` dans un fichier `iris.json` au format NDJSON. Ouvrir ce fichier dans un éditeur de texte pour vérifier que chaque ligne contient un document.

In [11]:
iris.to_json("iris.json", orient="records", lines=True) # Vers NDJSON
obj_copy = pd.read_json("iris.json", lines=True) # Depuis NDJSON
print(obj_copy)


     SepalLength  SepalWidth  PetalLength  PetalWidth    Species
0            5.1         3.5          1.4         0.2     setosa
1            4.9         3.0          1.4         0.2     setosa
2            4.7         3.2          1.3         0.2     setosa
3            4.6         3.1          1.5         0.2     setosa
4            5.0         3.6          1.4         0.2     setosa
..           ...         ...          ...         ...        ...
145          6.7         3.0          5.2         2.3  virginica
146          6.3         2.5          5.0         1.9  virginica
147          6.5         3.0          5.2         2.0  virginica
148          6.2         3.4          5.4         2.3  virginica
149          5.9         3.0          5.1         1.8  virginica

[150 rows x 5 columns]


4. Importer le fichier `iris.json` au format NDJSON dans un dataframe `iris2`.

In [12]:
iris2 = pd.read_json("iris.json", lines=True)
iris2

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth,Species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,virginica
146,6.3,2.5,5.0,1.9,virginica
147,6.5,3.0,5.2,2.0,virginica
148,6.2,3.4,5.4,2.3,virginica


## Star Wars API

Le projet SWAPI (*Star Wars API*) est une source de données sur l'univers de Star Wars. L'API fournit plusieurs jeux de données concernant les planètes, les vaisseaux, les véhicules, les personnages, les films et les espèces de la saga venue d'une galaxie très, très lointaine.

1. Utiliser la fonction Pandas `read_json` pour importer les données sur les planètes disponibles au format JSON à l'adresse [https://swapi-node.vercel.app/api/planets](https://swapi-node.vercel.app/api/planets) dans un dataframe. Est-ce que le résultat est facilement exploitable sous cette forme ?

In [38]:
planets_url = "https://swapi-node.vercel.app/api/planets"
pd.read_json(planets_url)


Unnamed: 0,count,pages,next,previous,results
0,60,6,/api/planets?page=2,,{'fields': {'edited': '2014-12-20T20:58:18.411...
1,60,6,/api/planets?page=2,,{'fields': {'edited': '2014-12-20T20:58:18.420...
2,60,6,/api/planets?page=2,,{'fields': {'edited': '2014-12-20T20:58:18.421...
3,60,6,/api/planets?page=2,,{'fields': {'edited': '2014-12-20T20:58:18.423...
4,60,6,/api/planets?page=2,,{'fields': {'edited': '2014-12-20T20:58:18.425...
5,60,6,/api/planets?page=2,,{'fields': {'edited': '2014-12-20T20:58:18.427...
6,60,6,/api/planets?page=2,,{'fields': {'edited': '2014-12-20T20:58:18.429...
7,60,6,/api/planets?page=2,,{'fields': {'edited': '2014-12-20T20:58:18.430...
8,60,6,/api/planets?page=2,,{'fields': {'edited': '2014-12-20T20:58:18.432...
9,60,6,/api/planets?page=2,,{'fields': {'edited': '2014-12-20T20:58:18.434...


2. Utiliser la fonction `get` du module `requests` pour récupérer les mêmes données que dans la question précédente et vérifier le code HTTP obtenu.

In [39]:
import requests
r = requests.get(planets_url)
print(r.text)

{"count":60,"pages":6,"next":"/api/planets?page=2","previous":null,"results":[{"fields":{"edited":"2014-12-20T20:58:18.411Z","climate":"arid","surface_water":"1","name":"Tatooine","diameter":"10465","rotation_period":"23","created":"2014-12-09T13:50:49.641Z","terrain":"desert","gravity":"1 standard","orbital_period":"304","population":"200000","residents":[],"films":[],"url":"/api/planets/1"}},{"fields":{"edited":"2014-12-20T20:58:18.420Z","climate":"temperate","surface_water":"40","name":"Alderaan","diameter":"12500","rotation_period":"24","created":"2014-12-10T11:35:48.479Z","terrain":"grasslands, mountains","gravity":"1 standard","orbital_period":"364","population":"2000000000","residents":[],"films":[],"url":"/api/planets/2"}},{"fields":{"edited":"2014-12-20T20:58:18.421Z","climate":"temperate, tropical","surface_water":"8","name":"Yavin IV","diameter":"10200","rotation_period":"24","created":"2014-12-10T11:37:19.144Z","terrain":"jungle, rainforests","gravity":"1 standard","orbital

3. Comprendre les éléments de la réponse obtenue à la question précédente. En particulier, combien y a-t-il de planètes dans `results` et à quoi correspond `next` ?

4. Écrire une boucle pour récupérer les informations de toutes les planètes disponibles dans l'API et stocker le résultat dans un dataframe `planets`.

In [79]:
url_base = "https://swapi-node.vercel.app"
url_current = "/api/planets"
planets = []
while url_current is not None:
    url = url_base+url_current
    r = requests.get(url)
    assert r.status_code == 200, 'Error'
    obj=r.json()
    planets.append(
        pd.DataFrame(
            [planet['fields'] for planet in obj['results']]
        )
    )
    url_current = obj['next']

planets_df = pd.concat(planets, ignore_index = True)
planets_df

Unnamed: 0,edited,climate,surface_water,name,diameter,rotation_period,created,terrain,gravity,orbital_period,population,residents,films,url
0,2014-12-20T20:58:18.411Z,arid,1,Tatooine,10465,23,2014-12-09T13:50:49.641Z,desert,1 standard,304,200000,[],[],/api/planets/1
1,2014-12-20T20:58:18.420Z,temperate,40,Alderaan,12500,24,2014-12-10T11:35:48.479Z,"grasslands, mountains",1 standard,364,2000000000,[],[],/api/planets/2
2,2014-12-20T20:58:18.421Z,"temperate, tropical",8,Yavin IV,10200,24,2014-12-10T11:37:19.144Z,"jungle, rainforests",1 standard,4818,1000,[],[],/api/planets/3
3,2014-12-20T20:58:18.423Z,frozen,100,Hoth,7200,23,2014-12-10T11:39:13.934Z,"tundra, ice caves, mountain ranges",1.1 standard,549,unknown,[],[],/api/planets/4
4,2014-12-20T20:58:18.425Z,murky,8,Dagobah,8900,23,2014-12-10T11:42:22.590Z,"swamp, jungles",,341,unknown,[],[],/api/planets/5
5,2014-12-20T20:58:18.427Z,temperate,0,Bespin,118000,12,2014-12-10T11:43:55.240Z,gas giant,"1.5 (surface), 1 standard (Cloud City)",5110,6000000,[],[],/api/planets/6
6,2014-12-20T20:58:18.429Z,temperate,8,Endor,4900,18,2014-12-10T11:50:29.349Z,"forests, mountains, lakes",0.85 standard,402,30000000,[],[],/api/planets/7
7,2014-12-20T20:58:18.430Z,temperate,12,Naboo,12120,26,2014-12-10T11:52:31.066Z,"grassy hills, swamps, forests, mountains",1 standard,312,4500000000,[],[],/api/planets/8
8,2014-12-20T20:58:18.432Z,temperate,unknown,Coruscant,12240,24,2014-12-10T11:54:13.921Z,"cityscape, mountains",1 standard,368,1000000000000,[],[],/api/planets/9
9,2014-12-20T20:58:18.434Z,temperate,100,Kamino,19720,27,2014-12-10T12:45:06.577Z,ocean,1 standard,463,1000000000,[],[],/api/planets/10


5. Exporter le dataframe obtenu à la question précédente dans un fichier `planets.json` au format NDJSON.

In [84]:
planets_df.to_json("planets.json", orient="records", lines=True) # Vers NDJSON
obj_copy = pd.read_json("planets.json", lines=True) # Depuis NDJSON
print(obj_copy.head())

                     edited              climate surface_water      name  \
0  2014-12-20T20:58:18.411Z                 arid             1  Tatooine   
1  2014-12-20T20:58:18.420Z            temperate            40  Alderaan   
2  2014-12-20T20:58:18.421Z  temperate, tropical             8  Yavin IV   
3  2014-12-20T20:58:18.423Z               frozen           100      Hoth   
4  2014-12-20T20:58:18.425Z                murky             8   Dagobah   

  diameter rotation_period                   created  \
0    10465              23  2014-12-09T13:50:49.641Z   
1    12500              24  2014-12-10T11:35:48.479Z   
2    10200              24  2014-12-10T11:37:19.144Z   
3     7200              23  2014-12-10T11:39:13.934Z   
4     8900              23  2014-12-10T11:42:22.590Z   

                              terrain       gravity orbital_period  \
0                              desert    1 standard            304   
1               grasslands, mountains    1 standard            364