# Web Data Extraction - Part I

__Web API:__ Application Programming Interface for Web Applications (Client --> Server)

__REST:__ Representational state transfer

__URI:__ Uniform Resource Identifier [RFC 3986](https://www.rfc-editor.org/rfc/rfc3986)

__HTTP library for Python:__ [Requests](https://requests.readthedocs.io/en/latest/)

In [None]:
# Import libraries
import pandas as pd
import requests    #!conda install requests

# Pandas display options
pd.set_option('display.max_columns', None)

---

In [None]:
response = requests.get('https://jsonplaceholder.typicode.com/todos')
print(type(response))

### HTTP Response

[Boring Reference](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes)

[Funny Reference](https://http.cat/)

In [None]:
status = response.status_code
status

In [None]:
content = response.content     # raw binary content of the response body
type(content)

In [None]:
json_data = response.json()     # parse the content of a HTTP response as JSON (JavaScript Object Notation)
print(type(json_data))          # return it as a dictionary or list, depending on the structure of the JSON
print(len(json_data))
print(type(json_data[0]))
json_data[0].keys()

In [None]:
# Other attributes/methods: .headers, .links, .cookies

headers = response.headers
headers

In [None]:
# But we like DataFrames

df = pd.DataFrame(json_data)
df

---

In [None]:
# Main End-Point
end_point = 'https://api.github.com/'
# Body
par1 = 'repos/'
par2 = 'ih-datapt-mad/'
par3 = 'dataptmad0924_labs/'
par4 = 'pulls'     # https://docs.github.com/en/rest/reference/pulls
par5 = '?state=closed'

In [None]:
full_url = end_point + par1 + par2 + par3 + par4 + par5
pulls_response = requests.get(full_url)
pulls_json = pulls_response.json()
print(full_url)
print(type(pulls_json))
print(len(pulls_json))
print(pulls_json[0].keys())

In [None]:
df_pulls = pd.DataFrame(pulls_json)
df_pulls

---

In [None]:
# The best way is with a method called: .json_normalize

df_flat = pd.json_normalize(pulls_json)#.T.reset_index(drop=True)
print(df_flat.info())
df_flat

---

__Some useful tools:__

- https://curlconverter.com/

- https://www.postman.com/

__Some REST API to practice with:__

- https://jsonplaceholder.typicode.com

- https://docs.github.com/en/rest

- https://github.com/Kaggle/kaggle-api

- https://polygon.io/

- https://coinmarketcap.com/api/documentation/v1/#section/Quick-Start-Guide

- https://datos.gob.es/es/documentacion/guia-practica-para-la-publicacion-de-datos-abiertos-usando-apis