## Process REST Payload using pandas

Let us understand how to process REST Payload using Pandas Dataframe APIs.
* We can get details about all the public repositories using `GET /repositories` from **https://api.github.com**.
* As it is getting or reading data from external application the details are available via `GET`. We will have JSON Array as part of the Payload.
* We can convert this JSON Array to Python `list`. Each element in the list will be of type `dict`.
* We can apply `pandas.json_normalize` to get flattened Dataframe by passing this list of dicts.
* Let us understand how the data in this Pandas Dataframe can be processed using appropriate Pandas APIs as per our requirements.

In [None]:
import requests

In [None]:
payload = requests.get('https://api.github.com/repositories').json()

In [None]:
type(payload)

In [None]:
payload # A list which contain dicts

* Now we can convert the list returned to pandas dataframe using `json_normalize`.

In [None]:
import pandas as pd

In [None]:
pd.DataFrame(payload)

In [None]:
repos_df = pd.json_normalize(payload)

In [None]:
repos_df

In [None]:
repos_df.columns

In [None]:
repos_df.dtypes

In [None]:
repos_df.shape

In [None]:
repos_df.count()

Here are some of the tasks you can work on using `repos_df` data. We will explore the solutions using Pandas APIs.

In [None]:
since = 369

In [None]:
repos = requests.get(f'https://api.github.com/repositories?since={since}').json()

In [None]:
repos_df = pd.json_normalize(repos)

In [None]:
repos_df

* Get number of repositories.

In [None]:
repos_df.shape

In [None]:
repos_df.shape[0]

* Get repository name, url and owner type of all repositories.

In [None]:
repos_df

In [None]:
repos_df[['name', 'url', 'owner.type']]

* Get all unique or distinct owner types of the repositories. The output should be of type **list**.

In [None]:
repos_df['owner.type']

In [None]:
repos_df['owner.type'].unique()

In [None]:
list(repos_df['owner.type'].unique())

* Get number of repositories where owner type is **User**.

In [None]:
repos_df['owner.type'] == 'User'

In [None]:
repos_df[repos_df['owner.type'] == 'User']

In [None]:
repos_df[repos_df['owner.type'] == 'User'].shape

In [None]:
repos_df[repos_df['owner.type'] == 'User'].shape[0]

* Get number of repositories where owner type is **Organization**.

In [None]:
repos_df[repos_df['owner.type'] == 'Organization'].shape[0]

* Get number of repositories by each owner type.

In [None]:
repos_df.groupby('owner.type')

In [None]:
repos_df.groupby('owner.type')['owner.type'].count()

* Sort the data by owner type and then by id. Ensure that data is sorted by id as numeric.

In [None]:
repos_df.dtypes

In [None]:
repos_df.sort_values(by=['owner.type', 'id']).head(10)