<a href="https://colab.research.google.com/github/fvgm-spec/learn-airbyte/blob/main/Traditional_ETL_process.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Traditional ETL process

 The following is a traditional ETL process, which is based on batch data processing coming from API calls to [News API](https://newsapi.org/). The Data is collected and processed in predefined blocks as Python functions.

### Step 1: Importing required packages and setting global variables

In [1]:
import pandas as pd
import requests
import time
import json

In [2]:
url = 'https://api.github.com'

### Step 2: Defining functions

In [14]:
## Helper functions

#Defines a function to extract data from API endpoint
def get_users(URL: str) -> pd.DataFrame:
    """
    Extracts data from GitHub API endpoint

    Args:
        URL (str): url that directs to API endpoint.

    Returns:
        df: The JSON data extracted from the API call.
    """
    #Defines an empty DataFrame
    df = pd.DataFrame()

    for i in range (0,9,1):

        r = requests.get(f'{URL}/users?since= + {i}')
        data = r.json()
        #Converts json response to dataframe
        table = pd.DataFrame.from_dict(data)
        #Appends data from newly dataframe created that contains the API extraction to new dataframe
        df = pd.concat([df,table])
        #Waits 2 seconds to perform the next API call
        time.sleep(2)

    return df


#Function that parses extracted data
def parse_df(df):
    """
    Parses the data extracted from the API call.

    Args:
        df (pd.DataFrame): Data extracted from the API converted into a DataFrame.

    Returns:
        df: pandas DataFrame with parsed data.
    """
    df = df.iloc[1: , :]
    df = df[['id', 'login','node_id', 'avatar_url', 'gravatar_id', 'url', 'html_url',
           'followers_url', 'following_url', 'gists_url', 'starred_url',
           'subscriptions_url', 'organizations_url', 'repos_url', 'events_url',
           'received_events_url', 'type', 'site_admin', 'login']]

    return df


### Step 3: Using helper functions to extract data from API endpoints and then performing data transformations

In [15]:
df = get_users(url)

In [17]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 270 entries, 0 to 29
Data columns (total 18 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   login                270 non-null    object
 1   id                   270 non-null    int64 
 2   node_id              270 non-null    object
 3   avatar_url           270 non-null    object
 4   gravatar_id          270 non-null    object
 5   url                  270 non-null    object
 6   html_url             270 non-null    object
 7   followers_url        270 non-null    object
 8   following_url        270 non-null    object
 9   gists_url            270 non-null    object
 10  starred_url          270 non-null    object
 11  subscriptions_url    270 non-null    object
 12  organizations_url    270 non-null    object
 13  repos_url            270 non-null    object
 14  events_url           270 non-null    object
 15  received_events_url  270 non-null    object
 16  type          

In [18]:
df.head(5)

Unnamed: 0,login,id,node_id,avatar_url,gravatar_id,url,html_url,followers_url,following_url,gists_url,starred_url,subscriptions_url,organizations_url,repos_url,events_url,received_events_url,type,site_admin
0,mojombo,1,MDQ6VXNlcjE=,https://avatars.githubusercontent.com/u/1?v=4,,https://api.github.com/users/mojombo,https://github.com/mojombo,https://api.github.com/users/mojombo/followers,https://api.github.com/users/mojombo/following...,https://api.github.com/users/mojombo/gists{/gi...,https://api.github.com/users/mojombo/starred{/...,https://api.github.com/users/mojombo/subscript...,https://api.github.com/users/mojombo/orgs,https://api.github.com/users/mojombo/repos,https://api.github.com/users/mojombo/events{/p...,https://api.github.com/users/mojombo/received_...,User,False
1,defunkt,2,MDQ6VXNlcjI=,https://avatars.githubusercontent.com/u/2?v=4,,https://api.github.com/users/defunkt,https://github.com/defunkt,https://api.github.com/users/defunkt/followers,https://api.github.com/users/defunkt/following...,https://api.github.com/users/defunkt/gists{/gi...,https://api.github.com/users/defunkt/starred{/...,https://api.github.com/users/defunkt/subscript...,https://api.github.com/users/defunkt/orgs,https://api.github.com/users/defunkt/repos,https://api.github.com/users/defunkt/events{/p...,https://api.github.com/users/defunkt/received_...,User,False
2,pjhyett,3,MDQ6VXNlcjM=,https://avatars.githubusercontent.com/u/3?v=4,,https://api.github.com/users/pjhyett,https://github.com/pjhyett,https://api.github.com/users/pjhyett/followers,https://api.github.com/users/pjhyett/following...,https://api.github.com/users/pjhyett/gists{/gi...,https://api.github.com/users/pjhyett/starred{/...,https://api.github.com/users/pjhyett/subscript...,https://api.github.com/users/pjhyett/orgs,https://api.github.com/users/pjhyett/repos,https://api.github.com/users/pjhyett/events{/p...,https://api.github.com/users/pjhyett/received_...,User,False
3,wycats,4,MDQ6VXNlcjQ=,https://avatars.githubusercontent.com/u/4?v=4,,https://api.github.com/users/wycats,https://github.com/wycats,https://api.github.com/users/wycats/followers,https://api.github.com/users/wycats/following{...,https://api.github.com/users/wycats/gists{/gis...,https://api.github.com/users/wycats/starred{/o...,https://api.github.com/users/wycats/subscriptions,https://api.github.com/users/wycats/orgs,https://api.github.com/users/wycats/repos,https://api.github.com/users/wycats/events{/pr...,https://api.github.com/users/wycats/received_e...,User,False
4,ezmobius,5,MDQ6VXNlcjU=,https://avatars.githubusercontent.com/u/5?v=4,,https://api.github.com/users/ezmobius,https://github.com/ezmobius,https://api.github.com/users/ezmobius/followers,https://api.github.com/users/ezmobius/followin...,https://api.github.com/users/ezmobius/gists{/g...,https://api.github.com/users/ezmobius/starred{...,https://api.github.com/users/ezmobius/subscrip...,https://api.github.com/users/ezmobius/orgs,https://api.github.com/users/ezmobius/repos,https://api.github.com/users/ezmobius/events{/...,https://api.github.com/users/ezmobius/received...,User,False


### Step 4

From this point we can proceed performing some other custom tranformation over the data and persisting the extracted data in a DBMS like Postgres, MySQL or DuckDB using SQLAlchemy library.