In this lab you will:

*   Run the ETL process
*   Extract bank and market cap data from the JSON file `bank_market_cap.json`
*   Transform the market cap currency using the exchange rate data
*   Load the transformed data into a seperate CSV

## Setup

### Install libraries

In [None]:
!pip install pandas==1.3.3
!pip install requests==2.26.0

### Import libraries

In [4]:
import os
import glob
import pandas as pd
from datetime import datetime

## Extract

In [22]:
def extract(json_file, cols=['Name','Market Cap (US$ Billion)']):
    dataframe = pd.read_json(json_file)
    extracted_data = pd.DataFrame(dataframe, columns=cols)
    return extracted_data

In [23]:
json_file = os.path.join("data", "bank_market_cap_1.json")

extracted_data = extract(json_file)
extracted_data.head()

Unnamed: 0,Name,Market Cap (US$ Billion)
0,JPMorgan Chase,390.934
1,Industrial and Commercial Bank of China,345.214
2,Bank of America,325.331
3,Wells Fargo,308.013
4,China Construction Bank,257.399


Load the `exchange_rates` data as a dataframe and find the exchange rate for British pounds with the symbol `GBP`, store it in the variable `exchange_rate`. Hint: set the parameter `index_col` to 0.

In [29]:
def get_exchange_rate(targetfile):
    dataframe = pd.read_csv(targetfile, index_col=0)
    exchange_rate = dataframe.loc['GBP','Rates']
    return exchange_rate

In [31]:
exchg_rate_file = os.path.join("data", "exchange_rates.csv")

get_exchange_rate(exchg_rate_file)

0.7323984208000001

## Transform

Using `exchange_rate` and the `exchange_rates.csv` file find the exchange rate of USD to GBP. Write a transform function that

1.  Changes the `Market Cap (US$ Billion)` column from USD to GBP
2.  Rounds the Market Cap (US$ Billion) column to 3 decimal places
3.  Rename `Market Cap (US$ Billion)` to `Market Cap (GBP$ Billion)`

In [26]:
def transform(df, USD_2_GBP):
    df['Market Cap (US$ Billion)'] = round(df['Market Cap (US$ Billion)'].multiply(USD_2_GBP),3)
    df = df.rename(columns={'Market Cap (US$ Billion)':'Market Cap (GBP$ Billion)'})
    
    return df

In [27]:
transformed_data = transform(extracted_data, exchange_rate)
transformed_data.head()

Unnamed: 0,Name,Market Cap (GBP$ Billion)
0,JPMorgan Chase,286.319
1,Industrial and Commercial Bank of China,252.834
2,Bank of America,238.272
3,Wells Fargo,225.588
4,China Construction Bank,188.519


## Load

Create a function that takes a dataframe and load it to a csv named `bank_market_cap_gbp.csv`. Make sure to set `index` to `False`.

In [18]:
def load(df, save_path):
    df.to_csv(save_path, index=False)

## Logging Function

Write the logging function `log` to log your data

In [20]:
def log(message, save_path="log.txt"):
    timestamp_format = '%Y-%h-%d-%H:%M:%S'
    # Year-Monthname-Day-Hour-Minute-Second
    now = datetime.now() # get current timestamp
    timestamp = now.strftime(timestamp_format)
    with open(save_path, "a") as f:
        f.write(timestamp + ',' + message + '\n')

## Running the ETL Process

In [32]:
log("ETL Job Started")

# Extract
log("Extract phase Started")
json_file = os.path.join("data", "bank_market_cap_1.json")
extracted_data = extract(json_file)
display(extracted_data.head())
log("Extract phase Ended")

log("Transform phase Started")
exchg_rate_file = os.path.join("data", "exchange_rates.csv")
exchange_rate = get_exchange_rate(exchg_rate_file)
transformed_data = transform(extracted_data, exchange_rate)
display(transformed_data.head())
log("Transform phase Ended")

log("Load phase Started")
save_data_path = os.path.join("data", "bank_market_cap_gbp.csv")
load(transformed_data, save_data_path)
log("Load phase Ended")

Unnamed: 0,Name,Market Cap (US$ Billion)
0,JPMorgan Chase,390.934
1,Industrial and Commercial Bank of China,345.214
2,Bank of America,325.331
3,Wells Fargo,308.013
4,China Construction Bank,257.399


Unnamed: 0,Name,Market Cap (GBP$ Billion)
0,JPMorgan Chase,286.319
1,Industrial and Commercial Bank of China,252.834
2,Bank of America,238.272
3,Wells Fargo,225.588
4,China Construction Bank,188.519
