<center><h1>Data engineering project</h1></center>

# Part 3 - ETL


In this final part, we'll:

*   Run the ETL process
*   Extract bank and market cap data from the JSON file `bank_market_cap.json` prapared from part-1
*   Transform the market cap currency using the `exchange rate data` prapared from part-2
*   Load the transformed data into a seperate CSV


In [None]:
#!pip install glob
#!pip install pandas
#!pip install requests
#!pip install datetime

## Imports

In [None]:
import glob
import pandas as pd
from datetime import datetime

## Extract


### JSON Extract Function

This function will extract JSON files.


In [None]:
def extract_from_json(file_to_process):
    dataframe = pd.read_json(file_to_process)
    return dataframe

## Extract Function

we'll define the extract function that finds JSON file `bank_market_cap.json` and calls the function created above to extract data it then store the data in a `pandas` dataframe.


In [None]:
def extract():
    # create an empty data frame to hold extracted data
    extracted_data = pd.DataFrame(columns=['Name','Market Cap (US$ Billion)']) 
    extracted_data = extracted_data.append(extract_from_json('bank_market_cap.json'), ignore_index=True)
    return extracted_data

Right now, we'll load the file <code>exchange_rates.csv</code> prepared from part-2 as a dataframe and find the exchange rate for British pounds with the symbol <code>GBP</code>, store it in the variable  <code>exchange_rate</code>.

In [None]:
df=pd.read_csv('exchange_rates.csv',index_col=0)
exchange_rate = df.loc['GBP']['Rates']
exchange_rate

## Transform

Using <code>exchange_rate</code> and the `exchange_rates.csv` file, we'll find the exchange rate of USD to GBP and then write a transform function that

1.  Changes the `Market Cap (US$ Billion)` column from USD to GBP
2.  Rounds the Market Cap (US$ Billion)\` column to 3 decimal places
3.  Rename `Market Cap (US$ Billion)` to `Market Cap (GBP£ Billion)`


In [None]:
def transform(data, r):
    # Write your code here
    data['Market Cap (US$ Billion)'] = round((data['Market Cap (US$ Billion)'] * r),3)
    data.rename(columns={'Market Cap (US$ Billion)':'Market Cap (GBP£ Billion)'},inplace=True)
    return data

## Load

we'll create a function that takes a dataframe and load it to a csv named `bank_market_cap_gbp.csv`.


In [None]:
target_file = 'bank_market_cap_gbp.csv'
def load(target_file,data_to_load):
    data_to_load.to_csv(target_file,index=False) 

## Logging Function


In [None]:
def log(message):
    timestamp_format = '%Y-%h-%d-%H:%M:%S' # Year-Monthname-Day-Hour-Minute-Second
    now = datetime.now() # get current timestamp
    timestamp = now.strftime(timestamp_format)
    with open("logfile.txt","a") as f:
        f.write(timestamp + ',' + message + '\n')

## Running the ETL Process


In [None]:
log('ETL Job Started')
log('Extract phase Started')

### Extract


In [None]:
MarketCap = extract()
MarketCap.head()

In [None]:
log('Extract phase Ended')

### Transform


In [None]:
log("Transform phase Started")

In [None]:
transformed_data = transform(MarketCap,exchange_rate)
transformed_data.head()

In [None]:
log("Transform phase Ended")

### Load


In [None]:
log("Load phase Started")

In [None]:
load(target_file,transformed_data)

In [None]:
log("Load phase Ended")

In [None]:
!cat logfile.txt