# Data Engineer - ETL Assignment


## Objectives

In this notebook you will:

*   Run the ETL process
*   Extract bank and market cap data from the JSON file `bank_market_cap.json`
*   Transform the market cap currency using the exchange rate data
*   Load the transformed data into a seperate CSV


In [None]:
%pip install pandas
%pip install requests

## Imports

Import any additional libraries you may need here.


In [None]:
import glob
import pandas as pd
from datetime import datetime

As the exchange rate fluctuates, we will download the same dataset to make marking simpler. This will be in the same format as the dataset you used in the last section


In [None]:
%pip install wget --quiet


In [None]:
# Downloading the data files
#!python -m wget https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0221EN-SkillsNetwork/labs/module%206/Lab%20-%20Extract%20Transform%20Load/data/bank_market_cap_1.json
#!python -m wget https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0221EN-SkillsNetwork/labs/module%206/Lab%20-%20Extract%20Transform%20Load/data/bank_market_cap_2.json
#!python -m wget https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0221EN-SkillsNetwork/labs/module%206/Final%20Assignment/exchange_rates.csv

## Extract


### JSON Extract Function

This function will extract JSON files.


In [None]:
def extract_from_json(file_to_process):
    dataframe = pd.read_json(file_to_process)
    return dataframe

## Extract Function

Define the extract function that finds JSON file `bank_market_cap_1.json` and calls the function created above to extract data from them. Store the data in a `pandas` dataframe. Use the following list for the columns.


In [None]:
n_columns=['Name','Market Cap (US$ Billion)']

In [None]:
def extract():
    df = pd.DataFrame(columns=n_columns)
    df = extract_from_json('all_data/bank_market_cap_1.json')
    return df
    
#extract()
    

<b>Question 1</b> Load the file <code>exchange_rates.csv</code> as a dataframe and find the exchange rate for British pounds with the symbol <code>GBP</code>, store it in the variable  <code>exchange_rate</code>, you will be asked for the number. Hint: set the parameter  <code>index_col</code> to 0.


In [None]:
# Write your code here
df = pd.read_csv('all_data/exchange_rates.csv') 
df



In [None]:
exchange_rate = df['Rates'][9]
print(exchange_rate)

## Transform

Using <code>exchange_rate</code> and the `exchange_rates.csv` file find the exchange rate of USD to GBP. Write a transform function that

1.  Changes the `Market Cap (US$ Billion)` column from USD to GBP
2.  Rounds the Market Cap (US$ Billion)\` column to 3 decimal places
3.  Rename `Market Cap (US$ Billion)` to `Market Cap (GBP$ Billion)`


In [None]:
def transform(ex_rate, file):
    # Write your code here
    df = pd.read_json(file)
    print(df)
    df.rename(columns = {'Market Cap (US$ Billion)':'Market Cap (GBP$ Billion)'}, inplace = True)
    df['Market Cap (GBP$ Billion)'] =  round(df['Market Cap (GBP$ Billion)'] * ex_rate, 3)
    return df

#transform(exchange_rate, 'all_data/bank_market_cap_1.json')
 

    

## Load

Create a function that takes a dataframe and load it to a csv named `bank_market_cap_gbp.csv`. Make sure to set `index` to `False`.


In [None]:
def load(targetfile,data_to_load):
    data_to_load.to_csv(targetfile, index=False) 

#load('all_data/output/bank_market_cap_gbp.csv', transform(exchange_rate, 'all_data/bank_market_cap_1.json'))

    

## Logging Function


Write the logging function <code>log</code> to log your data:


In [None]:
def log(message):
    timestamp_format = '%H:%M:%S-%h-%d-%Y' #Hour-Minute-Second-MonthName-Day-Year
    now = datetime.now() # get current timestamp
    timestamp = now.strftime(timestamp_format)
    with open("all_data/logs/marketCap_logfile.txt","a") as f:
        f.write(timestamp + ',' + message + '\n') 
    

## Running the ETL Process


Log the process accordingly using the following <code>"ETL Job Started"</code> and <code>"Extract phase Started"</code>


In [None]:
# Write your code here
log('ETL Job Started')
log('Extract phase Started')


### Extract


<code>Question 2</code> Use the function <code>extract</code>, and print the first 5 rows:


In [None]:
def extract():
    df = pd.DataFrame(columns=n_columns)
    df = extract_from_json('all_data/bank_market_cap_1.json')
    return df
# Call the function here
df = extract()
# Print the rows here
df.head()

Log the data as <code>"Extract phase Ended"</code>


In [None]:
# Write your code here
log('Extract phase Ended')

### Transform


Log the following  <code>"Transform phase Started"</code>


In [None]:
# Write your code here
log('Transform phase Started')

<code>Question 3</code> Use the function <code>transform</code> and print the first 5 rows of the output:


In [None]:
# Call the function here
df = transform(exchange_rate, 'all_data/bank_market_cap_1.json')
# Print the first 5 rows here
df.head()

Log your data <code>"Transform phase Ended"</code>


In [None]:
# Write your code here
log("Transform phase Ended")

### Load


Log the following `"Load phase Started"`.


In [None]:
# Write your code here
log("Load phase Started")

Call the load function


In [None]:
# Write your code here
load('all_data/output/bank_market_cap_gbp.csv', transform(exchange_rate, 'all_data/bank_market_cap_1.json'))

Log the following `"Load phase Ended"`.


In [None]:
# Write your code here
log("Load phase Ended")