<p style="font-size:24px;">Python ETL for Currency Exchange Data</p>


**Description:**  
This notebook demonstrates a Python-based ETL pipeline to retrieve, process, and merge currency exchange rate data. It includes functions to:

- Fetch the latest exchange rates from an API.  
- Retrieve historic exchange rate data based on a configurable number of days in the past.  
- Merge the latest and historic datasets while handling duplicates.  
- Convert currency rates to GBP using vectorized operations for improved performance and optimized error handling.

The file serves as a practical example for data engineering tasks, showcasing API integration, data transformation with pandas, and effective error handling strategies.

In [17]:

# We will be using API from https://exchangeratesapi.io/ 
# Kindly use this link and sign up to get your free API key 

In [18]:
# What are APIs in data engineering ? 
# APIs (Application Programming Interfaces) in data engineering provide 
# a standardized way for systems to communicate and exchange data. 
# They are often used to fetch, update, or push data between various 
# software systems, making it easier to integrate and automate 
# data workflows such as ETL processes.


In [19]:
# we will import some libraries 
import pandas as pd # for manipulating data 
import requests #  It enables you to interact easily with web APIs 
import json # is used for parsing and creating JSON-formatted data.
from datetime import datetime, timedelta # for handling date and time operations 

In [20]:
# this function fetches the latest exchange rate data from the API 

def latest_extraction():
    """
    Fetches the latest exchange rate data from the API and returns it as a pandas DataFrame.
    """
    url = "http://api.exchangeratesapi.io/v1/latest?access_key=6e45c8ce217f5beca6893ac6968c5c2f"
    # sends an HTTP get request to the API end point 
    response = requests.get(url)
    response.raise_for_status()  # Raise an error if the request failed
    # converts the response from a Json string to a python dictionary 
    data = response.json()
    # converts dictionary into a dataframe 
    df = pd.DataFrame(data["rates"].items(), columns=["currency", "rate"])
    # adds a new column to the dataframe with the date of the exchange rate
    df["date"] = data["date"]
    return df

# Step 1: Extract latest exchange rates and display the DataFrame
print("Step 1: Extracting Latest Exchange Rates")
latest_df = latest_extraction()
display(latest_df.head(10))


Step 1: Extracting Latest Exchange Rates


Unnamed: 0,currency,rate,date
0,AED,4.177106,2025-04-21
1,AFN,81.88165,2025-04-21
2,ALL,99.252011,2025-04-21
3,AMD,444.591336,2025-04-21
4,ANG,2.049629,2025-04-21
5,AOA,1037.159477,2025-04-21
6,ARS,1294.140508,2025-04-21
7,AUD,1.780172,2025-04-21
8,AWG,2.047025,2025-04-21
9,AZN,1.929509,2025-04-21


In [21]:
# this function historic exchange rates from the API and returns it as a pandas DataFrame

def historic_extraction(n_days=10):
    """
    Fetches exchange rate data from n_days ago from the API and returns it as a pandas DataFrame.
    
    Parameters:
        n_days (int): Number of days ago for which to fetch historic exchange rates.
        
    Returns:
        pd.DataFrame: A DataFrame containing currency rates and the corresponding date.
    """
    # Calculate the date n_days ago and format it as 'YYYY-MM-DD'
    date_n_days_ago = (datetime.now() - timedelta(days=n_days)).strftime('%Y-%m-%d')
    # Construct the API URL using the calculated date
    url = f"http://api.exchangeratesapi.io/v1/{date_n_days_ago}?access_key=6e45c8ce217f5beca6893ac6968c5c2f"
    # Send an HTTP GET request to the API endpoint
    response = requests.get(url)
    # Raise an error if the HTTP request failed (e.g., 404, 500)
    response.raise_for_status() 
     # Parse the JSON response into a Python dictionary
    data = response.json()
    # Convert the rates dictionary from the API response into a pandas DataFrame,
    # where each item becomes a row with two columns: 'currency' and 'rate'
    df = pd.DataFrame(data["rates"].items(), columns=["currency", "rate"])
    # Add a new column 'date' to the DataFrame with the date included in the API response
    df["date"] = data["date"]
    
    return df

# Step 2: Extract historic exchange rates (10 days ago by default) and display the DataFrame
print("\nStep 2: Extracting Historic Exchange Rates")
historic_df = historic_extraction(n_days=10)
display(historic_df.head(10))



Step 2: Extracting Historic Exchange Rates


Unnamed: 0,currency,rate,date
0,AED,4.172469,2025-04-11
1,AFN,81.226466,2025-04-11
2,ALL,100.310777,2025-04-11
3,AMD,444.244667,2025-04-11
4,ANG,2.03356,2025-04-11
5,AOA,1042.821867,2025-04-11
6,ARS,1220.13733,2025-04-11
7,AUD,1.807145,2025-04-11
8,AWG,2.044748,2025-04-11
9,AZN,1.935661,2025-04-11


In [22]:
# this function merges the latest and historic exchange rate data 

def merge_latest_and_historic(latest_df, historic_df):
    # Assign the latest and historic DataFrames to local variables (optional, for clarity)
    df1 = latest_df
    df2 = historic_df

    # Combine both DataFrames into a list and concatenate them,
    # resetting the index in the combined DataFrame
    frames = [df1, df2]
    result = pd.concat(frames, ignore_index=True)
    # Sort the combined DataFrame by the 'currency' column in ascending order 
    # and reset the index after sorting
    exhange_rates = result.sort_values(by='currency', ascending=True, ignore_index=True)  # noqa: E501
    # Drop any duplicate records from the DataFrame (if any)
    # keep=False means drop all duplicates, not just later occurrences
    exhange_rates = exhange_rates.drop_duplicates(keep=False)

    return exhange_rates

    # Step 3: Merge the latest and historic data and display the merged data frame
print("\nStep 3: Merging Latest and Historic Data")
merged_df = merge_latest_and_historic(latest_df, historic_df) 
display(merged_df.head(10))



Step 3: Merging Latest and Historic Data


Unnamed: 0,currency,rate,date
0,AED,4.177106,2025-04-21
1,AED,4.172469,2025-04-11
2,AFN,81.226466,2025-04-11
3,AFN,81.88165,2025-04-21
4,ALL,99.252011,2025-04-21
5,ALL,100.310777,2025-04-11
6,AMD,444.591336,2025-04-21
7,AMD,444.244667,2025-04-11
8,ANG,2.049629,2025-04-21
9,ANG,2.03356,2025-04-11


In [23]:

# this function converts currency rates into GBP 
def add_gbp_conversion(df):
    """
    Adds a new column 'rate_in_GBP' to the DataFrame by converting each currency's rate into GBP.
    The conversion is done by dividing each rate by the GBP rate for the corresponding date.
    
    Parameters:
        df (pd.DataFrame): Merged DataFrame with columns 'currency', 'rate', and 'date'.
        
    Returns:
        pd.DataFrame: The input DataFrame with an additional 'rate_in_GBP' column.
    """
    # Create a Series that maps each date to its GBP rate.
    # First, filter rows where the currency equals "GBP".
    # Drop duplicate rows for a given date in case more than one exists.
    # Then, set the 'date' column as the index and extract the 'rate' column.
    gbp_rates = (
        df.loc[df["currency"] == "GBP", ["date", "rate"]]
        .drop_duplicates(subset="date")
        .set_index("date")["rate"]
    )
    
    # Define an inner function that performs the conversion for a single row. 
    def convert_row(row):
        # Get the GBP rate corresponding to the rows date. 
        gbp_rate = gbp_rates.get(row["date"])
        # if a GBP rate is found, convert the rate; otherwise return None. 
        return row["rate"] / gbp_rate if gbp_rate else None
    
    # Apply the conversion function to each row in the dataframe
    # which creates a new column 'rate in GBP' with the converted values. 
    df["rate_in_GBP"] = df.apply(convert_row, axis=1)
    return df

# print message 
print("\nAdding 'rate_in_GBP' column to the merged DataFrame")
# apply the function to the merged dataframe to add the new column 
merged_df = add_gbp_conversion(merged_df)
# display the first 10 rows of the updated dataframe 
display(merged_df.head(10))




Adding 'rate_in_GBP' column to the merged DataFrame


Unnamed: 0,currency,rate,date,rate_in_GBP
0,AED,4.177106,2025-04-21,4.872465
1,AED,4.172469,2025-04-11,4.80397
2,AFN,81.226466,2025-04-11,93.520051
3,AFN,81.88165,2025-04-21,95.512418
4,ALL,99.252011,2025-04-21,115.774408
5,ALL,100.310777,2025-04-11,115.492763
6,AMD,444.591336,2025-04-21,518.602075
7,AMD,444.244667,2025-04-11,511.480874
8,ANG,2.049629,2025-04-21,2.390829
9,ANG,2.03356,2025-04-11,2.341338


In [24]:
# this function loads the final dataframe into a CSV file 

def load_data(df, output_file='final_exchange_rates.csv', index=False):
    """
    Saves the DataFrame to a CSV file.
    
    Parameters:
        df (pd.DataFrame): The DataFrame to be saved.
        output_file (str): The name or path to the CSV file. Defaults to 'final_exchange_rates.csv'.
        index (bool): If True, the DataFrame index will be written to the file. Defaults to False.
    
    Returns:
        None
    """
    df.to_csv(output_file, index=index)
    print(f"Data successfully saved to {output_file}")

# Example usage: Save merged_df to a CSV file
load_data(merged_df)

Data successfully saved to final_exchange_rates.csv
