2023WS_30310

#### Assignment 5
# The Performance of Numpy Versus Regular Python Lists when Computing a Loss Function

### Goal:
In this assignment, the advantages of numpy in performance compared to python-lists is shown with an example on a Loss function. Additionally, by using pandas, the understanding with the library is strengthen.

### Problem and Input Data:

#### Loss function$^{[1, 2]}$:

$$\Large Loss = \alpha * |R^{Pred.} - R^{Exp.}| + \beta * |E^{Pred.} - E^{Exp.}|$$

The Loss function will be used on some example wheather data from diffrent locations and days in Australia$^{[3, 4]}$. It is used to evaluate the quality of the prediction of the observed values.

---

### Imports
For the following tasks, some imported libraries will be used:
<li> numpy </li>
<li> pandas </li>
<li> timeit </li>

In [1]:
import numpy as np
import pandas as pd
import timeit

### Task 1
First of all, the given data will be stored in a singe dataframe. For this, two dataframes will be created and merged. Than, the fist lines will be displayed.

In [2]:
experiment = pd.read_csv(filepath_or_buffer = 'weather_experiment.csv')
prediction = pd.read_csv(filepath_or_buffer = 'weather_prediction.csv')

weather_data = pd.merge(experiment, prediction, on = ['Date', 'Location'])
weather_data.head()

Unnamed: 0,Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustSpeed,RainToday,RainTomorrow,Rainfall Pred.,Evaporation Pred.
0,2009-01-01,Cobar,17.9,35.2,0.0,12.0,12.3,48.0,No,No,7.098998,6.179719
1,2009-01-02,Cobar,18.4,28.9,0.0,14.8,13.0,37.0,No,No,1.433238,6.375806
2,2009-01-04,Cobar,19.4,37.6,0.0,10.8,10.6,46.0,No,No,0.914834,5.687946
3,2009-01-05,Cobar,21.9,38.4,0.0,11.4,12.2,31.0,No,No,5.285904,6.897139
4,2009-01-06,Cobar,24.2,41.0,0.0,11.2,8.4,35.0,No,No,0.993975,0.050364


### Task 2
Now, two functions will be defined which will calculate the Loss function. The first one uses python lists, the second one uses numpy.

#### Function with lists by python:

In [3]:
def loss_function_lists(alpha: float, beta: float, weather: pd.DataFrame) -> list:
    '''
    Calculates the loss function by using python lists

    Parameter:
        alpha (float): rainfall weighting factor (No Unit)
        beta (float): evaporation weighting factor (No Unit, float)
        weather (pd.DataFrame): experimental and predicted weather data 
                    
    Return: 
        List with loss values for each column
    '''
    if not isinstance(alpha, float):
        raise TypeError(f'The given value for alpha (i.e. {alpha}) must be a float type.')
    elif not isinstance(beta, float):
        raise TypeError(f'The given value for beta (i.e. {beta}) must be a float type.')
    elif not isinstance(weather, pd.DataFrame):
        raise TypeError(f'The given input for weather must be a Pandas DataFrame.')
    elif not (alpha + beta == 1):
        raise ValueError(f'The weighting factors have to sum up to 1.0 but are {alpha + beta}.')
    else:
        input = [
            list(weather['Rainfall Pred.']), 
            list(weather['Rainfall']), 
            list(weather['Evaporation Pred.']), 
            list(weather['Evaporation'])
        ]

        result = [alpha * abs(r_pred - r_exp) + beta * abs(e_pred - e_exp)
            for r_pred, r_exp, e_pred, e_exp in zip(*input)]
        
        return result

#### Function with numpy:

In [4]:
def loss_function_numpy(alpha: float, beta: float, weather: pd.DataFrame) -> np.array:
    '''
    Calculates the loss function by using python lists

    Parameter:
        alpha (float): rainfall weighting factor (No Unit)
        beta (float): evaporation weighting factor (No Unit, float)
        weather (pd.DataFrame): experimental and predicted weather data 
                    
    Return: 
        List with loss values for each column
    '''
    if not isinstance(alpha, float):
        raise TypeError(f'The given value for alpha (i.e. {alpha}) must be a float type.')
    elif not isinstance(beta, float):
        raise TypeError(f'The given value for beta (i.e. {beta}) must be a float type.')
    elif not isinstance(weather, pd.DataFrame):
        raise TypeError(f'The given input for weather must be a Pandas DataFrame.')
    elif not (alpha + beta == 1):
        raise ValueError(f'The weighting factors have to sum up to 1.0 but are {alpha + beta}.')
    else:
        r_pred = weather['Rainfall Pred.'].to_numpy()
        r_exp = weather['Rainfall'].to_numpy()
        e_pred = weather['Evaporation Pred.'].to_numpy()
        e_exp = weather['Evaporation'].to_numpy()

        result = alpha * np.abs(r_pred - r_exp) + beta * np.abs(e_pred - e_exp)
        
        return result

### Task 3
To compare the performance of the two functions, with the timeit-library, both functions will be executed 100 times and the time this takes is measured.

#### Measue the time using lists

In [5]:
loss_lists = timeit.timeit(
    lambda:loss_function_lists(
        alpha = 0.5, 
        beta = 0.5, 
        weather = weather_data
    ), number = 100
)

#### Measue the time using numpy

In [6]:
loss_numpy = timeit.timeit(
    lambda:loss_function_numpy(
        alpha = 0.5, 
        beta = 0.5, 
        weather = weather_data
    ), number = 100
)

#### Results:

In [7]:
print(f'Time using Python lists: {loss_lists}s')
print(f'Time using Numpy arrays: {loss_numpy}s\n')

print(f'Speedup factor: {(loss_lists / loss_numpy):.2f}x')

Time using Python lists: 2.217046100002335s
Time using Numpy arrays: 0.012894200001028366s

Speedup factor: 171.94x


By Using python, the calculation is a lot faster.

### References

[1] Wikipedia contributors, Loss Functions, updated on November 9, 2023. https://en.wikipedia.org/wiki/Loss_function. Online; accessed on Dezember 1, 2023. <br>
[2] Karl N. Kirschner. Scientific Programming with Python Assignment: The Performance of Numpy Versus Regular Python Lists when Computing a Loss Function. November 27, 2023. <br>
[3] Oswal, N. Predicting Rainfall using Machine Learning Techniques. arXiv, 2019 (https://arxiv.org/abs/1910.13872). <br>
[4] Joe Young and Adamyoung. Rain in Australia, Kaggle https://www.kaggle.com/datasets/jsphyg/weather-dataset-rattle-package?resource=download&select=weatherAUS.csv. Online; accessed on November 27, 2022. <br>