#### Scientific Programming with Python
## The Performance of NumPy versus Regular Python Lists

A machine learning model, created by weather scientists, predicts rainfall and evaporation on different days at different locations in Australia. To check its accuracy, we combine the data within the CSV-formatted files `weather_experiment.csv` and `weather_prediction.csv` [1, 2] with the following loss function [3], which is used to improve results concerning machine learning models [4]:

$$\Large Loss = \alpha  \ast \mid R^{Pred.} - R^{Exp.} \mid  + \beta \mid E^{Pred.} - E^{Exp.} \mid \tag{1}$$

where
 - $\alpha$ stands for the rainfall weighting factor
 - $\beta$ stands for the evaporation weighting factor
 - all weighting factors should sum togehter to a value of 1.0
 - $R^{Pred.}$ and $R^{Exp.}$ stand for the predicted and experimental rainfall values
 - $E^{Pred.}$ and $E^{Exp.}$ stand for the corresponding evaporation values
   [3]

Before we can start, we need to import all necessary libraries first.

In [1]:
import numpy as np
import pandas as pd
import timeit

### Import Data

First we need to import both CSV-formatted files.

In [2]:
data_exp  = pd.read_csv('weather_experiment.csv', sep=',')
data_pred = pd.read_csv('weather_prediction.csv', sep=',')

In [3]:
data_exp

Unnamed: 0,Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustSpeed,RainToday,RainTomorrow
0,2009-01-01,Cobar,17.9,35.2,0.0,12.0,12.3,48.0,No,No
1,2009-01-02,Cobar,18.4,28.9,0.0,14.8,13.0,37.0,No,No
2,2009-01-04,Cobar,19.4,37.6,0.0,10.8,10.6,46.0,No,No
3,2009-01-05,Cobar,21.9,38.4,0.0,11.4,12.2,31.0,No,No
4,2009-01-06,Cobar,24.2,41.0,0.0,11.2,8.4,35.0,No,No
...,...,...,...,...,...,...,...,...,...,...
55242,2017-06-20,Darwin,19.3,33.4,0.0,6.0,11.0,35.0,No,No
55243,2017-06-21,Darwin,21.2,32.6,0.0,7.6,8.6,37.0,No,No
55244,2017-06-22,Darwin,20.7,32.8,0.0,5.6,11.0,33.0,No,No
55245,2017-06-23,Darwin,19.5,31.8,0.0,6.2,10.6,26.0,No,No


In [4]:
data_pred

Unnamed: 0,Date,Location,Rainfall Pred.,Evaporation Pred.
0,2009-01-01,Cobar,7.098998,6.179719
1,2009-01-02,Cobar,1.433238,6.375806
2,2009-01-04,Cobar,0.914834,5.687946
3,2009-01-05,Cobar,5.285904,6.897139
4,2009-01-06,Cobar,0.993975,0.050364
...,...,...,...,...
55242,2017-06-20,Darwin,5.693780,3.400099
55243,2017-06-21,Darwin,1.548031,1.696780
55244,2017-06-22,Darwin,1.516136,2.945245
55245,2017-06-23,Darwin,1.158509,4.960711


### Implement function

To compare both datasets, we can use the loss function implemented in two ways.

The first implementation uses regular Python lists only.

In [5]:
def loss_func_python(alpha, r_pred, r_exp, beta, e_pred, e_exp):
    '''
    calculates the loss function using python lists
    :param alpha:  float, rainfall weighting factor,    no units
    :param r_pred: list,  rainfall prediction,          values in mm
    :param r_exp:  list,  rainfall experiment,          values in mm
    :param beta:   float, evaporation weighting factor, no units
    :param e_pred: list,  evaporation prediction,       values in mm
    :param e_exp:  list,  evaporation experiment,       values in mm
    :return:       list,  loss value,                   no units
    '''
    result = []
    
    for i in range(len(r_pred)):
        rainfall = alpha * abs(r_pred[i] * r_exp[i])
        evaporation = beta * abs(e_pred[i] * e_exp[i])
        total = rainfall + evaporation

        result.append(total)
                      
    return result

The second implementation uses Numpy.

In [6]:
def loss_func_numpy(alpha, r_pred, r_exp, beta, e_pred, e_exp):
    '''
    calculates the loss function using numpy
    :param alpha:  float, rainfall weighting factor,    no units
    :param r_pred: array, rainfall prediction,          values in mm
    :param r_exp:  array, rainfall experiment,          value in mm
    :param beta:   float, evaporation weighting factor, no units
    :param e_pred: array, evaporation prediction,       value in mm
    :param e_exp:  array, evaporation experiment,       value in mm
    :return:       float, loss value,                   no units
    '''
    
    return (alpha * np.abs(r_pred - r_exp)) + (beta * np.abs(e_pred - e_exp))

### Evaluate speed performance

Lastly, we want to compare the speed performance of both implementations. For this purpose, we use $\alpha = \beta = 0.5$ as weighting factors and the timeit library.

We can extract the data needed into several smaller datasets for easier handle.

In [7]:
py_r_exp  = data_exp['Rainfall']
py_r_pred = data_pred['Rainfall Pred.']
py_e_exp  = data_exp['Evaporation']
py_e_pred = data_pred['Evaporation Pred.']

np_r_exp  = np.array(py_r_exp)
np_r_pred = np.array(py_r_pred)
np_e_exp  = np.array(py_e_exp)
np_e_pred = np.array(py_e_pred)

Now we can use the timeit library to time both functions, running them 100 times.

In [None]:
time_py = timeit.timeit(lambda: loss_func_python(0.5, py_r_pred, py_r_exp, 0.5, py_e_pred, py_e_exp), number = 100)

print(f'Time taken to compute loss function using python lists: {time_py} seconds')

In [None]:
time_np  = timeit.timeit(lambda: loss_func_numpy(0.5, np_r_pred, np_r_exp, 0.5, np_e_pred, np_e_exp), number = 100)

print(f'Time taken to compute loss function using NumPy: {time_np} seconds')

Given the times timeit provides,

In [None]:
time_percent = ((time_py - time_np) / time_py) * 100

print(f'NumPy is {time_percent}% faster than Python lists.')

### References

[1] N. OSWAL: **Predicting Rainfall using Machine Learning Techniques**, arXiv (2019), https://arxiv.org/abs/1910.13827, Accessed on November 29th, 2023. 

[2] J. YOUNG, ADAMYOUNG: **Rain in Australia**, Kaggle https://kaggle.com/datasets/jsphyg/weather-dataset-rattle-package?resource=download&select=weatherAUS.csv, Accessed on November 29th, 2023.

[3] WIKIPEDIA CONTRIBUTORS: **Loss function**, Last edited on November 9th, 2023. https://en.wikipedia.org/wiki/Loss_function, Accessed on November 29th, 2023.

[4] DATAROBOT: **Intoduction to Loss Functions**, Last edited on March 26th, 2021. https://www.datarobot.com/blog/introduction-to-loss-functions/, Accessed on November 29th, 2023. 