# <center>Assigment 5: The Performance of Numpy Versus Regular Python Lists when Cpmputing a Loss Function <h6> [^1] </h6> </center>  

###  SciPro_ID: 41265 
### Date : 27.11.2023

##### Note : Some References are in the Code as a comment.

##### Goal: In this assignment, we are going to use the numpy library, and we need to understand how it affects the performance of numerical calculations.  

---

##### Problem and Input Data: A machine learning model developed by weather researchers is capable of forecasting rainfall and evaporation for various locations in Australia on different days. The data obtained from both the experimental observations and the model predictions are presented in Table 1. The observables are described by the following : 

* Date- The observation date.
* Location - The weather station location.
* MinTemp - The minimum temperature (°C).
* MaxTemp - The maximum temperature (°C).
* Rainfall - The rainfall amount in 24 hours (mm).
* Evaporation - The evaporation amount in 24 hours (mm).
* Sunshine - The sunshine amount in 24 hours (h).
* WindGustSpeed - The maximum wind gust speed in 24 hours (h).
* RainToday - Did it rain on that day= yes: if precipitation >= 1 mm, no:if precipitation < 1 mm.
* RainTomorrow - Did it rain in the following day? yes: if precipitation >= 1mm, no: if precipitation < 1mm. 3

##### This function is for the performance of the model to predict the rainfall and evaporation for each date and location:

 $$\LARGE Loss = \alpha * | R^{Pred} - R^{Exp} | + \beta * | E^{Pred} - E^{Exp}| [^2] $$

 * $\alpha$ is the rainfall weighting factor.
 * $\beta$ is evaporation weighting factor.
 * $R^{Pred}$ and $R^{Exp}$ are the predicted and experimental rainfall values.
 * $E^{Pred}$ and $E^{Exp}$ are the corresponding evaporation values.

---

# Task1  
### Read in the data  contained in **weather_experiment.csv** and **weather.prediction.csv**.

In [1]:
import pandas as pd

# You need to download the csv given from Mr Kirschner and put it in your local jupyter notebook folder.
# Also do not forget to download the libraries locally at your python folder. So that you can use pandas and numpy.
#[^5]

df = pd.read_csv("weather_experiment.csv")
df2 = pd.read_csv("weather_prediction.csv")

In [2]:
df

Unnamed: 0,Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine,WindGustSpeed,RainToday,RainTomorrow
0,2009-01-01,Cobar,17.9,35.2,0.0,12.0,12.3,48.0,No,No
1,2009-01-02,Cobar,18.4,28.9,0.0,14.8,13.0,37.0,No,No
2,2009-01-04,Cobar,19.4,37.6,0.0,10.8,10.6,46.0,No,No
3,2009-01-05,Cobar,21.9,38.4,0.0,11.4,12.2,31.0,No,No
4,2009-01-06,Cobar,24.2,41.0,0.0,11.2,8.4,35.0,No,No
...,...,...,...,...,...,...,...,...,...,...
55242,2017-06-20,Darwin,19.3,33.4,0.0,6.0,11.0,35.0,No,No
55243,2017-06-21,Darwin,21.2,32.6,0.0,7.6,8.6,37.0,No,No
55244,2017-06-22,Darwin,20.7,32.8,0.0,5.6,11.0,33.0,No,No
55245,2017-06-23,Darwin,19.5,31.8,0.0,6.2,10.6,26.0,No,No


In [3]:
df2

Unnamed: 0,Date,Location,Rainfall Pred.,Evaporation Pred.
0,2009-01-01,Cobar,7.098998,6.179719
1,2009-01-02,Cobar,1.433238,6.375806
2,2009-01-04,Cobar,0.914834,5.687946
3,2009-01-05,Cobar,5.285904,6.897139
4,2009-01-06,Cobar,0.993975,0.050364
...,...,...,...,...
55242,2017-06-20,Darwin,5.693780,3.400099
55243,2017-06-21,Darwin,1.548031,1.696780
55244,2017-06-22,Darwin,1.516136,2.945245
55245,2017-06-23,Darwin,1.158509,4.960711


---

# Task 2 
### Creathe user-defined functions that encodes and computes the loss function(Equation1), which:

1. perfoms the calculation using regular Python lists(i.e. **do not use Numpy or ndarrays**), and
2. performs the calculation using Numpy (i.e **maximizing the use of Numpy's library and perfomance**).



### Python Lists Implementation:

In [9]:
def calculate_loss_python_list(R_pred_list, R_exp_list, E_pred_list, E_exp_list, alpha, beta):
    
    # Calculate absolute differences
    abs_diff_rainfall = [abs(rp - re) for rp, re in zip(R_pred_list, R_exp_list)]
    abs_diff_evaporation = [abs(ep - ee) for ep, ee in zip(E_pred_list, E_exp_list)]

    # Calculate the weighted sum
    loss = [alpha * r + beta * e for r, e in zip(abs_diff_rainfall, abs_diff_evaporation)]

    return loss

### Numpy Implementation:

In [10]:
import numpy as np #[^4]

def calculate_loss_numpy_array(R_pred_array, R_exp_array, E_pred_array, E_exp_array, alpha, beta):
    
    # Calculate absolute differences using NumPy
    abs_diff_rainfall = np.abs(R_pred_array - R_exp_array)
    abs_diff_evaporation = np.abs(E_pred_array - E_exp_array)

    # Calculate the weighted sum using NumPy
    loss = alpha * abs_diff_rainfall + beta * abs_diff_evaporation

    return loss

---

# Task 3 
##### Evaluate the speed performance between your Task 2 functions by computing the loss value for when $\alpha$ = $\beta$ = 0.5. You will use the timeit library ( i.e. timeit.timeit) for this, and assign its "number" parameter to 100.


In [8]:
import pandas as pd
import numpy as np
import timeit

#[^4], [^5]

# Extract relevant columns from the DataFrames
R_exp_list = df['Rainfall'].tolist()
R_pred_list = df2 ['Rainfall Pred.'].tolist()
E_exp_list = df['Evaporation'].tolist()
E_pred_list = df2['Evaporation Pred.'].tolist()
alpha = 0.5
beta = 0.5

# Convert lists to arrays for the NumPy version
R_exp_array = np.array(R_exp_list)
R_pred_array = np.array(R_pred_list)
E_exp_array = np.array(E_exp_list)
E_pred_array = np.array(E_pred_list)

# Function using regular Python lists
time_python_list = timeit.timeit(lambda: calculate_loss_python_list(R_pred_list, R_exp_list, E_pred_list, E_exp_list, alpha, beta),number=100)

# Function using NumPy arrays
time_numpy_array = timeit.timeit(lambda: calculate_loss_numpy_array(R_pred_array, R_exp_array, E_pred_array, E_exp_array, alpha, beta),number=100)


# Print the execution times
print(f"Execution time using regular Python lists: {time_python_list} seconds")
print(f"Execution time using NumPy arrays: {time_numpy_array} seconds")

# Compare the speedup
speedup_percentage = ((time_python_list - time_numpy_array) / time_python_list) * 100
print(f"NumPy version is {speedup_percentage}% faster than the regular Python version.")



Execution time using regular Python lists: 1.9461510999826714 seconds
Execution time using NumPy arrays: 0.029745399951934814 seconds
NumPy version is 98.47157808290426% faster than the regular Python version.


### Conclusion:
In this comparison, the NumPy implementation of the loss function outperformed the regular Python lists version, exhibiting a significant speedup of [percentage]% across multiple iterations. NumPy's efficient array operations contribute to improved performance in scientific computations..

---

**References:**
1. Adam-P (Mar 27, 2022) Markdown cheatsheet, GitHub. Available at: https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet#tables (Accessed: 28 October 2023). 
2. Macchia, M. (Nov 26, 2020) Quick Reference for AMS-LaTeX, Github. Available at: https://github.com/manuelemacchia/math-latex/blob/master/amsmath.pdf (Accessed: 28 October 2023). 
3. Lennard-Jones potential (2023) Wikipedia. Available at: https://en.wikipedia.org/wiki/Lennard-Jones_potential (Accessed: 28 October 2023). 
4. Kirschner, Karl N (2020) Numpy, Github. Available at: https://github.com/karlkirschner/Scientific_Programming_Course/blob/master/numpy.ipynb (Accessed: 03 December 2023).
5. Kirschner, K.N. (2020) Pandas. Available at: https://github.com/karlkirschner/Scientific_Programming_Course/blob/master/pandas.ipynb (Accessed: 26 November 2023). 