# Md Khaled Mahmud Shujon

###### Goal: The main goal of this assignment is Examining how the use of Numpy [1] effects the performance of numerical calculations, creating a user-defined function using regular Python [2] lists and another user-defined function using Numpy, Furthermore, loading and preparing datasets using Pandas [3], and evaluating performance by computing the RMSE (Root Mean Square Error) of simulated water density. 


NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more [4].

In [9]:
import numpy as np

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language [5].

In [10]:
import pandas as pd

### Timeit library
- timeit, https://docs.python.org/3/library/timeit.html

In [11]:
import timeit

### Root-Mean-Squared-Error (RMSE)

$\sqrt{{\displaystyle \frac{1}{n}{\sum_{i=1}^n (X_i(Calculated) - Y(Experiment))^2}}}$

* $n$ = number of comparisons made

* $X_i(Calculated)$ = the density values obtained from the simulations
* $Y(Experiment)$ = the experimental density data value.

 - Experimental density for water is
0.995659 $g/cm^3$ at 30.0 °C [6]

### Task 1:
        Create the following two user-defined function that encodes the RMSE equation.
        a) A user-defined function that is written using regular     
           Python lists(i.e. do not use Numpy/Pands)
        b) A user defined function that is written to maximizes  
           the equation's performance by using Numpy and its fuctions.

In [12]:
# User-defined function that is written using regular Python lists

def calculate_rmse_list(simulated_list, experimental_values: float=0.995659) -> float:
    """
    calculates the root-mean-squared error (RMSE) between a list of simulated values and an experimental value.
    input: list - Python regular list
           float - experimental values
           
    return: Result -RMSE
    """
    
    n = len(simulated_list)
    sum = 0.0
    
    for x in simulated_list:
        diff = x - experimental_values
        squared_diff = diff ** 2
        sum += squared_diff
    
    mean_sum = sum / n
    rmse = mean_sum ** (1/2)
    
    return rmse


### Numpy built-in functions
- numpy.array, https://numpy.org/doc/stable/reference/generated/numpy.array.html
- numpy.mean, https://numpy.org/doc/stable/reference/generated/numpy.mean.html
- numpy.sqrt, https://numpy.org/doc/stable/reference/generated/numpy.sqrt.html

In [13]:
# User-defined function that is written using Numpy and its fuctions    

def calculate_rmse_numpy(simulated_arry, experimental_values: float=0.995659) -> float:
    """
    calculates the root-mean-squared error (RMSE) between a list of simulated values and an experimental value.
    input: arry - numpy array
           float - experimental values
           
    return: Result - RMSE
    """
    
    squared_diff = (np.array(simulated_arry) - experimental_values) ** 2
    mean_squared_diff = np.mean(squared_diff)
    rmse = np.sqrt(mean_squared_diff)
    
    return rmse


### Task 2:
        Read in the data contained in the CSV-formatted file, and compute the RMSE of the simulated water density  
        with respect to the experimental value using each of the functions.



In [14]:
# Reading CSV file from source
density_df = pd.read_csv('density.csv',header=0, sep=',')

# Removing rows with missing data
density_df = density_df.dropna()

# Creating a list of simulated values
simulated_list = density_df['Density'].tolist()

# Creating a arry of simulated values
simulated_arry = density_df['Density'].to_numpy()
    

# Computing RMSE using the Regular Python list function
rmse_list = calculate_rmse_list(simulated_list)
    
# Computing RMSE using the NumPy function
rmse_numpy = calculate_rmse_numpy(simulated_arry)
    

print("RMSE (List):", rmse_list)
print("RMSE (NumPy): ", rmse_numpy)


RMSE (List): 0.021091780857179263
RMSE (NumPy):  0.02109178085717924


### Task 3: 
        Evaluate the speed performance between two functions (Task 1) by computing the RMSE of the simulated water  
        density. Use the timeit libarary(i.e.timeit.timeit) for this, and assign its "number" parameter to 20,000.

### Lambda functions
- lambda-expressions, https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions

In [15]:
# Measuring the time taken by the Regular Python list function
time_regular = timeit.timeit(lambda:calculate_rmse_list(simulated_list), number=20000)

# Measuring the time taken by the NumPy function
time_numpy = timeit.timeit(lambda:calculate_rmse_numpy(simulated_arry), number=20000)

print(f"Time taken by the Regular function: {time_regular}")
print(f"Time taken by the NumPy efunction: {time_numpy}")


Time taken by the Regular function: 1.122839292002027
Time taken by the NumPy efunction: 0.07585358300275402


### Additonal solution using time library
- time, https://docs.python.org/3/library/time.html
- underscore, Role of Underscore(_) in Python, https://www.datacamp.com/tutorial/role-underscore-python)

In [16]:
import time

# Regular Python List
start_time = time.process_time()

for _ in range(20000):
    calculate_rmse_list(simulated_list)

stop_time = time.process_time()

print(f"Regular Python list timing: {stop_time - start_time:0.2f} seconds")


# Numpy Array
start_time = time.process_time()

for _ in range(20000):
    calculate_rmse_numpy(simulated_arry)

stop_time = time.process_time()

print(f"Numpy timing: {stop_time - start_time:0.2f} seconds")

Regular Python list timing: 1.13 seconds
Numpy timing: 0.07 seconds


## References

[1] Harris, C.R., Millman, K.J., van der Walt, S.J. et al. Array programming with NumPy. Nature, 585 (2020) 357-362 (DOI: 10.1038 /s41586-020-2649-2)

[2] Python Software Foundation. Python Language Reference, version 3.11. Available at http://www.python.org, Accessed on 18 June 2023.

[3] Pandas user guide, https://pandas.pydata.org/docs/user_guide/index.html#user-guide, Accessed on 18 June 2023.

[4] NumPy user guide., https://numpy.org/doc/stable/user/whatisnumpy.html, Accessed on 19 June 2023.

[5] The pandas development team, pandas-dev/pandas: Pandas, Zenodo, 2020 (https://pandas.pydata.org/)

[6] M. Vedamuthu, 8. Singh, and G.W. Robinson. Properties of liquid water: origin of the density anomalies. The Journal of Physical Chemistry, 98 (1994): 2222-2230.