# Exercise 32: An Introduction to Scientific Computing in Python

## Objective
This notebook introduces the fundamentals of scientific computing using Python, focusing on its relevance to chemistry research.

# 1. Introduction
Scientific computing involves using computational methods to analyze data, simulate processes, and visualize results. Python, with its powerful libraries, offers a robust environment for scientific research, including chemistry. In this notebook, we'll explore the basics of scientific computing and understand how Python can support your research projects.

### Why Python for Chemists?
- Easy to learn and use.
- Powerful libraries for data analysis and visualization.
- Widely used in academia and industry.
- Ideal for automating repetitive tasks and managing experimental data.

# 2. Setting Up the Environment
Before we begin, ensure you have the following libraries installed.

**NOTE - if you are looking at this notebook from a university PC, the chances are you can skip this step. That's why the one line of code starts with a `#`, to comment out the active line that would otherwise attempt to install these Python libraries.**

In [None]:
#!pip install numpy pandas matplotlib

# 3. Scientific Computing Libraries Overview
We'll introduce three essential libraries for scientific computing in Python and demonstrate their use:
- **NumPy** for numerical operations.
- **Pandas** for data manipulation.
- **Matplotlib** for data visualization.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# 4. (But First...a Basic Python Refresher)
We'll start with a quick recap of essential Python concepts that will be useful for scientific computing.

### Defining different variable types

In [None]:
# Defining variables with different data types
molecular_weight_H2O = 18.015  # float
chemical_name = "Water"  # string
num_molecules = 6.022e23  # scientific notation

print(f"Molecular weight of {chemical_name}: {molecular_weight_H2O} g/mol")

### Using variables as placeholders for simple arithmetic

In [None]:
# Simple stoichiometry example
moles = 2  # mol
mass = moles * molecular_weight_H2O
print(f"Mass of {moles} mol of {chemical_name}: {mass} g")

### Using the `list` data type

In [None]:
# Example of handling experimental data
concentrations = [0.1, 0.2, 0.3, 0.4, 0.5]  # mol/L
print("Concentration data:", concentrations)
print()
print("The fourth item on the list is:", concentrations[3])

### Using the `for` loop

In [None]:
for i in concentrations:
    print(f"The current value of 'i' is: {i} \n")

### Using `enumerate` with the `for` loop

In [None]:
#Use a for loop to cycle through the concentrations list, and partner each item on the list with an index, starting from 0.

for i,j in enumerate(concentrations):
    print(f"The current index position on the list is: ###")
    print(f"The current value of 'i' is: ### \n")
    

### Using dictionaries to store experimental data

In [None]:
# Example of using a dictionary to store experimental data
decay_rates = {'Time': [0,1,2,3,4,5,6,7,8,9,10,11,12], 'A': [100,74,55,41,30,22,17,12,9,7,5,4,3]}

print("Data type = ", type(decay_rates)
print()



In [None]:
# Print the time values from the dictionary, by references the 'Time' keyword to print the values associated with that keyword.
print("The time values in the dictionary are: ", decay_rates['Time'])

# Print the 'A' values from the dictionary, by references the 'A' keyword to print the values associated with that keyword.
###

# 5. Using `numpy` arrays

### **Introduction to NumPy Arrays**

**NumPy arrays** are a core feature of the **NumPy** library, providing a powerful and efficient way to store and manipulate large datasets in Python. Unlike Python's built-in lists, NumPy arrays support advanced mathematical operations, efficient memory usage, and fast performance, making them ideal for scientific computing.

#### **Why Use NumPy Arrays in Scientific Computing?**

1. **Performance:** NumPy arrays are implemented in C, which makes array operations significantly faster compared to standard Python lists, especially for large datasets.

2. **Memory Efficiency:** Arrays are stored as contiguous blocks of memory, allowing for faster data access and manipulation. This is crucial when handling large experimental datasets in chemistry.

3. **Vectorized Operations:** Instead of looping through elements manually, NumPy allows for operations to be applied to entire arrays simultaneously (e.g., multiplying an array by a scalar). This approach not only improves code readability but also enhances performance.

4. **Mathematical Functions:** NumPy provides a broad range of mathematical functions (e.g., linear algebra, statistical operations) that operate directly on arrays.

5. **Multidimensional Data Handling:** NumPy arrays can be 1D, 2D, or even higher dimensions, which is useful for representing complex data structures like matrices, time series, or multidimensional datasets from chemical experiments.

#### **Example Use Cases in Chemistry:**
- **Storing Time Series Data:** Storing concentration or reaction rate data over time.
- **Simulations:** Running kinetic or thermodynamic simulations where large numerical datasets need to be processed efficiently.
- **Data Analysis:** Applying mathematical and statistical operations to datasets, such as calculating average molecular weights or analyzing spectroscopy data.

In this notebook, we are exploring how to create and manipulate NumPy arrays and see why they are a powerful tool for scientific research and chemical data analysis.


In [None]:
# Convert the dictionary into a NumPy array
# First, convert the dictionary values to a list of lists and then to an array
#Code lines 11 - 16 are deliberating spread across more lines that you are used to seeing so that you can more easily see what a 'list of lists' means).
decay_array = np.array(
    [
        decay_rates['Time'], 
     decay_rates['A']
    ]
)

# Display the array
print("Decay Array:\n", decay_array)



### Printing out the properties of your `numpy` array

In [None]:
# Print out properties of the NumPy array
print("\nArray Properties:")
print("Shape:", decay_array.shape)       # Shape of the array (rows, columns)
print("Size:", decay_array.size)         # Total number of elements
print("Dimensions:", decay_array.ndim)   # Number of dimensions (e.g., 1D, 2D)
print("Data Type:", decay_array.dtype)   # Data type of array elements
print("Memory Size (bytes):", decay_array.nbytes)  # Memory size of the array


### Using NumPy for Simple Calculations

In [None]:
# Generate an array of concentrations and perform arithmetic operations
concentrations_np = np.array([0.1, 0.2, 0.3, 0.4, 0.5])
doubled_concentrations = concentrations_np * 2
print("Original concentrations:", concentrations)
print("Doubled concentrations:", doubled_concentrations)

### What data type?

#### Using `numpy` might look like it produces a `list` but that's not what is happening...

In [None]:
#Show what data type the above variables are being stored as

print("The concentrations variable is data type:", type(concentrations))

print("The concentrations_np variable is data type:", type(concentrations_np))


# 6. Using Pandas to Create and Display a Simple DataFrame

### **Introduction to Pandas**

**Pandas** is a powerful open-source data analysis and manipulation library for Python. It is particularly well-suited for handling structured data, such as tables or time series data, and offers functionality similar to spreadsheet tools like Excel but with much greater flexibility and performance. 

#### **Why Use Pandas in Scientific Computing?**

1. **Data Structures:** Pandas provides two primary data structures:
   - **Series:** A one-dimensional array with labeled indices, perfect for single columns of data.
   - **DataFrame:** A two-dimensional, tabular data structure with labeled rows and columns, akin to a spreadsheet or SQL table.

2. **Data Import and Export:** Pandas makes it easy to import data from various formats, including:
   - **CSV files:** Common for experimental data export (`pd.read_csv()`).
   - **Excel files:** For laboratory data and results (`pd.read_excel()`).
   - **Other formats:** Including JSON, SQL databases, and more.

3. **Data Cleaning:** Tools for handling missing values, filtering data, and transforming datasets. This is invaluable when working with experimental data that may require preprocessing before analysis.

4. **Data Analysis:** Supports descriptive statistics, grouping data, and applying mathematical operations directly to DataFrames. This is useful for analyzing experimental results or processing large datasets.

5. **Data Visualization:** Pandas integrates well with visualization libraries like **Matplotlib**, allowing for quick and easy generation of plots and graphs directly from DataFrames.

#### **Example Use Cases in Chemistry:**
- **Analyzing Time Series Data:** Such as reaction rates over time or monitoring experimental conditions.
- **Handling Experimental Datasets:** Importing data from instruments, cleaning it, and performing initial analysis.
- **Data Export:** Saving processed data back to a CSV or Excel file for further reporting or sharing with collaborators.

In this notebook, we will explore how to load, manipulate, and visualize data using **Pandas**, showing how it can simplify and enhance data-driven research in chemistry.


In [None]:
# Create a simple DataFrame to manage chemical data
data = {
    "Sample": ["A", "B", "C"],
    "Concentration (mol/L)": [0.1, 0.2, 0.3],
    "pH": [7, 8, 6]
}

#Create a pandas dataframe from the dictionary called 'data'
df = pd.DataFrame(data)
print(df)

In [None]:
# Create another dataframe, this time from the 'decay_rates' dictionary

decay_df = pd.DataFrame(decay_rates)

print(decay_df)

In [None]:
# For larger dataframes, you may wihs to just take a 'seank peak' at the content structure, using the 'head()' function in pandas.
# Write out the code below that will print out the top of the decay_df dataframe, using the head() function. 

###

### What data type (again)?

#### Let's more deeply understand the variables we've used in the above code block...

In [None]:
#Show what data type the above variables are being stored as

print("The data variable is data type:", type(data))

print("The df variable is data type:", type(df))


# 7. Plotting with Matplotlib

### **Introduction to Matplotlib**

**Matplotlib** is a comprehensive library for creating static, animated, and interactive visualizations in Python. It is particularly valuable in scientific computing for generating high-quality graphs and plots to visualize data effectively. Whether you are analyzing experimental results, presenting data, or exploring trends, **Matplotlib** offers the tools needed to create clear and informative graphics.

#### **Why Use Matplotlib in Scientific Computing?**

1. **Versatile Plotting Capabilities:** Matplotlib supports a wide variety of plots, including:
   - **Line plots:** Ideal for showing trends over time or continuous data.
   - **Scatter plots:** Useful for examining relationships between variables.
   - **Bar charts:** Good for categorical data comparisons.
   - **Histograms:** Excellent for displaying data distributions.
   - **Heatmaps and more:** Advanced visualizations for complex datasets.

2. **Customization:** Almost every aspect of a plot can be customized, from colors and labels to tick marks and grid lines. This level of control ensures that your data is presented clearly and professionally.

3. **Integration with Other Libraries:** Matplotlib works seamlessly with **Pandas** and **NumPy**, allowing you to plot data directly from **DataFrames** or **arrays** with minimal code.

4. **Publication-Quality Figures:** The library supports vector graphics output (e.g., PDF, SVG), which is ideal for creating figures for scientific publications and presentations.

5. **Interactive Plots:** When used in Jupyter notebooks, Matplotlib can generate interactive plots that enhance data exploration.

#### **Example Use Cases in Chemistry:**
- **Visualizing Reaction Kinetics:** Plotting concentration vs. time data for chemical reactions.
- **Comparing Experimental Conditions:** Using bar charts to compare yields or efficiencies under different conditions.
- **Analyzing Spectroscopy Data:** Creating line plots of absorbance vs. wavelength.

In this notebook, we will explore how to create and customize plots using **Matplotlib**, demonstrating how effective data visualization can enhance the interpretation and communication of research findings.


In [None]:
# Simple plot to visualize concentration data, stored in numpy arrays
plt.plot(concentrations_np, doubled_concentrations, marker='o')
plt.xlabel('Initial Concentration (mol/L)')
plt.ylabel('Doubled Concentration (mol/L)')
plt.title('Concentration Doubling Example')
plt.grid(True)
plt.show()

### Formatting `matplotlib` plots

#### In the code block below, we will remove the `#` that is sitting in front of the input arguments inside the brackets of each `plt` code line.

#### In this way, you will see some examples of how we can improve on the basic graph example above.

In [None]:
# Simple plot to visualize concentration data
plt.plot(
    concentrations, 
    doubled_concentrations, 
    marker='o',           # Marker style (e.g., 'o', 's', '^')
    #linestyle='--',        # Line style (e.g., '-', '--', '-.')
    #color='b',             # Line color (e.g., 'b' for blue, 'r' for red)
    #label='Concentration Data' # Label for the legend
)

plt.xlabel(
    'Initial Concentration (mol/L)', 
    #fontsize=12,           # Font size of the label
    #color='darkblue'        # Font color
)

plt.ylabel(
    'Doubled Concentration (mol/L)', 
    #fontsize=12, 
    #color='darkgreen'
)

plt.title(
    'Concentration Doubling Example', 
    #fontsize=14, 
    #fontweight='bold'       # Make the title bold
)

plt.grid(True)              # Show grid lines
#plt.legend()                # Show the legend (if labels are used)
plt.show()


### One more plotting example

In [None]:
# Simple plot to visualize concentration data, stored in numpy arrays
plt.plot(decay_df['Time'].to_numpy(), #convert the pandas dataframe column to a numpy array before plotting x-axis
         decay_df['A'].to_numpy(), #convert the pandas dataframe column to a numpy array before plotting y-axis
         marker='o',
         markersize=10,
         mfc='red', #marker foreground colour
         mec='black',
        linestyle='--'
        )
plt.xlabel('Initial Concentration (mol/L)')
plt.ylabel('Doubled Concentration (mol/L)')
plt.title('Concentration Doubling Example')
plt.grid(True)
plt.show()

# 7. Summary and Reflection
In this notebook, we covered:
- The importance of scientific computing in chemistry.
- How Python's basic features can be used for simple scientific tasks.
- Demonstrated practical use of NumPy, Pandas, and Matplotlib.

### Next Steps
- Dive deeper into **NumPy** for numerical computing with **Exercises 33A and 33B**, handling larger datasets with **Pandas**, and creating informative plots using **Matplotlib**.

- Feel free to experiment with the code and explore how these tools can help you in your own research!