# Scientific Computing with Python

This worksheet provides an introduction to scientific computing using the `numpy`, `scipy`, and `matplotlib` libraries. These libraries turn Python into a powerful tool for solving problems in statistics, linear algebra, calculus, signal processing, optimization, and more. This worksheet will just scratch the surface with a few examples.

It is normal and encouraged to consult library documentation and online help to look up how to use these tools. Even experienced programmers frequently reference online resources while writing code. Just be careful if you ever copy and paste lines of code from anywhere that you (1) understand how they work, and (2) properly credit their source.

Run all of the code sections to get a feeling for the kinds of things these libraries can do and how they interact with each other.

## NumPy

Let's start with the NumPy library, which provides a wide variety of mathematical constructs. NumPy arrays form the basis for Python's entire scientific computing ecosystem. A NumPy array is like a list with superpowers. An ordinary Python list is like Bruce Wayne, and a NumPy array is like Batman!

NumPy arrays are fast because they support vectorized operations, meaning that you can operate on all the elements in the array at once instead of having to loop over the data structure to modify each one.

The full NumPy reference is here: https://numpy.org/devdocs/index.html

In [None]:
import numpy as np
# this allows us to use the abbreviation np for the numpy library
# we will use it so frequently, it is convenient to make it short

# some convenient constants:
print("pi       =", np.pi)
print("e        =", np.e)

# some convenient math functions:
print("sin(1)   =", np.sin(1))   # sine (argument in radians)
print("exp(2)   =", np.exp(2))   # exponentiation
print("log(3)   =", np.log(3))   # natural log
print("log2(4)  =", np.log2(4))  # base 2 log
print("log10(5) =", np.log10(5)) # base 10 log
print("pi is roughly", np.around(np.pi, 4)) # rounding

See the full list of NumPy math functions here:
https://numpy.org/devdocs/reference/routines.math.html



In [None]:
# some ordinary Python lists
x_list = [0, 1, 2, 3, 4, 5, 6]
y_list = [1.03, 2.18, 3.15, 4.09, 4.52, 5.12, 6.9]

# converted to NumPy arrays
x_data = np.array(x_list)
y_data = np.array(y_list)

# we can increment every number in the array with a single line
print("original data:  ", y_data)
y_data += 1 # this will fail for a normal list
print("after increment:", y_data)

# other math operations can be combined and performed on all elements
z_data = np.cos(x_data * 1.2 - 0.7)
print(z_data)

# we can also do elementwise arithemetic operations between two arrays
print(x_data + y_data)

In [None]:
# as with lists, array elements can be indexed with numbers
print("the third y value is", y_data[2]) # remember counting starts at 0

# you can use slicing to capture multiple elements
print("the second and third elements are", y_data[1:3])
print("the first four elements are", y_data[:4])
print("the last two elements are", y_data[-2:])

In [None]:
# arrays can be multidimensional (stored as nested arrays of arrays)
numbers = np.array([ [1,2,3], [4,5,6] ])
print(numbers)

# the first element is an array containing the first row
print("first row:", numbers[0])

# to extract a single number, separate the indices by commas
print("second row, first column:", numbers[1,0])

# use slicing to extract all elements in a certain dimension
print("all rows, second column:", numbers[:,1])

# you can set the value of an entire slice at once
numbers[-1,:] = 0
print("last row is now all zeros:")
print(numbers)

Learn more about indexing numpy arrays here:
https://numpy.org/devdocs/reference/arrays.indexing.html


In [None]:
# you can easily check the size and shape of an array
print("size =", numbers.size) # total number of elements
print("shape =", numbers.shape) # dimensions of array
# the size attribute is preferred over using the builtin len() function

In [None]:
# you can generate numpy arrays using some helpful constructors

a = np.zeros(10)  # ten ones
print(a)
a = np.ones(10)   # ten zeros
print(a)
a = np.arange(10) # like builtin range()
print(a)
a = np.arange(5, 12, 2) # start at 5 and count to 12 in steps of 2
print(a)
a = np.linspace(0, 1, 5) # 5 numbers evenly spaced between 0 and 1 (inclusive)
print(a)

Full list of array creation routines:
https://numpy.org/devdocs/reference/routines.array-creation.html

In [None]:
# numpy provides a variety of random distributions for simulations
# the output will be different every time you run this code

a = np.random.uniform(2, 4, 5) # 5 random numbers between 3 and 5
print(a)
a = np.random.normal(-2, 3, 5) # mean -2 and standard deviation 3
print(a)
a = np.random.poisson(3, 5) # 5 numbers from poisson process with lambda = 3
print(a)

Full list of random distributions:
https://numpy.org/devdocs/reference/random/generator.html#distributions

In [None]:
# it's also easy to generate multidimensional arrays

a = np.zeros([3,4]) # 3 by 4 array of zeros
print(a)

a = np.random.normal(0, 1, (3,2)) # 3 by 2 array drawn from standard normal
print(a)

## Matplotlib

The Matplotlib library is used for plotting and visualizing data. It integrates closely with NumPy and SciPy.

The full Matplotlib reference is here:
https://matplotlib.org/contents.html

In [None]:
import matplotlib.pyplot as plt

plt.plot(x_data, y_data) # display data as a line plot
plt.show() # show the plot to the user

# the points correspond to the pairs of x and y data from the two arrays
# note that the sizes of the x and y arrays must match

In [None]:
# you can plot multiple curves with a title and axis labels

x = np.linspace(-5, 5, 100)
y1 = np.sin(x)
y2 = np.cos(x)
y3 = np.arctan(x)

plt.plot(x, y1)
plt.plot(x, y2)
plt.plot(x, y3)

plt.title("this is the title")
plt.xlabel("this is the x label")
plt.ylabel("this is the y label")
plt.show()

In [None]:
# can you find the parts of the code that do each of the following?
# - make the figure wide and short
# - label each of the functions
# - display a legend in the upper right corner
# - overlay a grid on the graph

plt.figure(figsize=[12,3])
plt.plot(x, y1, label='sin(x)')
plt.plot(x, y2, label='cos(x)')
plt.plot(x, y3, label='atan(x)')
plt.legend(loc='upper right')
plt.grid()
plt.show()

Learn about all the ways you can customize your plots here:
https://matplotlib.org/api/_as_gen/matplotlib.pyplot.plot.html

In [None]:
# an alternative to plotting functions on the same graph is to use subplots

# size in inches of the whole figure (width, height)
plt.figure(figsize=[12,3])

# this subplot command divides the figure into a grid with 1 row and 3 columns
# it then focuses on the first grid square until the next subplot command
plt.subplot(1, 3, 1) # 1 by 3 grid in 1st space
plt.plot(x, y1)
plt.title("plot 1 of 3")

plt.subplot(1, 3, 2) # 1 by 3 grid in 2nd space
plt.plot(x, y2)
plt.title("plot 2 of 3")

plt.subplot(1, 3, 3) # 1 by 3 grid in 3rd space
plt.plot(x, y3)
plt.title("plot 3 of 3")

plt.show()

Learn more about the subplot command here:
https://matplotlib.org/api/_as_gen/matplotlib.pyplot.subplot.html

In [None]:
# matplotlib has a variety of styles for plotting data; here are some examples

plt.figure(figsize=[12,3])

plt.subplot(1, 3, 1)
plt.scatter(x_data, y_data)
plt.title("scatter plot")

plt.subplot(1, 3, 2)
plt.bar(x_data, y_data)
plt.title("bar chart")

plt.subplot(1, 3, 3)
plt.fill_between(x_data, y_data)
plt.title("area plot")

plt.show()

**Python Question 1**

- Make a figure with size 5 by 5
- Add a scatter plot of the mystery data provided below
- Add a title describing what you see in the plot

In [None]:
mystery_x = np.array([2.143, -3.0, 0.875, 1.75, -4.226, 3.0, 2.75, -2.878, -0.75, -4.6, -4.226, -0.938, -1.0, -0.333, -1.114, 1.562, -4.6, 4.427, -0.25, 2.143, -1.7, -3.743, -2.723, -1.7, 1.391, -1.246, 3.0, -0.427, -0.201, 2.336, -1.773, -1.361, 2.333, 1.965, -2.5, 1.0, -4.855, 1.75, -0.664, 0.875, -0.938, -1.631, 4.743, 1.886, -1.965, 0.277, 3.997, -0.25, -0.358, 2.524, -1.0, -1.361, -1.773, -2.723, 1.886, -2.333, -1.0, 1.246, 5.0, -2.5, 5.0, 2.336, -3.162, 2.75, -2.511, 0.427, 2.84, -0.75, -4.855, 2.878, 0.277, 1.369, -1.114, 1.562, 0.8, -4.984, 0.8, -3.0, 2.642, -0.664, -1.667, 3.464, 4.935, -2.125, 0.489, 2.84, 0.603, 1.667, 2.642, 1.139, 1.139, -4.984, -2.511, -3.743, 4.935, -0.201, 1.391, 0.333, 0.489, -1.631, 3.997, 0.603, -2.524, 4.743, -3.162, 1.369, 4.427, -0.358, 3.464, -2.125])
mystery_y = np.array([6.017, 0.0, 4.299, 2.5, -1.172, -0.0, 3.0, -0.563, 2.5, 3.46, 4.172, 2.891, -3.399, 0.0, 1.573, 2.891, -0.46, 3.824, 3.0, -3.017, 2.717, 4.816, 3.312, 2.283, 6.303, -1.819, 0.0, -1.98, 6.496, 1.885, 6.175, 2.013, 0.0, -1.511, 5.83, 0.0, 0.303, 2.5, 4.115, 1.701, 2.109, 1.508, -0.083, 1.573, -1.511, 3.312, 4.504, 3.0, 3.61, -1.081, 0.0, 2.987, -3.175, 2.688, 4.427, 0.0, 6.399, -1.819, 1.5, -2.83, 1.5, 4.115, -2.373, 3.0, 2.118, -1.98, 5.615, 2.5, 2.697, -0.563, 2.688, 1.508, 4.427, 2.109, 2.717, 1.902, 2.283, 0.0, 3.61, 1.885, 0.0, -2.106, 0.698, 4.299, 2.118, -2.615, -3.464, 0.0, 2.39, 2.987, 2.013, 1.098, 3.882, -1.816, 2.302, -3.496, -3.303, 0.0, 3.882, 4.492, -1.504, 6.464, -1.081, 3.083, 5.373, 4.492, -0.824, 2.39, 5.106, 1.701])

In [None]:
# YOUR CODE HERE



**Python Question 2**

Modify the code below to make the following changes (you may need to reference the documentation to figure out how):

- Make the figure 6 inches wide and 2 inches tall.
- Make the plot a green dashed line.
- Set the x limits between 0 and 1.
- Set the y limits between -1 and 1.
- Add a title saying "Python is great!".
- Label the x axis as "time".
- Label the y axis as "love of Python".
- Remove the grid.

In [None]:
x = np.linspace(-5, 5, 100)
y1 = np.sin(x)

plt.plot(x, y1)
plt.xlim([-5,4])
plt.grid()
plt.show()

## SciPy

SciPy builds on NumPy to provide tools for common scientific computing tasks. SciPy provides many useful methods, but to start we will examine just a single one: `curve_fit` allows us to fit a parameterized curve to a collection of data.

The full SciPy reference is here:
https://docs.scipy.org/doc/scipy/reference/#api-reference

The `curve_fit` function is a general purpose tool that allows us to fit any kind of function to a collection of data. The programmer writes a function with some parameters, and then `curve_fit` attempts to find the parameters that mazimize the curve's fit with the data.

`curve_fit` takes three inputs:

- a function to optimize (specified by the programmer)
- the x data to fit
- the y data to fit

and it has two outputs:

- a list of the parameters that best fit the data
- the covariance matrix (a generalization of the error)

For complex functions and datasets, the `curve_fit` function may perform better if you can provide it with an initial guess of the parameters to prevent it from falling into a local optimum.

The full reference for this method is here:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html#scipy.optimize.curve_fit

In [None]:
# suppose you recorded the data below from a laboratory experiment; you measured
# the molar concentration (y) of a chemical product over time (x) in seconds
# (later we'll read data from files - for now, assume you already have it)

x = np.array([0.0, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0, 2.2, 2.4,
              2.6, 2.8, 3.0, 3.2, 3.4, 3.6, 3.8, 4.0, 4.2, 4.4, 4.6, 4.8, 5.0,
              5.2, 5.4, 5.6, 5.8, 6.0, 6.2, 6.4, 6.6, 6.8, 7.0, 7.2, 7.4, 7.6,
              7.8, 8.0, 8.2, 8.4, 8.6, 8.8, 9.0, 9.2, 9.4, 9.6, 9.8, 10.0])
y = np.array([0.002, 0.023, 0.026, 0.017, 0.004, 0.029, 0.041, 0.022, 0.011,
              0.038, 0.065, 0.118, 0.078, 0.108, 0.164, 0.187, 0.18, 0.212,
              0.262, 0.286, 0.337, 0.35, 0.386, 0.4, 0.457, 0.461, 0.521, 0.542,
              0.52, 0.55, 0.598, 0.611, 0.607, 0.621, 0.586, 0.604, 0.634,
              0.623, 0.615, 0.646, 0.642, 0.648, 0.655, 0.664, 0.655, 0.648,
              0.639, 0.633, 0.652, 0.656, 0.656])

plt.scatter(x, y)
plt.show()

In [None]:
# we can use scipy to fit curves to data
from scipy.optimize import curve_fit
# this syntax allows us to import a particular function
# if we instead just said import scipy, we would have to type
# scipy.optimize.curve_fit every time instead of just curve_fit

# here we specify the type of function we want to fit
def linear(x, a, b):
    return a*x + b

params, cov = curve_fit(linear, x, y) # compute best fit params for data
print("fit parameters:", params)
linear_fit = linear(x, params[0], params[1]) # stick params back into function

plt.scatter(x, y, s=5, color='black')
plt.plot(x, linear_fit, lw=1, color='red')
plt.title("linear fit")
plt.show()

In [None]:
# curve_fit can work with any kind of function we define

# it's really not important that you understand how this function works
# the point is that it describes a type of function that we can fit to our data
def clipping(x, offset, scale, delay, slope):
    y = (x - delay) * slope
    y = np.clip(y, 0, 1)
    return y * scale + offset

params, cov = curve_fit(clipping, x, y)
clip_fit = clipping(x, *params)
# *params is shorthand to unpack all the parameters: params[0], params[1], ...

plt.scatter(x, y, s=5, color='black')
plt.plot(x, clip_fit, lw=1, color='red')
plt.title("linear fit with clipping")
plt.show()

**Python Question 3**

Fit two more functions to this same set of data: a cubic function and a logistic function.

A cubic function is a third order polynomial:
$$ f(x) = ax^3 + bx^2 + cx + d $$

A logistic curve has a sigmoid shape:
$$ f(x) = a \cdot \tanh(bx - c) + d $$

In both functions, $a$, $b$, $c$, and $d$ are all parameters.

- Plot the fit curves side by side in adjacent subplots.
- Add titles to each subplot.
- Label the x axes as "time (s)".
- Label the y axes as "concentration [M]".
- Add legends in the upper left corners.

Which one do you think best describes this reaction data?

In [None]:
# YOUR CODE HERE



**Python Question 4**

Below are some new data. Fill in the code below to fit the data with a linear curve fit and at least one other function that you think is a better fit.

In [None]:
tofitxvals = np.array([0,1,2,3,4,5,6])
tofityvals = np.array([0,1,4,9,16,25,36])
plt.title('My Fit Data')
plt.xlabel('Study Time')
plt.ylabel('My Knowledge')

def linear(x, a, b):
    return a*x + b

# DEFINE ANOTHER FUNCTION OF YOUR CHOICE

params, cov = curve_fit(linear, tofitxvals, tofityvals)
print("linear fit parameters:", params)
linear_fit = linear(tofitxvals, params[0], params[1])

# COMPUTE ANOTHER BEST FIT

plt.scatter(tofitxvals, tofityvals, s=10, color='black', label='raw data')
plt.plot(tofitxvals, linear_fit, lw=1, color='red', label='linear fit')

# PLOT THE OTHER FIT CURVE

plt.legend()
plt.show()