In [None]:
import numpy as np
import matplotlib.pyplot as plt

Suppose we want to know the sample mean and sample variance of dataset
$$1, 2, 4, 9,0, 19, 21, -9$$
We can type in

In [None]:
x = np.array([1,2,4,9,0,19,21,-9])
print(np.mean(x))
print(np.var(x,ddof=1))

In [None]:
x = np.array([1,2,4,9,0,19,21,-9]) #alternative method
print(x.mean())
print(x.var(ddof=1))

What does `ddof` do here? What if we use `np.var()` without the `ddof` parameter?

We also have other commands like `np.sum(x)`, `np.median(x)`, `np.std(x,ddof=1)`, and they will produce sum of all observations, median, and sample standard deviation. However, note that numpy array does not have median as a method, therefore `x.median()` does not work.

If x, y are two numpy arrays of the same size, then x+y yields a numpy array whose entries are sums of elements of x and y. x-y is defined similarly. Moreover, x*y, x/y are also defined elementwise. This means to compute the sum of squares of all datapoints, we can use `np.sum(x**2)`. Let's use this element-wise operation to verify the formula for variance.

In [None]:
xbar = np.mean(x)
ssquared = np.sum((x-xbar)**2)/(x.size - 1)
print(ssquared == x.var(ddof=1))

We can plot the histogram of the given dataset with `plt.hist()'. Note that without providing any parameters, the default should be viewed with all the caveats we went over in class. The choice of bin size, bin intervals can affect how a histogram looks.

In [None]:
plt.hist(x)
plt.show()

For a pair of data x, y as two numpy arrays of the same size, we construct a scatter plot with the following. We can compute their correlation coefficient by calling `np.corrcoef([x,y])`. Note that `np.corrcoef()' takes a list numpy arrays as its input.

In [None]:
y = np.array([4,0,1,0,8,12,19, -3])
print(np.corrcoef([x,y]))

In [None]:
plt.scatter(x, y)
plt.show()

To sort a numpy array, one can use `np.sort()`. Note that if you use `sorted()`, which is part of python, the operation will convert a numpy array to a list in Python.

In [None]:
np.sort(x)
print(x)

We can select an element of numpy array `x` in position `i` by `x[i]`. Subsetting `x` can be done by `x[i:j]`. For instance, `x[2:4]` produces a numpy array from `x`'s position 2 to 9.

In [None]:
print(x[2:6])

You can also create your own programs. The following is an example that compute the hyperbolic sine 

In [None]:
def hyp_sine(x):
    return ( np.e ** x - np.e ** (-1 * x) ) / 2

print(hyp_sine(9))

Note `np.e` is the constant $e$ and exponent operation is `**`. Of course, `sinh` is also built-in in numpy.

In [None]:
print(np.sinh(9))

In [None]:
plt.hist(x)
plt.show()