# Introduction to SciPy
Tutorial at EuroSciPy 2019, Bilbao
## 1. Statistical analysis – Gyroscope data taken in a TGV

In [None]:
import numpy as np
from scipy import optimize, stats
%matplotlib notebook
import matplotlib.pyplot as plt

### Importing the data

Our data are stored in a compressed file `TGV_data.csv.bz2`. Each row of the uncompressed file contains entries separated by commas and the first row contains labels explaining the content of the respective column.

NumPy provides rather universal tools to import data from files into NumPy arrays. We will use `genfromtxt` which allows to deal with `bz2` compressed data files and also handles the labels in the first row of the data.

In [None]:
data = np.genfromtxt('data/TGV_data.csv.bz2', delimiter=',', names=True)

There are five columns identified by names:

In [None]:
data.dtype.names

In [None]:
time = data['Time_s']
omega_x = data['Gyroscope_x_rads']
omega_y = data['Gyroscope_y_rads']

Let us first get an idea of the data.

In [None]:
plt.plot(time, omega_x)
plt.plot(time, omega_y)

### Statistical analysis

We use the data for $\omega_x$ to demonstrate some statistical analysis. Let us first take a look at a histogram of the data.

In [None]:
n, bins = np.histogram(omega_x, bins=100, density=True)

Because we have set `density=True`, the data can be considered as a normalized probability distribution.

In [None]:
np.sum(n)*(bins[1]-bins[0])

The array `bins` contains the edges of the bins. To plot the histogram, we need their centers.

In [None]:
bincenters = 0.5*(bins[1:]+bins[:-1])

In [None]:
plt.plot(bincenters, n)

With SciPy we can easily determine some statistical characteristics from the original data:

In [None]:
description = stats.describe(omega_x, ddof=False)
description

We can compare our histogram with a Gaussian for the mean and variance just obtained.

In [None]:
loc = description.mean
scale = np.sqrt(description.variance)
loc, scale

In [None]:
x = np.linspace(-0.1, 0.1, 200)
plt.plot(x, stats.norm.pdf(x, loc, scale))

A maximum likelihood fit with a Gaussian will yield the same result.

In [None]:
stats.norm.fit(omega_x)

How can we fit the histogram to a Gaussian? Use nonlinear least-squares curve fitting.

In [None]:
def gaussian(x, loc, scale):
    return stats.norm.pdf(x, loc, scale)

In [None]:
popt, pcov = optimize.curve_fit(gaussian, bincenters, n)
popt

In [None]:
plt.plot(x, gaussian(x, *popt))

How likely is it that our data follow a normal distribution?

In [None]:
stats.normaltest(omega_x)

Normally distributed data

In [None]:
normdata = stats.norm.rvs(1, 0.2, 5000)
plt.plot(normdata)

In [None]:
n, bins = np.histogram(normdata, bins=100, density=True)
bincenters = 0.5*(bins[1:]+bins[:-1])
plt.plot(bincenters, n)

In [None]:
x = np.linspace(0, 1.6, 200)
plt.plot(x, stats.norm.pdf(x, *stats.norm.fit(normdata)))

In [None]:
popt, pcov = optimize.curve_fit(gaussian, bincenters, n)
popt

In [None]:
plt.plot(x, gaussian(x, *popt))

In [None]:
stats.normaltest(normdata)

Distribution with skewness

In [None]:
for a in range(-4, 5, 2):
    x = np.linspace(stats.skewnorm.ppf(0.001, a),
                    stats.skewnorm.ppf(0.999, a), 100)
    plt.plot(x, stats.skewnorm.pdf(x, a))