# Introduction to Scientific Computing II

Last week, we did a gentle introduction to manipulating n-dimensional arrays using the NumPy library.

This week, we're learning about a couple more important scientific computing libraries:

[**SciPy**](https://docs.scipy.org/doc/scipy/reference/) - SciPy is a collection of mathematical algorithms and utility functions built on the NumPy library. We will look at some, but not all, of the SciPy subpackages including: `spatial`, `sparse`, `stats`, and `linalg`. 

[**Pandas**](https://pandas.pydata.org/docs/user_guide/index.html) - From the website: *fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive.* It's a powerful tool for doing "real world" data munging and analysis.

#### Import the Packages

In [None]:
from scipy import *
import pandas as pd

## SciPy

First up, SciPy!

### Spatial `scipy.spatial`

#### Distance Computations `scipy.spatial.distance`

Let's import in our stuff!

In [None]:
from scipy.spatial.distance import pdist, cdist, cityblock, euclidean, cosine, jensenshannon, minkowski

You know what I love about SciPy? It's wonderful documentation!

Using the documentation, we can find the equations for these distances!

##### `scipy.spatial.distance.cityblock` - https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cityblock.html

![](assets/cityblock.jpg)

##### `scipy.spatial.distance.euclidean` - https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.euclidean.html

![](assets/euclidean.jpg)

##### `scipy.spatial.distance.euclidean` - https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cosine.html

![](assets/cosine.jpg)

##### `scipy.spatial.distance.jensenshannon` - https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.jensenshannon.html

![](assets/jensenshannon.jpg)

##### `scipy.spatial.distance.minkowski` - https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.minkowski.html

![](assets/minkowski.jpg)

In [17]:
# Magical Testing Area

### Sparse Matrices `scipy.sparse`

Consider the following example: You go shopping at Wal-Mart and only have 3 small items you plan on buying. Would it be better to take a large cart or just a small basket? You're right! The small basket, because then you save space!

Similarly, with spare matrices, we can get store only what we need to, our non-zero values!

![](assets/sparsevsdense.png)

Usually, there are multiple formats of sparse matrixes that can be used for different things, but tbh, it's not that big of a deal.

Here's some things you can do with sparse matrices:

- Create graph representations and compute graph characteristics
- Liear Algebra \[covering later\]

In [22]:
from scipy.sparse.csgraph import dijkstra

dijkstra([[0, 4, 0, 0, 0, 0, 0, 8, 0], 
        [4, 0, 8, 0, 0, 0, 0, 11, 0], 
        [0, 8, 0, 7, 0, 4, 0, 0, 2], 
        [0, 0, 7, 0, 9, 14, 0, 0, 0], 
        [0, 0, 0, 9, 0, 10, 0, 0, 0], 
        [0, 0, 4, 14, 10, 0, 2, 0, 0], 
        [0, 0, 0, 0, 0, 2, 0, 1, 6], 
        [8, 11, 0, 0, 0, 0, 1, 0, 7], 
        [0, 0, 2, 0, 0, 0, 6, 7, 0] 
        ])

array([[ 0.,  4., 12., 19., 21., 11.,  9.,  8., 14.],
       [ 4.,  0.,  8., 15., 22., 12., 12., 11., 10.],
       [12.,  8.,  0.,  7., 14.,  4.,  6.,  7.,  2.],
       [19., 15.,  7.,  0.,  9., 11., 13., 14.,  9.],
       [21., 22., 14.,  9.,  0., 10., 12., 13., 16.],
       [11., 12.,  4., 11., 10.,  0.,  2.,  3.,  6.],
       [ 9., 12.,  6., 13., 12.,  2.,  0.,  1.,  6.],
       [ 8., 11.,  7., 14., 13.,  3.,  1.,  0.,  7.],
       [14., 10.,  2.,  9., 16.,  6.,  6.,  7.,  0.]])

### Statistical Functions `scipy.stats`

Yay! So this is the module I'm most familiar with!

##### Statistics \*cues shrieks of terror\*

Just kidding! Statistics is great(-ish)!

SciPy provides INNUMERABLE statistical functions, ranging anywhere from random number generation to statistical analysis. I personally know the most about random number generation, so that's where I'm going to focus.

Some of the most common random number functions I use are
`scipy.stats.bernoulli` and `scipy.stats.maxwell`

SciPy has made it all convenient and has given each of these distributions static methods.

Ex: `scipy.stats.bernoulli.rvs(p, size=())`

Every statistical distribution has these staticmethods:
`rvs`, `pdf` `logpdf`, `cdf`, `logcdf`, `mean`, `std`, `var`, etc.

These staticmethods have their own significance. As described by the documentation itself:

![](assets/rv_staticmethods.jpg)

In [23]:
# Some statistical distributions to have fun with!
from scipy.stats import bernoulli, boltzmann, norm, multivariate_normal

However, though SciPy has innumrable random number generation functions, it also has some other amazing functions!

In [24]:
# Other amazing functions to demonstrate

# scipy.stats.chisquare
# scipy.stats.zscore
# scipy.stats.cumfreq

### Linear Algebra `scipy.linalg`

We'll be doing a linear algebra lecture in a few weeks, so we'll introduce this subpackage then!

## Pandas

## Matplotlib