## SciPy
SciPy is a basic python library for solving scientific problem. It is a collection of mathematical algorithms and
convenience functions built on the Numpy extension of Python.

SciPy subpackage is organized to cover different scientific computing domains. We are using only `scipy.cluster`, `scipy.constants` and `scipy.cluster subpackages`.

#### K-Mean clusters
k-mean is an algorithm for grouping similar data into **k group** as an user input.
This algorithm will _return_ **a set of centroids**, one for each of the clusters. Data
will be transformed in to observation vector and will be classified into the closest
centroid.

- **whiten(obs[, check finite])**  
    Normalize a group of observations on a per feature basis.
- **vq(obs, code book[,check finite])**  
    Assign codes from a code book to observations.
- **kmeans(obs, k o guess[, iter, thresh, ...])**  
    Performs k-means on a set of observation vectors forming k clusters.
- **(kmeans2(data, k[, iter, thresh, minit, ...]))**  
    Classify a set of observations into k clusters using the k-means algorithm.

In [29]:
import scipy
from scipy.cluster.vq import kmeans, vq, whiten
    #Score of 20 students in on subjects
student_scores = [[80],[78],[55], [60], [62],
                  [59],[48], [49], [51], [54],
                  [72], [95], [78], [54], [50],
                  [61], [57], [56], [65], [53]];
print(student_scores)

[[80], [78], [55], [60], [62], [59], [48], [49], [51], [54], [72], [95], [78], [54], [50], [61], [57], [56], [65], [53]]


In [30]:
# normalize the raw data and calculate the centroids for classifying 2 group of students
student_scores = whiten(student_scores)
centroids,_ = kmeans(student_scores, 2)

In [31]:
print(*student_scores, sep=" ")
print()
print(*centroids, sep=" ")

[6.53573177] [6.37233848] [4.49331559] [4.90179883] [5.06519213] [4.82010218] [3.92143906] [4.00313571] [4.16652901] [4.41161895] [5.8821586] [7.76118148] [6.37233848] [4.41161895] [4.08483236] [4.98349548] [4.65670889] [4.57501224] [5.31028207] [4.3299223]

[4.54233358] [6.58474976]


In [32]:
# assign student into each group based on the nearest distance
result,_ = vq(student_scores, centroids)
print(result)

[1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0]


In [33]:
# Modify the code in Exercise 1,2 and 3, in order to classify students in to 4 groups
centroids,_ = kmeans(student_scores, 4)
result,_ = vq(student_scores, centroids)
print(centroids)
print(result)

[[4.22780149]
 [7.76118148]
 [6.29064183]
 [4.90179883]]
[2 2 0 3 3 3 0 0 0 0 2 1 2 0 0 3 3 3 3 0]


In [34]:
# classify the following scores of two subjects from 20 students into FOUR groups.
student_scores2 = [[80, 72], [78, 56], [55, 64], [60, 61], [62, 45],
                   [59, 71], [48, 85], [49, 45], [51, 55], [54, 62],
                   [72, 81], [95, 81], [78, 92], [54, 80], [50, 50],
                   [61, 65], [57, 62], [56, 55], [65, 63], [53, 72]];

In [40]:
student_scores2 = whiten(student_scores2)
centroids2,_ = kmeans(student_scores2, 4)
result2,_ = vq(student_scores2, centroids2)
print(centroids2, '\n\n', result2)

[[6.58474976 5.92039083]
 [4.37894029 3.87460133]
 [4.22099344 6.1218701 ]
 [4.79676028 4.9594897 ]] 

 [0 0 3 3 1 3 2 1 1 3 0 0 0 2 1 3 3 1 3 2]


#### Scipy constants
The scipy package contains a lot of useful constants and units of both Physic and Mathematic which will allow you to easily compute scientific problems

In [46]:
import scipy.constants

print("The PI is %.16f" %scipy.constants.pi)
print("The speed of light is c = %.1F" %scipy.constants.c)
print("The newton's gravity constant is G = %.1F" %scipy.constants.g)

The PI is 3.1415926535897931
The speed of light is c = 299792458.0
The newton's gravity constant is G = 9.8


In [53]:
scipy.constants.find("gram")  # find every constant with the word "gram" in it

['atomic mass unit-kilogram relationship',
 'electron volt-kilogram relationship',
 'hartree-kilogram relationship',
 'hertz-kilogram relationship',
 'inverse meter-kilogram relationship',
 'joule-kilogram relationship',
 'kelvin-kilogram relationship',
 'kilogram-atomic mass unit relationship',
 'kilogram-electron volt relationship',
 'kilogram-hartree relationship',
 'kilogram-hertz relationship',
 'kilogram-inverse meter relationship',
 'kilogram-joule relationship',
 'kilogram-kelvin relationship']

In [55]:
print(scipy.constants.physical_constants["atomic mass unit-kilogram relationship"])

(1.66053904e-27, 'kg', 2e-35)


In [45]:
import numpy as np
E = 55*np.power(scipy.constants.c, 2)  # E = mc^2
print(E)

4.943153483052497e+18


#### Scipy stats
This package contains a large number of statistical functions.

**Basic Statistics**
- **describe()** Computes several descriptive statistics of the passed array
- **gmean()** Computes geometric mean along the specified axis.
- **hmean()** Calculates the harmonic mean along the specified axis.
- **kurtosis()** Computes the kurtosis.
- **mode()** Returns the modal value.
- **skew()** Tests the skewness of the data.
- **f_oneway()** Performs a 1-way ANOVA.
- **iqr()** Computes the interquartile range of the data along the specified axis.
- **zscore()** Calculates the z score of each value in the sample, relative to the sample mean and standard deviation.
- **sem()** Calculates the standard error of the mean (or standard error of measurement) of the values in the input array.

In [47]:
import scipy.stats as stats
x = np.array([19,18,21,16,15,17,20,18])
# find min, max, mean, and variance
print(stats.tmin(x), stats.tmax(x), stats.tmean(x), stats.tvar(x))

15 21 18.0 4.0


In [76]:
print(stats.describe(x))

DescribeResult(nobs=8, minmax=(15, 21), mean=18.0, variance=4.0, skewness=0.0, kurtosis=-1.0)


##### Z-score

To measure on how many standard deviations below or above the population
mean of a raw score Z-score is used. It also know as a standard score and can
be placed on a normal distribution curve. The Z-score are as follow:  
z<sub>i</sub> = (x<sub>i</sub> − x&#772;) / S  
Where x is a test score, x&#772; is the sample mean and S is the
sample standard deviation.

In [51]:
student_scores = [[80], [78], [55], [60], [62],
                  [59], [48], [49], [51], [54],
                  [72], [95], [78], [54], [50],
                  [61], [57], [56], [65], [53]];
print(stats.tmean(student_scores))  # mean
print(stats.tstd(student_scores))   # standard deviation
print((62-stats.tmean(student_scores))/stats.tstd(student_scores))  # calculate z-score only for score=62

61.85
12.558389940383952
0.011944206280587316


In [52]:
ans = (student_scores - stats.tmean(student_scores)) / stats.tstd(student_scores)  # calculate z-score for EVERY student
print(ans)

[[ 1.44524896]
 [ 1.28599288]
 [-0.54545209]
 [-0.14731188]
 [ 0.01194421]
 [-0.22693992]
 [-1.10284838]
 [-1.02322034]
 [-0.86396425]
 [-0.62508013]
 [ 0.80822462]
 [ 2.63966959]
 [ 1.28599288]
 [-0.62508013]
 [-0.9435923 ]
 [-0.06768384]
 [-0.386196  ]
 [-0.46582404]
 [ 0.25082833]
 [-0.70470817]]
