# Python SciPy

## 1. Intro
SciPy stands for Scientific Python.

It provides more utility functions for optimization, stats and signal processing.

### Setup

In [2]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns 

import scipy
from scipy import constants

sns.set(style="darkgrid")

### Example
How many cubic meters are in one liter:

In [3]:
print(constants.liter)

0.001


> constants: SciPy offers a set of mathematical constants, one of them is liter which returns 1 liter as cubic meters.

### Checking SciPy Version
The version string is stored under the __version__ attribute.

In [4]:
print(scipy.__version__)

1.5.2


## 2. Constants
As SciPy is more focused on scientific implementations, it provides many built-in scientific constants.

These constants can be helpful when you are working with Data Science.

> PI is an example of a scientific constant.

In [5]:
print(constants.pi)

3.141592653589793


### Constant Units
A list of all units under the constants module can be seen using the dir() function.

In [6]:
print(dir(constants))



### Unit Categories
The units are placed under these categories:
* Metric
* Binary
* Mass
* Angle
* Time
* Length
* Pressure
* Volume
* Speed
* Temperature
* Energy
* Power
* Force

### Metric (SI) Prefixes:
Return the specified unit in __meter__ (e.g. <span style="color:red">centi</span> returns <span style="color:red">0.01</span>)

In [7]:
print(constants.yotta)    #1e+24
print(constants.zetta)    #1e+21
print(constants.exa)      #1e+18
print(constants.peta)     #1000000000000000.0
print(constants.tera)     #1000000000000.0
print(constants.giga)     #1000000000.0
print(constants.mega)     #1000000.0
print(constants.kilo)     #1000.0
print(constants.hecto)    #100.0
print(constants.deka)     #10.0
print(constants.deci)     #0.1
print(constants.centi)    #0.01
print(constants.milli)    #0.001
print(constants.micro)    #1e-06
print(constants.nano)     #1e-09
print(constants.pico)     #1e-12
print(constants.femto)    #1e-15
print(constants.atto)     #1e-18
print(constants.zepto)    #1e-21

1e+24
1e+21
1e+18
1000000000000000.0
1000000000000.0
1000000000.0
1000000.0
1000.0
100.0
10.0
0.1
0.01
0.001
1e-06
1e-09
1e-12
1e-15
1e-18
1e-21


### Binary Prefixes:
Return the specified unit in __bytes__ (e.g. <span style="color:red">kibi</span> returns <span style="color:red">1024</span>)

In [8]:
print(constants.kibi)    #1024
print(constants.mebi)    #1048576
print(constants.gibi)    #1073741824
print(constants.tebi)    #1099511627776
print(constants.pebi)    #1125899906842624
print(constants.exbi)    #1152921504606846976
print(constants.zebi)    #1180591620717411303424
print(constants.yobi)    #1208925819614629174706176

1024
1048576
1073741824
1099511627776
1125899906842624
1152921504606846976
1180591620717411303424
1208925819614629174706176


### Mass:
Return the specified unit in __kg__ (e.g. <span style="color:red">gram</span> returns <span style="color:red">0.001</span>)

In [9]:
print(constants.gram)        #0.001
print(constants.metric_ton)  #1000.0
print(constants.grain)       #6.479891e-05
print(constants.lb)          #0.45359236999999997
print(constants.pound)       #0.45359236999999997
print(constants.oz)          #0.028349523124999998
print(constants.ounce)       #0.028349523124999998
print(constants.stone)       #6.3502931799999995
print(constants.long_ton)    #1016.0469088
print(constants.short_ton)   #907.1847399999999
print(constants.troy_ounce)  #0.031103476799999998
print(constants.troy_pound)  #0.37324172159999996
print(constants.carat)       #0.0002
print(constants.atomic_mass) #1.66053904e-27
print(constants.m_u)         #1.66053904e-27
print(constants.u)           #1.66053904e-27

0.001
1000.0
6.479891e-05
0.45359236999999997
0.45359236999999997
0.028349523124999998
0.028349523124999998
6.3502931799999995
1016.0469088
907.1847399999999
0.031103476799999998
0.37324172159999996
0.0002
1.6605390666e-27
1.6605390666e-27
1.6605390666e-27


## Angle:
Return the specified unit in __radians__ (e.g. <span style="color:red">degree</span> returns <span style="color:red">0.017453292519943295</span>)

In [10]:
print(constants.degree)     #0.017453292519943295
print(constants.arcmin)     #0.0002908882086657216
print(constants.arcminute)  #0.0002908882086657216
print(constants.arcsec)     #4.84813681109536e-06
print(constants.arcsecond)  #4.84813681109536e-06

0.017453292519943295
0.0002908882086657216
0.0002908882086657216
4.84813681109536e-06
4.84813681109536e-06


### Time:
Return the specified unit in __seconds__ (e.g. <span style="color:red">hour</span> returns <span style="color:red">3600.0</span>)

In [11]:
print(constants.minute)      #60.0
print(constants.hour)        #3600.0
print(constants.day)         #86400.0
print(constants.week)        #604800.0
print(constants.year)        #31536000.0
print(constants.Julian_year) #31557600.0

60.0
3600.0
86400.0
604800.0
31536000.0
31557600.0


### Length:
Return the specified unit in __meters__ (e.g. <span style="color:red">nautical_mile</span> returns <span style="color:red">1852.0</span>)

In [12]:
print(constants.inch)              #0.0254
print(constants.foot)              #0.30479999999999996
print(constants.yard)              #0.9143999999999999
print(constants.mile)              #1609.3439999999998
print(constants.mil)               #2.5399999999999997e-05
print(constants.pt)                #0.00035277777777777776
print(constants.point)             #0.00035277777777777776
print(constants.survey_foot)       #0.3048006096012192
print(constants.survey_mile)       #1609.3472186944373
print(constants.nautical_mile)     #1852.0
print(constants.fermi)             #1e-15
print(constants.angstrom)          #1e-10
print(constants.micron)            #1e-06
print(constants.au)                #149597870691.0
print(constants.astronomical_unit) #149597870691.0
print(constants.light_year)        #9460730472580800.0
print(constants.parsec)            #3.0856775813057292e+16

0.0254
0.30479999999999996
0.9143999999999999
1609.3439999999998
2.5399999999999997e-05
0.00035277777777777776
0.00035277777777777776
0.3048006096012192
1609.3472186944373
1852.0
1e-15
1e-10
1e-06
149597870700.0
149597870700.0
9460730472580800.0
3.085677581491367e+16


### Pressure:
Return the specified unit in __pascals__ (e.g. <span style="color:red">psi</span> returns <span style="color:red">6894.757293168361</span>)

In [13]:
print(constants.atm)         #101325.0
print(constants.atmosphere)  #101325.0
print(constants.bar)         #100000.0
print(constants.torr)        #133.32236842105263
print(constants.mmHg)        #133.32236842105263
print(constants.psi)         #6894.757293168361

101325.0
101325.0
100000.0
133.32236842105263
133.32236842105263
6894.757293168361


### Area:
Return the specified unit in square __meters__(e.g. <span style="color:red">hectare</span> returns <span style="color:red">10000.0</span>)

In [14]:
print(constants.hectare) #10000.0
print(constants.acre)    #4046.8564223999992

10000.0
4046.8564223999992


### Volume:
Return the specified unit in __cubic meters__ (e.g. <span style="color:red">liter</span> returns <span style="color:red">0.001</span>)

In [15]:
print(constants.liter)            #0.001
print(constants.litre)            #0.001
print(constants.gallon)           #0.0037854117839999997
print(constants.gallon_US)        #0.0037854117839999997
print(constants.gallon_imp)       #0.00454609
print(constants.fluid_ounce)      #2.9573529562499998e-05
print(constants.fluid_ounce_US)   #2.9573529562499998e-05
print(constants.fluid_ounce_imp)  #2.84130625e-05
print(constants.barrel)           #0.15898729492799998
print(constants.bbl)              #0.15898729492799998

0.001
0.001
0.0037854117839999997
0.0037854117839999997
0.00454609
2.9573529562499998e-05
2.9573529562499998e-05
2.84130625e-05
0.15898729492799998
0.15898729492799998


### Speed:
Return the specified unit in __meters per second__ (e.g. <span style="color:red">speed_of_sound</span> returns <span style="color:red">340.5</span>)

In [16]:
print(constants.kmh)            #0.2777777777777778
print(constants.mph)            #0.44703999999999994
print(constants.mach)           #340.5
print(constants.speed_of_sound) #340.5
print(constants.knot)           #0.5144444444444445

0.2777777777777778
0.44703999999999994
340.5
340.5
0.5144444444444445


### Temperature:
Return the specified unit in __Kelvin__ (e.g. <span style="color:red">zero_Celsius</span> returns <span style="color:red">273.15</span>)

In [17]:
print(constants.zero_Celsius)      #273.15
print(constants.degree_Fahrenheit) #0.5555555555555556

273.15
0.5555555555555556


### Energy:
Return the specified unit in __joules__ (e.g. <span style="color:red">calorie</span> returns <span style="color:red">4.184</span>)

In [18]:
print(constants.eV)            #1.6021766208e-19
print(constants.electron_volt) #1.6021766208e-19
print(constants.calorie)       #4.184
print(constants.calorie_th)    #4.184
print(constants.calorie_IT)    #4.1868
print(constants.erg)           #1e-07
print(constants.Btu)           #1055.05585262
print(constants.Btu_IT)        #1055.05585262
print(constants.Btu_th)        #1054.3502644888888
print(constants.ton_TNT)       #4184000000.0

1.602176634e-19
1.602176634e-19
4.184
4.184
4.1868
1e-07
1055.05585262
1055.05585262
1054.3502644888888
4184000000.0


### Power:
Return the specified unit in __watts__ (e.g. <span style="color:red">horsepower</span> returns <span style="color:red">745.6998715822701</span>)

In [19]:
print(constants.hp)         #745.6998715822701
print(constants.horsepower) #745.6998715822701

745.6998715822701
745.6998715822701


### Force:
Return the specified unit in __newton__ (e.g. <span style="color:red">kilogram_force</span> returns <span style="color:red">9.80665</span>)

In [20]:
print(constants.dyn)             #1e-05
print(constants.dyne)            #1e-05
print(constants.lbf)             #4.4482216152605
print(constants.pound_force)     #4.4482216152605
print(constants.kgf)             #9.80665
print(constants.kilogram_force)  #9.80665

1e-05
1e-05
4.4482216152605
4.4482216152605
9.80665
9.80665


## 3. Optimizers
### Optimizers in SciPy
Optimizers are a set of procedures defined in SciPy that either find the minimum value of a function, or the root of an equation.

### Optimizing Functions
Essentially, all of the algorithms in Machine Learning are nothing more than a complex equation that needs to be minimized with the help of given data.

### Roots of an Equation
NumPy is capable of finding roots for polynomials and linear equations, but it can not find roots for non linear equations, like this one:

<span style="color:red">x + cos(x)</span>

For that you can use SciPy's <span style="color:red">optimze.root</span> function.

This function takes two required arguments:

**_fun_** - a function representing an equation.

**_x0_** - an initial guess for the root.

The function returns an object with information regarding the solution.

The actual solution is given under attribute <span style="color:red">x</span> of the returned object:

**Example**: Find root of the equation <span style="color:red">x + cos(x)</span>:

In [21]:
from scipy.optimize import root
from math import cos

def eqn(x):
  return x + cos(x)

myroot = root(eqn, 0)

print(myroot.x)

[-0.73908513]


> Note: The returned object has much more information about the solution.

**Example**: Print all information about the solution (not just <span style="color:red">x</span> which is the root)

In [22]:
print(myroot)

    fjac: array([[-1.]])
     fun: array([0.])
 message: 'The solution converged.'
    nfev: 9
     qtf: array([-2.66786593e-13])
       r: array([-1.67361202])
  status: 1
 success: True
       x: array([-0.73908513])


### Minimizing a Function
A function, in this context, represents a curve, curves have high points and low points.

High points are called _maxima_.

Low points are called _minima_.

The highest point in the whole curve is called _global maxima_, whereas the rest of them are called _local maxima_.

The lowest point in whole curve is called _global minima_, whereas the rest of them are called _local minima_.

### Finding Minima
We can use <span style="color:red">scipy.optimize.minimize()</span> function to minimize the function.

The <span style="color:red">minimize()</span> function takes the following arguments:

**_fun_** - a function representing an equation.

**_x0_** - an initial guess for the root.

**_method_** - name of the method to use. Legal values:
* <span style="color:red">'CG'</span>
* <span style="color:red">'BFGS'</span>
* <span style="color:red">'Newton-CG'</span>
* <span style="color:red">'L-BFGS-B'</span>
* <span style="color:red">'TNC'</span>
* <span style="color:red">'COBYLA'</span>
* <span style="color:red">'SLSQP'</span>

**_callback_** - function called after each iteration of optimization.

**_options_** - a dictionary defining extra params:
<span style="color:red">
  {
     "disp": boolean - print detailed description,
     "gtol": number - the tolerance of the error
  }
</span>

**Example**: Minimize the function <span style="color:red">x^2 + x + 2</span> with <span style="color:red">BFGS</span>:

In [23]:
from scipy.optimize import minimize

def eqn(x):
  return x**2 + x + 2

mymin = minimize(eqn, 0, method='BFGS')

print(mymin)

      fun: 1.75
 hess_inv: array([[0.50000001]])
      jac: array([0.])
  message: 'Optimization terminated successfully.'
     nfev: 8
      nit: 2
     njev: 4
   status: 0
  success: True
        x: array([-0.50000001])


## 4. Statistical Significance Tests
In statistics, statistical significance means that the result that was produced has a reason behind it, it was not produced randomly, or by chance.

SciPy provides us with a module called <span style="color:red">scipy.stats</span>, which has functions for performing statistical significance tests.

Here are some techniques and keywords that are important when performing such tests:

### Hypothesis in Statistics
Hypothesis is an assumption about a parameter in population.

### Null Hypothesis
It assumes that the observation is not stastically significant.

### Alternate Hypothesis
It assumes that the observations are due to some reason.

Its alternate to Null Hypothesis.

**Example**:

For an assessment of a student we would take:

_"student is worse than average"_ - as a null hypothesis, and:

_"student is better than average"_ - as an alternate hypothesis.

### One tailed test
When our hypothesis is testing for one side of the value only, it is called "one tailed test".

**Example**:

For the null hypothesis:

_"the mean is equal to k"_, we can have alternate hypothesis:

_"the mean is less than k"_, or:

_"the mean is greater than k"_

### Two tailed test
When our hypothesis is testing for both side of the values.

**Example**:

For the null hypothesis:

_"the mean is equal to k"_, we can have alternate hypothesis:

_"the mean is not equal to k"_

In this case the mean is less than, or greater than k, and both sides are to be checked.

### Alpha value
Alpha value is the level of significance.

**Example**:

How close to extremes the data must be for null hypothesis to be rejected.

It is usually taken as 0.01, 0.05, or 0.1.

### P value
P value tells how close to extreme the data actually is.

P value and alpha values are compared to establish the statistical significance.

If p value <= alpha we reject the null hypothesis and say that the data is statistically significant. otherwise we accept the null hypothesis.

### T-Test
T-tests are used to determine if there is significant deference between means of two variables. and lets us know if they belong to the same distribution.

It is a two tailed test.

The function <span style="color:red">ttest_ind()</span> takes two samples of same size and produces a tuple of t-statistic and p-value.

#### **Example**
Find if the given values v1 and v2 are from same distribution:

In [24]:
from scipy.stats import ttest_ind

v1 = np.random.normal(size=100)
v2 = np.random.normal(size=100)

res = ttest_ind(v1, v2)

print(res)

Ttest_indResult(statistic=-2.350787321636274, pvalue=0.01971706144089413)


If you want to return only the p-value, use the pvalue property:

In [25]:
res = ttest_ind(v1, v2).pvalue

print(res)

0.01971706144089413


### KS-Test
KS test is used to check if given values follow a distribution.

The function takes the value to be tested, and the CDF as two parameters.

> A CDF can be either a string or a callable function that returns the probability.

It can be used as a one tailed or two tailed test.

By default it is two tailed. We can pass parameter alternative as a string of one of two-sided, less, or greater.

#### **Example**
Find if the given value follows the normal distribution:

In [26]:
from scipy.stats import kstest

v = np.random.normal(size=100)

res = kstest(v, 'norm')

print(res)

KstestResult(statistic=0.10671365977240443, pvalue=0.19066324234438317)


Statistical Description of Data
In order to see a summary of values in an array, we can use the describe() function.

It returns the following description:

1. number of observations (nobs)
2. minimum and maximum values = minmax
3. mean
4. variance
5. skewness
6. kurtosis

**Example**
Show statistical description of the values in an array:

In [27]:
import numpy as np
from scipy.stats import describe

v = np.random.normal(size=100)
res = describe(v)

print(res)

DescribeResult(nobs=100, minmax=(-2.4675108054258765, 2.1458526753082765), mean=0.08832054819032525, variance=1.1377376508233006, skewness=-0.14614255882526833, kurtosis=-0.6105290669830414)


### Normality Tests (Skewness and Kurtosis)
Normality tests are based on the skewness and kurtosis.

The <span style="color:red">normaltest()</span> function returns p value for the null hypothesis:

_"x comes from a normal distribution"_.

---

#### Skewness:
A measure of symmetry in data.

For normal distributions it is 0.

If it is negative, it means the data is skewed left.

If it is positive it means the data is skewed right.

---

#### Kurtosis:
A measure of whether the data is heavy or lightly tailed to a normal distribution.

Positive kurtosis means heavy tailed.

Negative kurtosis means lightly tailed.

**Example**
Find skewness and kurtosis of values in an array:

In [28]:
from scipy.stats import skew, kurtosis

v = np.random.normal(size=100)

print(skew(v))
print(kurtosis(v))

0.12551759973252366
-0.2647848137797779


**Example**
Find if the data comes from a normal distribution:

In [29]:
from scipy.stats import normaltest

v = np.random.normal(size=100)

print(normaltest(v))

NormaltestResult(statistic=4.763378806906758, pvalue=0.09239435424430387)
