# Machine Learning and Statistics

## Task 1 

### <center>Calculation Method</center>

The method taken to calculate sqaure root of two is the  **Newton Sqaure root method**

#### Newtonian Optimization

$$0 = f(x_0) + f'(x_0)(x_1 -x_0)$$

$$x_1 - x_0 = - \frac{f(x_0)}{f'(x_0)}$$ 

$$x_1 = x_0 -\frac{f(x_0)}{f'(x_0)} $$

"*Newtonian optimization is one of the basic ideas in optimization where function to be optimized is evaluated at a random point. Afterwards, this point is shifted in the negative direction of gradient until convergence.*"[[1]](https://medium.com/@sddkal/newton-square-root-method-in-python-270853e9185d)

$$a = x^2$$

For, 
$$f(x) = x^2 - a$$
$$f'(x)=x^2 -a $$
     
$$f(x) = 2x$$

$$\frac{f(x)}{f'(x)} = \frac{x^2 -a}{2x} = \frac{x -\frac{a}{x}}{2}$$

Since,

$$x_{n+1} - x_n = -\frac{f(x_n)}{f'(x_n)}$$

$$x_{n+1}   = x_n -\frac{x_n - \frac{a}{x_n}}{2}$$

$$x_{n+1}   = \frac{x_n - \frac{a}{x_n}}{2}$$

A classic algorithm that illustrates many of these concerns is “Newton’s” method to compute square
roots $x =√a$ for $a > 0$, i.e. to solve $x^2 = a$. The algorithm starts with some guess x1 > 0 and
computes the sequence of improved guesses [[2]](https://math.mit.edu/~stevenj/18.335/newton-sqrt.pdf )

$$x_{n+1} = \frac{1}{2}(x_{n} + \frac{a}{x_{n}})$$
.

In [5]:

def sqrt2( number_iters = 500):
    a = float(2) # number to get square root of
    for i in range(number_iters): # iteration number
        a = 0.5 * (a + 2 / a) # update
        
    print("{:.100f}".format(a))
  

  

In [6]:
sqrt2()

1.4142135623730949234300169337075203657150268554687500000000000000000000000000000000000000000000000000


## Task 2

### Chi Sqaured Tests 

The chi-square test is often used to assess the significance (if any) of the differences among k different groups. The null and alternate hypotheses of the test, are generally written as:

H<sub>0</sub>: There is no significant difference between two or more groups.

H<sub>A</sub> There exists at least one significant difference between two or more groups.

The chi-square test statistic, denoted $x^2$, is defined as the following:[[3]](https://aaronschlegel.me/chi-square-test-independence-contingency-tables.html)

$$x^2=\sum_{i=1}^r\sum_{i=1}^k\frac{(O_{ij} -E_{ij})^2}{E_{ij}}$$

Where $Oi_{j}$ is the i-th observed frequency in the j-th group and $E_{ij}$ is the corresponding expected frequency. The expected frequency can be calculated using a common statistical analysis. The expected frequency, typically denoted $E_{cr}$, where c is the column index and r is the row index. Stated more formally, the expected frequency is defined as:



$$E_{cr}= \frac{(\sum_{i=0}^{n_r}r_i)((\sum_{i=0}^{n_c}c_i)}{n}$$

Where n is the total sample size and nc,nr are the number of cells in row and column, respectively. The expected frequency is calculated for each 'cell' in the given array.

In [8]:

from scipy.stats import chi2_contingency
import numpy as np
obs = np.array([[90, 60, 104,95], [30, 51, 51,20],[30,40,45,35]])
chi2_contingency(obs)


chi2_stat, p_val, dof, ex = chi2_contingency(obs)
print("===Chi2 Stat===")
print(chi2_stat)
print("\n")
print("===Degrees of Freedom===")
print(dof)
print("\n")
print("===P-Value===")
print(p_val)
print("\n")
print("===Contingency Table===")
print(ex)

===Chi2 Stat===
25.31053121097707


===Degrees of Freedom===
6


===P-Value===
0.0002990757107414016


===Contingency Table===
[[ 80.41474654  80.95084485 107.21966206  80.41474654]
 [ 35.02304147  35.25652842  46.69738863  35.02304147]
 [ 34.56221198  34.79262673  46.08294931  34.56221198]]


## Task 3 

### Standard Deviation

With Standard Deviation you can get a handle on whether your data are close to the average or they are spread out over a wide range. For example, if an teacher wants to determine if the grades in one of his/her class seem fair for all students, or if there is a great disparity, he/she can use standard deviation. To do that, he/she can find the average of the salaries in that department and then calculate the standard deviation. In general, a low standard deviation means that the data is very closely related to the average, thus very reliable and a high standard deviation means that there is a large variance between the data and the statistical average, thus not as reliable[[4]](https://towardsdatascience.com/using-standard-deviation-in-python-77872c32ba9b)


#### Population Standard Deviation 

$$\sigma = \frac{\sqrt{\sum(X_i - \mu)^2}}{N}$$

<center>$\sigma$ = population standard deviation </center>
<center>$\sum$ = sum of </center>
<center>$X_i$ = each value in the sample </center>
<center>$\mu$= population mean</center>
<center>N= number of values in the sample</center>

This standard deviation equation **Numpy** [[5]](https://towardsdatascience.com/why-computing-standard-deviation-in-pandas-and-numpy-yields-different-results-5b475e02d112)uses by default

#### Sample Stanadard Deviation

When data is collected  it is actually quite rare that we work with populations. It is more likely that we will be working with samples of populations rather than whole populations itself.thus better to use sample standard deviation equation . 

$$\sigma = \frac{\sqrt{\sum(X_i - \mu)^2}}{N - 1}$$

<center>$\sigma$ = population standard deviation </center>
<center>$\sum$ = sum of </center>
<center>$X_i$ = each value in the sample </center>
<center>$\mu$= population mean</center>
<center>N= number of values in the sample</center>

#### Diference between population and sample strandard deviation 

The difference is in the denominator of the equation. In sample standard deviation its divided by N- 1 instead of only using N as when compute population standard deviation.
The reason for this is that in statistics in order to get an unbiased estimator for population standard deviation when calculating it from the sample we should be using (N-1). This is called one degree of freedom, we subtract 1 in order to get an unbiased estimator.[[6]](https://towardsdatascience.com/why-computing-standard-deviation-in-pandas-and-numpy-yields-different-results-5b475e02d112)


#### So is sample standard devaition better to use ? 

 N-1 should be used  in order to get the unbiased estimator. And this is usually the case as mostly dealing with samples, not entire populations. This is why pandas default standard deviation is computed using one degree of freedom.
This may, however, may not be always the case so be sure what your data is before you use one or the other. 

##### Code samples to prove the case for sample stardard deviation 

In [2]:
import pandas as pd
df = pd.DataFrame({'height' : [161, 156, 172], 
                   'weight': [67, 65, 89]})
df.head()

Unnamed: 0,height,weight
0,161,67
1,156,65
2,172,89


In [6]:
df.weight.std()


13.316656236958787

In [3]:
import numpy as np
np.std(df.weight)

10.873004286866728

The degree in of freedom in NumPy to change this to unbiased estimator by using ddof parameter:

In [5]:
np.std(df.weight,ddof=1)

13.316656236958787

## References 


1. Sıddık Açıl, May 6, 2018,Newton Square Root Method in Python,https://medium.com/@sddkal/newton-square-root-method-in-python-270853e9185d

2. S. G. Johnson, MIT Course 18.335,February 4, 2015,Square Roots via Newton’s Method,https://math.mit.edu/~stevenj/18.335/newton-sqrt.pdf

3. Aaron Schlegel, Mon 17 August 2020,Chi-Square Test of Independence for R x C Contingency Tables,https://aaronschlegel.me/chi-square-test-independence-contingency-tables.html

4. Reza Rajabi,Aug 15, 2019, Using Standard Deviation in Python, Mean, Standard deviation, and Error bar in Python,https://towardsdatascience.com/using-standard-deviation-in-python-77872c32ba9b


5. Magdalena Konkiewicz,Apr 29 2020,Why computing standard deviation in pandas and NumPy yields different results?
Curious? Let’s talk about statistics, populations, and samples…, https://towardsdatascience.com/why-computing-standard-deviation-in-pandas-and-numpy-yields-different-results-5b475e02d112