# Statistics

### 1. Python Statistics
In this Python Statistics , we will learn how to calculate the p-value . Moreover, we will discuss T-test  with example and code in Python Statistics.



## a. One-sample T-test with Python

Let’s try this on a single sample. The test will tell us whether means of the sample and the population are different. Consider the voting populace in India and in Gujarat. Does the average age of Gujarati voters differ from that of the population? Let’s find out.

In [1]:
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import math



In [2]:
np.random.seed(6)
population_ages1 = stats.poisson.rvs(loc = 18, mu = 35, size = 150000) #Random variates of given type.
population_ages2 = stats.poisson.rvs(loc = 18, mu = 10, size = 100000)

population_ages = np.concatenate((population_ages1, population_ages2))#Join a sequence of arrays along an existing axis.

gujarat_ages1 = stats.poisson.rvs(loc = 18, mu = 30, size = 30)
gujarat_ages2 = stats.poisson.rvs(loc = 18, mu = 10, size = 20)
 
gujarat_ages = np.concatenate((gujarat_ages1, gujarat_ages2)) #Join a sequence of arrays along an existing axis.

In [3]:
population_ages.mean()


43.000112

In [4]:
gujarat_ages.mean()

39.26

In [6]:
stats.ttest_1samp(a = gujarat_ages, popmean = population_ages.mean())

Ttest_1sampResult(statistic=-2.5742714883655027, pvalue=0.013118685425061678)

#### Calculate the T-test for the mean of ONE group of scores.

This is a two-sided test for the null hypothesis that the expected value
(mean) of a sample of independent observations `a` is equal to the given
population mean, `popmean`.


### Now this value of -2.574 tells us how aberrant the sample mean is from the null hypothesis.

## b. Two-sample T-test With Python

Such a test tells us whether two data samples have different means. Here, we take the null hypothesis that both groups have equal means. We don’t need a known population parameter for this.

In [21]:
np.random.seed(16)

maharashtra_ages1 = stats.poisson.rvs(loc = 18, mu = 33, size = 30)
maharashtra_ages2 = stats.poisson.rvs(loc = 18, mu = 13, size = 20)

maharashtra_ages = np.concatenate((maharashtra_ages1, maharashtra_ages2))

maharashtra_ages.mean()

42.16

In [22]:
stats.ttest_ind(a = gujarat_ages, b = maharashtra_ages, equal_var = False)

Ttest_indResult(statistic=-1.4048298254638962, pvalue=0.16323397693922115)

#### Calculate the T-test for the means of *two independent* samples of scores.

This is a two-sided test for the null hypothesis that 2 independent samples
have identical average (expected) values. This test assumes that the
populations have identical variances by default.


#### The value of 0.163tells us there’s a 16.3% chance that the sample data is such far apart for two identical groups. This is greater than the 5% confidence level.

## c. Paired T-test With Python

When you want to check how different samples from the same group are, you can go for a paired T-test. Let’s take an example.

In [23]:
np.random.seed(11)

before = stats.norm.rvs(scale = 30, loc = 250, size = 100) #Random variates of given type.

after = before + stats.norm.rvs(scale = 5, loc = -1.25, size = 100)

weight_df = pd.DataFrame({"weight_before":before,
                         "weight_after":after,
                         "weight_change":after-before})
weight_df

Unnamed: 0,weight_before,weight_after,weight_change
0,302.483642,305.605006,3.121364
1,241.417810,240.526071,-0.891739
2,235.463046,226.017788,-9.445258
3,170.400443,165.913930,-4.486513
4,249.751461,252.590309,2.838848
...,...,...,...
95,258.148189,256.671813,-1.476376
96,275.760015,272.554150,-3.205866
97,212.077790,218.498901,6.421112
98,283.446109,281.375144,-2.070965


In [24]:
weight_df.describe()

Unnamed: 0,weight_before,weight_after,weight_change
count,100.0,100.0,100.0
mean,250.345546,249.115171,-1.230375
std,28.132539,28.422183,4.783696
min,170.400443,165.91393,-11.495286
25%,230.421042,229.148236,-4.046211
50%,250.830805,251.134089,-1.413463
75%,270.637145,268.927258,1.738673
max,314.700233,316.720357,9.759282


In [25]:
stats.ttest_rel(a = before, b = after)

Ttest_relResult(statistic=2.5720175998568284, pvalue=0.011596444318439857)

#### Calculate the T-test on TWO RELATED samples of scores, a and b.

This is a two-sided test for the null hypothesis that 2 related or
repeated samples have identical average (expected) values.

## Correlation in Python Statistics

This is a statistical relationship between two random variables (or bivariate data). This can be causal. It is a measure of how close two variables are to holding a linear relationship to each other. One such example will be the correlation between demand and supply for a product whose supply is limited.

Correlation can denote a predictive relationship that we can exploit. To measure the degree of correlation, we can use constants like ρ or r. Benefits of correlation-

* Predicting one quantity from another
* Discovering the existence of a causal relationship
* Foundation for other modeling techniques