## Descriptive Statistics

In [None]:
from scipy import stats

 Import **NumPy**, **SciPy**, and **Pandas**

In [None]:
import numpy as np
import scipy as sp
import pandas as pd

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [None]:
samples = np.random.normal(100, 15, 1000)
samples

array([ 69.9981563 ,  82.27807429,  84.26246677,  95.95183012,
       131.256004  ,  78.77556854,  88.95325283,  88.10211349,
       120.37414438,  85.55777907, 111.95868618, 133.21019461,
        86.29847956, 126.09845134, 113.06770859,  83.43125405,
       107.21406732, 111.93278694, 103.24598121,  98.97172385,
        68.1954199 ,  85.79593222,  97.06820915,  95.50550778,
        83.75473404, 104.60825225, 107.28551588,  74.24347963,
       109.31796174, 117.59941815, 116.99672632, 107.25522438,
        99.54496393, 106.74640813,  92.23857975, 107.28098657,
        84.02285029,  58.95642323,  87.4554732 , 109.89873629,
       106.87656154, 103.55366919,  97.71796194, 118.32523423,
       101.50620216,  70.79148856, 105.14952307, 118.61321743,
        90.16734191,  97.05063475, 125.11055892,  71.65468615,
       105.54679739,  68.78094801,  96.24144553,  91.8197782 ,
       116.97379104,  91.4336985 ,  91.04655016, 115.84465703,
       104.84056719,  90.99268475,  99.56917749, 103.90

Compute the **mean**, **median**, and **mode**

In [None]:
mean = samples.mean()
median = np.median(samples)
mode = stats.mode(samples)

print( mean )
print(median)
print(mode)

100.38094701310257
100.18951000413362
ModeResult(mode=array([56.4045]), count=array([1]))


Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [None]:
min = samples.min()
max = samples.max()
q1 = np.percentile(samples, [25])
q3 = np.percentile(samples, [75])
iqr = stats.iqr(samples)

print(min)
print(max)
print(q1)
print(q3)
print(iqr)

56.40449999949959
156.61366082091183
[91.13890098]
[109.99612091]
18.857219929696413


Compute the **variance** and **standard deviation**

In [None]:
variance = samples.var()
std_dev = samples.std()

print(variance)
print(std_dev)

212.46171821153624
14.576066623459713


Compute the **skewness** and **kurtosis**

In [None]:
from scipy.stats import kurtosis
from scipy.stats import skew

In [None]:
skewness = skew(samples, bias=False)
kurtosis = kurtosis(samples, bias=False)

print(skewness)
print(kurtosis)

0.07802009127843344
0.20315228513347172


## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [None]:
x = np.arange(10, 20)
x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [None]:
y = np.array([1,2,3,4,5,6,7,8,9,10])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [None]:
r = np.corrcoef(x,y)
r

array([[1., 1.],
       [1., 1.]])

## Pandas Correlation Calculation

Run the code below

In [None]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

Call the relevant method  to calculate Pearson's r correlation.

In [None]:
r = stats.pearsonr(x, y)
r

(0.758640289091187, 0.010964341301680813)

OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [None]:
rho = stats.spearmanr(x, y)
rho

SpearmanrResult(correlation=0.9757575757575757, pvalue=1.4675461874042197e-06)

## Seaborn Dataset Tips

Import Seaborn Library

In [None]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [None]:
tips = sns.load_dataset("tips")

Generate descriptive statistics include those that summarize the central tendency, dispersion

In [None]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [None]:
tips.corr('pearson')

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.675734,0.598315
tip,0.675734,1.0,0.489299
size,0.598315,0.489299,1.0
