## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [12]:
import numpy as np
import pandas as pd
import scipy as sp

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [17]:
samples = np.random.normal(loc = 100 , scale = 15, size = 1000)
np.set_printoptions(threshold=10)
samples

array([ 97.0138156 ,  77.17235376,  95.57763619, ...,  61.15326964,
        99.70764246, 123.72026553])

Compute the **mean**, **median**, and **mode**

In [18]:
from scipy import stats

mean =samples.mean()
median =np.median(samples)
mode = stats.mode(samples)
print(" mean : ", mean,"\n", "median : ", median, "\n","mode : ", mode) 

 mean :  100.9167276353743 
 median :  100.9080140849536 
 mode :  ModeResult(mode=array([48.55833084]), count=array([1]))


Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [21]:
min = samples.min()
max = samples.max()
q1 = np.percentile(samples, 25)
q3 = np.percentile(samples, 75)
iqr = q3- q1
print("min :" , min,
      "\nmax :",  max,
      "\nq1  :",   q1,
      "\nq3  :",  q3,
      "\niqr :",  iqr)

min : 48.55833083771193 
max : 141.19645298969857 
q1  : 91.0593066872583 
q3  : 111.33495736141468 
iqr : 20.27565067415638


Compute the **variance** and **standard deviation**

In [23]:
variance = np.var(samples)
std_dev = np.std(samples)
print("Variance : " , variance , "\nstd_dev : ", std_dev)

Variance :  224.8562243646913 
std_dev :  14.995206712969692


Compute the **skewness** and **kurtosis**

In [29]:
skewness = stats.skew(samples)
kurtosis = stats.kurtosis(samples)
print("skewness :", skewness,"\nkurtosis :", kurtosis)

skewness : -0.09118814730951079 
kurtosis : -0.11076530873153168


## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [32]:
x = np.arange(10,20)
x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [40]:
y = np.array(np.random.randint(20,size=10))
y

array([17,  7, 14,  5,  8, 13, 14,  6,  5, 11])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [41]:
r = np.corrcoef(x,y)
r

array([[ 1.        , -0.32087225],
       [-0.32087225,  1.        ]])

## Pandas Correlation Calculation

Run the code below

In [42]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

Call the relevant method  to calculate Pearson's r correlation.

In [43]:
r = x.corr(y)
r

0.7586402890911867

OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [45]:
rho =x.corr(y, method ="spearman")
rho

0.9757575757575757

## Seaborn Dataset Tips

Import Seaborn Library

In [46]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [49]:
tips = sns.load_dataset("tips")
tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


Generate descriptive statistics include those that summarize the central tendency, dispersion

In [50]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [52]:
tip=tips["tip"]
size=tips["size"]
r=tip.corr(size)
r

0.4892987752303577