## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [3]:
import numpy as np
import pandas as pd
from scipy import stats

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [11]:
sample = np.random.normal(100,15, size=1000)

Compute the **mean**, **median**, and **mode**

In [22]:
mean = sample.mean()
median = np.median(sample)
mode = stats.mode(sample)
print(mean, median, mode, sep="\n")

99.81146762271005
99.40347127453097
ModeResult(mode=array([39.57526798]), count=array([1]))


Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [32]:
min = sample.min()
max = sample.max()
q1 = np.percentile(sample, 25)
q3 = np.percentile(sample, 75)
iqr = q3 - q1
print(min, max, q1, q3, iqr, sep="\n")

39.57526798473096
151.5407937200726
89.30111215985077
110.36694679176382
21.06583463191305


Compute the **variance** and **standard deviation**

In [34]:
variance = sample.var()
std_dev = sample.std()
print(variance, std_dev, sep="\n")

232.57152395394422
15.250295864472408


Compute the **skewness** and **kurtosis**

In [42]:
skewness = stats.skew(sample)
kurtosis = stats.kurtosis(sample)
print(skewness, kurtosis, sep="\n")

0.0510223025523639
-0.036859005860049976


## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [43]:
x = np.arange(10,20)
x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [57]:
y = np.array([5,11,16,18,4,9,12,16,3,0])
y

array([ 5, 11, 16, 18,  4,  9, 12, 16,  3,  0])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [58]:
r = np.corrcoef(x,y)
r

array([[ 1.        , -0.33620859],
       [-0.33620859,  1.        ]])

## Pandas Correlation Calculation

Run the code below

In [60]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

Call the relevant method  to calculate Pearson's r correlation.

In [62]:
x.corr(y)

0.7586402890911867

OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [63]:
x.corr(y, method="spearman")

0.9757575757575757

## Seaborn Dataset Tips

Import Seaborn Library

In [64]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [65]:
tips = sns.load_dataset("tips")

Generate descriptive statistics include those that summarize the central tendency, dispersion

In [69]:
tips

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


In [70]:
tips.mean()

total_bill    19.785943
tip            2.998279
size           2.569672
dtype: float64

In [71]:
tips.median()

total_bill    17.795
tip            2.900
size           2.000
dtype: float64

In [72]:
tips.mode()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,13.42,2.0,Male,No,Sat,Dinner,2


In [66]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


In [82]:
range = tips.max(numeric_only=True) - tips.min(numeric_only=True)
range

total_bill    47.74
tip            9.00
size           5.00
dtype: float64

In [83]:
tips.std()

total_bill    8.902412
tip           1.383638
size          0.951100
dtype: float64

In [85]:
tips_iqr = tips.quantile(0.75) - tips.quantile(0.25)
tips_iqr

total_bill    10.7800
tip            1.5625
size           1.0000
dtype: float64

Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [68]:
tips.corr()

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.675734,0.598315
tip,0.675734,1.0,0.489299
size,0.598315,0.489299,1.0
