## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [2]:
samples = np.random.normal(100,15,1000)
samples

array([ 62.43986393, 101.18890505, 112.34834931,  95.29765761,
        82.38710332, 113.5155811 ,  75.09093297,  97.9937521 ,
       104.69514886, 104.81544303,  98.08773012,  66.92655134,
        87.67416479,  99.72816672,  99.97078175, 101.41905413,
        84.83385193, 113.02801104, 118.40765539,  84.63603667,
       142.9370346 , 109.83029377,  96.92527864, 113.3131116 ,
        97.05171253,  95.57835036, 110.7936414 , 130.82745576,
        98.43017439, 111.69839511, 105.9692571 ,  98.86567727,
        83.61392927,  97.28061458, 115.68846375,  96.8039867 ,
        93.86750793, 107.45230462,  97.89165661, 108.62683604,
       110.24834396, 121.29461941, 102.56203509, 117.4766798 ,
        75.71694091,  87.83737499,  92.61621427, 111.37852305,
       107.72882233, 123.83358301,  95.79456118, 113.60185408,
       101.55522685,  75.67974567,  77.30606282, 119.56266423,
        98.11955084, 106.44836985,  94.24629751,  81.03692698,
        83.22491268, 134.53525738, 111.05797509,  95.87

Compute the **mean**, **median**, and **mode**

In [3]:
mean = np.mean(samples)
median = np.median(samples)
mode = stats.mode(samples)
print("mean",mean)
print("median",median)
print("mode",mode)

mean 100.31221735812869
median 100.52282042896589
mode ModeResult(mode=array([38.71689988]), count=array([1]))


Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [7]:
min = np.min(samples)
max = np.max(samples)
q1 = np.percentile(samples, 25)
q3 = np.percentile(samples, 75)
iqr = q3 - q1
print(f"min : {min}\nmax : {max}\nq1 : {q1}\nq3 : {q3}\niqr : {iqr}")

min : 38.71689987958824
max : 145.8441176758177
q1 : 90.29822940003451
q3 : 110.71287616725161
iqr : 20.414646767217093


Compute the **variance** and **standard deviation**

In [9]:
variance = np.var(samples)
std_dev = np.corrcoef(samples)
print(f"variance : {variance}\nstd_dev : {std_dev}")

variance : 223.53607332034028
std_dev : 1.0


Compute the **skewness** and **kurtosis**

In [10]:
skewness = stats.skew(samples)
kurtosis = stats.kurtosis(samples)
print("skewness :",skewness)
print("kurtosis:", kurtosis)

skewness : -0.05833732725867545
kurtosis: -0.029010417810407585


## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [12]:
x = np.arange(10,20)
print(x)

[10 11 12 13 14 15 16 17 18 19]


Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [16]:
y = np.array([12,34,-8,45,67,54,10,99,123,300])
print(y)

[ 12  34  -8  45  67  54  10  99 123 300]


Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [17]:
r = np.corrcoef(x,y)
r

array([[1.        , 0.74690856],
       [0.74690856, 1.        ]])

## Pandas Correlation Calculation

Run the code below

In [18]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

Call the relevant method  to calculate Pearson's r correlation.

In [20]:
r =y.corr(x)
r

0.7586402890911869

OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [22]:
rho = x.corr(y,method="spearman")
rho

0.9757575757575757

## Seaborn Dataset Tips

Import Seaborn Library

In [23]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [24]:
tips = sns.load_dataset("tips")

Generate descriptive statistics include those that summarize the central tendency, dispersion

In [25]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [26]:
tips.corr("pearson")

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.675734,0.598315
tip,0.675734,1.0,0.489299
size,0.598315,0.489299,1.0
