## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [1]:
import numpy as np
import pandas as pd
import scipy as sp

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [2]:
samples = np.random.normal(loc = 100, scale = 15, size = 1000)
samples

array([ 98.4896607 ,  83.45350218, 101.58148616,  71.25620735,
       114.87932552, 109.16959404,  92.69354055,  95.45610533,
        84.75824611, 101.28142097,  71.11021139,  97.07663488,
        80.60874879,  98.79439494,  86.82472196, 101.53563619,
        85.65659174, 128.74568091,  76.33918934, 106.20996825,
       114.28244877, 104.58565601,  95.41360444,  90.70757205,
       109.47640999, 101.88569735, 110.8541465 ,  88.76409105,
       105.7703377 , 101.49636378,  98.52096159,  81.32587601,
        87.17908569, 105.33262364,  98.93043141,  93.86540438,
       126.10993396, 105.48148138, 103.3006304 ,  92.9646031 ,
        99.94378828,  94.53844452,  92.76836714,  93.41784319,
        90.64216003, 124.31985798, 111.24354086, 101.4846582 ,
        94.57618084,  90.06847564, 124.09490794, 100.78226669,
       113.65952732,  93.64671964,  98.37115871,  90.73878734,
        98.55019712, 118.17100394,  85.17609991,  93.17336673,
       113.878565  ,  94.28674852, 101.30405427, 103.15

Compute the **mean**, **median**, and **mode**

In [3]:
from scipy import stats
mean = samples.mean()
median = np.median(samples)
mode= stats.mode(samples)



print("Mean: ", mean)
print("Median: ", median)
print("Mode: ", mode)

Mean:  99.98403329900586
Median:  99.81279058935331
Mode:  ModeResult(mode=array([53.70470234]), count=array([1]))


Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [4]:
min = samples.min()
max = samples.max()
q1 = np.quantile(samples, 0.25)
q3 = np.quantile(samples, 0.75)
iqr = q3 - q1 

print("Min: ", min)
print("Max: ", max)
print("Q1: ", q1)
print("Q3: ", q3)
print("iqr ", iqr)

Min:  53.70470233863507
Max:  142.7209700790326
Q1:  90.43945419100623
Q3:  109.86848692586221
iqr  19.42903273485598


Compute the **variance** and **standard deviation**

In [23]:
stats.tvar(samples)

219.56734332306536

In [5]:
df1 = pd.DataFrame(samples)

variance = df1.var()

st_dev = df1.std()

print("Variance:", variance)
print("Standard Deviation:", st_dev)

Variance: 0    219.567343
dtype: float64
Standard Deviation: 0    14.817805
dtype: float64


Compute the **skewness** and **kurtosis**

In [24]:
stats.skew(samples)

-0.058200927324763375

In [25]:
stats.kurtosis(samples)

0.059544494113888735

In [27]:
skewness = df1.skew()
kurtosis = df1.kurtosis()

print("Skewness:", skewness)
print("Kurtosis:", kurtosis)

Skewness: 0   -0.058288
dtype: float64
Kurtosis: 0    0.065867
dtype: float64


## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [8]:
x = np.arange(10,20)
x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [9]:
y = np.array([1,2,3,4,5,6,7,8,9,10])
y

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [10]:
r = np.corrcoef(x, y)
r

array([[1., 1.],
       [1., 1.]])

## Pandas Correlation Calculation

Run the code below

In [16]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

Call the relevant method  to calculate Pearson's r correlation.

In [17]:
r, p = stats.pearsonr(x, y)
r  

0.7586402890911869

OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [20]:
rho = stats.spearmanr(x, y)
rho

SpearmanrResult(correlation=0.9757575757575757, pvalue=1.4675461874042197e-06)

## Seaborn Dataset Tips

Import Seaborn Library

In [28]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [29]:
tips = sns.load_dataset("tips")

Generate descriptive statistics include those that summarize the central tendency, dispersion

In [30]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [31]:
tips.corr(method = "pearson")

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.675734,0.598315
tip,0.675734,1.0,0.489299
size,0.598315,0.489299,1.0
