## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [2]:
import numpy as np
import pandas as pd
import scipy as sp
from scipy import stats

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [None]:
samples = np.random.normal(loc=100, scale=15, size=(1000))


Compute the **mean**, **median**, and **mode**

In [None]:
mean = samples.mean()
median = samples.median()
mode = sp.stats.mode(samples)

Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [None]:
min = samples.min()
max = samples.max()
q1 = np.percentile(samples, 25)
q3 = np.percentile(samples,75)
iqr = np.percentile(samples,75) - np.percentile(samples,25)

Compute the **variance** and **standard deviation**

In [None]:
variance = samples.var()
std_dev = samples.std()

Compute the **skewness** and **kurtosis**

In [None]:
skewness = sp.stats.skew(samples)
kurtosis = sp.stats.kurtosis(samples)

## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [None]:
x = np.arange(10,20)
x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [None]:
y = np.random.randint(10,30,10)
y


array([24, 28, 19, 25, 23, 23, 25, 25, 12, 28])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [None]:
r = np.corrcoef(x,y)
r

array([[ 1.        , -0.17923408],
       [-0.17923408,  1.        ]])

## Pandas Correlation Calculation

Run the code below

In [None]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

Call the relevant method  to calculate Pearson's r correlation.

In [None]:
r = sp.stats.pearsonr(x,y)
r

(0.758640289091187, 0.010964341301680813)

OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [None]:
rho = sp.stats.spearmanr(x,y)
rho

SpearmanrResult(correlation=0.9757575757575757, pvalue=1.4675461874042197e-06)

## Seaborn Dataset Tips

Import Seaborn Library

In [3]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [4]:
tips = sns.load_dataset("tips")

Generate descriptive statistics include those that summarize the central tendency, dispersion

In [5]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [6]:
tips.corr()

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.675734,0.598315
tip,0.675734,1.0,0.489299
size,0.598315,0.489299,1.0
