## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [1]:
import numpy as np
from scipy import stats
import pandas as pd

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [2]:
samples = np.random.normal(size=1000, loc = 100, scale = 15) # samples = np.random.normal(100, 15, 1000)

Compute the **mean**, **median**, and **mode**

In [3]:
mean = np.mean(samples)
median = np.median(samples)
mode = stats.mode(samples)

Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [4]:
min = np.min(samples)
max = np.max(samples)
q1 = np.percentile(samples, 25, interpolation = 'midpoint')
q3 = np.percentile(samples, 75, interpolation = 'midpoint')
iqr = q3 - q1

Compute the **variance** and **standard deviation**

In [5]:
variance = np.var(samples)
std_dev = np.std(samples)

Compute the **skewness** and **kurtosis**

In [6]:
skewness = stats.skew(samples)
kurtosis = stats.kurtosis(samples)

## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [7]:
x = np.arange(10, 20)

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [8]:
y = np.array(np.random.randint(10, size=10))

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [9]:
r = np.corrcoef(x, y)

## Pandas Correlation Calculation

Run the code below

In [10]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

Call the relevant method  to calculate Pearson's r correlation.

In [11]:
r = x.corr(y)

OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [12]:
rho = x.corr(y, method='spearman')

## Seaborn Dataset Tips

Import Seaborn Library

In [13]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [14]:
tips = sns.load_dataset("tips")

In [15]:
tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


Generate descriptive statistics include those that summarize the central tendency, dispersion

In [16]:
tips.corr()

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.675734,0.598315
tip,0.675734,1.0,0.489299
size,0.598315,0.489299,1.0


In [17]:
tips.total_bill.describe()

count    244.000000
mean      19.785943
std        8.902412
min        3.070000
25%       13.347500
50%       17.795000
75%       24.127500
max       50.810000
Name: total_bill, dtype: float64

In [18]:
tips.tip.describe()

count    244.000000
mean       2.998279
std        1.383638
min        1.000000
25%        2.000000
50%        2.900000
75%        3.562500
max       10.000000
Name: tip, dtype: float64

Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [19]:
r_tips = tips.total_bill.corr(tips.tip)
r_tips

0.6757341092113641