## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [1]:
import numpy as np
import pandas as pd
from scipy import stats

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [2]:
samples = np.random.normal(loc= 100, scale = 15,size=1000)
samples

array([ 96.56283237,  77.39712795,  98.13360988,  91.86042772,
        96.32588417,  92.2887569 ,  89.15013017, 124.42036666,
       107.5436116 , 105.01522323,  70.76930573, 105.87155104,
       109.16573365,  86.70395345,  70.689637  , 103.95597431,
       103.89717537, 123.56440378, 112.45923426, 100.04968952,
       107.8384563 ,  86.39330979,  97.05317662, 100.63796476,
       104.38666843, 108.40219536, 105.39240382,  92.19658789,
        80.5623928 ,  97.96554589, 106.74447273,  97.95551881,
        95.19860094,  71.87045879, 106.63932832,  90.45116367,
        99.87914073, 113.54100596,  75.36882421,  87.69654321,
       125.37074873,  94.84455819,  81.68105981, 113.04960231,
       131.75985377,  97.10216472,  94.62951314, 106.54795399,
       128.58940498,  76.73781055,  87.119554  , 127.81383796,
       116.12252175, 115.15262062, 103.55021981,  86.73660892,
        96.8916886 ,  85.34305614,  91.35964276,  98.63029405,
       103.33571816,  90.72739473,  91.8375861 ,  97.24

Compute the **mean**, **median**, and **mode**

In [3]:
mean = np.mean(samples)
median = np.median(samples)
mode = stats.mode(samples)

In [4]:
mean

99.25041469240325

In [5]:
median

99.12226371057167

In [6]:
mode

ModeResult(mode=array([51.57253376]), count=array([1]))

Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [7]:
min = np.min(samples)
max = np.max(samples)
q1 = np.percentile(samples,25)
q3 = np.percentile(samples,75)
iqr = q3-q1

In [8]:
min

51.572533759495236

In [9]:
max

147.05317818665128

In [10]:
q1

89.3096141199515

In [11]:
q3

109.3600663103575

In [12]:
iqr

20.05045219040599

Compute the **variance** and **standard deviation**

In [13]:
variance = np.var(samples)
std_dev = np.std(samples)

In [14]:
variance

218.62082048372866

In [15]:
std_dev

14.785831748120518

Compute the **skewness** and **kurtosis**

In [16]:
skewness = stats.skew(samples)
kurtosis = stats.kurtosis(samples)

## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [17]:
x = np.arange(10,20)
x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [18]:
y = np.array([1,3,4,4,6,8,9,10,7,7])
y

array([ 1,  3,  4,  4,  6,  8,  9, 10,  7,  7])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [19]:
r = np.corrcoef(x,y)
r

array([[1.        , 0.83170436],
       [0.83170436, 1.        ]])

## Pandas Correlation Calculation

Run the code below

In [20]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

Call the relevant method  to calculate Pearson's r correlation.

In [21]:
r = x.corr(y)
r

0.7586402890911867

In [22]:
b = y.corr(x)
b

0.7586402890911866

OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [23]:
rho = x.corr(y, method='spearman')  
rho

0.9757575757575757

## Seaborn Dataset Tips

Import Seaborn Library

In [24]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [26]:
tips = sns.load_dataset("tips")
tips

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


Generate descriptive statistics include those that summarize the central tendency, dispersion

In [27]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [29]:
np.corrcoef(tips["total_bill"], tips["tip"])

array([[1.        , 0.67573411],
       [0.67573411, 1.        ]])