## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [2]:
import numpy as np
import scipy as sp
import pandas as pd

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [4]:
samples = np.random.normal(100, 15, size = 1000)
samples

array([105.22682645,  97.73588995,  97.53112924, 103.00488087,
        97.81679462,  82.54713554,  95.78682363, 119.4057207 ,
        79.33151708,  80.7431314 , 113.39222777,  85.31942357,
       102.98868983, 116.99567083, 102.07070427, 104.24708699,
        96.30340483,  96.32965921,  78.52573934, 113.80366073,
       119.93796206,  82.30474339,  85.32754007,  80.07160093,
        92.30725091,  69.37976414,  87.96058171, 115.1883286 ,
       131.83209936, 102.73918617,  98.93103452, 120.45554653,
        98.1517433 , 124.88618882, 116.17998901, 112.48120657,
       101.10694173,  71.23166891, 111.46077764, 122.83994827,
       104.83411758,  90.61919399, 106.79793101,  94.89223816,
        80.95224308, 103.44459217, 111.36224988, 108.38602804,
       100.89113368, 105.50699842,  88.52155613, 122.42608668,
        91.97543708,  61.89323167, 119.75274992, 107.32286996,
       105.37679318, 106.74639012,  76.2907163 , 129.20154665,
       115.51098913,  95.63275565,  72.39755492,  99.92

Compute the **mean**, **median**, and **mode**

In [11]:
from scipy import stats
mean = samples.mean()
median = np.median (samples)
mode = stats.mode (samples)
print (mean,"\n", median,"\n", mode)

99.82045389313572 
 99.53605835597276 
 ModeResult(mode=array([51.30695786]), count=array([1]))


Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [17]:
min = samples.min()
max = samples.max()
q1 = np. percentile(samples, 25)
q3 = np. percentile(samples,75)
iqr = q3-q1
print(min,"\n",max,"\n",iqr)

51.306957857066394 
 162.87612580218797 
 19.014085664104684


Compute the **variance** and **standard deviation**

In [19]:
variance = samples.var()
std_dev = samples.std()
print (variance,"\n",std_dev )

224.20516724748197 
 14.973482135010613


Compute the **skewness** and **kurtosis**

In [20]:
skewness = stats.skew (samples)
kurtosis =stats. kurtosis(samples)
print (skewness,"\n",kurtosis)

0.09263468896813584 
 0.4366727701731956


## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [21]:
x = np.arange(10,20)
x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [22]:
y = np. array(np.random.randint(20, size=10))
y

array([13, 19, 15, 14, 12, 14,  0,  0,  0, 18])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [23]:
r = np. corrcoef (x, y)
r

array([[ 1.       , -0.4936193],
       [-0.4936193,  1.       ]])

## Pandas Correlation Calculation

Run the code below

In [24]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

Call the relevant method  to calculate Pearson's r correlation.

In [26]:
r = x. corr (y)
r

0.7586402890911867

OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [28]:
rho =x. corr (y, method = "spearman")
rho

0.9757575757575757

## Seaborn Dataset Tips

Import Seaborn Library

In [32]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [33]:
tips = sns.load_dataset("tips")


Generate descriptive statistics include those that summarize the central tendency, dispersion

In [35]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


In [36]:
tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [37]:
tips.corr()

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.675734,0.598315
tip,0.675734,1.0,0.489299
size,0.598315,0.489299,1.0
