## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [2]:
import numpy as np
import pandas as pd
from scipy import stats

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [3]:
samples = np.random.normal(100, 15, 1000)
samples

array([107.77553215, 108.03335547,  91.81959928, 120.3052975 ,
       122.21640471, 117.93606629,  96.45822422,  84.34880284,
       127.43651463, 124.32422736,  90.21362715, 102.56292792,
       109.72157691, 114.49833973, 103.78674705, 131.47714786,
        77.87034363, 117.20516795, 115.97492989, 108.40500827,
        77.70693774,  85.31256178,  98.22901382, 105.86391652,
        85.73153214,  79.00942106,  98.1793417 , 108.40963947,
       132.75774613, 102.48674119,  87.03964072, 100.43293146,
       104.0891533 , 107.16432727, 100.82613035, 103.96460029,
        97.19871291,  93.52549618, 114.42533446, 116.8708648 ,
        88.56600627, 108.57298479, 101.38554709, 121.10433979,
        94.56668141, 111.09502894,  87.61737376, 122.01888612,
       132.44816599, 128.54807744,  86.14001869, 126.53321661,
        75.11668672,  97.11604557,  94.88740049, 109.29109424,
       102.01795531,  94.59625813, 103.88966722, 107.22031405,
       126.79005585,  97.36927085, 100.12145792,  87.76

Compute the **mean**, **median**, and **mode**

In [4]:
mean = np.mean(samples)
median = np.median(samples)
mode = stats.mode(samples)

print("mean :", mean)
print("median :", median)
print("mode :", mode)

mean : 99.98095029811792
median : 100.46988057479228
mode : ModeResult(mode=array([54.02864581]), count=array([1]))


Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [None]:
from numpy.random.mtrand import sample
min = np.min(samples)
max = np.max(samples)
q1 = np.percentile(samples, 25)
q3 = np.percentile(samples, 75)
iqr = stats.iqr(samples)

print("min :", min)
print("max :", max)
print("q1 :", q1)
print("q3 :", q3)
print("iqr :", iqr)

min : 48.53730534288019
max : 149.62625068446053
q1 : 91.14731391634174
q3 : 111.63797972369434
iqr : 20.4906658073526


Compute the **variance** and **standard deviation**

In [None]:
variance = np.var(samples)
std_dev = np.std(samples)

print("variance :", variance)
print("standart deviation :", std_dev)

variance : 228.7831900912204
standart deviation : 15.125580653026859


Compute the **skewness** and **kurtosis**

In [None]:
skewness = stats.skew(samples)
kurtosis = stats.kurtosis(samples)

print("skewness :", skewness)
print("kurtosis :", kurtosis)

skewness : -0.012121099683658536
kurtosis : 0.04399016109062437


## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [5]:
x = np.arange(10,20)
print(x)

[10 11 12 13 14 15 16 17 18 19]


Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [8]:
y = np.array([3, 5, 1, 6, 9, 2, 4, 8, 7, 10])
print(y)

[ 3  5  1  6  9  2  4  8  7 10]


Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [9]:
r = np.corrcoef(x,y)
print(r)

[[1.  0.6]
 [0.6 1. ]]


## Pandas Correlation Calculation

Run the code below

In [10]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

Call the relevant method  to calculate Pearson's r correlation.

In [14]:
r = stats.pearsonr(x,y)
print(r)  # gives us pearson's correlation coefficient and p-value

(0.758640289091187, 0.010964341301680813)


OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [15]:
rho = stats.spearmanr(x,y)
print(rho)  # gives us spearsman's correlation coefficient and p-value

SpearmanrResult(correlation=0.9757575757575757, pvalue=1.4675461874042197e-06)


## Seaborn Dataset Tips

Import Seaborn Library

In [16]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [17]:
tips = sns.load_dataset("tips")

Generate descriptive statistics include those that summarize the central tendency, dispersion

In [18]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [19]:
tips.corr()

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.675734,0.598315
tip,0.675734,1.0,0.489299
size,0.598315,0.489299,1.0
