## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [13]:
import numpy as np
from scipy import stats
import pandas as pd

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [23]:
samples = np.random.normal(100, 15, 1000)
# print(samples)

[ 79.80889473  92.34952435  89.78285853  78.24141216  89.02317628
 106.69770015  93.22568192 102.54334572  81.79743855  82.62397703
  79.03821479  89.68812124 103.29264215  61.96538738 101.77110708
 119.94562347  92.26675714  76.46025013  79.28669221 134.89480816
  77.95890891 110.31773312 108.4932318   89.20277671 111.59885806
  83.46688444  97.21458831  84.92750697  99.56097934 124.18546283
  97.37081522 102.55625667  77.53037409 101.11340709 119.59165647
  84.84063362 110.42839335 105.94464196  91.23426139  91.78369729
 104.71473631  96.74631289 112.34964278 141.33647979  68.88694238
  75.01195191  88.00019059  63.28639545  79.74134946  84.7450934
 124.1182374  102.6227148  121.56808939 104.21138262 107.71384468
  83.00031537 121.79438301  90.56647587 100.86632853  74.83481157
 100.49005585  92.98671234 110.33657862  85.77827715 117.90126829
  82.51201149  76.46496851  83.53131569  95.21049158  98.37433051
 112.34863347 101.39097919 108.54999272 127.49017211 112.6049158
  86.8205660

Compute the **mean**, **median**, and **mode**

In [19]:
mean = np.mean(samples)
# print(mean)
median = np.median(samples)
# print(median)
mode = stats.mode(samples)
# print(mode)

99.70891072875098
99.20524343918862
ModeResult(mode=array([49.42767269]), count=array([1]))


Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [20]:
min = np.min(samples)
# print(min)
max = np.max(samples)
# print(max)
q1 = np.percentile(samples, 25)
# print(q1)
q3 = np.percentile(samples, 75)
# print(q3)
iqr = stats.iqr(samples)
# print(iqr)

49.427672691231834
153.951215924907
89.7703577766421
109.52866906223491
19.75831128559281


Compute the **variance** and **standard deviation**

In [21]:
variance = np.var(samples)
# print(variance)
std_dev = np.std(samples)
# print(std_dev)

223.42067335213645
14.947263072286393


Compute the **skewness** and **kurtosis**

You can use [`scipy.stats.skew`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.skew.html) and [`scipy.stats.kurtosis`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kurtosis.html)

In [22]:
skewness = stats.skew(samples)
# print(skewness)
kurtosis = stats.kurtosis(samples)
# print(kurtosis)

0.08368892031938245
0.04643691598784194


## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [28]:
x = np.arange(10, 20)
# x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [29]:
y = np.array([0,1,2,3,4,5,6,7,8,9])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [31]:
r = np.corrcoef(x, y)
# r

array([[1., 1.],
       [1., 1.]])

## Pandas Correlation Calculation

Run the code below

In [40]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])
# print(x)
# print(y)

0    10
1    11
2    12
3    13
4    14
5    15
6    16
7    17
8    18
9    19
dtype: int64
0     2
1     1
2     4
3     5
4     8
5    12
6    18
7    25
8    96
9    48
dtype: int64


Call the relevant method  to calculate Pearson's r correlation.

In [33]:
r =stats.pearsonr(x, y)
# r

(0.758640289091187, 0.010964341301680813)

OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [35]:
rho = stats.spearmanr(x, y)
# rho

SpearmanrResult(correlation=0.9757575757575757, pvalue=1.4675461874042197e-06)

## Seaborn Dataset Tips

Import Seaborn Library

In [1]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [38]:
tips = sns.load_dataset("tips")
tips.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


Generate descriptive statistics include those that summarize the central tendency, dispersion

In [41]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [45]:
tips.corr()

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.675734,0.598315
tip,0.675734,1.0,0.489299
size,0.598315,0.489299,1.0
