## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [1]:
import numpy as np
import pandas as pd
import scipy.stats as stats

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [2]:
samples = np.random.normal(100,15,1000)


Compute the **mean**, **median**, and **mode**

In [12]:
mean =np.mean(samples)
median = np.median(samples)
mode = stats.mode(samples)
print(mean)
print(median)
print(mode)

99.63333123830938
99.82905026937968
ModeResult(mode=array([47.68199818]), count=array([1]))


Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [19]:
mini = np.min(samples)
print(mini)
maxi = np.max(samples)
print(maxi)
q1 = np.percentile(samples,25)
print(q1)
q3 = np.percentile(samples,75)
print(q3)
iqr = q3-q1
iqr2= stats.iqr(samples)
print(iqr)
print(iqr2)

47.6819981831429
151.6298688175682
89.34782423656173
109.70205749985871
20.354233263296976
20.354233263296976


Compute the **variance** and **standard deviation**

In [16]:
variance = np.var(samples)
print(variance)
std_dev = np.std(samples)
print(std_dev)

229.46909883888284
15.148237482918033


Compute the **skewness** and **kurtosis**

In [20]:
skewness = stats.skew(samples)
print(skewness)
kurtosis = stats.kurtosis(samples)
print(kurtosis)


-0.09181226111139426
0.09272957306230367


## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [21]:
x = np.arange(10,20)
x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [23]:
y = np.array([1,2,3,4,5,6,7,8,9,10])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [26]:
r = np.corrcoef(x, y)
r

array([[1., 1.],
       [1., 1.]])

## Pandas Correlation Calculation

Run the code below

In [27]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

Call the relevant method  to calculate Pearson's r correlation.

In [31]:
r = scipy.stats.pearsonr(x, y)
print(r)

(0.758640289091187, 0.010964341301680813)


OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

In [32]:
rho = scipy.stats.spearmanr(x, y)
print(rho)

SpearmanrResult(correlation=0.9757575757575757, pvalue=1.4675461874042197e-06)


## Seaborn Dataset Tips

Import Seaborn Library

In [33]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [34]:
tips = sns.load_dataset("tips")

Generate descriptive statistics include those that summarize the central tendency, dispersion

In [44]:
print(stats.iqr(tips["total_bill"]))
print(stats.iqr(tips["tip"]))

10.779999999999998
1.5624999999999996


In [41]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


In [39]:
stats.mode(tips)

ModeResult(mode=array([[13.42, 2.0, 'Male', 'No', 'Sat', 'Dinner', 2]], dtype=object), count=array([[  3,  33, 157, 151,  87, 176, 156]]))

In [38]:
print(tips.agg(["min","median","std", "var"]))

Unnamed: 0,total_bill,tip,size
min,3.07,1.0,1.0
median,17.795,2.9,2.0
std,8.902412,1.383638,0.9511
var,79.252939,1.914455,0.904591


Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [36]:
tips.corr(method='spearman')

Unnamed: 0,total_bill,tip,size
total_bill,1.0,0.678968,0.604791
tip,0.678968,1.0,0.468268
size,0.604791,0.468268,1.0
