## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [2]:
import numpy as np
from scipy import stats
import pandas as pd

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

In [5]:
samples = np.random.normal(100, 15, 1000)

Compute the **mean**, **median**, and **mode**

In [6]:
mean = np.mean(samples)
median =np.median(samples)
mode = stats.mode(samples)

Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [7]:
min = np.min(samples)
max = np.max(samples)
q1 = np.percentile(samples, 25)
q3 = np.percentile(samples, 75)
iqr = stats.iqr(samples)

Compute the **variance** and **standard deviation**

In [8]:
variance = np.var(samples)
std_dev = np.std(samples)

Compute the **skewness** and **kurtosis**

You can use [`scipy.stats.skew`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.skew.html) and [`scipy.stats.kurtosis`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kurtosis.html)

In [11]:
from scipy.stats import skew, kurtosis
skewness = skew(samples)
kurtosis = kurtosis(samples)

## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [12]:
x = np.arange(10, 20)

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [15]:
y = np.array([1, 3, 5, 7, 9, 2, 4, 6, 8, 11])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [16]:
r = np.corrcoef(x, y)

## Pandas Correlation Calculation

Run the code below

In [17]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

Call the relevant method  to calculate Pearson's r correlation.

In [18]:
r = stats.pearsonr(x, y)

OPTIONAL. Call the relevant method to calculate Spearman's rho correlation.

> Girintili blok



In [19]:
rho = stats.spearmanr(x, y)

## Seaborn Dataset Tips

Import Seaborn Library

In [20]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [23]:
tips = sns.load_dataset("tips")
print(tips)

     total_bill   tip     sex smoker   day    time  size
0         16.99  1.01  Female     No   Sun  Dinner     2
1         10.34  1.66    Male     No   Sun  Dinner     3
2         21.01  3.50    Male     No   Sun  Dinner     3
3         23.68  3.31    Male     No   Sun  Dinner     2
4         24.59  3.61  Female     No   Sun  Dinner     4
..          ...   ...     ...    ...   ...     ...   ...
239       29.03  5.92    Male     No   Sat  Dinner     3
240       27.18  2.00  Female    Yes   Sat  Dinner     2
241       22.67  2.00    Male    Yes   Sat  Dinner     2
242       17.82  1.75    Male     No   Sat  Dinner     2
243       18.78  3.00  Female     No  Thur  Dinner     2

[244 rows x 7 columns]


Generate descriptive statistics include those that summarize the central tendency, dispersion

In [24]:
tips.describe()

Unnamed: 0,total_bill,tip,size
count,244.0,244.0,244.0
mean,19.785943,2.998279,2.569672
std,8.902412,1.383638,0.9511
min,3.07,1.0,1.0
25%,13.3475,2.0,2.0
50%,17.795,2.9,2.0
75%,24.1275,3.5625,3.0
max,50.81,10.0,6.0


Call the relevant method to calculate pairwise Pearson's r correlation of columns

In [26]:
stats.pearsonr(tips["total_bill"], tips["tip"])

(0.6757341092113647, 6.6924706468630016e-34)