## Descriptive Statistics

 Import **NumPy**, **SciPy**, and **Pandas**

In [None]:
import numpy as np
import scipy
import pandas as pd

 Randomly generate 1,000 samples from the normal distribution using `np.random.normal()`(mean = 100, standard deviation = 15)

np.random.normal(loc=0.0, scale=1.0, size=None)  #you need to modify this code.

loc will be equal to mean, scale will be equal to std deviation, size will be equal to sample size.

In [None]:
samples = np.random.normal(loc=100, scale=15, size=1000)

Compute the **mean**, **median**, and **mode**

In [None]:
mean = np.mean(samples)
median = np.median(samples)
from scipy import stats
mode = stats.mode(samples,keepdims=False)
print(mean)
print(median)
print(mode)

100.03599094440239
99.71696456158556
ModeResult(mode=53.55170049440541, count=1)


Compute the **min**, **max**, **Q1**, **Q3**, and **interquartile range**

In [None]:
min =np.min(samples)
max =np.max(samples)
q1 =np.percentile(samples,25)
q3 =np.percentile(samples,75)
iqr = q3-q1

In [None]:
print("min:",min)
print("max",max)
print("Q1:",q1)
print("Q3:",q3)
print("IQR:",iqr)


min: 53.55170049440541
max 143.1686861989862
Q1: 90.29025191361798
Q3: 110.03022055933252
IQR: 19.73996864571454


Compute the **variance** and **standard deviation**

In [None]:
variance =np.var(samples)
std_dev =np.std(samples)
print("variance:",variance)
print("std_dev:", std_dev)

variance: 213.02041749441892
std_dev: 14.59521899439741


Compute the **skewness** and **kurtosis**

You can use [`scipy.stats.skew`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.skew.html) and [`scipy.stats.kurtosis`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kurtosis.html)

In [None]:
from scipy.stats import skew, kurtosis
skewness = skew(samples)
kurtosis =kurtosis(samples)
print("skewness:",skewness)
print("kurtosis:",kurtosis)

skewness: 0.04394403273702856
kurtosis: -0.04453887389361677


## NumPy Correlation Calculation

Create an array x of integers between 10 (inclusive) and 20 (exclusive). Use `np.arange()`

In [None]:
x =np.arange(10,20)
x

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Then use `np.array()` to create a second array y containing 10 arbitrary integers.

In [None]:
y = np.array([1, 3, 5, 6, 8, 5, 7, 6,4,3])

Once you have two arrays of the same length, you can compute the **correlation coefficient** between x and y

In [None]:
r = np.corrcoef(x, y)[0, 1]
print(f"Correlation Coefficient between x and y: {r}")

Correlation Coefficient between x and y: 0.26243194054073893


## Pandas Correlation Calculation

Run the code below

In [None]:
x = pd.Series(range(10, 20))
y = pd.Series([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])

Call the relevant method  to calculate Pearson's r correlation.

In [None]:
r = x.corr(y)
print(f"Correlation Coefficient between x and y : {r}")

Correlation Coefficient between x and y : 0.7586402890911867


## Seaborn Dataset Tips

Import Seaborn Library

In [None]:
import seaborn as sns

Load "tips" dataset from Seaborn

In [None]:
tips = sns.load_dataset("tips")
print(tips.head())


   total_bill   tip     sex smoker  day    time  size
0       16.99  1.01  Female     No  Sun  Dinner     2
1       10.34  1.66    Male     No  Sun  Dinner     3
2       21.01  3.50    Male     No  Sun  Dinner     3
3       23.68  3.31    Male     No  Sun  Dinner     2
4       24.59  3.61  Female     No  Sun  Dinner     4


Generate descriptive statistics include those that summarize the central tendency, dispersion

In [None]:
descriptive_stats = tips.describe()
print(descriptive_stats)

       total_bill         tip        size
count  244.000000  244.000000  244.000000
mean    19.785943    2.998279    2.569672
std      8.902412    1.383638    0.951100
min      3.070000    1.000000    1.000000
25%     13.347500    2.000000    2.000000
50%     17.795000    2.900000    2.000000
75%     24.127500    3.562500    3.000000
max     50.810000   10.000000    6.000000


Call the relevant method to calculate pairwise Pearson's r correlation of columns (plus heatmap)

In [None]:
correlation= tips.corr()
print("Correlation:")
print(correlation)


Correlation:
            total_bill       tip      size
total_bill    1.000000  0.675734  0.598315
tip           0.675734  1.000000  0.489299
size          0.598315  0.489299  1.000000


  correlation= tips.corr()
