<h1>Python Statistics Fundamentals: How to Describe Your Data</h1>

<h2><b>1.Introduction</b></h2>
<p>In this notebook, I'll be taking notes from Real Python's lesson on the <b>fundamentals of statistics</b>. If you are interested, follow this <a href='https://realpython.com/python-statistics/'>link</a> to get the entire content. Thanks once again for the whole Real Python team - specially, in this case, <a href='https://realpython.com/team/mstojiljkovic/'>Mirko Stojiljković</a> for this course!</p>
<p>This notebook will cover the following <b>objectives</b>:</p>
<ul>
    <li>What numerical quantities you can use to describe and summarize your datasets</li>
    <li>How to calculate descriptive statistics in pure Python</li>
    <li>How to get descriptive statistics with available Python libraries</li>
    <li>How to visualize your datasets</li>
</ul>

<h2>

<h2><b>2.Understanding Descriptive Statistics</b></h2>
<p>The term <b>descriptive statistics</b> refers to the process of <b>describing</b> and <b>summarizing</b> data. It can use <b>two main approaches</b>:</p>
<ol>
    <li><b>Quantitative approach</b> - describes data numerically.</li>
    <li><b>Visual approach</b> - describes data with the assistance of data visualization tools (charts, histograms, plots, etc).</li>
</ol>
<p>When you describe a <b>single variable</b>, you're conducting a <b>univariate analysis</b>. When two variables are chosen to verify a statistical relationship, you are performing a <b>bivariate analysis</b>. Finally, <b>multivariate analysis</b> considers multiple variables.</p>


<h3><b>2.1 Types of Measures</b></h3>
<ol>
    <li><b>Central tendency</b> - describes the center of the data. Some of the measures include: <b>mean, mode, and median</b>.</li>
    <li><b>Variability</b> - describes the <b>spread</b> of the data. <b>Variance</b> and <b>standard deviation</b> represent useful measures.</li>
    <li><b>Correlation</b> or <b>joint variability</b> - tells you about the <b>relation</b> between a pair of variables in a dataset, in which <b>covariance</b> and the <b>correlation coefficient</b> are useful measures.</li>

<h3><b>2.2 Populations and Samples</b></h3>
<p>A <b>sample</b> constitutes a <b>subset of a population</b> and, ideally, should preserve the fundamental <b>features</b> of the population.</p>

<h3><b>2.3 Outliers</b></h3>
<p>An <b>outlier</b> is a data point that <b>significantly differs</b> from the majority of the data taken from a sample or a population. <b>Natural variation</b> in data, <b>change</b> in the behavior of the observed system, and <b>errors</b> during the process of data collection are some of the most frequent reasons for outliers, although you must always rely on <b>experience</b> to properly identify them.</p>

<h2><b>3. Python Statistics Libraries</b></h2>
<p>We'll be using the following libraries:</p>
<ul>
    <li><code>statistics</code></li>
    <li><code>NumPy</code></li>
    <li><code>SciPy</code></li>
    <li><code>Pandas</code></li>
    <li><code>Matplotlib</code></li>
</ul>
<hr>
<h2><b>4. Calculating Descriptive Statistics</b></h2>
<p>Let's import all the necessary packages:</p>

In [1]:
import math
import statistics
import numpy as np
import pandas as pd
import scipy.stats

<p>Let's now create some data:</p>

In [2]:
x = [8.0, 1, 2.5, 4, 28.0]
x_with_nan = [8.0, 1, 2.5, math.nan, 4, 28.0] # You can also use float('nan') and np.nan

In [3]:
x

[8.0, 1, 2.5, 4, 28.0]

In [4]:
x_with_nan

[8.0, 1, 2.5, nan, 4, 28.0]

<p>Let's now create <code>np.ndarray</code> and <code>pd.Series</code> objects corresponding  to <code>x</code> and <code>x_with_nan</code>:</p>

In [5]:
y, y_with_nan = np.array(x), np.array(x_with_nan)
z, z_with_nan = pd.Series(x), pd.Series(x_with_nan)

In [6]:
y

array([ 8. ,  1. ,  2.5,  4. , 28. ])

In [7]:
y_with_nan

array([ 8. ,  1. ,  2.5,  nan,  4. , 28. ])

In [8]:
z

0     8.0
1     1.0
2     2.5
3     4.0
4    28.0
dtype: float64

In [9]:
z_with_nan

0     8.0
1     1.0
2     2.5
3     NaN
4     4.0
5    28.0
dtype: float64

<h3><b>4.1 Measures of Central Tendency</b></h3>
<h4><b>4.1.1 Mean</b></h4>