[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/shaneahmed/StatswithPython/blob/main/02-DescriptiveStatistics.ipynb) 

[![Open In Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://github.com/shaneahmed/StatswithPython/blob/main/02-DescriptiveStatistics.ipynb)

# Descriptive Statistics
In the [previous notebook](https://github.com/shaneahmed/StatswithPython/blob/main/01-Introduction%20to%20Python.ipynb) we discussed basic python syntax, data structures and objects. In this notebook, we will learn to import python modules and use them to perform descriptive analysis.

## Modules in Python
The definitions you made in the previous excercise for functions and variables are lost. You may want to use a handy function such as to calculate _the mean_ that you’ve written in several programs without copying its definition into each program. Python has a way to put definitions in a file and use them in a script or in an interactive instance of the interpreter. Such a file is called a module; definitions from a module can be imported into other modules or into the main module.

A module is a file containing Python definitions and statements. The file name is the module name with the suffix .py
appended. Within a module, the module’s name (as a string) is available as the value of the global variable __name__.
In the previous notebook, you wrote a function to calculate fibonacci numbers you can save the code in a file named fibo.py in the current directory and import it in this notebook. Once the file is saved you can import the module in this notebook.

In [1]:
import fibo

    You can run the fib function in fibo using `fibo.fib()` call

In [2]:
fibo.fib(500)

0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 


    You can also import modules with a different name

In [5]:
import fibo as fib

In [6]:
fib.fib(500)

0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 


    You can import functions within a module directly

In [7]:
from fibo import fib

In [8]:
fib(500)

0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 


### Statistics Module
Python has some built-in modules such as `math`, `statistics` to perform basic statistical calculations. You can import statistics module using python `import`

In [40]:
import math
math.sqrt(4) # sqrt function calculate square root of a number

2.0

In [9]:
import statistics

In [11]:
statistics.mean([1, 2, 3, 4, 4])

2.8

### Installing Module in Python
For more sophisticated functions you can install modules such as "numpy", "scipy", "pandas", using `pip` command or `conda` (if you are in anaconda environment). Let's install "numpy", "scipy", "pandas". "!" operator runs terminal commands in notebook.

In [12]:
!pip install numpy scipy pandas



    After installation we can import these modules directly. The examples below show mean calculation using numpy and pandas libraries. scipy will depreciating support for mean calculation and is proposing move to numpy mean calculation.

In [8]:
import numpy as np # numpy (numerical python) is usually imported as np in python codes

In [15]:
np.mean([1, 2, 3, 4, 4])

2.8

In [32]:
import pandas as pd # pandas (Python Data Analysis Library) is usually imported in python as pd

In [26]:
d = pd.DataFrame([1, 2, 3, 4, 4]) # pandas deals with data in data frame. we will learn about data frame in the following notebooks

In [30]:
d

Unnamed: 0,0
0,1
1,2
2,3
3,4
4,4


In [37]:
d.mean()[0]

2.8

## Frequency Distribution
Let's consider the data in lecture slides.

In [5]:
data = [15, 8, 20, 16, 12, 18, 14, 22, 17, 5,
19, 15, 18, 29, 6, 13, 16, 19, 10, 24,
15, 3, 26, 30, 13, 17, 7, 16, 23, 25,
1, 15, 18, 14, 5, 27, 16, 20, 14, 6,
24, 14, 20, 25, 21, 15, 17, 8, 23, 21,
17, 14, 10, 13, 18, 16, 21, 9, 11, 22,
15, 12, 9, 16, 20, 11, 13, 22, 17, 13,
9, 22, 16, 12, 19, 17, 14, 10, 19, 18,
11, 16, 12, 18, 13, 17, 15, 14, 15, 28]

In [6]:
print(data)

[15, 8, 20, 16, 12, 18, 14, 22, 17, 5, 19, 15, 18, 29, 6, 13, 16, 19, 10, 24, 15, 3, 26, 30, 13, 17, 7, 16, 23, 25, 1, 15, 18, 14, 5, 27, 16, 20, 14, 6, 24, 14, 20, 25, 21, 15, 17, 8, 23, 21, 17, 14, 10, 13, 18, 16, 21, 9, 11, 22, 15, 12, 9, 16, 20, 11, 13, 22, 17, 13, 9, 22, 16, 12, 19, 17, 14, 10, 19, 18, 11, 16, 12, 18, 13, 17, 15, 14, 15, 28]


In [20]:
np.bincount(data)[::-1] # Is the frequency distribution same as in the slides

array([1, 1, 1, 1, 1, 2, 2, 2, 4, 3, 4, 4, 6, 7, 8, 8, 7, 6, 4, 3, 3, 3,
       2, 1, 2, 2, 0, 1, 0, 1, 0], dtype=int64)

In [21]:
np.unique(data) # Identifies unique values in the data with f>0

array([ 1,  3,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
       20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30])

## The Mean
As calculated above mean can be calculated using np.mean() function

In [25]:
np.mean([5, 8, 10, 11, 12])

9.2

    you can calculate mean of multiple columns in python using the same function

In [37]:
data = pd.DataFrame([[15, 8, 20, 16, 12, 18, 14, 22, 17, 5],
[19, 15, 18, 29, 6, 13, 16, 19, 10, 24]])

data # print the values

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,15,8,20,16,12,18,14,22,17,5
1,19,15,18,29,6,13,16,19,10,24


In [40]:
np.mean(data)

0    17.0
1    11.5
2    19.0
3    22.5
4     9.0
5    15.5
6    15.0
7    20.5
8    13.5
9    14.5
dtype: float64

    As data is in pandas dat

In [41]:
data.mean()

0    17.0
1    11.5
2    19.0
3    22.5
4     9.0
5    15.5
6    15.0
7    20.5
8    13.5
9    14.5
dtype: float64