## Why is Data Analysis important?

Data analysis allows you to generate new insights and identify new opportunities for growth and improved decision making.


## About this training

This training teaches participants with no prior programming or data analytics experience how to perform data manipulation, data analysis and data visualisation in Python. 

Users will be able to understand the capabilities of data analytics and align business goals with techniques.

 Programming will be performed interactively in a Jupyter notebook environment.  Jupyter notebooks are powerful and compelling tools, and have become very popular among data scientists.  A Jupyter notebook is a document that supports mixing executable code, equations, visualizations, and narrative text and allows to generate reports dynamically. It is a free and open source platform.

## Outline:

1. Introduction to Python;
2. Loading and manipulating data (reshaping, missing values);
3. Data visualisation;
4. Descriptive Statistics (mean, standard deviation, etc);
5. Inferential Statistics (anova, regression, classification).

## Learning outcomes:

On completion of this module you will be able to:

1. Write basic scripts in Python;
2. Visualise data;
3. Summarise data;
4. Draw conclusions from data;
5. Generate dynamic reports from Jupyter notebooks.


## What is Python?


Invented in the late 1980s as a teaching and scripting language,
Python has since become an essential tool for many programmers,
engineers, researchers, and data scientists across academia and
industry.

Python is a popular, open-source programming language used for both scripting applications and standalone programs. Python can be used to do pretty much anything. For example, you can use Python as a calculator.

In [5]:
2+2

4

In [6]:
x = 10
10* 2

20

In [7]:
x** 2

100

## Why use Python?

## Data science tools

conda install numpy scipy pandas matplotlib scikit-learn

#### NumPy

NumPy provides an efficient way to store and manipulate multidi‐
mensional dense arrays in Python. The important features of
NumPy are:
• It provides an ndarray structure, which allows efficient storage
and manipulation of vectors, matrices, and higher-dimensional
datasets.
• It provides a readable and efficient syntax for operating on this
data, from simple element-wise arithmetic to more complicated
linear algebraic operations.

In [8]:
import numpy as np
x = np.arange(1, 10)
x

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [9]:
x ** 2

array([ 1,  4,  9, 16, 25, 36, 49, 64, 81])

NumPy arrays can be multidimensional

In [10]:
M = x.reshape((3, 3))
M

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

A two-dimensional array is one representation of a matrix, and
NumPy knows how to efficiently do typical matrix operations. For
example, you can compute the transpose using .T :

In [11]:
M.T

array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

#### Pandas

In [12]:
import pandas as pd
df = pd.DataFrame({'label': ['A', 'B', 'C', 'A', 'B', 'C'],
'value': [1, 2, 3, 4, 5, 6]})
df

Unnamed: 0,label,value
0,A,1
1,B,2
2,C,3
3,A,4
4,B,5
5,C,6


In [13]:
df['label']

0    A
1    B
2    C
3    A
4    B
5    C
Name: label, dtype: object

In [14]:
df['label'].str.lower()

0    a
1    b
2    c
3    a
4    b
5    c
Name: label, dtype: object

In [15]:
df.groupby('label').sum()

Unnamed: 0_level_0,value
label,Unnamed: 1_level_1
A,5
B,7
C,9


In [16]:
df['value'].sum()

21

#### Matplotlib

It is a powerful library for creating a large range of plots. 

In [17]:
%matplotlib notebook

In [18]:
import matplotlib.pyplot as plt

In [19]:
x = np.linspace(0, 10)
y = np.sin(x)
plt.plot(x, y);

<IPython.core.display.Javascript object>

#### SciPy

SciPy is a collection of scientific functionality that is built on
NumPy.

In [20]:
from scipy import interpolate
# choose eight points between 0 and 10
x = np.linspace(0, 10, 8)
y = np.sin(x)
# create a cubic interpolation function
func = interpolate.interp1d(x, y, kind='cubic')
# interpolate on a grid of 1,000 points
x_interp = np.linspace(0, 10, 1000)
y_interp = func(x_interp)
# plot the results
plt.figure() # new figure
plt.plot(x, y, 'o')
plt.plot(x_interp, y_interp);

<IPython.core.display.Javascript object>

#### The Jupyter notebook

The Jupyter notebook, a document format that allows exeutable code, formatted text, graphics, and even interactive features to be combined into a single document. This document is a Jupyter notebook.

The notebook is useful both as a development environment and as a means of sharing work via rich computational and data-driven narratives that mix together code, figures, data, and text

### References

[A Whirlwind Tour of Python by Jake VanderPlas (O’Reilly). Copyright
2016 O’Reilly Media, Inc., 978-1-491-96465-1](https://jakevdp.github.io/WhirlwindTourOfPython/)

![](https://jakevdp.github.io/WhirlwindTourOfPython/figures/cover-large.gif)