# Preparing, manipulating and visualizing data in Python
This notebook contains an introduction to using NumPy, Pandas and Matplotlib for machine learning purposes.

## Imports
Let's begin by importing the external dependencies we need

In [1]:
import time
import numpy as np
import pandas as pd

## Timing some functions
We can create a simple decorator function, allowing us to measure the time of other functions to run by prepending them with `@timer` (which is really just syntactic sugar for calling our function like this: `timer(our_function)(our_function_args)`, every time)

In [6]:
def timer(func):
    def do_timing(*args, **kwargs):
        start = time.time()
        func_ret = func(*args, **kwargs)
        end = time.time()
        print("{} took {:.3f}s to run".format(func.__name__, end-start))
        return func_ret
    return do_timing

Using the timer decorator we just created, we can examine how efficient NumPy really is compared to vanilla Python.

In [7]:
@timer
def sum_trad(upper):
    X = range(upper)
    Y = range(upper)
    Z = []
    for i in range(len(X)):
        Z.append(X[i] + Y[i])

@timer
def sum_np(upper):
    X = np.arange(upper)
    Y = np.arange(upper)
    Z = X + Y

In [8]:
sum_trad(10000000)
sum_np(10000000)

sum_trad took 2.353s to run
sum_np took 0.214s to run


## Creating arrays in NumPy
Unlike Python lists, NumPy arrays have a specified type of elements they hold, i.e. while a Python list can happily store strings and numbers together, a NumPy array will not.

In [9]:
arr = np.array([1, 2, 3, 4], float)

print(arr)
print(type(arr))

[1. 2. 3. 4.]
<class 'numpy.ndarray'>


These arrays can be quite simply transformed into normal lists.

In [10]:
# arr_list = list(arr)
arr_list = arr.tolist()

print(arr_list)
print(type(arr_list))

[1.0, 2.0, 3.0, 4.0]
<class 'list'>


Assigning lists between variables does not create new lists, but rather creates a new reference to the same object in memory.

In [11]:
arr1 = np.array([1, 2, 3, 4])
arr2 = arr1

arr2[0] = 0

print(arr1)
print(arr2)

[0 2 3 4]
[0 2 3 4]


To create a new copy of an array, we have to use the `copy` method.

In [12]:
arr1 = np.array([1, 2, 3, 4])
arr2 = arr1.copy()

arr2[0] = 0

print(arr1)
print(arr2)

[1 2 3 4]
[0 2 3 4]


The NumPy array also contains a few convenience functions allowing us to easily generate certain kinds of arrays and matrices. Some of these include:
- Filling an array with one given value
- Generating arrays with random data
- Generating identity matrices
- Generating arrays or matrices with all ones or all zeros
- Combining arrays vertically to create a kind of row matrix

In [33]:
print("Filling an array with one given value")
arr = np.array([1, 2, 3, 4], float)
arr.fill(1)
print(arr)

print("\nGenerating arrays with random data")
print(np.random.permutation(4))
print(np.random.normal(0, 1, 4))
print(np.random.random(4))

print("\nGenerating identity matrices")
print(np.identity(4))
print(np.eye(3, 4, 1))

print("\nGenerating arrays or matrices with all ones or all zeros")
print(np.zeros([2, 3]))
print(np.ones(4))

print("\nCombining arrays vertically to create a kind of row matrix")
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
print(np.vstack([arr1, arr2]))

Filling an array with one given value
[1. 1. 1. 1.]

Generating arrays with random data
[3 1 0 2]
[-0.54653171 -0.98103995 -0.06623798 -0.08488481]
[0.10915467 0.52770116 0.83043229 0.38037727]

Generating identity matrices
[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]
[[0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]

Generating arrays or matrices with all ones or all zeros
[[0. 0. 0.]
 [0. 0. 0.]]
[1. 1. 1. 1.]

Combining arrays vertically to create a kind of row matrix
[[1 2 3]
 [4 5 6]]


## Manipulating arrays
Getting to the core of data science (except not quite because we still haven't gotten to Pandas yet)