# Numpy basics

Q: What is Numpy? 

A: Numpy is a OSS library for numeric computing. A huge part of the python ecosystem for data science & machine learning. NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of non-negative integers. In NumPy dimensions are called axes.

# Importing numpy to your notebook or python script

It is a common convertion to import the library with an alias. We typically use `as np`. 

In [2]:
# convention to import it as np
import numpy as np

To create a numpy array from a standard python list we do:
    
- Delcare the python list
- assign the value of a numpy conversion by passing the python list through `np.array(list-name)`

In [3]:
# convert a normal python list to a numpy array
mylist = [1,2,3,4,5]
myarr = np.array(mylist)

# type checks output
print(type(mylist))
print(type(myarr))

<class 'list'>
<class 'numpy.ndarray'>


# Array generation helper methods

numpy has some array generation helper methods. 

One such methos is `arange`. This might be understood as _a range_ like the python `range` function. This `arange` can be used to generate arays quickly and in a concise code format, rather than having to use python loops or the more pythonic comprehensions to get to the same result.

In [16]:
# generate a list with a loop
looplist = []
for i in range(10):
    if i % 2 == 0:
        looplist.append(i)
        
print(looplist)

# You can see that that this takes several lines of code to 
# create a simple list of values. While functional this is
# noisy to digest. 

[0, 2, 4, 6, 8]


In [10]:
# generate an array of numbers between 1-10 that are divisible by 2
# using a python list comprehension. 
numslist = [x for x in range(10) if x % 2 == 0]
print("{} has a type of {}".format(numslist, type(numslist)))

# The above can generate the value but we will still have to
# convert it to an np.array
numslist = np.array(numslist)
print("Apply conversion")
print("{} has a type of {}".format(numslist, type(numslist)))


# the numpy version. One line of code and simple to understand. Start value (inclusive), stop value (exclusive), step value
nplist = np.arange(0,10,2)
print("{} has a type of {}".format(nplist, type(nplist)))

# some additional examples to aide understanding
# np.arange(0,22, 3) - 0,3,6,9,12,1,5,18,21
# np.arange(0,30,5) - 0,5,10,15,20,25
# np.arange(0,100,10) - 0, 10,20,30,40,50,60,70,80,90

[0, 2, 4, 6, 8] has a type of <class 'list'>
Apply conversion
[0 2 4 6 8] has a type of <class 'numpy.ndarray'>
[0 2 4 6 8] has a type of <class 'numpy.ndarray'>


# Data array shapes 

Two common uses in numpy are generation of multidimensional arrays of zeros and of values numpy provides two useful functions here: 
- np.zeros  
- np.ones

In [4]:
# rows and columns is the layout or order 
# of the shape function call. 
np.zeros(shape=(5,5))

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [12]:
# building on the cell above we can generate an array of specifi values
# using the ones. 

vals = np.ones(shape=(5,5))
vals.shape
vals

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [14]:
# now lets use the example that we want that % by 5 two
# dimensional array to be filled by the number 42
# we can say 
vals = vals*42

vals

array([[42., 42., 42., 42., 42.],
       [42., 42., 42., 42., 42.],
       [42., 42., 42., 42., 42.],
       [42., 42., 42., 42., 42.],
       [42., 42., 42., 42., 42.]])

In [5]:
# control random numbers by using a seed
np.random.seed(101)
arr = np.random.randint(0,100,10)
arr

array([95, 11, 81, 70, 63, 87, 75,  9, 77, 40])

In [6]:
arr2 = np.random.randint(0,100, 10)
arr2 

array([ 4, 63, 40, 60, 92, 64,  5, 12, 93, 40])

In [7]:
minval = arr.min()
minindex = arr.argmin()
maxval = arr.max()
maxindex = arr.argmax()

print("min is {}: at index {} - max is {}: at index {}".format(minval, minindex, maxval, maxindex))

min is 9: at index 7 - max is 95: at index 0


In [8]:
arr.reshape((2,5))

array([[95, 11, 81, 70, 63],
       [87, 75,  9, 77, 40]])

# Slicing data arrays

Slicing in python means taking elements from one given index to another given index. We pass slice instead of index like this: [start:end]. We can also define the step, like this: [start:end:step].

Some rules around slicing: 
- If we don't pass start its considered 0
- If we don't pass end its considered length of array in that dimension
- If we don't pass step its considered 1

In [9]:
mat = np.arange(0,100).reshape(10,10)

In [10]:
mat

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])

In [11]:
# using the 10x10 grid is an easy way to understand how the 
# 2d indexing works. 
mat[4,5]

45

In [12]:
# or all rows select the value at 
# position 2 of each row in the 
# array and present them as a column
mat[:, 2].reshape(10,1)

array([[ 2],
       [12],
       [22],
       [32],
       [42],
       [52],
       [62],
       [72],
       [82],
       [92]])

In [13]:
# get the cells 65-69 inclusive
mat[6, 5:]

array([65, 66, 67, 68, 69])

In [14]:
# get the cells for 38, 48, 58, 68
mat[3:7, 8]

array([38, 48, 58, 68])