---

_You are currently looking at **version 1.1** of this notebook. To download notebooks and datafiles, as well as get help on Jupyter notebooks in the Coursera platform, visit the [Jupyter Notebook FAQ](https://www.coursera.org/learn/python-data-analysis/resources/0dhYG) course resource._

---

# The Python Programming Language: Functions

In [1]:
x = 1
y = 2
x + y

3

In [2]:
x

1

Create `add_numbers` function that takes 2 numbers and adds them together to do the above 3 lines in 1 function

In [3]:
def add_numbers(x, y):
    return x + y

add_numbers(1, 2)

3

<br>
`add_numbers` updated to take an optional 3rd parameter. Using `print` allows printing of multiple expressions within a single cell.

In [4]:
def add_numbers(x,y,z=None):
    if (z==None):
        return x+y
    else:
        return x+y+z

print(add_numbers(1, 2))
print(add_numbers(1, 2, 3))

3
6


<br>
`add_numbers` updated to take an optional flag parameter.

In [5]:
def add_numbers(x, y, z=None, flag=False):
    if (flag):
        print('Flag is true!')
    if (z==None):
        return x + y
    else:
        return x + y + z
    
print(add_numbers(1, 2, flag=True))

Flag is true!
3


<br>
Assign function `add_numbers` to variable `a`.

In [6]:
def add_numbers(x,y):
    return x+y

a = add_numbers
a(1,2)

3

<br>
# The Python Programming Language: Types and Sequences

<br>
Use `type` to return the object's type.

In [7]:
type('This is a string')

str

In [8]:
type(None)

NoneType

In [9]:
type(1)

int

In [10]:
type(1.0)

float

In [11]:
type(add_numbers)

function

<br>
Tuples are an **immutable** data structure (cannot be altered).

In [12]:
x = (1, 'a', 2, 'b')
type(x)

tuple

<br>
Lists are a **mutable** data structure (can be altered)

In [13]:
x = [1, 'a', 2, 'b']
type(x)

list

<br>
Use `append` to append an object to a list.

In [14]:
x.append(3.3)
print(x)

[1, 'a', 2, 'b', 3.3]


<br>
This is an example of how to loop through each item in the list.

In [15]:
for item in x:
    print(item)

1
a
2
b
3.3


<br>
Or using the indexing operator:

In [16]:
i=0
while( i != len(x) ):
    print(x[i])
    i = i + 1

1
a
2
b
3.3


In [18]:
i=0
while( i != len(x) ):
    print(x[i])
    i += 1

1
a
2
b
3.3


<br>
Use `+` to concatenate lists.

In [19]:
[1,2] + [3,4]

[1, 2, 3, 4]

<br>
Use `*` to repeat lists.

In [20]:
[1]*3

[1, 1, 1]

<br>
Use the `in` operator to check if something is inside a list.

In [21]:
1 in [1, 2, 3]

True

<br>
Now let's look at strings. Use bracket notation to slice a string.

In [23]:
x = 'This is a string'
print(x[0]) #first character
print(x[0:1]) #first character, but we have explicitly set the end character (not included)
print(x[0:2]) #first two characters

T
T
Th


<br>
This will return the last element of the string.

In [24]:
x[-1]

'g'

<br>
This will return the slice starting from the 4th element from the end and stopping before the 2nd element from the end.

In [25]:
x[-4:-2]

'ri'

<br>
This is a slice from the beginning of the string and stopping before the 3rd element.

In [26]:
x[:3]

'Thi'

<br>
And this is a slice starting from the 4th element of the string and going all the way to the end.

In [27]:
x[3:]

's is a string'

In [28]:
firstname = 'Christopher'
lastname = 'Brooks'

print(firstname + ' ' + lastname)
print(firstname*3)
print('Chris' in firstname)

Christopher Brooks
ChristopherChristopherChristopher
True


In [5]:
x = 'Dr. Christopher Brooks'

print(x[4:4+len('Christopher')])

Christopher


<br>
`split` returns a list of all the words in a string, or a list split on a specific character.

In [29]:
firstname = 'Christopher Arthur Hansen Brooks'.split(' ')[0] # [0] selects the first element of the list
lastname = 'Christopher Arthur Hansen Brooks'.split(' ')[-1] # [-1] selects the last element of the list
print(firstname)
print(lastname)

Christopher
Brooks


<br>
Make sure you convert objects to strings before concatenating.

In [30]:
'Chris' + 2

TypeError: Can't convert 'int' object to str implicitly

In [31]:
'Chris' + str(2)

'Chris2'

<br>
Dictionaries are objects like lists and tuples, but are actually collections associate keys with values and are unordered

* must use a value's key to get that value out of a dict

In [7]:
# create dict
x = {'Christopher Brooks': 'brooksch@umich.edu', 'Bill Gates': 'billg@microsoft.com'}

# Retrieve a value by using the indexing operator
x['Christopher Brooks'] 

'brooksch@umich.edu'

In [8]:
# add new dict key w/ value of None
x['Kevyn Collins-Thompson'] = None
x['Kevyn Collins-Thompson']

<br>
Iterate over all of the keys:

In [34]:
for name in x:
    print(x[name])

None
billg@microsoft.com
brooksch@umich.edu


<br>
Iterate over all of the values:

In [35]:
for email in x.values():
    print(email)

None
billg@microsoft.com
brooksch@umich.edu


<br>
Iterate over all of the items (KV pairs) in the list:

In [36]:
for name, email in x.items():
    print(name)
    print(email)

Kevyn Collins-Thompson
None
Bill Gates
billg@microsoft.com
Christopher Brooks
brooksch@umich.edu


<br>
You can **unpack** a sequence into different variables:

In [37]:
x = ('Christopher', 'Brooks', 'brooksch@umich.edu')
fname, lname, email = x

In [38]:
fname

'Christopher'

In [39]:
lname

'Brooks'

<br>
*Make sure the number of values you are unpacking matches the number of variables being assigned.*

In [40]:
x = ('Christopher', 'Brooks', 'brooksch@umich.edu', 'Ann Arbor')
fname, lname, email = x

ValueError: too many values to unpack (expected 3)

<br>
# The Python Programming Language: More on Strings

In [41]:
print('Chris' + 2)

TypeError: Can't convert 'int' object to str implicitly

In [42]:
print('Chris' + str(2))

Chris2


<br>
Python has a built in method for convenient string formatting --> **string.format()**

In [43]:
sales_record = {
'price': 3.24,
'num_items': 4,
'person': 'Chris'}

# use {}'s place holders for arguments to .format()
sales_statement = '{} bought {} item(s) at a price of {} each for a total of {}'

# insert a dictionary value for specified keys into each instance of {}
print(sales_statement.format(sales_record['person'],
                             sales_record['num_items'],
                             sales_record['price'],
                             sales_record['num_items']*sales_record['price']))


Chris bought 4 item(s) at a price of 3.24 each for a total of 12.96


<br>
# Reading and Writing CSV files

<br>
Let's import our datafile mpg.csv, which contains fuel economy data for 234 cars.

* mpg : miles per gallon
* class : car classification
* cty : city mpg
* cyl : # of cylinders
* displ : engine displacement in liters
* drv : f = front-wheel drive, r = rear wheel drive, 4 = 4wd
* fl : fuel (e = ethanol E85, d = diesel, r = regular, p = premium, c = CNG)
* hwy : highway mpg
* manufacturer : automobile manufacturer
* model : model of car
* trans : type of transmission
* year : model year

In [12]:
import csv

%precision 2

# create list (dataframe) of dictionaries (rows)
with open('mpg.csv') as csvfile:
    mpg = list(csv.DictReader(csvfile))

# get first 3 dictionaries in our list.
mpg[:3] 

[{'': '1',
  'class': 'compact',
  'cty': '18',
  'cyl': '4',
  'displ': '1.8',
  'drv': 'f',
  'fl': 'p',
  'hwy': '29',
  'manufacturer': 'audi',
  'model': 'a4',
  'trans': 'auto(l5)',
  'year': '1999'},
 {'': '2',
  'class': 'compact',
  'cty': '21',
  'cyl': '4',
  'displ': '1.8',
  'drv': 'f',
  'fl': 'p',
  'hwy': '29',
  'manufacturer': 'audi',
  'model': 'a4',
  'trans': 'manual(m5)',
  'year': '1999'},
 {'': '3',
  'class': 'compact',
  'cty': '20',
  'cyl': '4',
  'displ': '2',
  'drv': 'f',
  'fl': 'p',
  'hwy': '31',
  'manufacturer': 'audi',
  'model': 'a4',
  'trans': 'manual(m6)',
  'year': '2008'}]

In [14]:
type(mpg)

list

In [10]:
import pandas as pd
mpg2 = pd.read_csv('mpg.csv')
mpg2.head()

Unnamed: 0.1,Unnamed: 0,manufacturer,model,displ,year,cyl,trans,drv,cty,hwy,fl,class
0,1,audi,a4,1.8,1999,4,auto(l5),f,18,29,p,compact
1,2,audi,a4,1.8,1999,4,manual(m5),f,21,29,p,compact
2,3,audi,a4,2.0,2008,4,manual(m6),f,20,31,p,compact
3,4,audi,a4,2.0,2008,4,auto(av),f,21,30,p,compact
4,5,audi,a4,2.8,1999,6,auto(l5),f,16,26,p,compact


<br>
**`csv.Dictreader`** reads in *each row* of our csv file *as a dictionary*. 

`len` shows that our list is comprised of 234 dictionaries.

In [46]:
len(mpg)

234

<br>
`keys` gives us the column names of our csv.

In [47]:
mpg[0].keys()

dict_keys(['', 'hwy', 'trans', 'displ', 'cyl', 'model', 'fl', 'cty', 'year', 'class', 'drv', 'manufacturer'])

<br>
Find the average **cty** fuel economy across all cars. 

* All values in the dictionaries are strings, so we need to convert to float.

In [48]:
# sum up each value of city fuel economy for each dictionary in the dataset then divide by # of records
sum(float(d['cty']) for d in mpg) / len(mpg)

16.86

<br>
Similarly this is how to find the average **hwy** fuel economy across all cars.

In [50]:
# sum up each value of highway fuel economy for each dictionary in the dataset then divide by # of records
sum(float(d['hwy']) for d in mpg) / len(mpg)

23.44

<br>
Use **`set`** to return the *unique* values for the number of **cyl**inders the cars in our dataset have.

In [51]:
cylinders = set(d['cyl'] for d in mpg)
cylinders

{'4', '5', '6', '8'}

<br>
Here's a more complex example where we are grouping the cars by number of cylinder, and finding the average cty mpg for each group.

In [52]:
# initiate list
CtyMpgByCyl = []

# iterate over all the cylinder levels we just got
for c in cylinders:
    # initiate local variables
    summpg = 0
    cyltypecount = 0
    
    # iterate over all dictionaries/rows
    for d in mpg: 
        if d['cyl'] == c: # if the cylinder level of the current dictionary matches the current cyliner level
            summpg += float(d['cty']) # add the cty mpg
            cyltypecount += 1 # increment the count of matching cylinder levels for the current level
    CtyMpgByCyl.append((c, summpg / cyltypecount)) # append the tuple ('cylinder', 'avg mpg')

CtyMpgByCyl.sort(key=lambda x: x[0])
CtyMpgByCyl

[('4', 21.01), ('5', 20.50), ('6', 16.22), ('8', 12.57)]

<br>
Use **`set`** to return the *unique* values for the class types in our dataset.

In [13]:
# what are the class types in our dataset
vehicleclass = set(d['class'] for d in mpg) 
vehicleclass

{'2seater', 'compact', 'midsize', 'minivan', 'pickup', 'subcompact', 'suv'}

<br>
And here's an example of how to find the *average hwy mpg for each class of vehicle* in our dataset.

In [15]:
HwyMpgByClass = []

# for each unique vehicle class
for t in vehicleclass:
    # initiate vars
    summpg = 0
    vclasscount = 0
    # for each dictionary/row
    for d in mpg: 
        if d['class'] == t: # if the current rows' class type matches the current class we're iterating for
            summpg += float(d['hwy']) # add the hwy mpg to this class' total MPG
            vclasscount += 1 # increment the count of vehicles in this class
    HwyMpgByClass.append((t, summpg / vclasscount)) # append the tuple ('class', 'avg mpg') to our list of avg class mpg

HwyMpgByClass.sort(key = lambda x: x[1])
HwyMpgByClass

[('pickup', 16.88),
 ('suv', 18.13),
 ('minivan', 22.36),
 ('2seater', 24.80),
 ('midsize', 27.29),
 ('subcompact', 28.14),
 ('compact', 28.30)]

<br>
# The Python Programming Language: Dates and Times

In [16]:
import datetime as dt
import time as tm

#return current time in seconds since Epoch (1/1/1970)
tm.time()

1503519676.87

<br>
Convert the **timestamp** to **datetime**.

In [17]:
# from current time in seconds, convert to datetime format from datetime package
dtnow = dt.datetime.fromtimestamp(tm.time())
dtnow

datetime.datetime(2017, 8, 23, 20, 21, 47, 948398)

<br>
Handy datetime attributes:

In [18]:
# get year, month, day, etc. from a datetime object
dtnow.year, dtnow.month, dtnow.day, dtnow.hour, dtnow.minute, dtnow.second 

(2017, 8, 23, 20, 21, 47)

<br>
**`timedelta`** is a data type and a duration expressing the difference between 2 dates.

In [20]:
# create a timedelta of 100 days
delta = dt.timedelta(days = 100) 

# add 100 days to today
dtnow + delta

datetime.datetime(2017, 12, 1, 20, 21, 47, 948398)

<br>
**`date.today`** returns the current local date (ONLY date).

In [21]:
today = dt.date.today()
today

datetime.date(2017, 8, 23)

In [22]:
# the date 100 days ago
today - delta 

datetime.date(2017, 5, 15)

In [24]:
# compare dates (is today later than 100 days ago?)
today > today-delta 

True

<br>
# The Python Programming Language: Objects and map()

<br>
An example of a **class** in Python (can have attached methods + be instantiated as **objects**

* everything indendted under a class definition is w/in the class' **scope**
* generally defined in camelcase
* variables declared w/in the object (once instantiated, we can just use them from the start)
* class variables are shared across all instances (i.e. default department for all "people" is the School of Information
* must include **self** in the method **signature** in order to have access to the instance in which a method is being invoked upon
* must prepend the same for instance variables set w/in the object (name, location)

In [1]:
class Person:
    # default variable across all instances of Person class
    department = 'School of Information' 
    
    # define 3 methods to change instance-bound variables name + location
    def set_name(self, new_name):
        self.name = new_name
    def set_location(self, new_location):
        self.location = new_location

In [2]:
# instantiate class w/ ()
person = Person()

# call class functions
person.set_name('Christopher Brooks')
person.set_location('Ann Arbor, MI, USA')

# print out class attributes
print('{} live in {} and works in the department {}'.format(person.name, person.location, person.department))

Christopher Brooks live in Ann Arbor, MI, USA and works in the department School of Information


Objects in Python do NOT have private/protected members. If we instantiate an object, we have access to all methods and/or attributes of that object.

There's also no need for a specific constructor when creating objects in Python (can add one with **__init__** method)

**Functional programming** = programming paradigm in which we *explicitly* declare all parameters that could change w/ execution of a given function

This makes Functional programming *side-effect free* due the the software contract that describes what can actually change by calling a function.

Python isn't a functional programming language in the *pure* sense, since you can have many side effects of functions, + you certainly don't have to pass in all parameters you're interested in changing. 

But functional programming causes one to think more heavily while** chaining** operations together, an of underlying theme in much of data science, + date cleaning in particular

Functional programming methods are often used in Python, + it's not uncommon to see a parameter for a function be a function itself. 

**map** --> Every  parameter is something which can be iterated upon. 

<br>
Here's an example of **mapping** the `min` function between two lists.

In [9]:
# create data
store1 = [10.00, 11.00, 12.34, 2.34]
store2 = [9.00, 11.10, 12.34, 2.01]

# find the min value for each index over each list via map()
# goes through each index and for the 2 values from each store, picks the cheapest
cheapest = map(min, store1, store2)
cheapest

<map at 0x4de4ba8>

map returns a map object, an example of **lazy evaluation**.

It doesn't *actually try and run the function min on 2 items until you look inside for a value*. 

This is an interesting design pattern of the language + it's commonly used when dealing w/ big data --> allows us to have very efficient memory management, even though something might be computationally complex. 

Maps are **iterable**, just like lists and tuples, so we can use a FOR loop to look at all of the values in the map. 

This passing around of functions + data structures they should be applied to is a hallmark of functional programming + is very common in data analysis + cleaning

Now let's iterate through the map object to actually see the values.

In [10]:
for item in cheapest:
    print(item)

9.0
11.0
12.34
2.01


In [19]:
# list of all faculty teaching the MOOC --> write a function + apply map() to the list to get the list of all titles
# as well as last names (i.e. 'Dr. Brooks')
people = ['Dr. Christopher Brooks', 'Dr. Kevyn Collins-Thompson', 'Dr. VG Vinod Vydiswaran', 'Dr. Daniel Romero']

def split_title_and_name(person):
    # split the item given by spaces and return the title (1st index) and last name (3rd index)
    return person.split(' ')[0] + ' ' + person.split(' ')[-1]

# apply the split_title_and_name function to each item in the people list and put the results into a list
list(map(split_title_and_name,people))

['Dr. Brooks', 'Dr. Collins-Thompson', 'Dr. Vydiswaran', 'Dr. Romero']

In [20]:
# optional solution given
people = ['Dr. Christopher Brooks', 'Dr. Kevyn Collins-Thompson', 'Dr. VG Vinod Vydiswaran', 'Dr. Daniel Romero']

def split_title_and_name(person):
    title = person.split()[0]
    lastname = person.split()[-1]
    return '{} {}'.format(title, lastname)

list(map(split_title_and_name, people))


['Dr. Brooks', 'Dr. Collins-Thompson', 'Dr. Vydiswaran', 'Dr. Romero']

<br>
# The Python Programming Language: Lambda and List Comprehensions

**lambda** Python's way of creating **anonymous functions** --> same as other functions but w/ no name w/ intent = they're simple or short lived + easier just to write out in 1 line instead of going to the trouble of creating a named function. 

Declare a lambda function w/ the word **lambda** followed by a list of arguments, a colon, + then a *single* expression (only 1 expression to be evaluated in a lambda) whose value is returned on execution of the lambda. 

The return of a lambda is a **function reference**. So in this case, you would execute my_function and pass in three different parameters. 

Can't have default values for parameters or complex logic inside of lambda itself b/c limited to a single expression. 

Much more limited than full function definitions but useful for simple little data cleaning tasks.

<br>
Here's an example of **lambda** that takes in 3 parameters and adds the first 2 and returns that sum

In [21]:
my_function = lambda a, b, c : a + b

In [22]:
my_function(1, 2, 3)

3

In [36]:
# convert function into a lambda
people = ['Dr. Christopher Brooks', 'Dr. Kevyn Collins-Thompson', 'Dr. VG Vinod Vydiswaran', 'Dr. Daniel Romero']

#def split_title_and_name(person):
 #   return person.split()[0] + ' ' + person.split()[-1]

#option 1
#for person in people:
#    print(split_title_and_name(person) == (
#for person in people:
split_title_and_name2 = lambda person : person.split()[0] + ' ' + person.split()[-1]#))
for person in people:
    print(split_title_and_name2(person))
#option 2
#list(map(split_title_and_name, people)) == list(map(???))


Dr. Brooks
Dr. Collins-Thompson
Dr. Vydiswaran
Dr. Romero


In [49]:
# convert function into a lambda
people = ['Dr. Christopher Brooks', 'Dr. Kevyn Collins-Thompson', 'Dr. VG Vinod Vydiswaran', 'Dr. Daniel Romero']

def split_title_and_name(person):
    return person.split()[0] + ' ' + person.split()[-1]

#option 1
for person in people:
    print(split_title_and_name(person) == (lambda x : x.split()[0] + ' ' + x.split()[-1])(person))    

True
True
True
True


In [45]:
# option 2
list(map(split_title_and_name, people)) == list(map(lambda person : person.split()[0] + ' ' + person.split()[-1],people))

True

**Sequences** = structures we can iterate over + often created through loops or by reading in data from a file. 

Python has built in support for creating these **collections** using a more abbreviated syntax called **list comprehensions** to write it all in 1 line.

* Start the list comprehension with the value we want *IN* the list, then a FOR-loop, and finally add any condition clauses.
* much more compact of a format + tends to be faster 


Just like lambdas, list comprehensions are a condensed format which may offer readability + performance benefits and you'll often find them being used in data science tutorials or on stack overflow.

<br>
Let's iterate from 0 to 999 and return the even numbers.

In [54]:
my_list = []
for number in range(0, 1000):
    if number % 2 == 0:
        my_list.append(number)
my_list[:10]

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

<br>
Now the same thing but with **list comprehension** in 1 line

In [55]:
# the item to add in the list is the number in the range from 0-1000 if it's even
my_list = [number for number in range(0,1000) if number % 2 == 0]
my_list[:10]

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

In [72]:
#convert function to list comprehension
def times_tables():
    lst = []
    for i in range(10):
        for j in range (10):
            lst.append(i*j)
    return lst

times_tables() == [i*j for i in range(10) for j in range(10)]

True

In [87]:
lowercase = 'abcdefghijklmnopqrstuvwxyz'
digits = '0123456789'

answer = [l1+l2+d1+d2 for l1 in lowercase for l2 in lowercase for d1 in digits for d2 in digits]
#correct_answer == answer
print(answer[:15],answer[-15:])

['aa00', 'aa01', 'aa02', 'aa03', 'aa04', 'aa05', 'aa06', 'aa07', 'aa08', 'aa09', 'aa10', 'aa11', 'aa12', 'aa13', 'aa14'] ['zz85', 'zz86', 'zz87', 'zz88', 'zz89', 'zz90', 'zz91', 'zz92', 'zz93', 'zz94', 'zz95', 'zz96', 'zz97', 'zz98', 'zz99']


<br>
# The Python Programming Language: Numerical Python (NumPy)

In [88]:
import numpy as np

## Creating Arrays

Create a list and convert it to a numpy array

In [89]:
mylist = [1, 2, 3]
x = np.array(mylist)
x

array([1, 2, 3])

<br>
Or just pass in a list directly

In [90]:
y = np.array([4, 5, 6])
y

array([4, 5, 6])

<br>
Pass in a list of lists to create a **multidimensional** array = **matrix**

In [91]:
m = np.array([[7, 8, 9], [10, 11, 12]])
m

array([[ 7,  8,  9],
       [10, 11, 12]])

<br>
Use the shape method to find the dimensions of the array. (rows, columns)

In [92]:
m.shape

(2, 3)

<br>
`arange` returns evenly spaced values within a given interval.

In [93]:
# start at 0 count up by 2, stop before 30
n = np.arange(0,30,2)
n

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28])

<br>
`reshape` returns an array with the same data with a new shape.

In [94]:
# reshape array to be 3x5
n = n.reshape(3, 5) 
n

array([[ 0,  2,  4,  6,  8],
       [10, 12, 14, 16, 18],
       [20, 22, 24, 26, 28]])

<br>
`linspace` returns evenly spaced numbers over a specified interval.

In [100]:
# return 9 evenly spaced values from 0 to 4
o = np.linspace(0,4,9)
o

array([ 0. ,  0.5,  1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ])

<br>
`resize` changes the shape and size of array **in-place** ==> don't have to reassign like with .reshape

In [101]:
# change to 3x3 array 
o.resize(3, 3)
o

array([[ 0. ,  0.5,  1. ],
       [ 1.5,  2. ,  2.5],
       [ 3. ,  3.5,  4. ]])

<br>
`ones` returns a new array of given shape and type, filled with ones.

In [102]:
np.ones((3, 2))

array([[ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.]])

<br>
`zeros` returns a new array of given shape and type, filled with zeros.

In [103]:
np.zeros((2, 3))

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

<br>
`eye` returns a 2-D array with ones on the diagonal and zeros elsewhere --> **identity matrix**

In [104]:
np.eye(3)

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

<br>
`diag` extracts a diagonal or constructs a diagonal array.

In [109]:
y = np.array([4, 5, 6])
np.diag(y)

array([[4, 0, 0],
       [0, 5, 0],
       [0, 0, 6]])

<br>
Create an array using repeating list (or see `np.tile`)

In [110]:
np.array([1, 2, 3] * 3)

array([1, 2, 3, 1, 2, 3, 1, 2, 3])

In [112]:
l = np.array([1, 2, 3])
np.tile(l,3)

array([1, 2, 3, 1, 2, 3, 1, 2, 3])

<br>
Repeat elements of an array using `repeat`.

In [111]:
np.repeat([1, 2, 3], 3)

array([1, 1, 1, 2, 2, 2, 3, 3, 3])

<br>
#### Combining Arrays

In [113]:
p = np.ones([2, 3], int)
p

array([[1, 1, 1],
       [1, 1, 1]])

<br>
Use `vstack` to stack arrays in sequence vertically (row wise).

In [114]:
np.vstack([p, 2*p])

array([[1, 1, 1],
       [1, 1, 1],
       [2, 2, 2],
       [2, 2, 2]])

<br>
Use `hstack` to stack arrays in sequence horizontally (column wise).

In [115]:
np.hstack([p, 2*p])

array([[1, 1, 1, 2, 2, 2],
       [1, 1, 1, 2, 2, 2]])

<br>
## Operations

Use `+`, `-`, `*`, `/` and `**` to perform element wise addition, subtraction, multiplication, division and power.

In [None]:
x = [1, 2, 3]
y = [3, 4, 5]
# elementwise addition
print(x + y)
# elementwise subtraction 
print(x - y)

In [116]:
# elementwise multiplication 
print(x * y)
# elementwise divison 
print(x / y)

[ 4 10 18]
[ 0.25  0.4   0.5 ]


In [117]:
# elementwise power
print(x**2)

[1 4 9]


<br>
**Dot Product:**  

$ \begin{bmatrix}x_1 \ x_2 \ x_3\end{bmatrix}
\cdot
\begin{bmatrix}y_1 \\ y_2 \\ y_3\end{bmatrix}
= x_1 y_1 + x_2 y_2 + x_3 y_3$

In [118]:
x.dot(y)

32

In [121]:
z = np.array([y, y**2])
# number of rows of array
print(len(z),'\n\n',z) 

2 

 [[ 4  5  6]
 [16 25 36]]


<br>
Let's look at transposing arrays. Transposing permutes the dimensions of the array.

In [127]:
z = np.array([y, y**2])
print(z.shape,'\n\n',z,'\n\n',z.T.shape,'\n\n',z.T)

(2, 3) 

 [[ 4  5  6]
 [16 25 36]] 

 (3, 2) 

 [[ 4 16]
 [ 5 25]
 [ 6 36]]


<br>
Use `.dtype` to see the data type of the elements in the array.

In [128]:
z.dtype

dtype('int32')

<br>
Use `.astype` to cast to a specific type.

In [129]:
z = z.astype('f')
z.dtype

dtype('float32')

In [130]:
z

array([[  4.,   5.,   6.],
       [ 16.,  25.,  36.]], dtype=float32)

<br>
## Math Functions

Numpy has many built in math functions that can be performed on arrays.

In [131]:
a = np.array([-4, -2, 1, 3, 5])

In [132]:
a.sum()

3

In [133]:
a.max()

5

In [134]:
a.min()

-4

In [135]:
a.mean()

0.59999999999999998

In [136]:
a.std()

3.2619012860600183

<br>
`argmax` and `argmin` return the index of the maximum and minimum values in the array.

In [137]:
a.argmax()

4

In [138]:
a.argmin()

0

<br>
## Indexing / Slicing

In [139]:
# array of each number from 0-13 squared
s = np.arange(13)**2
s

array([  0,   1,   4,   9,  16,  25,  36,  49,  64,  81, 100, 121, 144])

<br>
Use bracket notation to get the value at a specific index. Remember that indexing starts at 0.

In [140]:
s[0], s[4], s[-1]

(0, 16, 144)

<br>
Use `:` to indicate a range. `array[start:stop]`


Leaving `start` or `stop` empty will default to the beginning/end of the array.

In [141]:
s[1:5]

array([ 1,  4,  9, 16])

<br>
Use negatives to count from the back.

In [142]:
s[-4:]

array([ 81, 100, 121, 144])

<br>
A second `:` can be used to indicate step-size. `array[start:stop:stepsize]`

Here we are starting 5th element from the end, and counting indices backwards by 2 until the beginning of the array is reached.

In [143]:
s[-5::-2]

array([64, 36, 16,  4,  0])

<br>
Let's look at a multidimensional array.

In [147]:
# create a range from 0-36 into 6x6 matrix
r = np.arange(36)
r.resize((6, 6))
r

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])

<br>
Use bracket notation to slice: `array[row, column]`

In [148]:
# value in row 3, col 3
r[2, 2]

14

<br>
And use : to select a range of rows or columns

In [149]:
# get values in row 4 in cols 4-6
r[3, 3:6]

array([21, 22, 23])

<br>
Here we are selecting all the rows up to (and not including) row 2, and all the columns up to (and not including) the last column.

In [150]:
# get values from rows 1-2 and all cols expect last (6)
r[:2, :-1]

array([[ 0,  1,  2,  3,  4],
       [ 6,  7,  8,  9, 10]])

<br>
This is a slice of the last row, and only every other element/col

In [151]:
r[-1, ::2]

array([30, 32, 34])

<br>
We can also perform conditional indexing. Here we are selecting values from the array that are greater than 30. (Also see `np.where`)

In [152]:
r[r > 30]

array([31, 32, 33, 34, 35])

<br>
Here we are assigning all values in the array that are greater than 30 to the value of 30.

In [153]:
r[r > 30] = 30
r

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 30, 30, 30, 30, 30]])

<br>
## Copying Data

Be careful with copying and modifying arrays in NumPy!


`r2` is a slice of `r`

In [154]:
r2 = r[:3,:3]
r2

array([[ 0,  1,  2],
       [ 6,  7,  8],
       [12, 13, 14]])

<br>
Set this slice's values to zero ([:] selects the entire array)

In [155]:
r2[:] = 0
r2

array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

<br>
`r` has also been changed!

In [156]:
r

array([[ 0,  0,  0,  3,  4,  5],
       [ 0,  0,  0,  9, 10, 11],
       [ 0,  0,  0, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 30, 30, 30, 30, 30]])

<br>
To avoid this, use `r.copy` to create a **copy **that will NOT affect the original array

In [158]:
r_copy = r.copy()
r_copy

array([[ 0,  0,  0,  3,  4,  5],
       [ 0,  0,  0,  9, 10, 11],
       [ 0,  0,  0, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 30, 30, 30, 30, 30]])

<br>
Now when r_copy is modified, r will not be changed.

In [159]:
r_copy[:] = 10
print(r_copy, '\n')
print(r)

[[10 10 10 10 10 10]
 [10 10 10 10 10 10]
 [10 10 10 10 10 10]
 [10 10 10 10 10 10]
 [10 10 10 10 10 10]
 [10 10 10 10 10 10]] 

[[ 0  0  0  3  4  5]
 [ 0  0  0  9 10 11]
 [ 0  0  0 15 16 17]
 [18 19 20 21 22 23]
 [24 25 26 27 28 29]
 [30 30 30 30 30 30]]


<br>
### Iterating Over Arrays

Let's create a new 4 by 3 array of random numbers 0-9.

In [160]:
test = np.random.randint(0, 10, (4,3))
test

array([[0, 7, 7],
       [5, 1, 0],
       [7, 0, 3],
       [9, 6, 4]])

<br>
Iterate by row:

In [161]:
for row in test:
    print(row)

[0 7 7]
[5 1 0]
[7 0 3]
[9 6 4]


<br>
Iterate by index: print the row for each row in the matrix

In [162]:
for i in range(len(test)):
    print(test[i])

[0 7 7]
[5 1 0]
[7 0 3]
[9 6 4]


<br>
**enumerate** creates an iterable object

Iterate by row and index:

In [163]:
for i, row in enumerate(test):
    print('row', i, 'is', row)

row 0 is [0 7 7]
row 1 is [5 1 0]
row 2 is [7 0 3]
row 3 is [9 6 4]


<br>
Use `zip` to iterate over multiple iterables.

In [164]:
test2 = test**2
test2

array([[ 0, 49, 49],
       [25,  1,  0],
       [49,  0,  9],
       [81, 36, 16]])

In [165]:
for i, j in zip(test, test2):
    print(i,'+',j,'=',i+j)

[0 7 7] + [ 0 49 49] = [ 0 56 56]
[5 1 0] + [25  1  0] = [30  2  0]
[7 0 3] + [49  0  9] = [56  0 12]
[9 6 4] + [81 36 16] = [90 42 20]
