# Python warm up

To refresh your knowledge in preparation for today's lecture, we'll quickly review Python's built-in, general-purpose "container" data types: **list**, **tuple**, **range**, **set** and **dict**. We'll use a number of the common *sequence operations* as we go.

## Sequence Types

First, let us recall that the most primitive form of **sequence** is the *string*. The string object is, in fact, an *immutable sequence* of unicode characters:

In [1]:
magicWord = 'abracadabra'

# Note the escaped single quote \' in the string handed to print()
# len() returns the length of a sequence
print('Today\'s magic word "{0}" has {1} characters.'.format(magicWord, len(magicWord))) 

Today's magic word "abracadabra" has 11 characters.


Like all sequence types, strings support element indexer [i] and slicer [i:j:k] operations

In [2]:
magicWord[0] # Recall: indexers begin with 0. Consequently, the first element in the sequence is retrieved with [0]

'a'

In [3]:
magicWord[-4:-2] # Recall: we use negative indexes to access elements starting from the end of the sequence

'ab'

**Exercise:** Extract the word 'cadabra'.

_**Question**_: What is the *third parameter* in the slicer operation good for?

In [None]:
#What is good for?

We use the **step** parameter to specify the *interval* of the slice operation. In the above example, we select every third item in our string, **start**ing with the second character (index 1!) and **stop**ping at the 11th character (index 10).

String supports all the common sequence functions like x in s, len(), count(x), min(), max(), index(i) and the overloaded operaters + and * for concatenation and repetition, respectively: e.g.

**Exercise:** Duplicate our _magic_ word 10 times.

In [None]:
# ten times the magic!

**Exercise:** Check whether the word cadabra does appear in our magic word.

In [None]:
#look for cadabra in our magic word

### Tuples

**Tuples are immutable sequences** typically used to store *heterogeneous* data. Tuples are enclosed in parentheses: **( )**.

Can we turn our magicWord string into a tuple?

Yes we can! We pass it to the **tuple()** constructor.

**Exercise:** Convert our magicWord into a _tuple_

**Question:** Can we make changes to our tuple?

In [None]:
#Try to edit our Tuple

In [None]:
#Any idea on how we could expand our tuple

In [None]:
# ABBA


In [4]:
cat = ('Cosmo', 'British shorthair', 'black', 3.5)
cat

('Cosmo', 'British shorthair', 'black', 3.5)

**Note**: the *parentheses* in the printed output tell you that cat is a tuple.
 Use the **type()** function to confirm what kind of object our cat is:

In [None]:
#your code below.


How would you return the *name* of our cat?

In [None]:
# your code below. Hint: index 


**Question**: what is the problem or inconvenience of using the tuple[i] operation?

There must be a better way! And there is...

In [None]:
#create a cat object based on a namedtuple, 'Animal' to store attributes for Name, Type, Colour and Weight

In [None]:
import collections

Now, with a named tuple we can use the "dot" syntax on the object to retrieve any of its elements, e.g.

What colour is Cosmo? And what type of cat is he?

In [None]:
# your code below.

We can't change our cat object because its a tuple. We could, however, convert it into a dictionary:

**Exercise:** Convert our cat into a dictionary

In [None]:
#Conversion

**Exercise:** Add an Owner and Colour to our dictionary.

In [None]:
# Add owner and colour

### Lists

**Lists are mutable sequences** typically used to store *homogenous* data. Lists are enclosed in square brackets: **[ ]**.

You can use [] to create an empty list.

A string can also be cast to a list using the **list()** constructor:

**Exercise:** Convert our magicWord into a list

In [None]:
magicWord

In [None]:
# Convert

**Exercise:** You can sort the items in the list using the **sort()** - function:

**Question:** How many times does the letter a appear in our list?

Unlike tuples, which are immutable, we can make changes to a list. *Lists are mutable objects*. To add elements to a list, use the **append()** or **insert()** functions

**Exercise: Add an Z to the beginning and an X to the end**

In [None]:
## magicList

**Exercise:** Given a string "hocus pocus", append each character in this string to the magicList using the iterator pattern

To remove elements from a list we can use **del()** with an *index*, **remove()** with a *value* or **pop()** to *remove and return* an element.

In [None]:
#delete the last 3 items using del
# changedList = None

We can combine del with a slicer to delete a range of values.

In [None]:
#remove the b from the list

To *remove all elements* we could use the slicer with no start and stop indexes: [:] or, more simply, the **clear()** function

### Ranges

A **range** is an *immutable ordered sequence of numeric elements*.
To create a range object we use the **range(start, stop, step)** function, whereby the start and step parameters are optional:

Tip: We can use a **list comprehension** to iterate over the range sequence, additionally defining a predicate to select only even numbered elements

In [None]:
#print every even number in the list
# HINT: FOR IN?

## Mapping Types

### Dictionaries

A **dictionary** is a *mutable, unordered table which maps hashable keys to arbitrary values*.

**dict.keys()** returns a live dictionaryview object of the dictionary's keys, **dict.values()** returns a view of the values and **dict.items()** returns a view of (k:v) pairs as a list of 2-tuples which can be accessed using k and v elements.

The key:value pairs of a dict object are enclosed in curly braces, **{ }**.

Let us take the following dictionary containing the favourite sports of each country

In [None]:
sports = {'Archery': 'Bhutan',
          'Golf': 'Scotland',
          'Sumo': 'Japan',
          'Ski': 'Austria',
          'Drinking Beer': 'Austria',
          'Taekwondo': 'South Korea'}

Display only the sports in the dataset

In [None]:
# Hint. Ask yourself, are the sports used as the key or value in the dictionary?

Display only countries in the dataset

In [None]:
# Hint. Ask yourself, are the countries used as the key or value in the dictionary?

Again, and this time, please remove the duplicates.

In [None]:
#Hint. What convenient container type swallows duplicates to return only distinct values?

Show us which country loves to play Golf:

What are the favourite sports of Austria?

In [None]:
# Hint: define a list comprehension using the dictionary view object returned by items() and filter on the v elements

It appears that we have an error in our data set. Could you please clean the data?

It seems like we forgot to add some data to our set. Would you please add the following pairs to the dictionary: USA = Football, India = Cricket, Baseball = Venezuela

Other sports we need to add.

In [None]:
other_sports = {'Soccer': 'Spain',
          'Golf': 'Scotland',
          'Baseball': 'USA',
          'Ski': 'Canada'}

Oops, now it seems like we need to figure out a different dataset in order to combine this. Any ideas?

In [None]:
# Combine the DataSet 

## Lambda

You may have seen the keyword lambda appear in this week's content, and you'll certainly see it appear more as you spend more and more time with Python and data science. Lambda's are Python's way of creating anonymous functions. These are the same as other functions, but they have no name. The intent is that they're simple or short lived and it's easier just to write out the function in one line instead of going to the trouble of creating a named function. 

The lambda syntax is fairly simple. But it might take a bit of time to get used to. 

In [None]:
# You declare a lambda function with the word lambda followed by a list of arguments, 
# followed by a colon and then a single expression and this is key. 
# There's only one expression to be evaluated in a lambda. 
# The expression value is returned on execution of the lambda. 
# The return of a lambda is a function reference. 
# So in this case, you would execute my_function and pass in three different parameters. 
my_function = lambda a, b, c : a + b

In [None]:
my_function(1, 2, 3)

Note that you can't have default values for lambda parameters and you can't have complex logic inside of the lambda itself because you're limited to a single expression. 

Convert this to lambda

In [None]:
people = ['Dr. Christopher Brooks', 'Dr. Kevyn Collins-Thompson', 'Dr. VG Vinod Vydiswaran', 'Dr. Daniel Romero']

def split_title_and_name(person):
    return person.split()[0] + ' ' + person.split()[-1]

## List comprehension

Redefine our times_tables in a list comprehension

In [None]:
def times_tables():
    lst = []
    for i in range(10):
        for j in range (10):
            lst.append(i*j)
    return lst

times_tables() == [???]

Here's a harder question which brings a few things together.

Many organizations have user ids which are constrained in some way. Imagine you work at an internet service provider and the user ids are all two letters followed by two numbers (e.g. aa49). Your task at such an organization might be to hold a record on the billing activity for each possible user.

Write an initialisation line as single list comprehension which creates a list of all possible user ids. Assume letters are all lower case

In [None]:
lowercase = 'abcdefghijklmnopqrstuvwxyz'
digits = '0123456789'

answer = [???]
correct_answer == answer

## Numpy

Numpy is a package widely used in the data science community which lets us work efficiently with arrays and matrices in Python. 

In [None]:
#First, let's import Numpy as np. 

Now let's make our first array. We can start by creating a list and converting it to an array. 

In [None]:
#Create a python list of numbers

With **np.array()** you can convert a list to a numpy array

In [None]:
#Convert your list to a numpy array

We can do it more succinctly by passing the list directly.

In [None]:
# pass list directly to the np.array method

Now let's make multidimensional arrays by passing in a list of lists. (Matrices). We pass in two lists with three elements each, and we get a two by three array. 

In [None]:
# np.array with 2 lists

We can check the dimensions by using the **shape** attribute. 

In [None]:
#check with shape

For the **arange** function, we pass in a start, a stop, and a step size, and it returns evenly spaced values within a given interval. 

In [None]:
n = np.arange(0,30,2)
n

So suppose we wanted to convert this array of numbers to a three by five array. We can use reshape to do that. 

In [None]:
#reshape (3,5)

Use **resize** to change the size of our array

In [None]:
#resize

**ones()** to return a matrix of ones. 
**zeros()** for zeros.
**eye()** for Identity Matrix

In [None]:
# call ones

You can use the times-operator **\*** to replicate items or use repeat to repeat.

In [None]:
#Replicate using *


### Operations

Performing elementwise addition, subtraction, multiplication, and division is straightforward, as is raising all the numbers of an array to a power. 

In [None]:
x = np.array([[1, 2, 3] * 2 ,[4, 5, 6]* 2] *2)
#try out different operations

For those familiar with linear algebra, the dot product can be done using the dot function. We can also take the transpose of an array using the t method, which swaps the rows and columns. 

In [None]:
# calculate a dot product and transpose

Numpy also has many useful math functions that we can use. Let's look at a few commonly used ones. 

In [None]:
#Create an array of random values using numpy.random.rand()

We can look the sum of the values in the array, the maximum and minimum, Or the mean and standard deviation. 

In [None]:
#try them out

To find the index of a maximum or minimum value, we can use argmax and argmin. 


In [None]:
# 

#### Indexing and slicing just like with tuples and lists

In [None]:
r = np.arange(36)
r.resize([6,6])
print(r)

Get the value of the 2nd row and 2nd column

Now let's use colon notation to get a slice of the third row and columns three to six. We can also do something like get the first two rows and all of the columns except the last. 

Try out the '<' or '>' operator and figure out what happens

In [None]:
#reassign any value bigger 30