In [1]:
from __future__ import division, unicode_literals, print_function
import unittest

# Data structures

So far, we have mostly dealt with variables that are organized in an *ad hoc* way. Most data, however, is not organized as single variables. Data has relationships - it has a *structure* that is often as meaningful as the individual data points itself.

As an example, imagine if chracters were always stored in individual variables, and there was no concept of a "string." We would lose a lot of the common grammar that we have for dealing with string data (replacements, counting, searching for substrings, etc.) This would be incredibly painful. While the data structures that we talk about here are a little more abstract than strings, you will quickly find that they are as indispensible as strings.

Python offers four major built in ways of organizing data, all of which we will talk about here. To list them briefly, they are:

0. Lists: Ordered lists
1. Tuples: Immutable collections
2. Dictionaries: Associated pairs
3. Sets: Unordered collections

We'll go through the first two in this lesson, and the second two in the next lesson.

# Lists

Lists are ordered collections of items. They model sequential data (for example, data observations over time, or the collection of lines in a file.) By items, we mean any object in python (numbers, strings, booleans, even lists!) The main way that people construct a list in python is to surround the data with square brakets.

In [6]:
list_numbers = [0, 1, 2, 3, 4]

# Lists can extend over multiple lines
list_strings = ['Now', 'this', 'is', 'a', 'story', 
                'all', 'about', 'how', 'my', 'life', 
                'got', 'flipped', 'turned', 'upside', 'down']

# Any type of data can be stored in lists, including lists
list_mixed = [1, 'apple', [1,2,3], 4.567]

There are also many functions that return lists. One such function is the string function .split(), which "splits" a string into a list of "words" based on a seperating character (which would be a space in a normal sentence)

In [39]:
"This is a sentence to be split into words".split()

['This', 'is', 'a', 'sentence', 'to', 'be', 'split', 'into', 'words']

## List slicing
Like strings, lists can be sliced. The syntax is identical to strings!

In [30]:
list_strings[5]

'all'

In [8]:
list_numbers[0:2]

[0, 1]

In [9]:
list_numbers[::2]

[0, 2, 4]

In [10]:
list_strings[::-1]

['down',
 'upside',
 'turned',
 'flipped',
 'got',
 'life',
 'my',
 'how',
 'about',
 'all',
 'story',
 'a',
 'is',
 'this',
 'Now']

## List methods

Lists are *mutable*, which means that they can be altered after they are created. For example

In [11]:
list_strings[9] = 'cat'

In [12]:
list_strings

['Now',
 'this',
 'is',
 'a',
 'story',
 'all',
 'about',
 'how',
 'my',
 'cat',
 'got',
 'flipped',
 'turned',
 'upside',
 'down']

Beacuse lists are mutable, they have a variety of methods allowing mutation. Let's look at a few useful ones:

Probably the most important method for a list is .append(item), which grows the list by adding an item to the end

In [18]:
list_numbers.append(10)

In [14]:
list_numbers

[0, 1, 2, 3, 4, 10]

This allows you to grow a list bit by bit

Some other examples of methods are

In [15]:
# Returns the index of an item in the list, or -1 if not found. Like string.find()
list_numbers.index(10)

5

In [19]:
# Removes an item by value
list_numbers.remove(10)

In [20]:
list_numbers

[0, 1, 2, 3, 4]

In [22]:
# Exactly the same as for strings. Shows you how many times an items occurs in a list
list_numbers.count(4)

1

## Lists are shared in memory

An important "gotcha" with lists is that Python tries to optimize memory usage with them. This is because they can be long, and take up a lot of space in memory. Thus, if you re-assign a list to another variable, it doesn't copy the list to this variable, but instead just makes this second variable a "link" to the original list.

The effect of this is that modifying the reference will also modify the original list.

In [23]:
list_copy = list_numbers

In [25]:
list_numbers

[0, 1, 2, 3, 4]

In [28]:
list_copy.append(500) # Also affects list numbers!

In [29]:
list_numbers

[0, 1, 2, 3, 4, 500, 500]

If you want to make a real copy of a list, the best way to do so is using the copy library. A quick and dirty way to do it is to use a full slice of the list, though, as below

In [44]:
real_copy = list_numbers[:]

In [45]:
real_copy.append(4567)

In [47]:
print("Original list: {}".format(list_numbers))
print("True copy: {}".format(real_copy))

Original list: [0, 1, 2, 3, 4, 500, 500]
True copy: [0, 1, 2, 3, 4, 500, 500, 4567]


# Tuples

Tuples are somewhat similar to lists - they are created by surrounding comma seperated values with parantheses, like so. 

```python
number_tuple = (1,2,3)
```

The main difference is that tuples are *immutable*, meaning that once they're created they cannot be changed. Tuples are often used in python to represent lists where the structure of the list is fixed (i.e. where each entry has a specific meaning.)

For example, colors are represented on computers as three integers representing the red, green and blue channels. In this case, the structure of the list is fixed. We will never have another color channel, and we don't want the order of the channels being mixed up. We would represent each pixel then as a tuple of (r, g, b) values.

In [56]:
color_1 = (50, 255, 127) # Green
color_2 = (127, 127, 127) # Grey

In [60]:
# Tuples are immutable: assignment causes an error

color_1[0] = 25

TypeError: 'tuple' object does not support item assignment

## Tuple indexing

Tuples can be indexed and sliced much like lists and strings

In [59]:
print(color_2[0])
print(color_1[0:2])

127
(50, 255)


## Multiple assignment (destructuring)

Tuples can be used to assign multiple variables in one line. For example, if we wanted to set the variables a and b to 1 and 2, respectively, we could write

In [63]:
a = 1
b = 2

But it is more concise to write this as

In [64]:
a, b = 1, 2

This also allows for easy swapping of variables

In [65]:
a, b = b, a
print("a = {}, b = {}".format(a, b))

a = 2, b = 1


Finally, we can easily use this to extract the individual variables from a tuple into their own variables. This is sometimes called "destructuring" in programming.

In [67]:
r, g, b = color_1

print("R: ", r)
print("G: ", g)
print("B: ", b)

R:  50
G:  255
B:  127


# The *for* loop

One of the most common tasks in programming is to visit every item in a list, and do someting with each item. Most programming languages offer an expression to perform this task, called a *for loop*. In python, you express a *for* loop as follows:

```python
for item in list:
    # Code that does something with the variable item
```

The way we define a *for* loop is similar to how *if* statements or functions are defined, with a colon and an indented block. In this case, the indented block is executed for every item in the list, with the variable *item* passed to the indented code.

Let's look at some examples

In [31]:
# Go through the list of numbers, and print each number + 1

for number in list_numbers:
    print(number+1)

1
2
3
4
5
501
501


In [34]:
# Construct a sentence out of the list of strings using string concatenation

sentence = ""
for word in list_strings:
    sentence = sentence + word + " " # Add a space between words

sentence

'Now this is a story all about how my cat got flipped turned upside down '

In all of these data structures, you will find that for loops are a powerful way to manipulate them

# Excercises 

Since data structures are such a critical part of Python programming, there's a few more excercises than normal to make sure you get the hang of them. 

## Squaring a list of numbers

Write a function that takes a list of numbers, and returns a new list where every number is squared

Example:
```python
square_numbers([-3, -4, -5])
```

Should output 
```python
[9, 16, 25]
```

In [76]:
def square_numbers(numbers):
    squared_numbers = []
    return squared_numbers

In [73]:
class SquaresTest(unittest.TestCase):
    def test_positive(self):
        self.assertEqual(square_numbers([3,4,5]), [9, 16, 25])
    def test_negative(self):
        self.assertEqual(square_numbers([-3,-4,-5]),  [9, 16, 25], 'Handles negative numbers incorrectly')
    def test_float(self):
        self.assertEqual(square_numbers([3.5,4.5,5.5]), [12.25, 20.25, 30.25], 'Handles floats incorrectly')
    def test_empty(self):
        self.assertEqual(square_numbers([]), [], 'Does not return an empty list when given one')

suite = unittest.TestLoader().loadTestsFromTestCase(SquaresTest)
unittest.TextTestRunner().run(suite)

....
----------------------------------------------------------------------
Ran 4 tests in 0.004s

OK


<unittest.runner.TextTestResult run=4 errors=0 failures=0>

## Use every other word from a sentence

Take a sentence as a string, and output a new setence where you have taken every other word from it. If the new sentence doesn't end in a period, add a period to it.

HINT: It would be good practice to write a loop to combine the sentence together. However, there's a useful way to convert a list back into a string using the .join() method on a *string* (Not a list!) 

HINT 2: If you can't remember how to take every other item from a collection, review slicing!

In [75]:
def take_every_other_word(sentence):
    return ""
    

In [47]:
class EveryOtherWordTest(unittest.TestCase):
    def test_adding_a_period(self):
        even_sentence = "I want to stand with you on the mountains"
        self.assertEqual(take_every_other_word(even_sentence), "I to with on mountains.", "Did you remember to add a period to the end?")
    def test_not_too_many_periods(self):
        even_sentence = "I want to stand with you on the mountains."
        self.assertNotEqual(take_every_other_word(even_sentence), "I to with on mountains..", "Do you check if the sentence already has a period?")
    def test_odd_sentence(self):
        odd_sentence = "I'm sorry Dave, but I can't do that."
        self.assertEqual(take_every_other_word(odd_sentence), "I'm Dave, I do.")

suite = unittest.TestLoader().loadTestsFromTestCase(EveryOtherWordTest)
unittest.TextTestRunner().run(suite)

...
----------------------------------------------------------------------
Ran 3 tests in 0.005s

OK


<unittest.runner.TextTestResult run=3 errors=0 failures=0>

## Second largest element in a list

Given a list of numbers, return the second largest number in the list.

Example input: 
```python 
[1,2,3,4,500,3]
```
Example output:
```python 
4
```

If the list contains only one number return that number. If the list is empty, it is ok for your program to crash.

As a sample, I've provided a function that finds the largest number 

In [74]:
def largest(numbers):
    largest_so_far = numbers[0]
    # Start at the second element of the list
    # If the list is only one element, numbers[1:] is empty
    # and the for loop does not do anything
    for number in numbers[1:]: 
        if number > largest_so_far:
            largest_so_far = number
    return largest_so_far
        

def second_largest(numbers):
    return 0

In [59]:
class SecondLargestTest(unittest.TestCase):
    def test_overall(self):
        result = second_largest([1,2,3,4,500,3])
        self.assertNotEqual(result, 500, 'Looks like you accidentally found the largest!')
        self.assertEqual(result, 4)
    def test_single_element_list(self):
        self.assertEqual(second_largest([1]), 1, 'Do you handle single element lists correctly?')

suite = unittest.TestLoader().loadTestsFromTestCase(SecondLargestTest)
unittest.TextTestRunner().run(suite)

..
----------------------------------------------------------------------
Ran 2 tests in 0.002s

OK


<unittest.runner.TextTestResult run=2 errors=0 failures=0>