In the first lesson we used Numpy arrays to manipulate temperature data. Numpy arrays are not built into Python; we had to import the library `numpy` in order to be able to use them. These data structures are the most useful for scientific computing because the allow us to easily do math on our data. For anyone who's programmed in Matlab or C before, these are also the most familiar data structure (which is why we introduce them first!).

One of the built-in data structures that you will definitely use extensively in Python is the list. Because they are built in, we don't need to load a library to use them. A list is exactly what it sounds like -- a sequence of things that don't all have to be the same type. Lists are ordered, so we can access the items through an integer index (like we did with Numpy arrays), and one list can simultaneously contain numbers, strings, other lists, numpy arrays, and even commands to run.

Lists are created by putting values, separated by commas, inside square brackets:

In [1]:
odds = [1, 3, 5, 7]
print 'odds are:', odds

odds are: [1, 3, 5, 7]


Because they are ordered, we can select individual elements from lists by indexing:

In [3]:
print 'first and last:', odds[0], odds[-1]

first and last: 1 7


## Negative indices?! {.callout}

One wonderful feature of sequences in Python is that they allow for indexing with respect to the end of the sequence by using negative values. An index of -1 refer to the last item in a list, -2 is the second to last, and so on. Try it out!

In [16]:
text = "The quick brown fox jumped over the lazy dog."

print text[-1]
print text[-4:-1]
print text[-4:]
print text[32:-1]

.
dog
dog.
the lazy dog


We can assign new values to individual elements in a list:

In [4]:
odds[-1] = 9
print 'odds are now:', odds

odds are now: [1, 3, 5, 9]


There are many ways to change the contents of lists besides assigning new values to individual elements:

In [11]:
odds.append(11)
print 'odds after adding a value:', odds

odds after adding a value: [1, 3, 5, 9, 11]


In [12]:
del odds[0]
print 'odds after removing the first element:', odds

odds after removing the first element: [3, 5, 9, 11]


In [13]:
odds.reverse()
print 'odds after reversing:', odds

odds after reversing: [11, 9, 5, 3]


There is one important difference between lists and strings: we can change the values in a list, but we cannot change the characters in a string. For example:

In [14]:
names = ['Newton', 'Darwing', 'Turing'] # typo in Darwin's name
print 'names is originally:', names
names[1] = 'Darwin' # correct the name
print 'final value of names:', names

names is originally: ['Newton', 'Darwing', 'Turing']
final value of names: ['Newton', 'Darwin', 'Turing']


In [15]:
name = 'Bell'
name[0] = 'b'

TypeError: 'str' object does not support item assignment

## Ch-Ch-Ch-Changes {.callout}

Data which can be modified in place is called mutable, while data which cannot be modified is called immutable. Strings and numbers are immutable. This does not mean that variables with string or number values are constants, but when we want to change the value of a string or number variable, we can only replace the old value with a completely new value.

Lists and Numpy arrays, on the other hand, are mutable: we can modify them after they have been created. We can change individual elements, append new elements, or reorder the whole list. For some operations, like sorting, we can choose whether to use a function that modifies the data in place or a function that returns a modified copy and leaves the original unchanged.

Be careful when modifying data in place. If two variables refer to the same list, and you modify the list value, it will change for both variables! If you want variables with mutable values to be independent, you must make a copy of the value when you assign it.

Because of pitfalls like this, code which modifies data in place can be more difficult to understand. However, it is often far more efficient to modify a large data structure in place than to create a modified copy for every small change. You should consider both of these aspects when writing your code.

If we make a list and (attempt to) copy it then modify in place, we can cause all sorts of trouble:

In [17]:
odds = [1, 3, 5, 7]
primes = odds
primes += [2]
print 'primes:', primes
print 'odds:', odds

primes: [1, 3, 5, 7, 2]
odds: [1, 3, 5, 7, 2]


This is because python stores a list in memory, and then can use multiple names to refer to the same list. If all we want to do is copy a (simple) list, we can use the list() command, so we do not modify a list we did not mean to:

In [18]:
odds = [1, 3, 5, 7]
primes = list(odds)
primes += [2]
print 'primes:', primes
print 'odds:', odds

primes: [1, 3, 5, 7, 2]
odds: [1, 3, 5, 7]


# Repeating Actions with Loops

Deep down, most people want to learn how to program so they can automate things. To do that, we’ll have to teach the computer how to repeat actions.

An example task that we might want to repeat is printing each item in a list. One way to do this would be to use a series of print statements:

In [19]:
odds = [1, 3, 5, 7]
print odds[0]
print odds[1]
print odds[2]
print odds[3]

1
3
5
7


This is a bad approach for two reasons:

It doesn’t scale: if we want to print the characters in a string or items in a list that’s hundreds of values long, it would take a very, very long time.

It’s fragile: if we give it a longer list, it only prints part of the data, and if we give it a shorter one, it produces an error because we’re asking for characters that don’t exist.

In [20]:
odds = [1, 3, 5]
print odds[0]
print odds[1]
print odds[2]
print odds[3]

1
3
5


IndexError: list index out of range

A better approach is to automate it with a **for loop**:

In [21]:
word = 'lead'
for char in word:
    print char

l
e
a
d


This is shorter than writing individual print statements for each character, and more robust as well:

In [22]:
word = 'oxygen'
for char in word:
    print char

o
x
y
g
e
n


The improved version uses a for loop to repeat an operation—in this case, printing—once for each thing in a collection. The general form of a loop is:

In [23]:
for variable in collection:
    do things with variable

SyntaxError: invalid syntax (<ipython-input-23-07fac83181f1>, line 2)

We can call the loop variable anything we like, but there must be a colon at the end of the line starting the loop, and we must indent anything we want to run inside the loop. Unlike many other languages, there is no command to end a loop (e.g. end for); what is indented after the for statement belongs to the loop.

Here’s another loop that repeatedly updates a variable:

In [24]:
length = 0
for vowel in 'aeiou':
    length = length + 1
print 'There are', length, 'vowels'

There are 5 vowels


It’s worth tracing the execution of this little program step by step. Since there are five characters in 'aeiou', the statement on line 3 will be executed five times. The first time around, length is zero (the value assigned to it on line 1) and vowel is 'a'. The statement adds 1 to the old value of length, producing 1, and updates length to refer to that new value. The next time around, vowel is 'e' and length is 1, so length is updated to be 2. After three more updates, length is 5; since there is nothing left in 'aeiou' for Python to process, the loop finishes and the print statement on line 4 tells us our final answer.

Note that a loop variable is just a variable that’s being used to record progress in a loop. It still exists after the loop is over, and we can re-use variables previously defined as loop variables as well:

In [25]:
letter = 'z'
for letter in 'abc':
    print letter
print 'after the loop, letter is', letter

a
b
c
after the loop, letter is c


## Built-in functions {.callout}

Finding the length of a string is such a common operation that Python actually has a built-in function to do it called `len`:

In [26]:
print len('aeiou')

5


`len` is much faster than any function we could write ourselves, and much easier to read than a two-line loop; it will also give us the length of many other things that we haven’t met yet, so we should always use it when we can.

# ADD CHALLENGES

# Making Choices

When analyzing data, we'll often want to automatically recognize differences between values and take different actions on the data depending on some conditions. Here, we'll learn how to write code that runs only when certain conditions are true.

## Conditionals

We can ask Python to take different actions, depending on a condition, with an **if statement**:

In [35]:
num = 42

if num > 100:
    print 'greater'
else:
    print 'not greater'
    
print 'done'

not greater
done


The second line of this code uses the keyword if to tell Python that we want to make a choice. If the test that follows the if statement is true, the body of the if (i.e., the lines indented underneath it) are executed. If the test is false, the body of the else is executed instead. Only one or the other is ever executed.

Conditional statements don’t have to include an else. If there isn’t one, Python simply does nothing if the test is false:

In [36]:
num = 42
print 'before conditional...'
if num > 100:
    print num, 'is greater than 100'
print '...after conditional'

before conditional...
...after conditional


We can also chain several tests together using `elif`, which is short for “else if”. The following Python code uses `elif` to print the sign of a number:

In [37]:
num = -3

if num > 0:
    print num, "is positive"
elif num == 0:
    print num, "is zero"
else:
    print num, "is negative"

-3 is negative


One important thing to notice in the code above is that we use a double equals sign `==` to test for equality rather than a single equals sign because the latter is used to mean assignment.

We can also combine tests using `and` and `or`. `and` is only true if both parts are true:

In [43]:
if (1 > 0) and (-1 > 0):
    print 'both tests are true'
else:
    print 'at least one test is false'

at least one test is false


while `or` is true if at least one part is true:

In [44]:
if (1 > 0) or (-1 > 0):
    print 'at least one test is true'
else:
    print 'neither test is true'

at least one test is true


# Add challenges