# 01_02: Lists, tuples, and the slicing syntax

In [140]:
import math
import collections
import dataclasses
import datetime

import numpy as np
import pandas as pd
import matplotlib.pyplot as pp  

OK, that was a good warmup about loops. We now move on to review lists and tuples, which are perhaps the quintessential Python data structure within the core language.

In addition to being incredibly useful on their own, lists and tuples are foundational for data science because they set the standard interface for accessing elements and ranges of elements by their ordinal index. Python calls that _slicing_. The same slicing interface is used in `numpy`, the most important Python library to manipulate large amounts of numerical data.

So, about lists: as you now they provide a way to store an arbitrary number of Python objects, such as strings, floating point numbers, another list, or any other object, and to access them using a numerical index.

Lists in Python are denoted by brackets, and their elements are separated by commas. For instance, the trio of nephews from the Disney universe:

In [141]:
nephews = ['Huey', 'Dewey', 'Louie']

In [142]:
nephews

['Huey', 'Dewey', 'Louie']

The length of a list is obtained with `len`.

In [143]:
len(nephews)

3

The empty list is written with an empty set of brackets, and obviously has length 0.

In [144]:
len([])

0

Individual list elements can be accessed by index, starting with zero for the first element, and ending at the length of the list minus one. [slide 1]

(_This convention of starting from zero comes from C, the language that inspired Python, and that was used to write the standard Python interpreter, known as CPython for this reason._)

For instance, the first nephew is Huey.

In [145]:
nephews[0]

'Huey'

So the last nephew would be at index 2. If we look beyond the end of list, we get an error.

In [146]:
nephews[2]

'Louie'

In [148]:
nephews[3]

IndexError: list index out of range

We can also index from the end, starting at -1 and going back. [slide 2]

In [149]:
nephews[-1]

'Louie'

In [150]:
nephews[-2]

'Dewey'

This bracket indexing notation can also be used to reassign elements. Let's do it for all these ducks with a simple loop.

In [151]:
for i in range(3):
    nephews[i] = nephews[i] + ' Duck'

In [152]:
nephews

['Huey Duck', 'Dewey Duck', 'Louie Duck']

An important point is that lists do not need to have homogeneous content, such as all strings, or all numbers. We can mix it up. We can have a list consisting of a number, another list of numbers, and a string

In [153]:
mix_it_up = [1, [2, 3], 'alpha']

In [154]:
mix_it_up

[1, [2, 3], 'alpha']

We can verify that an element exists in a list using the `in` operator:

In [155]:
'Huey' in nephews

False

In [156]:
'Huey Duck' in nephews

True

To add a single element at the end of a list, we use `append`. You see that here we are using Python in an object-oriented way, by accessing a _method_ (specifically `append`) of the list object. It sounds sophisticated, but it's actually very natural.

In [157]:
nephews.append('April Duck')

In [158]:
nephews

['Huey Duck', 'Dewey Duck', 'Louie Duck', 'April Duck']

To add multiple elements in one go, we can use `extend`.

In [159]:
nephews.extend(['May Duck', 'June Duck'])

In [160]:
nephews

['Huey Duck',
 'Dewey Duck',
 'Louie Duck',
 'April Duck',
 'May Duck',
 'June Duck']

To concatenate two lists, we use a plus---that's an example of _operator overloading_ in Python, where plus does different things for numbers and for lists. 

In [161]:
ducks = nephews + ['Donald Duck', 'Daisy Duck']

In [162]:
ducks

['Huey Duck',
 'Dewey Duck',
 'Louie Duck',
 'April Duck',
 'May Duck',
 'June Duck',
 'Donald Duck',
 'Daisy Duck']

Last, we can insert elements at any position in a list using the insert method.

In [163]:
ducks.insert(0, 'Scrooge McDuck')

In [164]:
ducks

['Scrooge McDuck',
 'Huey Duck',
 'Dewey Duck',
 'Louie Duck',
 'April Duck',
 'May Duck',
 'June Duck',
 'Donald Duck',
 'Daisy Duck']

We have seen how to build up lists, now let's break them down. We can delete elements either by their index, with `del`, or by their value, with `remove`.

In [165]:
del ducks[0]

In [166]:
ducks

['Huey Duck',
 'Dewey Duck',
 'Louie Duck',
 'April Duck',
 'May Duck',
 'June Duck',
 'Donald Duck',
 'Daisy Duck']

In [167]:
ducks.remove('Donald Duck')

In [168]:
ducks

['Huey Duck',
 'Dewey Duck',
 'Louie Duck',
 'April Duck',
 'May Duck',
 'June Duck',
 'Daisy Duck']

We may want our lists _sorted_. We can do this _in place_ and modify an existing list, with `sort`:

In [169]:
ducks.sort()

In [170]:
ducks

['April Duck',
 'Daisy Duck',
 'Dewey Duck',
 'Huey Duck',
 'June Duck',
 'Louie Duck',
 'May Duck']

Or we can make a _new sorted list_ out of an existing one, with `sorted`:

In [171]:
reverse_ducks = sorted(ducks, reverse=True)

In [172]:
reverse_ducks

['May Duck',
 'Louie Duck',
 'June Duck',
 'Huey Duck',
 'Dewey Duck',
 'Daisy Duck',
 'April Duck']

...which demonstrates also how to sort backwards.

All of this should be very basic to you if you've worked with Python in the past.

You would also know that it's very easy to _loop_ over a list. We don't even need the indices for that.

In [197]:
for duck in ducks:
    print(duck, "quacks!")

April Duck quacks!
Daisy Duck quacks!
Dewey Duck quacks!
Huey Duck quacks!
June Duck quacks!
Louie Duck quacks!
May Duck quacks!


Moving on to _slices_: beyond working with individual list elements, we can manipulate them in groups. We'll take a numerical example, the first few squares of the natural numbers.

In [2]:
squares = [1, 4, 9, 16, 25, 36, 49]

The convention for a slice is the same as for Python loops: the first index (0) is included, the last is not.

It pays to imagine that the indices sit on the edges of the elements. [slide 3]

So for instance, if want the first two squares, we'd write a slice that goes from zero to two.

In [174]:
squares[0:2]

[1, 4]

There are a few more tricks and shortcuts that we can use in slicing.

* We can omit the starting index to start at the beginning.
* We can omit the ending index to include elements through the end.
* We can omit both to get a copy of the entire list.
* We can move through the indices in steps.
* We can also use negative indices to count from the end.

In [175]:
squares[:4]

[1, 4, 9, 16]

In [176]:
squares[3:]

[16, 25, 36, 49]

In [177]:
squares[:]

[1, 4, 9, 16, 25, 36, 49]

In [178]:
squares[0:7:2]

[1, 9, 25, 49]

In [179]:
squares[-3:-1]

[25, 36]

In [3]:
# reverse the list!
squares[::-1]

[49, 36, 25, 16, 9, 4, 1]

We can also use slices to _reassign_ a subset of items, or to delete them.

In [180]:
squares[2:4] = ['four', 'nine']

In [181]:
squares

[1, 4, 'four', 'nine', 25, 36, 49]

In [182]:
del squares[4:6]

In [183]:
squares

[1, 4, 'four', 'nine', 49]

I'm afraid that at this point I've ruined my list of squares!

(_When we introduce NumPy arrays in chapter 4, we will see that the basic slicing syntax carries over, so it's good to understand it fully on lists first. Indeed the syntax is extended even further in NumPy._)

Now for _tuples_, which _look_ like lists, but with parentheses instead of brackets. 

In [184]:
integers = ('one', 'two', 'three', 'four')

In [185]:
integers

('one', 'two', 'three', 'four')

tuples are somewhat described as _immutable_ versions of lists: once a tuple is defined, we cannot modify its elements or add new ones. This is a feature, not a bug: it ensures data integrity, and makes it possible to use tuples as keys in dictionaries or indexes.

Nevertheless, we can perform the same indexing and slicing tricks as for lists.

In [186]:
integers[-1], integers[1:3]

('four', ('two', 'three'))

Just not assignment!

In [187]:
integers[0] = 1

TypeError: 'tuple' object does not support item assignment

One context in which one sees tuples often in Python is _tuple unpacking_, where Python statements or expressions are automatically evaluated in parallel over a tuple. For instance, to we assign multiple variables at once:

In [188]:
(a, b) = (1, 2)

The parentheses can even be omitted when there's no room for ambiguity.

In [189]:
c, d = 3, 4

Tuples appear also when we iterate over multiple variables at once. For example, using the `enumerate` iterator, which lets us loop over list index and list elements together:

In [190]:
for i, duck in enumerate(ducks):
    print(i, duck)

0 April Duck
1 Daisy Duck
2 Dewey Duck
3 Huey Duck
4 June Duck
5 Louie Duck
6 May Duck


One final useful trick is _unpacking_ a tuple to pass it to a function that requires multiple arguments.

In [191]:
def print_three_args(a, b, c):
    print(a, b, c)

In [192]:
my_args = (1,2,3)

In [193]:
print_three_args(*my_args)

1 2 3


Conversely, the star operator can also be used to define functions with a variable number of arguments, which are collected into a tuple.

In [194]:
def any_args(*args):
    print(args)

In [195]:
any_args(1,2,3)

(1, 2, 3)
