# Containers

Containers are a special class of variable used to store multiple values. When working with Python, you will spend much of your time working with **tuples**, **lists**, and **dicts**. These are all described, along with **sets**, in the following sections.

## Prelude: Python and indexes.

Before we dive in to the various containers, we should spend a minute on how to extract data from **sequence containers**, which keep values in a specific order; both tuples and lists full under this category.

### Basic indexing

Python uses **zero-based indexing**, meaning the first valid index in a container is 0, as opposed to 1. Accessing a value using an index involves declaring the container name immediately followed by the index number followed by square brackets ([ ]):


In [1]:
# first build a simple list
vals = [1,2,4,5]

# access the first value
print(vals[0])

# access the third value
print(vals[2])

# You can also have containers in containers
# Here's a list containing two tuples of three values
nested = [(4,3,2),(6,7,8)]

# print the third item in the second tuple
print(nested[1][2])



1
4
8


To determine how many items are in a container (and the range of acceptable indexes), the `len()` function can be used:

In [2]:
# let's see how many items are in the vals 
# variable from the previous code block:

print(len(vals))

# based on this, we know the last valid index is 3:
print(vals[3])

# An index greater than or equal to the length 
# of the container produces an IndexError:
print(vals[4])

4
5


IndexError: list index out of range

In many languages, this would be the end of the indexing discussion. However, there are two other index related topics that we should touch upon when discussing Python: **negative indexing** and **slicing**.

### Negative indexing

Python supports **negative indexes**. When negative indexes are used, the retrieved index is the length of the container minus the index:

In [3]:
# negative test:
negative = [1.0,1.1,1.2,1.3,1.4]

# lets get the length for reference
print(f'There are {len(negative)} items.')

# grab the last item (length - 1):
print(negative[-1])

# and the third from last (length - 3):
print(negative[-3])

There are 5 items.
1.4
1.2


Negative indexing is a powerful tool in situations where a value is required a relative distance from the end of the sequence container, regardless of its length.

### Slicing

The last part of the discussion on indexing involves **slicing**, or grabbing a range of values from a sequence container. The basic syntax for a slice operation is `vals[<start>:<end>]` where ***\<start\>*** is the first index to include, and ***\<end\>*** is one past the last index. Either value can be omitted; in this case ***\<start\>*** is assumed to be 0 and ***\<end\>*** is assumed to be the length of the container. Here are a few examples:

In [5]:
# quick build of tuple holding values 1-10
# range() function is covered in later notebooks.

nums = tuple(range(11))
print(nums)

# let's grab a slice of the second, third, and fourth values.
print(nums[1:4])

# retrieve the first five values, omitting the start index
print(nums[:5])

# retrieve the rest of the values by omitting the last index
print(nums[5:])

# we can perform a deep copy by omitting both indexes
# (don't worry about what a deep copy is for now)
print(nums[:])

# Negative indexes work too!
print(nums[-4:-2])

(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
(1, 2, 3)
(0, 1, 2, 3, 4)
(5, 6, 7, 8, 9, 10)
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
(7, 8)


There is a third optional argument that can be used when slicing: the step size. This argument follows a colon (':') after the ending index, and describes how many indexes will be treversed before the next value is retrived:

In [6]:
# Use step size to retrieve the even values only.
# The indexes are omitted since we are traversing the entire tuple.
print(nums[::2])

# Retrieve multiples of 3:
print(nums[3:-1:3])


(0, 2, 4, 6, 8, 10)
(3, 6, 9)


With indexing out of the way, lets start discussing the containers.

## Tuples

**Tuples** are the most common container type in Python; in most cases they are declared implicitly. Literal tuple declarations use parentheses surrounding a comma separated list of entries. Here are a few examples of tuple declaration:

```python
# Empty Tuple
empty = ()

# Declaring a tuple using a literal
explosions = ('Boom','Blam','Bam','Bang')

# the parentheses are optional; this is equivalent to the previous declaration:
explosions = 'Boom','Blam','Bam','Bang'

# and, we can always be explicit
explosions = tuple('Boom','Blam','Bam','Bang')

# tuples can store mixed types
stuff = ('Bear', 1, False, 3.6)

# functions with multiple return values actually return tuples
# functions will be discussed in a future notebook.
def two_vals():
    return 1,2

result = two_vals()

# tuples can hold other tuples:
tup2D = ((1,2),(3,4),(5,6))

#to retrieve the second value of the third tuple:
val = tup2D[2][1]
```

The most distinct characteristic is that a tuple and its contents are **immutable**; that is, the values cannot change once they are declared. Attempting to do so will produce a type error:

In [6]:
read_only = (2,3,5)

# we can freely read the values...
print('Mid-value: {}'.format(read_only[1]))

# but we cannot change them...
read_only[1]=7

Mid-value: 3


TypeError: 'tuple' object does not support item assignment

if you want an indexable container that you can change, then you need a list.


## Lists

A **list** is a sequential container, like a tuple, but is **mutable**, meaning its size and content can change after it has been declared. A list literal is denoted using square brackets ([ ]). Here are several ways to declare lists:

```python
# an empty list
noValues = []

# a list of numbers
someNums = [1,2,5,6]

# a list of lists:
nested = [['a','b'],['c','d','e']]

```

There are many ways a list can be modified; here are just a few examples:

In [7]:
# start with a list of ten integers

nums = list(range(10))
print("initial list: {}".format(nums))

# change the first value to -1
nums[0] = -1
print("next list: {}".format(nums))

# append a new value
nums.append(10)
print("next list: {}".format(nums))

# append several values: 
nums+=[11,12,13]
print("expanded: {}".format(nums))

# remove the second value:
nums.pop(1)
print("removed second: {}".format(nums))

# remove several values from the middle
nums[3:6]=[]
print("cut out: {}".format(nums))

# create a doubled-copy if the list
nums2 = nums*2
print("doubled copy: {}".format(nums2))

initial list: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
next list: [-1, 1, 2, 3, 4, 5, 6, 7, 8, 9]
next list: [-1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
expanded: [-1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
removed second: [-1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
cut out: [-1, 2, 3, 7, 8, 9, 10, 11, 12, 13]
doubled copy: [-1, 2, 3, 7, 8, 9, 10, 11, 12, 13, -1, 2, 3, 7, 8, 9, 10, 11, 12, 13]


While both tuples and lists allow for retrieving values through an integer based indexing, sometimes it useful to retrieve values using other forms of identification. This is where dictionaries become useful.

## Dictionaries (dicts)

A **dict** in python is a **dictionary**, an associative container where *values* are referred to by *keys*. A key can be any python value derived from a type that allows comparisons; this includes all base python data types. Most often strings are used as keys, allowing for the labeling of a value in a direct way, but you could just as easily use integers like you would in a list or tuple. Using integers as keys allows for the emulation of *sparse-arrays*, or index-based containers where there can be gaps between indices.

A dictionary literal is denoted by curly brackets ({}), and entries where key-value pairs are separated by a colon (:). Here are several examples of creating a dictionary:

```python

# empty dictionary
nothing = {}

#also an empty dictionary
more_nothing = dict()

# a dictionary literal
ball = {'color':'blue',
        'diameter':2.0,
        'material':'rubber'
       }
```

Dictionaries are mutable, so adding, modifying, and removing values is possible:

In [9]:
# start with a pet
my_pet = {'species':'dog',
         'age':5,
         'weight':22.5,
         'is_male': True
         }

# retrieving a value is as easy as passing the key to the index operator:
kind = my_pet['species']
print('My pet is a '+kind)

# changing a value is similar
my_pet['species'] = 'cat'
print('My pet is now a '+my_pet['species'])

# adding a new field is the same as assigning a value to an existing field
my_pet['color']='black'
print('My pet has a {} coat'.format(my_pet['color']))

# if you want to retrieve a value with a default if the applied key is absent, use get()
eyes = my_pet.get('eye_color','a secret')
print('The color of my pet\'s eyes is {}'.format(eyes))

# otherwise, asking for a value with a non-existant key will produce a keyError:
my_pet['bad_key']

My pet is a dog
My pet is now a cat
My pet has a black coat
The color of my pet's eyes is a secret


KeyError: 'bad_key'

Note that when printing a dictionary, the key-value pairs may not show up in the order that they are entered; this is because *dicts do not preserve order*. This is not an issue, however, since values are retrieved by value rather than by index.

There is much more that can be done with dicts, some of which will be discussed in future notebooks.

## Sets

Most Python tutorials will cover the three basic container types (tuple, list, and dict) during their introductory sections. However, for our purposes there is a forth type that can be very useful. A **set** is a sequential container that garuantees that each stored value is contained once and only once. Such a container can be useful when you have datasets with potentially unwanted duplicates. 

Like a dict, a set literal is denoted by curly brackets ({}); however, only values are entered:

```python

# since '{}' is an empty dict, we have to explicitly declare an empty set
nothing_set = set()

# a simple set:
unique_nums = {1,2,99,5}
```

A set is mutable, and will ignore duplicate values:

In [12]:
# initialize a set:
some_vehicles = {'car', 'car','car','bike','truck'}

# note that duplicates are ignored:
print("My vehicles: {}".format(some_vehicles))

# a unique value can be added:
some_vehicles.add('hoverboard')
print("My vehicles: {}".format(some_vehicles))

# but duplicate values will be ignored:
some_vehicles.add('truck')
print("My vehicles: {}".format(some_vehicles))

My vehicles: {'car', 'bike', 'truck'}
My vehicles: {'car', 'hoverboard', 'bike', 'truck'}
My vehicles: {'car', 'hoverboard', 'bike', 'truck'}


You may have noticed that when printing out `some_vehicles`, the contents were not necessarily in the order that they were declared. Like dicts, *sets do not preserve order*. Unlike dicts, however, there is no direct way to access individual values in sets. Instead, we have to use one of several methods to access the values indirectly. Two such ways are to first cast to a list, or to iterate through using a for-loop (more on for-loops in the next notebook):

In [13]:
# create the test set
some_values = {1,7,2,6,-1,44}

# create a list copy
list_values= list(some_values)
# grab an arbitrary value

print("Some list value: {}".format(list_values[3]))

# iterate through the set, print each value
print("Direct values:")
for val in some_values:
    print('  {}'.format(val))

Some list value: 7
Direct values:
  1
  2
  6
  7
  44
  -1


This is an incomplete discussion on Python's set container, as many of the capabilities and limitations of sets won't make much sense without more knowledge about Python in general. However, it is good to be aware that sets exist, especially when dealing with working with datasets that require no duplicate values.