# Week 03

## Objectives

By the end of this tutorial, you should be able to create, manipulate, and interrogate different types of data structures, including a **list**, a **dictionary**, and a **list of dictionaries**.

## Data structures in Python

Most programming languages implement a variety of **data structures**: objects that store and manipulate data. Each type of structure has different properties and abilities. In data analysis, we are especially interested in a structure's *level of complexity*, which allows you to associate multiple values together. In Python, for example, an `int` structure holds a single value, a `list` holds multiple values in a specific order, and a `dict` associates values with special *keys*.

## CRUD with a data object

When learning about any data object in any language, make sure you learn how to accomplish the following four things with it: **create**, **read**, **update**, and **delete** (**CRUD**) values. In other words, ask the following four questions:

- **Create.** How do I add new values to the object?
- **Read.** How do I find a specific value, or subset of values, in the object? And what kind of object will these values be stored in?
- **Update.** How do I change an existing value in the object?
- **Delete.** How do I remove an existing value? Does this do anything to remaining values in the object?

## Example: Python `list` object

As an example, let's illustrate these actions for Python's `list` object. We start by using “literal” notation to define the `list` while filling it with values. A list can contain any type of object: numbers, strings, even other lists.

In [257]:
atomic_symbols = ['H', 'He', 'Li', 'Be']
print atomic_symbols

['H', 'He', 'Li', 'Be']


You can **read** a value from a list using its numerical index, in brackets. 

In [258]:
print atomic_symbols[1]  # should be 'He'

He


In [259]:
# Use a loop to print an informational message about the periodic table
# (The %2s formats the string to be two spaces wide, no matter what)

for i in range(0,4):
    print "The index of %2s is %i" %(atomic_symbols[i], i)

The index of  H is 0
The index of He is 1
The index of Li is 2
The index of Be is 3


### *Exercise*
- Modify the code so it prints the *atomic number* of each element

You **create** new values in the list using `.append()`, which adds an item to the end of the list. 

In [260]:
# Append C, N to the end (intentionally skipping B)

atomic_symbols.append('C')
atomic_symbols.append('N')
print atomic_symbols

['H', 'He', 'Li', 'Be', 'C', 'N']


Don't want it at the end? Use `.insert()` to specify where the new element will go. 

In [261]:
# Insert B between Be and C. What does 4 refer to in line below?

atomic_symbols.insert(4,'B')
print atomic_symbols

['H', 'He', 'Li', 'Be', 'B', 'C', 'N']


You can **delete** a value using the `del` operator, which uses bracket notation to specify the index of the item to remove.

In [262]:
# removes N
del atomic_symbols[6]
print atomic_symbols

['H', 'He', 'Li', 'Be', 'B', 'C']


To **update** a specific value, use an assignment operator (`=` or `+=`) in combination with indexing notation. You can also update a subset using slice notation.

In [263]:
# Change Li to Lithium, then change it back

atomic_symbols[5] += 'arbon'
print atomic_symbols

['H', 'He', 'Li', 'Be', 'B', 'Carbon']


Besides the basic "CRUD" actions, lists have some special abilities, called **methods**. If you don't know the index of a value you can use methods that search for the value, such as `.remove()` and `.index()`.

In [264]:
# Remove 'Carbon'
atomic_symbols.remove('Carbon')
print atomic_symbols

# Find the index of 'Be'
print atomic_symbols.index('Be')

['H', 'He', 'Li', 'Be', 'B']
3


Another property of lists is the ability to **concatenate** two lists together. When working at the list level (not with individual elements of a list), the meaning of the `+` and `+=` operators changes to concatenation (instead of addition).

In [265]:
# Append an entire list to the end

more_symbols = atomic_symbols + ['C', 'N', 'O', 'F', 'Ne']
print more_symbols

['H', 'He', 'Li', 'Be', 'B', 'C', 'N', 'O', 'F', 'Ne']


### *Exercise* 
- Modify the code above to append the last three elements "in-place", i.e., directly to `atomic_symbols` without creating a new list.

Most multivalued data structures implement a **membership** test. A list does this by responding to the `in` operator, which tests whether a given value is in the list.

In [266]:
print 'Be' in atomic_symbols
print 'bunnies' in atomic_symbols

True
False


### *Exercise* 
- Implement the loop that prints atomic numbers again, except using the `in` operator on the list instead of a range of indices.

## Example: Python `dict` object

A dictionaries is similar to a list, but associates each value with a **key** instead of an index. In a data-management context, the key names an *attribute* of some observation or entity. A list can store different elements of the periodic table, but a dictionary would store the attributes of a single element, such as atomic symbol, atomic number, name, and atomic mass.

How do you implement CRUD operations for a dictionary?

In [267]:
# Use literal notation to start a dictionary for hydrogen
hydrogen = {
    'symbol' : 'H',
    'number' : 0
}

As with a list, a dictionary uses bracket notation to **read**, **update**, and **delete** a value using a key in place of an index.

In [268]:
print hydrogen['symbol']  # should be 'H'

H


In [269]:
hydrogen['number'] = 1  # correct atomic number
print hydrogen

{'symbol': 'H', 'number': 1}


In [270]:
del hydrogen['number']
print hydrogen

{'symbol': 'H'}


*Unlike* a list, however, you **create** a new key simply by using it in bracket notation, as if it already existed!

In [271]:
hydrogen['number'] = 1
hydrogen['name'] = "hydrogen"
hydrogen['mass'] = 1.008

print hydrogen

{'symbol': 'H', 'mass': 1.008, 'number': 1, 'name': 'hydrogen'}


## Interrogating a data structure

In data management, you will often *receive* a data structure and need to read or process it in some way. In this case, you need to **interrogate** the data structure to learn how it's organized. For example, a dictionary offers the `.keys()` method, which produces a list of keys that the dictionary currently recognizes. This list is useful for looping through all of a dictionary's values.

In [272]:
llaves = hydrogen.keys()  # llaves is a list
print llaves

['symbol', 'mass', 'number', 'name']


### *Exercise*

- The code below prints one attribute (the symbol) of hydrogen. Make a loop that prints all the attributes of hydrogen.

In [273]:
print "Hydrogen's %s is: %s" %('symbol', 'H')

Hydrogen's symbol is: H


## Nesting data structures

A classic way to present data is with a table, with rows and columns. Any database, spreadsheet, or analysis software uses this model for presenting and interacting with data. A table represents a *nested* data structure, where the rows represent a collection of entities, and each column represents an attribute or variable that these entities have in common. To translate this to Python, a row represents a dictionary, and a column represents a key that each dictionary should have. This means that the entire table represents a *list* of *dictionaries*.

*Yes! A dictionary can be an element of a list.*

In [274]:
# Make two new dictionaries

helium = {
    'symbol' : 'He',
    'number' : 2,
    'name' : "helium",
    'mass' : 4.00
}

lithium = {
    'symbol' : 'Li',
    'number' : 3,
    'name' : "lithium",
    'mass' : 6.94
}

In [275]:
# Now make a list of all three dictionaries
# Print the name of the 2nd element, just to prove it's there

elements = [hydrogen, helium, lithium]

print elements[1]['name']  # should be "helium"

helium


### *Exercise*

- The line below adds a new but empty dictionary to the list. Fill in the data for beryllium.

In [276]:
elements.append({})  # at index 3

### *Exercise*

- The line below prints the attributes of hydrogen as a row in a table. Write a loop that prints the first 4 rows.

In [277]:
print "%2i | %2s | %10s | %2.3f" %(hydrogen['number'],hydrogen['symbol'],hydrogen['name'],hydrogen['mass'])

 1 |  H |   hydrogen | 1.008


### *Exercise*

- Write a loop that creates a list of just the names of the elements