# Data Structures

Information lives in data structures. You can view these data structures as containers in which we store information for later use or processing. To a great extent, the selection of the right data structure depends on the scalability of the application, the format and complexity of the data, and the preference of the programmer. Many times the same information can be stored in more than one data structure.


## Lists

Lists are great for storing sequences of elements. Lists can store strings, numbers, or mixed data types. Lists can be nested and are `mutable` objects, which means that we can overwrite the information inside. A distinct feature of lists is that we can access information by using either a positional index (retrieve single item) or slicing (retrieve one or more items). Lists are also known as arrays in other languages (e.g. Matlab). It is worth mentioning that the standard Python lists cannot handle element-wise operations. To leverage element-wise and matric operations we need use Numpy arrays, which we will cover later.

>Lists are represented by square brackets `[]`


In [47]:
# List examples
decades = [1980, 1990, 2000, 2010]
soil_orders = ['Gelisols','Histosols','Spodosols']
mixed_list = [1, 2, 3, 'four', 'five', 'six']
nested = [[1,2,3],[4,5,6],[7,8,9]]


In [43]:
# Append elements to a list
soil_orders.append('Andisols')
print(soil_orders)


['Gelisols', 'Histosols', 'Spodosols', 'Andisols']


In [44]:
# Append multiple elements
extra_soil_orders = ['Oxisols','Aridisols','Vertisols','Ultisols']
soil_orders.extend(extra_soil_orders)
print(soil_orders)


['Gelisols', 'Histosols', 'Spodosols', 'Andisols', 'Oxisols', 'Aridisols', 'Vertisols', 'Ultisols']


>When appending multiple items using the `append()` method will results in nested lists, while using the `extend()` method will results in merged lists.

In [45]:
# Delete last element
soil_orders.pop(1) # Eliminate the second element of the list and print remaining elements
print(soil_orders)


['Gelisols', 'Spodosols', 'Andisols', 'Oxisols', 'Aridisols', 'Vertisols', 'Ultisols']


In [46]:
# An alternative method to delete one or more elements of the list.
del soil_orders[1:3]
print(soil_orders)
print(type(soil_orders))

['Gelisols', 'Oxisols', 'Aridisols', 'Vertisols', 'Ultisols']
<class 'list'>


## Tuples

Tuples are another convenient data structure in Python. For instance, a point in a two-dimensional plane is defined by its `x` and `y` coordinates. So a tuple `point(x,y)` would make sense and multiple points could be nested within a list. Both pieces of information are critical to define a point on the two-dimensional plane. Another example that requires three pieces of information is color in the RGB space. Each band of the RGB space could be store in a tuple to store `(r,g,b)` triplets. Tuples have an important property: they are `immutable`. 


In [27]:
# Geographic coordinates 
mauna_loa = (19.536111, -155.576111, 3397) # Mauna Load Observatory in Hawaii
konza_prairie = (39.106704, -96.608968, 320) # Konza Prairie in Kansas

coords = [mauna_loa,konza_prairie]
print(coords)

[(19.536111, -155.576111, 3397), (39.106704, -96.608968, 320)]


In [40]:
# A list of tuples
colors = [(0,0,0), (255,255,255), (0,255,0)] # Each tuple refers to black, white, and green.
print(colors)
print(type(colors[0]))

[(0, 0, 0), (255, 255, 255), (0, 255, 0)]
<class 'tuple'>


What happens if we want to change the first element of the third tuple from `0` to `255`?
```
colors[2][0] = 255
```
Remember that tuples are `immutable`, which is a good if we want to prevent accidentally changing the value of the color.

## Dictionaries

Dictionaries are an extremely versatile and one of the most popular python data structures. A powerful feature of dictionaries is the ability to store and retrieve information using names. Dictionaries are convenient when the data is not a square matrix or table and has multiple features that need to be stored together. A common example is weather data.

>Dictionaries are defined using `{}` and data inside dictionaries is associated to names using `name:value` pairs

In [39]:
D = {'city':'Manhattan',
     'state':'Kansas',
     'coords': (39.208722, -96.592248, 350),
     'data': [{'date' : '20220101', 
              'precipitation' : {'value':12.5, 'unit':'mm', 'instrument':'TE525'},
              'air_temperature' : {'value':5.6, 'units':'Celsius', 'instrument':'ATMOS14'}
              },
              {'date' : '20220102', 
              'precipitation' : {'value':0, 'unit':'mm', 'instrument':'TE525'},
              'air_temperature' : {'value':1.3, 'units':'Celsius', 'instrument':'ATMOS14'}
              }]
    }

print(D)
print(type(D))

{'city': 'Manhattan', 'state': 'Kansas', 'coords': (39.208722, -96.592248, 350), 'data': [{'date': '20220101', 'precipitation': {'value': 12.5, 'unit': 'mm', 'instrument': 'TE525'}, 'air_temperature': {'value': 5.6, 'units': 'Celsius', 'instrument': 'ATMOS14'}}, {'date': '20220102', 'precipitation': {'value': 0, 'unit': 'mm', 'instrument': 'TE525'}, 'air_temperature': {'value': 1.3, 'units': 'Celsius', 'instrument': 'ATMOS14'}}]}
<class 'dict'>


The example above has several interesting features:
- The city and state names are ordinary strings
- The geographic coordinates (latitude, longitude, and elevation) are grouped using a tuple.
- Weather data for each day is a list of dictionaries

>Note how dictionaries also allow us to add useful metadata such as variable units and instrument model.

It's also important that you realize that the organization of the dictionary above depends on programmer preferences. For instance, rather than grouping all three coordinates into a tuple, a different programmer may prefer to store the values under individual `name:value` pairs.

## Sets

Sets is the fourth Python data type and is used to store multiple items into a single variable. Unlike `lists`, Python `Sets` don't allow for duplicate items, items cannot be changed (although items can be added and removed), and items are not indexed.

>Sets are represented by curly braces `{}`. They are similar to lists in the sense that they constitute a list of elements and are similar to dictionaries in the sense that we represent them using curly braces. Note that sets do not contain `name:value` pairs. The Python `sets` data structure does not seem to be as widely used as dictionaries, lists, and tuples.

In [38]:
dna_1 = set('ATTTGAATTA') # DNA sequence 1
dna_2 = set('GGATTCGCGT') # DNA sequence 2

# Print unique bases in each DNA sequence
print(dna_1)
print(dna_2)


{'A', 'G', 'T'}
{'A', 'G', 'C', 'T'}


In [37]:
# Check data type
print(type(dna_1))

<class 'set'>


## Practice

1. Create a list with the scientific names of three common grasses in the US Great Plains: big bluestem, switchgrass, indian grass, and little bluestem.

2. Using a periodic table, store in a dictionary the name, symbol, atomic mass, melting point, and boiling point of oxygen, nitrogen, phosphorus, and hydrogen. Then, write two separate python statements to retrieve the boiling point of oxygen and hydrogen. Combined, these two atoms can form water, which has a boiling point of 100 degrees Celsius. How does this value compare to the boiling point of the individual elements?

3. Without editing the dictionary that you created earlier, append the properties for a new element: carbon.

4. Create a list of tuples encoding the latitude, longitude, and altitude of three national parks of your choice.

5. Create a list of tuples containing the two matching DNA base pairs.