# Data Structures

Information lives in data structures. You can view thse data structures as containers in which we store information for later use or processing. The selection of the right data structure depends on the structure, format, and level of complexity of the information that we want to store. Many times the same information can be stored in more than one data structure. The choice of the right data structure will depend on the application and, to a great extent, on your experience and coding preferences.


## Lists

Lists are great for storing short or long sequences of elements. One of the most distinct features is that we access information stored in lists by using either a positional index (retrieve single item) or slicing (retrieve one or more items). Lists are the typical arrays in other languages. I think it is worth mentioning that the standard Python lists cannot handle element-wise operations. For this we need use Numpy arrays, which we will cover later.

List can have mixed data types and are `mutable`, which means that we can overwrite the information. This `mutable` capability can be good and bad. If we want to prevent accidentally changing the information of the list, then it could lead to some serious issues. On the other hand, if we want to re-define or update a variable then this feature is convenient.

In [7]:
# Simple list
list_primes = [2, 3, 5, 7, 11, 13, 17]  # A row vector
print(list_primes)
type(list_primes)

[2, 3, 5, 7, 11, 13, 17]


list

In [9]:
# Re-define list (they are mutable)
list_primes = [31, 37, 41, 43]
print(list_primes)

[31, 37, 41, 43]


In [10]:
# Determine the number of elements in a list
len(list_primes)

4

In [11]:
mixed_list = [1, 2, 3, 4, 'five', 'six']
print(mixed_list)

[1, 2, 3, 4, 'five', 'six']


In [18]:
# Lists can be nested
nested_list = [1980, 1990, 2000, 2010,['sand','silt','clay']]

print(nested_list[0])         # Access first element of list in the first position
print(nested_list[4])         # Access list in second position
print(nested_list[4][1])      # Access 'silt'
print(nested_list[4][2][0])   # Access the 'c' in 'clay'

1980
['sand', 'silt', 'clay']
silt
c


In [22]:
# Extract and save a nested list in a different variable
particles = nested_list[4]
print(particles)    # Copy information and store in a variable with a different name
print(nested_list)  # This list remains untouched

['sand', 'silt', 'clay']
[1980, 1990, 2000, 2010, ['sand', 'silt', 'clay']]


In [24]:
# Count method
nested_list.count(2000) # Count specific element. Takes only one argument


1

In [32]:
# Index of specific entry
nested_list.index(2010) 

3

**Will the following command work?**

```python
nested_list.index('sand')
```

**Solution**
```python
nested_list[4].index('sand')
```


In [29]:
nested_list.append([2020,2030])
print(nested_list)

[1980, 1990, 2000, 2010, ['sand', 'silt', 'clay'], [2020, 2030]]


In [33]:
# Clear list and check that list was cleared (it should print empty brackets)
nested_list.clear()
print(nested_list)

[]


In [2]:
# Here is a more challenging example:
new_list = [ [1,2,3] , [4,5,6, ['a','b','c']] ]
new_list[1][3][0]

'a'

**What letter will the following Python command return?**
```python
new_list[1][3][0]
```

**Solution**
```python
'a'
```

### Creating a 2D matrix


In [17]:
M = [[1, 4, 5],
    [-5, 8, 9]]
print(M)

[[1, 4, 5], [-5, 8, 9]]


### Append single element to a list

In [15]:
# Append a single element
soil_orders = ['Gelisols','Histosols','Spodosols']
soil_orders.append('Andisols') # Add new item to the existing list
print(soil_orders)


['Gelisols', 'Histosols', 'Spodosols', 'Andisols']


### Append multiple elements to a list

In [35]:
# Append multiple elements
soil_orders = ['Gelisols','Histosols','Spodosols']
extra_soil_orders = ['Oxisols','Aridisols','Vertisols','Ultisols']
soil_orders.append(extra_soil_orders)
print(soil_orders)


['Gelisols', 'Histosols', 'Spodosols', ['Oxisols', 'Aridisols', 'Vertisols', 'Ultisols']]


>Note the difference in the output list between `append()` in the previous code cell and `extend()` in the following code cell. In the previous cell the appended list appears nested, while the `extend()` method truly merges both lists to make a single, larger list.

### Merge lists

In [12]:
# Extend list
soil_orders = ['Gelisols','Histosols','Spodosols']
extra_soil_orders = ['Oxisols','Aridisols','Vertisols']

soil_orders.extend(extra_soil_orders)
print(soil_orders)


['Gelisols', 'Histosols', 'Spodosols', 'Oxisols', 'Aridisols', 'Vertisols']


### Delete element from list

In [13]:
# Eliminate last element
print(soil_orders) # Print original list
soil_orders.pop(1) # Eliminate the second element of the list and print remaining elements
print(soil_orders)


['Gelisols', 'Histosols', 'Spodosols', 'Oxisols', 'Aridisols', 'Vertisols']
['Gelisols', 'Spodosols', 'Oxisols', 'Aridisols', 'Vertisols']


In [14]:
# An alternative method to delete one or more elements of the list.
del soil_orders[1:3]
print(soil_orders)

['Gelisols', 'Aridisols', 'Vertisols']


## Tuples

Tuples are another data strcuture in python. Tuples are convenient when storing data defined by one or more values. For instance, a point in a two-dimensional plane is defined by its `x` and `y` coordinates. So a tuple `point(x,y)` would make sense. Both pieces of infomration are critical to define a point. Then many points could be stored within a list, such as: `many_points = [(x1,y1),(x2,y2),(x3,y3),(x4,y4)]`. Another example that requires three pieces of inofmation is color in the RGB space. Each band of the RGB space could be store in a tuple to store color information `(r,g,b)`. Some plotting libraries, like HoloViews, use tuples to pass pairs of `x` and `y` varaibles and plot options.

Tuples have an important property: they are `immutable`. Let's define a tuple and try to change its content to see what happens.


In [27]:
# Tuples (immutable)
point = (2.5,4.7) # (x,y) coordinates
type(point)


tuple

In [None]:
point[0] = 1.3; # This will throw an error. We can't change the value of a tuple (immutable).

Similar to cartesion coordinates, tuples would be great for storing geographic information such as latitude, longitude, and altitude.

In [33]:
# Geographic coordinates 
mauna_loa = (19.536111, -155.576111, 3397) # Mauna Load Observatory in Hawaii
konza_prairie = (39.106704, -96.608968, 320) # Konza Prairie in Kansas

coords = [mauna_loa,konza_prairie]
coords

[(19.536111, -155.576111, 3397), (39.106704, -96.608968, 320)]

In [29]:
# A list of tuples
colors = [(0,0,0), (255,255,255), (0,255,0)] # Each tuple refers to RGB colors: black, white, and green.
colors

[(0, 0, 0), (255, 255, 255), (0, 255, 0)]

By storing data in tuples within a list is much easier to access data all at once. For instance, to retrieve the color of the third element

In [30]:
colors[2]

(0, 255, 0)

## Dictionaries

Dictionaries are an extremely versatile and one of the most popular python data structures. One of the most powerful features is the ability to store and retrieve information using names. When we use lists we need to know the content in each postion of the list to retrieve the right piece of information. If we don't then we need to do some sort of matching process. With `dictionaries` this type of operations are easier and much more intuitive. Often times our datasets contain names of cities, laboratories, treatments, etc. that we can easily remember and use to access data.

```{note}
Dictionaries are defined using `{}` and data inside dictionaries is associated to names using `name:value` pairs.
```

For example, the dictionary below stores information about different soil hydraulic properties and fitting parameters for pedotrasnfer functions for different soil textural classes obtained from the ROSETTA model (Schaap et al., 2001). 

Each list contains values for:

- Residual water content ($cm^3/cm^3$)
- Saturated water content ($cm^3/cm^3$)
- Fitting parameter alpha (1/cm)
- Fitting parameter n (unitless)
- Saturated hydraulic conductivity (cm/hr)


Schaap, M.G., Leij, F.J. and Van Genuchten, M.T., 2001. Rosetta: A computer program for estimating soil hydraulic parameters with hierarchical pedotransfer functions. Journal of hydrology, 251(3-4), pp.163-176.

In [21]:
parameters = {'clay' : [0.098,0.459,0.015,1.25,0.615],
              'clay loam' : [0.079,0.442,0.0158,1.42,0.341],
              'loam' : [0.061,0.399,0.0111,1.47,0.502],
              'loamy sand' : [0.049,0.390, 0.0348, 1.75,4.383],
              'sand' : [0.053,0.375,0.0352,3.18,26.779],
              'sandy clay' : [0.117,0.385,0.0334,1.21,0.473],
              'sandy clay loam' : [0.063,0.384,0.0211,1.33,0.549],
              'sandy loam' : [0.039,0.387,0.0267,1.45,1.595],
              'silt' : [0.05,0.489,0.0066,1.68,1.823],
              'silty clay' : [0.111,0.481,0.0162,1.32,0.401],
              'silty clay loam' : [0.090,0.482,0.0084,1.52,0.463],
              'silty loam' : [0.065,0.439,0.0051,1.66,0.760]
            }

In [22]:
# Obtain all parameters for a single textural class
parameters['silty loam']

[0.065, 0.439, 0.0051, 1.66, 0.76]

In [23]:
# Obtain specific parameter for a single textural class
parameters['silty loam'][1] # Saturated water content

0.439

Sometimes we don't know all the keys in a dictionary, so the `keys()` method can be used to return a list of all the keys. The returned list can be stored and used to access a specific property by iterating over each dictionary key.

In [24]:
# Return all keys
parameters.keys() # alternatively K = parameters.keys(); print(K)

dict_keys(['clay', 'clay loam', 'loam', 'loamy sand', 'sand', 'sandy clay', 'sandy clay loam', 'sandy loam', 'silt', 'silty clay', 'silty clay loam', 'silty loam'])

The order of the items in the returned list follows a specific order. Which means that we need to know the order of the list items in advance to access the right property. In this particular case, the dictionary is useful to access hydraulic properties by textural class, but it does not explicity exposes the different properties. A more refined version of the dictionary could be:

In [25]:
parameters = {'clay' : {'theta_res':0.098, 'theta_sat':0.459, 'alpha':0.015, 'n':1.25, 'Ksat':0.615},
              'clay loam' : {'theta_res':0.079, 'theta_sat':0.442, 'alpha':0.0158, 'n':1.42, 'Ksat':0.341},
              'loam' : {'theta_res':0.061, 'theta_sat':0.399, 'alpha':0.0111, 'n':1.47, 'Ksat':0.502},
              'loamy sand' : {'theta_res':0.049, 'theta_sat':0.390, 'alpha':0.0348, 'n':1.75, 'Ksat':4.383},
              'sand' : {'theta_res':0.053, 'theta_sat':0.375, 'alpha':0.0352, 'n':3.18, 'Ksat':26.779},
              'sandy clay' : {'theta_res':0.117, 'theta_sat':0.385, 'alpha':0.0334, 'n':1.21, 'Ksat':0.473},
              'sandy clay loam' : {'theta_res':0.063, 'theta_sat':0.384, 'alpha':0.0211, 'n':1.33, 'Ksat':0.549},
              'sandy loam' : {'theta_res':0.039, 'theta_sat':0.387, 'alpha':0.0267, 'n':1.45, 'Ksat':1.595},
              'silt' : {'theta_res':0.05, 'theta_sat':0.489, 'alpha':0.0066, 'n':1.68, 'Ksat':1.823},
              'silty clay' : {'theta_res':0.111, 'theta_sat':0.481, 'alpha':0.0162, 'n':1.32, 'Ksat':0.401},
              'silty clay loam' : {'theta_res':0.090, 'theta_sat':0.482, 'alpha':0.0084, 'n':1.52, 'Ksat':0.463},
              'silty loam' : {'theta_res':0.065, 'theta_sat':0.439, 'alpha':0.0051, 'n':1.66, 'Ksat':0.760}
            }

In [26]:
parameters['silty loam']['theta_sat']

0.439

The new dictionary is more verbose, but it is harder to mistake one physical property for another. Depending on the application and the your preferences as a programmer you can opt for different ways of storing data.

Much of the data in the internet is stored and transferred using structures similar to dictionaries called JSON (Javascript Object Notation). These JSON structures are very much like Python dictionaries. Below is an example JSON retrieved from [OpenWeatherMap.org](https://openweathermap.org/) documentation page for the city of London.


In [22]:
# Here is a more concrete example using actual weather data obtained from OpenWeatherMap:

data = {"coord":{"lon":-0.13,"lat":51.51},
        "weather":[{"description":"light intensity drizzle"}],
        "base":"stations",
        "main":{"temp":280.32,"pressure":1012,"humidity":81},
        "visibility":10000,
        "wind":{"speed":4.1,"deg":80},
        "clouds":{"all":90},
        "dt":1485789600,
        "sys":{"country":"GB","sunrise":1485762037},
        "id":2643743,
        "name":"London",
        "cod":200}

print(data['coord']['lon'])
print(data['wind']['speed']) # Using apostrophes


-0.13
4.1


In [43]:
# Get all the keys in a list
print([*data])

['coord', 'weather', 'base', 'main', 'visibility', 'wind', 'clouds', 'dt', 'sys', 'id', 'name', 'cod']


## Sets

In [44]:
# Sets (set of unique values, test membership)

states = ['Kansas', 'Texas', 'California', 'Texas', 'Alaska', 'Kansas']
uniqueStates = set(states) # 
print(uniqueStates)

print('Kansas' in uniqueStates) # Testing membership (True)
print('Iowa' in uniqueStates)   # Testing membership (False)

{'California', 'Alaska', 'Kansas', 'Texas'}
True
False


In [54]:
dna1 = set('ATTTGAATTA') # DNA sequence 1
dna2 = set('GGATTCGCGT') # DNA sequence 2

# Print unique bases in each DNA sequence
print(dna1)
print(dna2)

{'A', 'G', 'T'}
{'A', 'G', 'C', 'T'}


## Practice

- Create a list with the scientific names of three common grasses in the US Great Plains: big bluestem, switchgrass, indian grass, and little bluestem.

- Using a periodic table, store in a dictionary the name, symbol, atomic mass, melting point, and boiling point of oxygen, nitrogen, phosphorus, potassium, and hydrogen. Then, write two separate python statements to retrieve the boiling point of oxygen and hydrogen. Combined, these two atoms can form water, which has a boiling point of 100 degrees Celsius. How does this value compare to the boiling point of the individual elements?

- Without editing the dictionary that you created earlier, append the properties for a new element: carbon.

- Create a list of tuples encoding the latitude, longitude, and altitude of five national parks of your choice.

- Create a short list of tuples for the two DNA base pairs.