# Python Data Structures

A data structure is something that stores combinations of fundamental data types. For example, it could be a list of numbers or a dictionary of key-value pairs. Data structures are useful for storing and manipulating complex sets of data.

For example, if you wanted to store information about the location of an RTD train at a certain time, you might want to store:

- Train identifier
- Time and date
- longitude
- latitude

The RTD train data above includes different data types that represent different things. We'll return to this example later.

## Strings as Data Structures

We've already seen one Python data structure: **string**. A string can be thought of as an *array* of individual characters. The word array here means an ordered sequence of items or elements. Strings are arrays because they are composed of a list of characters in a particular order.

We'll start by playing with strings, before moving on to some other Python data structures: tuples, lists, sets and dictionaries.

In [None]:
# Create a string and store it in the variable 'example'
example = "Data is fun!"
# Print the length of the string
print(len(example))
# Print the first character of the string. Remember indexing starts at 0.
print(example[0])
# Print the 12th (so last) character
print(example[11])
# Print the 3rd and 4th characters
print(example[2:5])

## Slicing

Slicing refers to getting part of a data structure. In the last line of code above, we sliced the string to get the third and fourth characters. The general syntax is:

```
 my_str[start:end]
```
Here *start* is the index of the first element you want and *end* is one more than the index of the last element you want.

## Exercise

Create a string called s1 that is equal to "All the world's a stage."
Use indexing to print each of the following on a separate line.

1. The third character from the left
2. The fifth character from the right
3. The second to ninth characters from the left



In [None]:
# Type your Python code here

In [None]:
# @title Solution
s1 = "All the world's a stage."
print(s1[2])
print(s1[19])
print(s1[1:9])

## Negative Indexing
One of the great things about Python is the variety ways you can reference different parts of data structures. An example is negative indexing. Negative indexing allows you to reference parts of a string starting from the right end rather than the left end.

Suppose you have a string
```
st = "Hello"
```

- You can reference the last element with st[4] or with st[-1]
- You can reference the second to last element with st[3] or with st[-2]
- You can reference the first element with st[0] or st[-5]

So instead of starting at the left with index 0, you can start at the right with index -1. As we will see, this applies to data structures other than strings.

In [None]:
st = 'Hello'
# Each line below should print the same character twice
print(st[4], st[-1])
print(st[3], st[-2])
print(st[0], st[-5])


## Tuples

Now that we've taken a look at strings, we can look at some other data structures. Many of the principles we have already seen will apply.

A **tuple** is a fixed, ordered sequence of elements. Tuples are immutable (so are strings). This means once you have declared a tuple, you can't change it or add or remove elements. Tuples use normal parentheses. Elements are separated by commas.

Format:
```
(element 1, element 2, element 3, ...)
```

Tuples can contain any data type, including mixtures of data types and other tuples. Run the code below to see some examples.


In [None]:
# A tuple giving approximate latitude and longitude of Denver.
den = (39.7, -105)
# A tuple giving approximate lat/long of London, UK.
lon = (51.5, -0.14)
# Print the first element of the tuple den using indexing
print(den[0])
# Print the second element of the tuple lon using indexing
print(lon[1])
# If you add tuples, you get a new tuple with all the elements
print(den+lon)

## Exercise
1. Create a tuple *t* that contains the following elements in order:
    - The number 3.14
    - The boolean value True
    - The string "Wow!"
    - The tuple (1, 2)
2. Use the len function to find the length of the tuple.
3. Use indexing to print the third element of the tuple.
4. Challenge: Use indexing to print the second element of the last element of the tuple t.

In [None]:
# Type your code here

In [None]:
# @title Solution
t = (3.14, True, "Wow!", (1, 2))
print(len(t))
print(t[2])
print(t[3][1])

## Lists

A **list** is similar to a tuple. It is an ordered list of elements. You use square brackets \[ and \] to create a list. Like a tuple, lists can contain elements of any datatypes.

Format:
```
my_list = [element 1, element 2, element 3, ...]
```

Unlike tuples, lists are mutable, so you can change, add or remove elements. This makes then very flexible and powerful.

In [None]:
# The following list contains the test scores of 6 students
my_list = [6, 4, 7, 8, 10, 4]
# Print the second test score
print(my_list[1])
# Print the fourth, fifth and sixth scores in the list
print(my_list[3:6])
# Change the third test score to 5
my_list[2] = 5
print(my_list)
# Add a new test score of 9 to the end of the list
my_list.append(9)
print(my_list)
# Sort the list from smallest to largest
my_list.sort()
print(my_list)

In [None]:
# Create a new list called 'new_list' that is initially empty
new_list = []
print(new_list)
# Add the element 'howdy' to the list
new_list.append('howdy')
print(new_list)
# Add the contents of the list my_list above to the list new_list
new_list.extend(my_list)
print(new_list)
# Delete the first three elements of the list
del new_list[0:3]
print(new_list)

## Exercise

1. Create a list called LS containing the elements 'p', 3.14, and False.
2. Add the number 47 as an element to the end of the list.
3. Print the second and third elements in the list.
4. Change the third element in the list to True.
5. Add the list \[0, -6, 5, 0\] as another element in the list.
6. Use indexing to print the second element of the last element of the list

In [None]:
# Type your code here

In [None]:
# @title Solution
LS = ['p', 3.14, False]
LS.append(47)
print(LS[1:3])
LS[2] = True
print(LS)
LS.append([0, -6, 5, 0])
print(LS[-1][1])

## Sets

A **set** is similar to a tuple or list. However:

- Order doesn't matter
- No individual element is repeated more than once

We use curly brackest { and } for sets.

Format:
```
my_set = {element 1, element 2, element 3, ...}
```
If you've come across the mathematical notion of a set, this is same idea. Sets are about membership. Elements are either in a set or not. Look at thet two sets below. How many elements does each have?
```
S = {1, 2, 3}
T = {3, 4, 5, 4, 5,}
```
The set S above has three elements: 1, 2 and 3.
The set T also has three elements: 3, 4, 5.

In [36]:
# Run this code to see that both sets have exactly 3 elements.
S = {1, 2, 3}
T = {3, 4, 5, 4, 5}
print("The set S", S)
print("The set T", T)

The set S {1, 2, 3}
The set T {3, 4, 5}


Like lists, you can add elements to a set or remove elements from a set.

Given two sets, you can find the elements they have in common (intersection) or the elements in one set but not the other (difference).

In [37]:
# Add the element 4 to the set S
S.add(4)
print(S)

{1, 2, 3, 4}


In [38]:
# Remove the element 2 from the set
S.remove(2)
print(S)

{1, 3, 4}


In [39]:
# Return the set containing everything that is in S and T
print(S.intersection(T))
# Return the set containing everything that is in S and not in T
print(S.difference(T))


{3, 4}
{1}


## Exercise

1. Create a set A containing the elements 0 and 1.
2. Add the elements 2 and 3 to the set A and print the whole set.
3. Create a set B containing the elements 3, 4, 5 and 6.
4. Remove the element 6 from the set B and print the whole set.
5. Print the set elements that are in A and B.
6. Print the set of elements that are in B but not in A.

In [None]:
# Type your code here

In [40]:
# @title Solution
A = {0, 1}
A.add(2)
A.add(3)
print(A)
B = {3, 4, 5, 6}
B.remove(6)
print(B)
print(B.intersection(A)) # or print(A.intersect(B))
print(B.difference(A))

{0, 1, 2, 3}
{3, 4, 5}
{3}
{4, 5}


## Dictionaries

Dictionaries are sets of key-value pairs. They are widely used in internet applications which share data through APIs (application programming interfaces). For this reason, they are very important in data science.

Let's go back to the train example from the start. If you wanted to store information about the location of an RTD train at a certain time, you might need to store:

- Train identifier
- Time and date
- longitude
- latitude

Here is one train (which we will call train_1):

- Identifier: A0213
- Time and date: 13\:53\:23 2025-12-10
- Latitude: 39.7
- Longitude: -105

We could store all this in a tuple or list, as long as the order remains consistent. Alternatively, we can use a dictionary with key-value pairs. Each key is like an index that you can use to look up a particular item.

train_1 = { 'ID':'A0213', 'DateTime': '13\:53\:23 2025-12-10', 'Lat':39.7, 'Long':-105 }

In [None]:
# Create a dictionary containing information about train 1
train_1 = {'ID': 'A0213', 'DateTime':'13:53:23 2025-12-10', 'Lat':39.7, 'Long':-105}

In [None]:
# Get the ID of train 1
train_1['ID']

In [None]:
# Get the latitude of train 1
train_1['Lat']

In [None]:
# Create a dictionary for train 2
train_2 = { 'ID': 'A0314', 'DateTime':'13:53:23 2025-12-10', 'Lat':40.1, 'Long':-103.2}
if (train_2['Lat']>train_1['Lat']):
    print("Train 2 is North of train 1")
if (train_1['Long']>train_2['Long']):
    print("Train 1 is East of train 2")

## Exercise
Create a dictionary for train 3 with the following properties:
ID = A0415
DateTime = '13:53:23 2025-12-10'
Lat = 40.2
Long = -104.2

Write an if-statement to check if train 3 is North of train 1, and a separate if-statement to
check is train 3 is North of train 2.

Challenge: Can you write some code that tells you which of the 3 trains is further North?

## Extension

We noted that tuples and strings are immutable. This means you can change them. For example, the follow code won't work.
```
my_tuple = (4, -3, 2)
my_tuple[1] = 7
```
On the other hand, lists and sets are mutable.

1. Do some research to lean about the pros and cons of data structures being mutable or immutable.
2. Suppose the variable mystr stores the string 'Karma'. How could you change the second character to a 'o' and store the new string in the original variable? Your solution should work in general with any string: given a string, a new character and an index value, replace the character at the index position with the new character and store in the same variable.

In [None]:
# @title Solution to 2
mystr = 'Karma'
mystr = mystr[0:1]+'o'+mystr[2:]