# Data Structures

We have seen that elements of data can take on different types (`int`, `float`, `str`, etc). Collections of data elements can be stored in different data structures.

Let's have a look at some of the most commonly used data structures, and how we can manipulate them.

## Lists

Lists are instantiated with `[]`, and elements in a list are separated with commas.

Let's create a list of the planets and check the length of the list:

In [1]:
planets = ["earth", "jupiter", "neptune", "pluto", "venus", "mars", "mercury", "uranus", "saturn"]

print(planets)

['earth', 'jupiter', 'neptune', 'pluto', 'venus', 'mars', 'mercury', 'uranus', 'saturn']


In [2]:
print(len(planets))

9


Lists have a lot of flexibility: objects in lists can be moved around, removed, additional items can be inserted, and more. List objects have many functions to perform such operations. An extensive list of these functions can be viewed here: https://docs.python.org/3/tutorial/datastructures.html#more-on-lists

Let's remove an item from this list.

In [3]:
planets.remove("pluto")

print(planets)

['earth', 'jupiter', 'neptune', 'venus', 'mars', 'mercury', 'uranus', 'saturn']


In [4]:
print(len(planets))

8


Let's add an item as well:

In [5]:
planets.append("planet_9")

print(planets)

['earth', 'jupiter', 'neptune', 'venus', 'mars', 'mercury', 'uranus', 'saturn', 'planet_9']


### Indexing

Indices are used to point towards the position of certain elements or chunks of a data structure, such as lists, but also with strings.

Each item in a list can be indexed by their position in the list, the first item in the list being indexed with `0`.

In [6]:
print(planets)

['earth', 'jupiter', 'neptune', 'venus', 'mars', 'mercury', 'uranus', 'saturn', 'planet_9']


In [7]:
print(planets[0])

earth


In [8]:
print(planets[3])

venus


In [9]:
print(planets[7])

saturn


If we try to index with a number that is greater than the maximum possible index (in this case 7, because there are 8 items in the list), then we will receive an error.

```python
print(planets[8]) -> error
```

We can also use a `-` to count backwards from the end of the list, beginning with `-1`:

In [10]:
print(planets)

['earth', 'jupiter', 'neptune', 'venus', 'mars', 'mercury', 'uranus', 'saturn', 'planet_9']


In [11]:
print(planets[-1])

planet_9


In [12]:
print(planets[-4])

mercury


And we can use indexes to slice a list using a `:` with the starting index (inclusive) on the left of the colon and ending index (exclusive) on the right:

In [13]:
print(planets)

['earth', 'jupiter', 'neptune', 'venus', 'mars', 'mercury', 'uranus', 'saturn', 'planet_9']


In [14]:
print(planets[2:5])

['neptune', 'venus', 'mars']


Leaving out one of the two indices when using `:` will return every item before/after the included index:

In [15]:
print(planets)

['earth', 'jupiter', 'neptune', 'venus', 'mars', 'mercury', 'uranus', 'saturn', 'planet_9']


In [16]:
print(planets[2:])

['neptune', 'venus', 'mars', 'mercury', 'uranus', 'saturn', 'planet_9']


In [17]:
print(planets[:2])

['earth', 'jupiter']


## Dictionaries

Dictionaries are instantiated with `{}`, and contain pairs of keys and values. A key is a label for its corresponding value, and are used like indices:

In [18]:
dictionary = {"planet": "earth", "moons": 1}

print(dictionary)

{'planet': 'earth', 'moons': 1}


In [19]:
print(dictionary["moons"])

1


We can add to a new entry to a dictionary by specifying a new key: 

In [20]:
dictionary["avg orbital radius (AU)"] = 1 

print(dictionary)

{'planet': 'earth', 'moons': 1, 'avg orbital radius (AU)': 1}


## NumPy Arrays

NumPy is a Python library. A library (sometimes called a module) is a collection of pre-written code, open-source and published online, that a user can import to their local project, and make use of its functions. Most libraries do not come with Python by default because libraries can be very specific to a task someone is trying to perform.

NumPy is one of the most commonly used libraries, and it is used for projects requiring mathematical operations on data.

Let's import NumPy:

In [21]:
import numpy as np

Here we have imported NumPy so that our code can use its functions, and we have imported it as `np`, which means whenever we want to reference a NumPy function, we use `np` as an abbreviation.

NumPy's utility comes in its arrays. They are effectively tables of data, but we aren't limited to just one or two dimensions.

In [22]:
data = [1,2,3,4,5]

array = np.array(data)

print(array)

[1 2 3 4 5]


NumPy can perform most mathematical operations you can think of, and it applies these operations to all elements in the array, unless specified otherwise.

In [23]:
array = array ** 2

print(array)

[ 1  4  9 16 25]


In [24]:
array = np.sin(array)

print(array)

[ 0.84147098 -0.7568025   0.41211849 -0.28790332 -0.13235175]


The line below creates a 3 dimensional array, of size 2x3x4, meaning the first dimension has 2 entries, the second has 3, and the third has 4.

NumPy attempts to represent this when printing the array. Each `[]` surrounds an entry in each dimension, working from the first to the last dimensions.

In [25]:
array_3d = np.random.rand(2,3,4)

print(array_3d) 

[[[0.79067982 0.11147306 0.2459798  0.88161649]
  [0.54848688 0.23778649 0.44357064 0.28387273]
  [0.8676022  0.77007072 0.2416081  0.18653272]]

 [[0.37584916 0.53139559 0.06735646 0.17601957]
  [0.43967221 0.58625294 0.09281619 0.01059077]
  [0.25304321 0.10094245 0.77054619 0.53373265]]]


In [26]:
print(array_3d.shape)

(2, 3, 4)


Indexing an array is similar to indexing a list, except we can additionally index across existing dimensions, with comma separated indices to correspond to each of the dimensions.

Prints the first item in the first dimension:

In [27]:
print(array_3d[0])

[[0.79067982 0.11147306 0.2459798  0.88161649]
 [0.54848688 0.23778649 0.44357064 0.28387273]
 [0.8676022  0.77007072 0.2416081  0.18653272]]


Prints the first item in the first and second dimensions:

In [31]:
print(array_3d[0, 0])

[0.79067982 0.11147306 0.2459798  0.88161649]


Prints the first item in all three dimensions:

In [29]:
print(array_3d[0, 0, 0])

0.7906798219001611


# Exercises

### Lists & indexing 📋

1. Look at the list of `planets` in the kernel below. Use a `for`-loop and `if` statement to create a list of planets with names longer than 5 characters. Print your resulting list.

    _(Hint: start with an empty list `[]` and add to it)_

In [30]:
planets = ["earth", "jupiter", "neptune", "venus", "mars", "mercury", "uranus", "saturn"]

2. Returning to the original list `planets`, print the second, third and fourth characters of each planet name.

### Dictionaries 📖

1. Create a dictionary called `mars_info` with keys:
    - "planet" set to "mars"
    - "moons" set to 2
    - "has_rings" set to False

2. Add a new key "diameter_km" with value 6792 to your mars_info dictionary.

3. Print the sentence:
`Mars has`<>`moons and a diameter of`<>`km` where <> is replaced with values from the dictionary.

### NumPy Arrays 🧮

You will need to use Google to help with how to use some of these functions! Or, directly use the NumPy documentation: https://numpy.org/doc/2.3/

1a. Create a `NumPy` array with the numbers 0 to 9 using `np.linspace()`

1b. Multiply all elements by 3 and print the result.

1c. Create an array of random numbers between 0 and 1 of size 2x3 using `np.random.rand()`.

2a. Create two NumPy arrays of size 3 representing 3D vectors using `np.random.randint()`.

2b. Compute their:
    - Dot product (`np.dot()`)
    - Cross product (`np.cross()`)
    - Angle between them in degrees (use `np.arccos()` and `np.linalg.norm()`)