# Data Structures

We have seen that elements of data can take on different types (`int`, `float`, `str`, etc). Collections of data elements can be stored in different data types.

Let's have a look at some of the most commonly used data structures, and how we can manipulate them.

## Lists

Lists are instantiated with `[]`, and elements in a list are separated with commas.

Let's create a list of the planets and check the length of the list:

In [36]:
planets = ["earth", "jupiter", "neptune", "pluto", "venus", "mars", "mercury", "uranus", "saturn"]

print(planets)
print(len(planets))

['earth', 'jupiter', 'neptune', 'pluto', 'venus', 'mars', 'mercury', 'uranus', 'saturn']
9


Lists have a lot of flexibility: objects in lists can be moved around, removed, additional items can be inserted, and more. List objects have many functions to perform such operations. An extensive list of these functions can be viewed here: https://docs.python.org/3/tutorial/datastructures.html#more-on-lists

Let's remove an item from this list.

In [37]:
planets.remove("pluto")

print(planets)
print(len(planets))

['earth', 'jupiter', 'neptune', 'venus', 'mars', 'mercury', 'uranus', 'saturn']
8


Let's add an item as well:

In [38]:
planets.append("planet_9")

print(planets)

['earth', 'jupiter', 'neptune', 'venus', 'mars', 'mercury', 'uranus', 'saturn', 'planet_9']


### Indexing

Indices are used to point towards the position of certain elements or chunks of a data structure, such as lists, but also with strings.

Each item in a list can be indexed by their position in the list, the first item in the list being indexed with `0`.

In [39]:
print(planets)

print(planets[0])

print(planets[3])

print(planets[7])

['earth', 'jupiter', 'neptune', 'venus', 'mars', 'mercury', 'uranus', 'saturn', 'planet_9']
earth
venus
saturn


If we try to index with a number that is greater than the maximum possible index (in this case 7, because there are 8 items in the list), then we will receive an error.

We can also use a `-` to count backwards from the end of the list, beginning with `-1`:

In [40]:
print(planets)

print(planets[-1])

print(planets[-4])

['earth', 'jupiter', 'neptune', 'venus', 'mars', 'mercury', 'uranus', 'saturn', 'planet_9']
planet_9
mercury


And we can use indexes to slice a list using a `:` with the starting index (inclusive) on the left of the colon and ending index (exclusive) on the right:

In [41]:
print(planets)

print(planets[2:5])

['earth', 'jupiter', 'neptune', 'venus', 'mars', 'mercury', 'uranus', 'saturn', 'planet_9']
['neptune', 'venus', 'mars']


Leaving out one of the two indices when using `:` will return every item before/after the included index:

In [42]:
print(planets)

print(planets[2:])

print(planets[:2])

['earth', 'jupiter', 'neptune', 'venus', 'mars', 'mercury', 'uranus', 'saturn', 'planet_9']
['neptune', 'venus', 'mars', 'mercury', 'uranus', 'saturn', 'planet_9']
['earth', 'jupiter']


## Dictionaries

Dictionaries are instantiated with `{}`, and contain pairs of keys and values. A key is a label for its corresponding value, and are used like indices:

In [43]:
dictionary = {"planet": "earth", "moons": 1}

print(dictionary)

print(dictionary["moons"])

{'planet': 'earth', 'moons': 1}
1


We can add to a new entry to a dictionary by specifying a new key: 

In [44]:
dictionary["avg orbital radius (AU)"] = 1 

print(dictionary)

{'planet': 'earth', 'moons': 1, 'avg orbital radius (AU)': 1}


## NumPy Arrays

NumPy is a Python library. A library (sometimes called a module) is a collection of pre-written code, open-source and published online, that a user can import to their local project, and make use of its functions. Most libraries do not come with Python by default because libraries can be very specific to a task someone is trying to perform.

NumPy is one of the most commonly used libraries, and it is used for projects requiring mathematical operations on data.

Let's import NumPy:


In [45]:
import numpy as np

Here we have imported NumPy so that our code can use its functions, and we have imported it as `np`, which means whenever we want to reference a NumPy function, we use `np` as an abbreviation.

NumPy's utility comes in its arrays. They are effectively tables of data, but we aren't limited to just one or two dimensions.

In [46]:
data = [1,2,3,4,5]

array = np.array(data)

print(array)

[1 2 3 4 5]


NumPy can perform most mathematical operations you can think of, and it applies these operations to all elements in the array, unless specified otherwise.

In [47]:
array = array ** 2

print(array)

array = np.sin(array)

print(array)

[ 1  4  9 16 25]
[ 0.84147098 -0.7568025   0.41211849 -0.28790332 -0.13235175]


The line below creates a 3 dimensional array, of size 2x3x4, meaning the first dimension has 2 entries, the second has 3, and the third has 4. NumPy attempts to represent this when printing the array. Each `[]` surrounds an entry in each dimension, working from the first to the last dimensions.

In [48]:
array_3d = np.random.rand(2,3,4)

print(array_3d) 
print(array_3d.shape)

[[[0.20031178 0.03349002 0.7228885  0.05751897]
  [0.44758227 0.89655885 0.59733246 0.14661288]
  [0.62160706 0.74552643 0.92853073 0.4520389 ]]

 [[0.60866282 0.88911482 0.4209507  0.48343762]
  [0.82685782 0.50397262 0.57778416 0.55281113]
  [0.48578802 0.87102564 0.25670402 0.6936361 ]]]
(2, 3, 4)


Indexing an array is similar to indexing a list, except we can additionally index across existing dimensions, with comma separated indices to correspond to each of the dimensions.

In [49]:
print(array_3d[0]) # Prints the first item in the first dimension
print(array_3d[0, 0]) # Prints the first item in the first and second dimensions
print(array_3d[0, 0, 0]) # Prints the first item in all three dimensions

[[0.20031178 0.03349002 0.7228885  0.05751897]
 [0.44758227 0.89655885 0.59733246 0.14661288]
 [0.62160706 0.74552643 0.92853073 0.4520389 ]]
[0.20031178 0.03349002 0.7228885  0.05751897]
0.20031177865998406


# Exercises