# Collections (data structures) basics
### Data 765 tutoring

Collections, or data structures, are ways of storing data. Like real life, computers need to store data in different ways depending on how that data should be accessed.

In our real lives we may store papers in a small folder if we need easy access and order doesn't matter. A binder with tabs could be used if we needed to divide papers by subject such as Math, Science, Literature, et cetera.

A binder or a folder is unsuitable for some objects. Unless you're weird, you probably wouldn't try to store a frying pan in a folder.

Data structures in computers are analogous to real life storage structures in that we use different structures depending on what we need to accomplish. For example, a dictionary a.k.a. hash map allows linking keys to values. A `list` (vector) allows efficient access to random elements as well as easy tail end insertion of elements.

Each data structure has limitations depending on your workload. A dresser in real life has the benefit of easy storage of a lot of clothes. However, digging to the bottom of a dresser is harder than pulling out a specific article from the top. In terms of programming, some data structures may be great at random access (lists/vectors) but slow at middle insertions.

Some of these concepts aren't too important for data 765, but having a working understanding of different data structures will exponentially improve your programming. As programmers, even discounting data science, all we do is manipulate data in some way.

# Lists (vectors)

Python `list`s store arbitrary types sequentially. In other words, a `list` may hold integers, floats, different classes, other `list`s, `dict`s, or any other type desired. A `list` may consist of only a single type as well.

Stuff in `list`s are referred to as "elements."

Lists are created with brackets, `[]`, the `list()` function, or comprehensions.

In [1]:
# All elements are strings
animals = ["cat", "giraffe", "bunny", "duck", "elephant", "dolphin"]

# Different types
grabbag = ["42", 42, 42.]

Here's a more complex `list` that stores other collections.

Internally, Python `list`s store pointers to `PyObject`s which allows us to store any type. Don't worry if that doesn't make sense. The gist of it is that a `list` doesn't store the actual object but a value that refers to where the object is located in memory. An example is if we had a real life list of addresses to houses; we don't have the actual houses in our pockets (unless you're a giant I guess) but instead the location of the houses.

This concept will be more important when your class reaches [NumPy](https://numpy.org/) and [ndarrays](https://numpy.org/doc/stable/reference/arrays.ndarray.html).

In [None]:
bag_of_holding = []

## Indexing

Like most programming languages, Python's `list` are indexed starting from 0 not 1 because of how array types are represented and accessed in memory.

Valid index ranges are in the interval [0, n-1) where n is the length of the `list`. In other words, the **last valid index is len(your_list) - 1**. The final element of `animals` is 5 because there are six elements. You know, math. 🤓😸

In [2]:
print(animals[0])
print(animals[1])

cat
giraffe


Besides single element indexing, Python supports retrieving a range of elements through slicing. The [slice](https://docs.python.org/3/library/functions.html#slice) class is constructed with optional `start`, `stop`, and `stop` conditions. We usually don't have to construct an object directly as Python has [syntactic sugar](https://en.wikipedia.org/wiki/Syntactic_sugar) for slices.

The parameters' default arguments are:
* `start`: 0
* `stop`: to the end of the sliced object
* `step`: step by one element

Let's retrieve "giraffe", "bunny", and "duck" using slices:

In [3]:
best_friend_fav = animals[1:4]
print(best_friend_fav)

['giraffe', 'bunny', 'duck']


I skipped the optional `step` parameter because "giraffe", "bunny", and "duck" are contiguous within the `list`.

The zeroth element is a reference to "cat" while "giraffe" is at index 1. The second number in the slice is where indexing should stop. The element at index 4 isn't included because of the open interval. We're left with the three elements we sought as a `list`.

As I mentioned above, `start` is an optional parameter. We may preclude passing an argument for `start` by leaving out the first number of the slice. In fact, each of the three parameters take their defaults if we leave them out in the slice. You'll learn more about default arguments later when the class reaches functions (my blog also covers it!).

In [4]:
animals[:4]

['cat', 'giraffe', 'bunny', 'duck']

By leaving out the first and last numbers we're effectively creating a slice like so: `animals[0:4:1]`.

What if we don't pass in any arguments?

In [2]:
animals[:]

['cat', 'giraffe', 'bunny', 'duck', 'elephant', 'dolphin']

The default arguments evaluate to slicing from the first to the last element without skipping any of them. In other words, slicing with only the default arguments is a shallow copy of the `list`.

Indexing and slicing returns an element or elements. We can chain indexing for cleaner code.

In [3]:
# Retrieve one element then index that element
print(animals[0][0])

# Retrieve multiple elements then index into the new list
# and finally index into the string

print(animals[:4][1][3])

c
a


`animals[0][0]` first indexes the `list` for the first element. The first element is a string, "cat", which in turn can be indexed. The first index of "cat" is 'c'.

The second example looks more complex, but when you break it down it's similar to the first. `animals[:4]` retrieves the first four elements (0, 1, 2, and 3) as a `list`. The second index refers to the second element in that `list`, the string "giraffe". The final index pulls a character from that string which in this case is 'a'.

The third number in slices is a skip value. For example, we can skip every other element by setting the third index to 2.

In [5]:
a = list(range(26))

a[::2]

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24]

As an aside, [range](https://docs.python.org/3/library/functions.html#func-range) objects are constructed similarly to slices. `range(1, 15, 2)` creates a range object that spans from 1 through 14 while skipping by 2. `range` objects are _not_ `list`s. They're sequences that must be consumed into a collection or iterated over. The code above consumes `range(26)` by creating a `list`.

## Mutating `list`s

Python's `list` is mutable. We can modify our actual `list` by appending and removing elements.

In [6]:
animals.append("hamster")

animals

['cat', 'giraffe', 'bunny', 'duck', 'elephant', 'dolphin', 'hamster']

`lists` implement a lot of useful methods which you [may find here](https://docs.python.org/3/tutorial/datastructures.html).