# Collections of Data: Sequences

Now that we know a decent amount about individual data values, it makes sense to figure out how we can work with more than one value of data at a time -- and start heading towards the notion of a 'data set'. We can start to achieve this by creating collections of values.

## Lists

In Python, the simplest way to make a collection is by creating a list. A list is created by surrounding a group of items with square brackets, `[ ]`, and separating each item with a comma `,`. You can put any data type inside a list!

In [None]:
my_amazing_list = [1, True, 'three', 5.0 - 1.0]
my_amazing_list

In [None]:
type(my_amazing_list)

The end result is an ordered sequence of items that we can read from beginning to end.

Now that your amazing list is created, you could even add, remove, reorder, and search for items in the list.

```{margin}
If we can alter the contents of a collection, then the collection is called 'mutable' (it can mutate!)
```

### Functions on lists

Collections are handy because they allow us to perform calculations that require multiple pieces of data to be computed.

For instance, if you've been keeping track of the hours of sleep you get each night for the past week, you might be interested in the shortest and longest amount of time you slept.  Neither of these calculations can be done easily with simple expressions, but they can be done easily using *call expressions* on a list!

Let's start by creating our collection of data and assigning it to a meaningful name.

In [None]:
hours_slept = [8, 7, 7, 8, 5, 8, 9]

Some functions are specifically designed to perform calculations on collections of values. For example, `max` and `min` are functions that will find the largest or smallest item in a list, respectively.

If we're curious what the longest number of hours you slept last week was, we can pass our list into the `max` function.

In [None]:
max([8, 7, 7, 8, 5, 8, 9])

And if we want to find the shortest number of hours you slept, then we can use the `min` function on our list. It was tedious to type out all the hours when we calculated the max, so let's take the smarter route and just use the variable name of our collection instead.

In [None]:
min(hours_slept)

We can also calculate the total number of hours we slept last week by using `sum`, and we can find out the number of items in a list by using `len`. Feel free to play with these functions in the interactive notebook!

````{hiddenanswer}
---
question: Using `sum` and `len` how would you write an expression that calculates the average number of hours you slept last week?
answer: |
    ```
    sum(hours_slept) / len(hours_slept)
    ```
    7.857142857142857
````

```todo
Finally, lists also have a handful of their own methods, which you can call by 

.count

Include (?)
```

## Arrays

Python’s lists are useful and easy to work with, but can be slow. As data scientists, we will eventually be working with sequences of millions, if not billions, of entries -- so speed is of the essence. Additionally, a lot of the calculations we've seen above aren't going to work if our list contains mixed types! After all, have you ever tried calculating the sum of `2` and `'orange'`?

The numerical analysis library `numpy` fixes this by introducing a new type of sequence: *arrays*. Let's import the `numpy` library (calling it `np` for short) and get started!

In [None]:
import numpy as np

Arrays are stricter than lists -- they can only contain one type of data, and once they're created we cannot add or remove items from them -- but they offer incredible power and speed. Because of their power, NumPy arrays are one of the most commonly used collections when working with data.

A lot of the same concepts from lists apply to arrays, in fact we create an array by passing in a list to the `np.array()` function.

In [None]:
hours_slept_array = np.array([8, 7, 7, 8, 5, 8, 9])
hours_slept_array

Arrays can also contain other types of data, like strings, but remember that a single array can only contain a single type of data.

In [None]:
np.array(['this', 'is', 'also', 'fine'])

[Recall what happened when we evaluated expressions that contained ints and floats: cast] [same thing will happen if we try to make an array containing ints and floats] [interestingly, numpy will also 'cast' other types.]

[try making an array of numbers and strings and see what happens]

[explanation of dtype]

### Functions on Arrays

The functions we looked at that can be used on lists can also be used on arrays, but NumPy comes full of additional functions that perform a vast amount of useful calculations on arrays.

```todo
### Array operations
```

```todo
## Ranges
```