# Collections of Data - Sequences

Now we have a decent understanding of individual data values in Python. But, as a data scientist you'll need to work with an entire set of data values -- not just one!

We can start heading towards the notion of a 'data set' by learning how to create collections of values.

## Lists

In Python, the simplest way to make a collection is by creating a list. A list is created by surrounding a group of items with square brackets, `[ ]`, and separating each item with a comma `,`. You can put any data type inside a list (including other lists)!

In [None]:
my_amazing_list = [1, True, 'three', [5.0 - 1.0]]
my_amazing_list

In [None]:
type(my_amazing_list)

The end result is an ordered sequence of items that we can read from beginning to end.

Now that your amazing list is created, you could even add, remove, reorder, and search for items in the list.

```{margin}
If we can alter the contents of a collection, then the collection is called 'mutable' (it can mutate!)
```

### Functions on lists

Collections are handy because they allow us to perform calculations that require multiple pieces of data to be computed.

For instance, if you've been keeping track of the hours of sleep you get each night for the past week, you might be interested in the shortest and longest amount of time you slept.  Neither of these calculations can be done easily with simple expressions, but they can be done easily calling a function on a list!

Let's start by creating our collection of data and assigning it to a meaningful name.

In [None]:
hours_slept = [8, 7, 7, 8, 5, 8, 9]

Some functions are specifically designed to perform calculations on collections of values. For example, `max` and `min` are functions that will find the largest or smallest item in a list, respectively.

If we're curious what the longest number of hours you slept last week was, we can pass our list into the `max` function.

In [None]:
max([8, 7, 7, 8, 5, 8, 9])

And if we want to find the shortest number of hours you slept, then we can use the `min` function on our list. It was tedious to type out all the hours when we calculated the max, so let's take the smarter route and just use the variable name of our collection instead.

In [None]:
min(hours_slept)

We can also calculate the total number of hours we slept last week by using `sum`, and we can find out the number of items in a list by using `len`. Feel free to play with these functions in the interactive notebook!

````{hiddenanswer}
---
question: Using `sum` and `len` how would you write an expression that calculates the average number of hours you slept last week?
answer: |
    ```
    sum(hours_slept) / len(hours_slept)
    ```
    7.857142857142857
````

Finally, just like strings had their own set of special functions (called methods), lists have their own set of methods.

These methods allow us to do things like add (append) an item to the end of a list, or find and remove an item from a list. Just like with string methods, list methods are called by placing a dot after a list or the variable name of a list, and then calling a method name as a function.

A lot of these methods don't return any value, but instead just modify the list directly -- so, we need to show the list again to see what the effect of the method was. If this is confusing, that's fine! We'll go over these methods at a later chapter when we learn about 'loops', but for now feel free to just play around with list methods.

In [None]:
hours_slept = [8, 7, 7, 8, 5, 8, 9]
hours_slept

In [None]:
hours_slept.append(3)

In [None]:
hours_slept

In [None]:
hours_slept.remove(8)

In [None]:
hours_slept

## Arrays

Python’s lists are useful and easy to work with, but can be slow. As data scientists, we will eventually be working with sequences of millions, if not billions, of entries -- so speed is of the essence. Additionally, a lot of the calculations we've seen above aren't going to work if our list contains mixed types! After all, have you ever tried calculating the sum of `2` and `'orange'`?

The numerical analysis library `numpy` fixes this by introducing a new type of sequence: *arrays*. Let's import the `numpy` library (calling it `np` for short) and get started!

In [None]:
import numpy as np

```{tip}
Avoid the embrassment -- it's pronounced **"num-pie"** <small>(not "num-pee")</small>
```

Arrays are stricter than lists, with two main restrictions that are instantly noticeable:

1. They can only contain one type of data
2. One created, we cannot directly add or remove items

However, NumPy arrays offer incredible power and speed which have made them one of the most commonly used collections in data science.

A lot of the same concepts from lists carry over when trying to understand arrays. In fact, to create an array we pass in a list to the `np.array()` function.

In [None]:
hours_slept_array = np.array([8, 7, 7, 8, 5, 8, 9])
hours_slept_array

Arrays can also contain other types of data, like strings or bools or other arrays, *but* remember that a single array can only contain a single type of data.

In [None]:
np.array(['this', 'is', 'also', 'fine'])

Remember what happened when we evaluated expressions that contained both ints and float? The result was always a float. The same thing will happen if we try to make an array containing ints and float -- remember, only one data type can be contained by an array.

In [None]:
np.array([1, 2, 3.0])

If possible, NumPy will always try to convert everything you give it to the same type. That means if you give it strings and numbers, it'll turn everything into strings!

Why is this? Because you can always convert a number into a string (just place quotes around it!), but there are only a handful some strings that can be reliably converted into a number. For the sake of consistency, NumPy turns it all into strings.

```{margin}
No need to be worried about the weird looking "dtype" -- that just tells us that the data type it contains are stored as unicode strings (`U`) with a maximum possible length of 21 characters (`21`).
```

In [None]:
np.array([1, 2, '3'])

### Functions on Arrays

The functions we looked at that can be used on lists can also be used on arrays, but NumPy comes full of additional functions and methods that perform a vast amount of useful calculations on arrays.

### Array operations

## Ranges

## Selecting items from a sequence