# Indexing and Slicing: Getting Items from Ordered Data Collections

## Indexing

Getting a single item from a collection is called **Indexing** a collection.  To do this, you pass the item's "index" (it's position) to the get-item operator in Python: the square brackets.  Note: Python is a 0-indexed language, so counting starts from zero.

```python
data = [10, 20, 30]
data[0]  # "Get the first item"
```

Counting from the end of a sequence can be done using a negative index:
```python
data = [10, 20, 30]
data[-1]  # "Get the last item"
```

**Exercises**

**Example**: What is the first letter in the english alphabet?

In [None]:
letters = 'abcdefghijklmnopqrstuvwxyz'

In [None]:
letters[0]

What is the fifth letter in the english alphabet?

In [None]:
letters = 'abcdefghijklmnopqrstuvwxyz'

What is the third letter in the english alphabet?

In [None]:
letters = 'abcdefghijklmnopqrstuvwxyz'

What is the last letter in the english alphabet?

In [None]:
letters = 'abcdefghijklmnopqrstuvwxyz'

What is the third-to-last letter in the english alphabet?

In [None]:
letters = 'abcdefghijklmnopqrstuvwxyz'

## Slicing

Slicing is about getting multiple items from a collection.  In sequences, this can be done by specifying the start and stop indices, seperated by a colon:

```python
data = [10, 20, 30, 40, 50, 60, 70, 80]
data[0:2]  # Gets [10, 20]
data[:2]   # Also gets [10, 20]
data[2:4]  # Gets [30, 40]
data [-2:] # Gets [70, 80]
```



**Example**: What are the first five letters in the English alphabet?

In [None]:
letters = 'abcdefghijklmnopqrstuvwxyz'

In [None]:
letters[:5]

What are the second-to-sixth letters in the english alphabet?

In [None]:
letters = 'abcdefghijklmnopqrstuvwxyz'

What are the third-to-fifth letters in the english alphabet?

In [None]:
letters = 'abcdefghijklmnopqrstuvwxyz'

What are the last three letters of the english alphabet?

In [None]:
letters = 'abcdefghijklmnopqrstuvwxyz'

What are the second-to fifth days of the week?

In [None]:
days_of_week = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

Get all the days of the week except Monday.

In [None]:
days_of_week = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

Get all the days of the week except the weekend days

In [None]:
days_of_week = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

# Filtering Data With Logical Indexing

Sometimes you want to remove certain values from your dataset.  In Numpy, this can be done with **Logical Indexing**, and in normal Python this is done with an **If Statement**

### Step 1: Create a Logical Numpy Array

We can convert all of the values in an array at once with a single logical expression.  This is broadcasting, the same as is done with the math operations we saw earlier:

```python
>>> data = np.array([1, 2, 3, 4, 5])
>>> data < 3
[True, True, False, False, False]
```

**Exercises**: Make arrays of True/False values that answer the following questions about the dataset below for each element.

In [None]:
import numpy as np

list_of_values = [3, 7, 10, 2, 1, 7, np.nan, 20, -5]
data = np.array(list_of_values)
data

*Example*: Where are the values that are greater than zero?

In [None]:
mask = data > 0
data[mask]

Where are the values that are less than four?

What are the values greater than 7

### Statistics on Filtered Data


**Exercises**: Using the following dataset, have Python to calculate the answers to the questions below:

In [None]:
data = np.array([3, 1, -6, 8, 20, 2, 7, 1, 9, 7, 7, -7])
data

*Example*: How many values are greater than 4?  

In [None]:
len(data[data > 4])

How many values are equal to 7?

What is the mean value of the positive numbers?

What is the mean value of the negative numbers?

What is the median value of the values that are greater than 5?

## Modifying Data Using Logical Indexing

Just like in normal indexing operations with arrays, logical indexing can be used to *set* new values ,in addition to *getting* new values from an array:

| Example | Description |
| :-- | :-- |
| **`data[data > 5] = 10`** | Set all values greater than 5 to 10  |
| **`data[data > 5] = data[data > 5] * 10`** | Set the values greater than 5 to themselves times 10  |
| **`data[data > 5] *= 10`** | Multiply all the values greater than 5 by 10, setting them in-place.  |
| **`data2 = data.copy()`** | Copy an array to a new variable |

Example: Set all positive values in `data` to 100

In [None]:
data = np.arange(-4, 5)
data

In [None]:
data[data > 0] = 100
data

Set all negative values in `data` to 0

In [None]:
data = np.array([ 1, -2,  2, -1,  1,  4, -3,  1,  2, -1])
data

Add 100 to all values in `data` less than 100.

In [None]:
data = np.array([0, 101, 2, 3, 104, 105, 6, 107, 8])
data


## Using Logical Indexing to Link Two Different Variables in a Dataset

So long as this process produces a mask of Trues and Falses equal in shape to the array it's being applied to, it works!  An implication of this is that we can use one dataset to index another dataset:

| Syntax | Description |
| :--  | :-- |
| **`data2[data1 > 0]`** | Get the values in `data2` with indices that correspond to the positive values in `data1` |

Get all the "Treatment" group's temperatures

In [None]:
temp = np.array([35, 38, 32, 35, 39, 37, 36, 38, 39])
group = np.array(['Control', 'Treatment', 'Treatment', 'Control', 'Treatment', 'Control', 'Control', 'Treatment', 'Control'])

In [None]:
mask = group == 'Treatment'
temp[mask]

Calculate the mean temperature of the "Control" group

In [None]:
temp = np.array([35, 38, 32, 35, 39, 37, 36, 38, 39])
group = np.array(['Control', 'Treatment', 'Treatment', 'Control', 'Treatment', 'Control', 'Control', 'Treatment', 'Control'])

What is the minimum temperature in control group?

What is the minimum temperature in treatment group?