# Selecting Subsets of Data from a Series

Selecting subsets of data from a Series is accomplished similarly to how it's done with DataFrames. 

## Series indexer rules

The same three indexers, `[]`, `loc`, and `iloc`, are available. Because there are no columns in a Series, the rules for each indexer are slightly different than they are for a DataFrame. Let's begin by reading in the movie dataset and setting the index to the title.

In [None]:
import pandas as pd
movie = pd.read_csv('../data/movie.csv', index_col='title')
movie.head(3)

Let's select a single column of data so that we can have access to a Series. Here, we select the `imdb_score` column.

In [None]:
imdb = movie['imdb_score']
imdb.head(3)

### Series subset selection with just the brackets

For DataFrames, we learned that *just the brackets* accepted either a single label or a list of labels and used this input to select one or more DataFrame columns. For a Series, *just the brackets* has different rules that you must follow to use it correctly. It allows selection by index label. For instance, we can select the `imdb_score` for the movie Avatar like this:

In [None]:
imdb['Avatar']

Interestingly enough, it's possible to use integer location as well with *just the brackets*. The movie Avatar is at integer location 0 and we can duplicate our previous result by using it.

In [None]:
imdb[0]

## Series subset selection with `loc`

The `loc` indexer selects by **label** just as it does with a DataFrame. Since there are no columns, it only accepts a single selection object which can be any of the following:

* A single label
* A list of labels
* A slice with labels
* A boolean Series (covered in a later chapter)

### Select a single value with `loc`

Select a single value by providing the `loc` indexer the name of the index. Here, we select the `imdb_score` of the movie Forrest Gump.

In [None]:
imdb.loc['Forrest Gump']

### Select multiple values using a list with `loc`

Provide the `loc` indexer a list of index labels to select multiple values.

In [None]:
names = ['Good Will Hunting', 'Home Alone', 'Meet the Parents']
imdb.loc[names]

### Select multiple values using slice notation with `loc`

Provide the `loc` indexer index labels for the start and stop components of slice notation to select all the values between those two labels. The results are **inclusive** of the stop label.

In [None]:
imdb.loc['Home Alone':'Top Gun']

As with any slice notation, all components are optional. Here, we select every `imdb_score` from the movie Twins to the end.

In [None]:
imdb.loc['Twins':].head()

In this example, we select every 300th `imdb_score` beginning at the movie Twins to the end.

In [None]:
imdb.loc['Twins'::300]

## Series subset selection with `iloc`

The Series `iloc` indexer is analogous to `loc` except that it only makes selection via integer location using the following:

* A single integer location
* A list of integer locations
* A slice with integer locations

### Select a single value with `iloc`

Let's select the `imdb_score` for the movie with integer location 499.

In [None]:
imdb.iloc[499]

Selecting with a single integer always return the value by itself and not within a Series. If we want to return a one-item Series, so that we can see the index, we can use a one-item list as our selection.

In [None]:
imdb.iloc[[499]]

### Select multiple values using a list with `iloc`

Provide `iloc` a list of integer locations to select multiple values.

In [None]:
ints = [499, 599, 699]
imdb.iloc[ints]

### Select multiple values using slice notation with `iloc`

Provide `iloc` with slice notation using integers as the stop and start components to select all the values between those two locations. The results are **exclusive** of the last integer. Here, we select integer locations 145 through 147.

In [None]:
imdb.iloc[145:148]

Let's select the last 3 values using slice notation.

In [None]:
imdb.iloc[-3:]

Let's select every 200th value from integer location 1,000 to 2,000

In [None]:
imdb.iloc[1000:2000:200]

### Use `loc` and `iloc` instead of just the brackets

For a Series, *just the brackets* is flexible and can take either a label or integer location. This might make it seem like `loc` and `iloc` would be unnecessary, but the opposite is actually the case. Using *just the brackets* for a Series is ambiguous and not explicit. It's not clear whether the label or integer location are being used.

I suggest only using `loc` and `iloc` for clarity. Whenever the `loc` indexer is used, we are certain it selects by label. Likewise, whenever the `iloc` indexer is used, we are certain it select by integer location.

## Summary of Series subset selection

The three indexers, `[]`, `loc`, and `iloc` are available to make subset selections on a Series. They work similarly as they do on DataFrames

* The `loc` indexer makes selections by label using a:
    * single label
    * list of labels
    * slice of labels
    * boolean Series
* The `loc` indexer makes selections by label using a:
    * single integer location
    * list of integer locations
    * slice of integer locations
* Use `loc` and `iloc` instead of *just the brackets* to be explicit
* There are no columns in a Series, so selection is only based on the index

## Exercises

Execute the cell below to select the `duration` column (length of movie in minutes) as a Series and use it for the first few exercises.

In [None]:
duration = movie['duration']
duration.head()

### Exercise 1

<span  style="color:green; font-size:16px">How long was the movie Titanic?</span>

### Exercise 2

<span  style="color:green; font-size:16px">How long was the movie at the 999th integer location?</span>

### Exercise 3

<span  style="color:green; font-size:16px">Select the duration for the movies Hulk, Toy Story, and Cars.</span>

### Exercise 4

<span  style="color:green; font-size:16px">Select the duration for every 100th movies from Hulk to Cars</span>

### Exercise 5

<span  style="color:green; font-size:16px">Select the duration for every 10th movie beginning from the 100th from the end.</span>

### Read in bikes dataset

Read in the bikes dataset and select the `wind_speed` column by executing the cell below and use it for the rest of exercises. Notice that the index labels are integers, meaning that when you use `loc` you will be using integers.

In [None]:
bikes = pd.read_csv('../data/bikes.csv')
wind = bikes['wind_speed']
wind.head()

### Exercise 6

<span  style="color:green; font-size:16px">What kind of index does the `wind` Series have?</span>

### Exercise 7
<span  style="color:green; font-size:16px">From the `wind` Series, select the integer locations 4 through, but not including 10</span>

### Exercise 8

<span  style="color:green; font-size:16px">Copy and paste your answer to Exercise 7 below but use `loc` instead. Do you get the same result? Why not?</span>