<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#The-Pandas-Series" data-toc-modified-id="The-Pandas-Series-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>The Pandas Series</a></span><ul class="toc-item"><li><span><a href="#What-Is-a-Pandas-Series?" data-toc-modified-id="What-Is-a-Pandas-Series?-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span><strong><font color="red">What Is a Pandas Series?</font></strong></a></span></li><li><span><a href="#So-What's-So-Great-About-a-Pandas-Series?" data-toc-modified-id="So-What's-So-Great-About-a-Pandas-Series?-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span><strong><font color="orange">So What's So Great About a Pandas Series?</font></strong></a></span></li></ul></li><li><span><a href="#Methods" data-toc-modified-id="Methods-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Methods</a></span><ul class="toc-item"><li><span><a href="#.head()-and-.tail()" data-toc-modified-id=".head()-and-.tail()-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span><code>.head()</code> and <code>.tail()</code></a></span></li><li><span><a href="#.value_counts()" data-toc-modified-id=".value_counts()-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span><code>.value_counts()</code></a></span></li><li><span><a href="#.isin()" data-toc-modified-id=".isin()-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span><code>.isin()</code></a></span></li></ul></li></ul></div>

In [3]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
%matplotlib inline

### The Pandas Series

#### **<font color=red>What Is a Pandas Series?</font>**

A pandas Series object is a one-dimensional, labeled array made up of an auto-generated index that starts at 0 and data consisting of a single data type. You can think of a pandas Series like a single column from a table in Excel or Google Sheets. Pandas also has a structure like the tables in Excel, which is a DataFrame object. 

**A pandas Series is a single column in a pandas DataFrame. It can be created on its own from a python list,**

In [41]:
colors = ['red', 'yellow', 'green', 'blue', 'orange', 'red', 'violet', 'indigo']
c_series = pd.Series(colors)
c_series

0       red
1    yellow
2     green
3      blue
4    orange
5       red
6    violet
7    indigo
dtype: object

In [42]:
type(c_series)

pandas.core.series.Series

**from a NumPy array,**

In [43]:
arr = np.array([5, 10, 15, 20, 25, 30, 35, 40])
a_series = pd.Series(arr)
a_series

0     5
1    10
2    15
3    20
4    25
5    30
6    35
7    40
dtype: int64

In [44]:
type(a_series)

pandas.core.series.Series

**or it can be pulled as a single column from a pandas DataFrame. The Series will reatain the same index as the DataFrame, but I'm jumping ahead; we'll learn more about pandas DataFrames in the near future.**

**For now, all you need to know is that a Series can be pulled from a DataFrame using dot notation**

```python
df.series
```
**or bracket notation.**

```python
df['series']
```

#### **<font color=orange>So What's So Great About a Pandas Series?</font>**

A Series can be made up of any data type and comes with many useful attributes and methods adding a dizzying array of functionality to this pandas object.

### Methods

**Methods** are 

#### `.head()` and `.tail()`

The `.head(n)` method returns the first n rows in the Series; n = 5 by default. This method returns a new Series with the same indexing as the original Series. 

The `.tail(n)` method returns the last 5 values. Increase or decrease your value for n to return more or less than 5 rows.

In [45]:
# The default is the first 5 rows.

c_series.head()

0       red
1    yellow
2     green
3      blue
4    orange
dtype: object

In [46]:
# Calling the .head() or .tail() methods on our Series returns another Series.

type(c_series.head())

pandas.core.series.Series

In [47]:
# Calling the .tail() method with our n = 2 returns a Series with the last two rows

c_series.tail(2)

6    violet
7    indigo
dtype: object

#### `.value_counts()`

The `.value_counts()` method returns a Series with the values from the original c_series as the index and values consisting of the counts of each unique value from the original c_series. This is an extremely useful method you will find yourself using often with Series' containing object and category data types. 

Below you can see the default settings for the method's parameters. In the [Chaining Methods](#chain) section, we will see more powerful uses for this method.

```python
series.value_counts(
    normalize=False,
    sort=True,
    ascending=False,
    bins=None,
    dropna=True,
)
```

In [48]:
# Default

c_series.value_counts()

red       2
blue      1
yellow    1
violet    1
indigo    1
orange    1
green     1
dtype: int64

In [49]:
# normalize=True returns the relative frequency of the unique values

c_series.value_counts(normalize=True)

red       0.250
blue      0.125
yellow    0.125
violet    0.125
indigo    0.125
orange    0.125
green     0.125
dtype: float64

In [51]:
# normalize=True and ascending=True to display the largest value last

c_series.value_counts(normalize=True, ascending=True)

green     0.125
orange    0.125
indigo    0.125
violet    0.125
yellow    0.125
blue      0.125
red       0.250
dtype: float64

#### `.isin()`

The `.isin()` method returns a Series of boolean values with the same index as the original Series with `True` indicating that the Series value at that index is in the list or set passed to the method and `False` indicating that it is not present in the list or set.

```python
series.isin(values)
```

In [53]:
# The values at index 0 and 5 are in my_colors list. I assigned the new Series to bools

my_colors = ['black', 'white', 'red']

bools = c_series.isin(my_colors)
bools

0     True
1    False
2    False
3    False
4    False
5     True
6    False
7    False
dtype: bool

**What if I want just the rows that return True for being in my_colors list?**

I can pass my `bools` Series as a selector for specific rows in my `c_series`, only rows that return True. This is Boolean Indexing and is a common way to filter data in a Series as well as in a DataFrame.

Think of `c_series[bools]` as translating to "return the rows in my Series where the value is True or **where my condition is True**." In this case, our condition for our `bools` Series is that the color is in our `my_colors` list. 

**So, "return the rows in `c_series` *where* the value is either black, white, or red."**

In [54]:
c_series[bools]

0    red
5    red
dtype: object

### Attributes

**Attributes** are like methods but instead of transforming the variable or data they are called on, they return useful information about the object. Jupyter Notebook allows you to quickly access a list of available attributes by pressing the tab key after the series name followed by a period or dot; this is called dot notation.

Here, we will look at some of the most commonly used attributes and methods for Series.

In [37]:
# Our Series of color name strings has the object data type

c_series.dtype

dtype('O')

In [39]:
# Our Series of numbers has the int64 data type

a_series.dtype

dtype('int64')