# Series Introduction

A Series is used to model one-dimensional data. The Series object also has a few more bits of data, including an index and a name. A common idea through pandas is the notion of an axis. Because a series is one-dimensional, it has a single axis—the index.

## Conceptualizing a Series in Plain Python

| artist_id | data |
| --- | --- |
| 0 | 145 |
| 1 | 142 |
| 2 |  38 |
| 3 |  13 |


In [1]:
series = {
    'index':[0, 1, 2, 3],
    'data': [145, 142, 38, 13],
    'name': 'songs'
}

In [2]:
series

{'index': [0, 1, 2, 3], 'data': [145, 142, 38, 13], 'name': 'songs'}

The get function defined below can pull items out of this data structure based on the index:

In [3]:
def get(series, idx):
    value_idx = series['index'].index(idx)
    return series['data'][value_idx]

In [4]:
get(series, 1)

142

## 4.1 The index abstraction

This double abstraction of the index seems unnecessary at first glance—a list already has integer indexes. But there is a trick up pandas’ sleeves. By allowing non-integer values, the data structure supports other index types such as strings, dates, as well as arbitrarily ordered indices, or even duplicate index values.

Below is an example that has string values for the index:

In [5]:
songs = {
    'index': ['Paul', 'John', 'George', 'Ringo'],
    'data':[145, 142, 38, 13],
    'name':'counts'
}

get(songs, 'John')

142

Many of the operations performed on a Series operate directly on the index or by index lookup.

## The pandas Series

With that background in mind, let’s look at how to create a Series in pandas. It is easy to create a
Series object from a list:

In [6]:
import pandas as pd

In [7]:
songs2 = pd.Series([145, 142, 38, 13], name='counts')

In [8]:
songs2

0    145
1    142
2     38
3     13
Name: counts, dtype: int64

It is easy to inspect the index of a series (or data frame), as it is an attribute of the object:

In [9]:
songs2.index

RangeIndex(start=0, stop=4, step=1)

A series index can string-based as well:

In [10]:
songs3 = pd.Series(
    [145, 142, 38, 13],
    name='counts',
    index=['Paul', 'John', 'George', 'Ringo']
)

In [11]:
songs3

Paul      145
John      142
George     38
Ringo      13
Name: counts, dtype: int64

In [12]:
songs3.index

Index(['Paul', 'John', 'George', 'Ringo'], dtype='object')

The actual data (or values) for a series does not have to be numeric or homogeneous. We can insert Python objects into a series:

In [13]:
class Foo:
    pass

ringo = pd.Series(
    ['Richard', 'Starkey', 13, Foo()],
    name='ringo'
)

In [14]:
ringo

0                                 Richard
1                                 Starkey
2                                      13
3    <__main__.Foo object at 0x12ca78dc0>
Name: ringo, dtype: object

Side note: if time data is a type object, we're probably working with string data. We'll need to convert. To be shown later.

## 4.3 The NaN value
A value that may be familiar to NumPy users, but not Python users in general, is NaN. When pandas determines that a series holds numeric values but cannot find a number to represent an entry, it will use NaN. This value stands for Not A Number and is usually ignored in arithmetic operations. (Similar to NULL in SQL).

In [15]:
import numpy as np
nan_series = pd.Series(
    [2, np.nan],
    index=['Ono', 'Clapton']
)

In [16]:
nan_series

Ono        2.0
Clapton    NaN
dtype: float64

You'll notice that Pandas will ignore `NaN` at times. One example is the `count()` method.

In [17]:
nan_series.count()

1

You can determine the number of missing values quickly by using the `size()` method.

In [18]:
nan_series.size

2

## 4.4 Optional Integer Support for NaN
If you really want to have a nullable integer type:

In [19]:
nan_series2 = pd.Series(
    [2, None],
    index=['Ono', 'Clapton'],
    dtype='Int64'               # the magic sauce of nullable integer type
)

In [20]:
nan_series2

Ono           2
Clapton    <NA>
dtype: Int64

In [21]:
nan_series2.count()

1

You can also convert columns to nullable integers types after the fact:

In [22]:
nan_series3 = nan_series.astype('Int64')

In [23]:
nan_series3

Ono           2
Clapton    <NA>
dtype: Int64

Generally though, ignore 'Int64' as it's better to clean up missing data. Also, when you ingest data in pandas, most functions use 'int64' (in lowercase) by default.

## 4.5 Similar to NumPy

The Series object behaves similarly to a NumPy array. As shown below, both types respond to index operations:

In [24]:
numpy_ser = np.array([145, 142, 38, 13])

In [25]:
songs3[1]

142

In [26]:
numpy_ser[1]

142

There are also methods in common:

In [27]:
songs3.mean()

84.5

In [28]:
numpy_ser.mean()

84.5

Boolean arrays are also a possibility in both numpy arrays and pandas series:

In [29]:
mask = songs3 > songs3.median()
mask

Paul       True
John       True
George    False
Ringo     False
Name: counts, dtype: bool

In [30]:
songs3[mask]    # Passing the mask into an index operation with pandas

Paul    145
John    142
Name: counts, dtype: int64

In [31]:
numpy_ser[numpy_ser > np.median(numpy_ser)]    # You need to supply median via the numpy namespace

array([145, 142])

## 4.6 Categorical Data

When you load data, you can indicate that the data is categorical. If we know that our data is limited to a few values; we might want to use categorical data. Categorical values have a few benefits:
• Use less memory than strings
• Improve performance
• Can have an ordering
• Can perform operations on categories 
• Enforce membership on values

There's no reason why categorical data _has_ to be string-based.

Anyway, here's how you might designate series values as categorical:

In [32]:
s = pd.Series(['m', 'l', 'xs', 's', 'xl'], dtype='category')

In [33]:
s

0     m
1     l
2    xs
3     s
4    xl
dtype: category
Categories (5, object): ['l', 'm', 's', 'xl', 'xs']

In [34]:
 # Notice that categories don't have ordering by default
s.cat.ordered

False

To convert a non-categorical series to an ordered category, we can create a type with the CategoricalDtype constructor and the appropriate parameters. Then we pass this type into the .astype method:

In [35]:
s2 = pd.Series(['m', 'l', 'xs', 's', 'xl'])
size_type = pd.api.types.CategoricalDtype(
    categories=['s', 'm', 'l'], ordered=True
)

In [36]:
s2

0     m
1     l
2    xs
3     s
4    xl
dtype: object

In [37]:
s3 = s2.astype(size_type)

In [38]:
s3

0      m
1      l
2    NaN
3      s
4    NaN
dtype: category
Categories (3, object): ['s' < 'm' < 'l']

In [39]:
s3 > 's'

0     True
1     True
2    False
3    False
4    False
dtype: bool

In [40]:
s.cat.reorder_categories(['xs','s','m','l', 'xl'], ordered=True)

0     m
1     l
2    xs
3     s
4    xl
dtype: category
Categories (5, object): ['xs' < 's' < 'm' < 'l' < 'xl']

## 4.7 Summary

The Series object is a one-dimensional data structure. It can hold numerical data, time data, strings, or arbitrary Python objects. If you are dealing with numeric data, using pandas rather than a Python list will benefit you. Pandas is faster, consumes less memory, and comes with built- in methods that are very useful to manipulate the data. Also, the index abstraction allows for accessing values by position or label. A Series can also have empty values and has some similarities to NumPy arrays. It is the primary workhorse of pandas; mastering it will pay dividends.

## 4.8 Exercises

Using Jupyter, create a series with the temperature values for the last seven days. Filter out the values below the mean.

In [41]:
temps = pd.Series([61, 54, 48, 69, 68])

In [42]:
temps

0    61
1    54
2    48
3    69
4    68
dtype: int64

In [43]:
mask = temps > temps.mean()

In [44]:
temps[mask]

0    61
3    69
4    68
dtype: int64

In [45]:
temps.mean()

60.0

 Using Jupyter, create a series with your favorite colors. Use a categorical type.

In [46]:
colors = pd.Series(['blue', 'red', 'purple'], dtype='category')

In [47]:
colors

0      blue
1       red
2    purple
dtype: category
Categories (3, object): ['blue', 'purple', 'red']