# Working with Pandas Series

The *Pandas* package offers two key data structures that are optimised for data analysis and manipulation: *Series* and *DataFrame*. In this notebook we will start off by looking at the *Series*, which is a one-dimensional structure holding data of any type.

To start off, we import the Pandas package. We can import it as *pd* for shorthand.

In [None]:
import pandas as pd

## Creating Pandas Series

To create a new series, we can use the `Series`function. The simplest (but least useful) approach is to pass in a Python list. By default, the Series will have a numeric index, counting from 0.

In [None]:
values = [2, 101, 45, 232, 45, 67]
# create the Series
s1 = pd.Series(values)
s1

In [None]:
# how many values in the series?
len(s1)

We can also explicitly pass a list of index labels to the *Series()* function to use a more useful index (in this case strings containing country names). Note the number of values and labels must match.

In [None]:
life_exp_values = [75.77, 82.09, 73.12, 80.99]
countries = ["Argentina", "Australia", "Brazil", "Canada"]
# create the Series
s2 = pd.Series(life_exp_values, countries)
s2

In [None]:
# how many values in the series?
len(s2)

This use of index labels very similar to a Python dictionary. In fact we can create a Pandas Series directly from a Python dictionary:

In [None]:
d_life_exp = {"Argentina": 75.77, "Australia": 82.09, "Brazil": 73.12, "Canada": 80.99}
s3 = pd.Series(d_life_exp)
s3

Let's create a Series with a larger number of values: 

In [None]:
values = [75.77, 82.09, 73.12, 80.99, 49.81, 74.87, 70.48, 80.24, 80.15, 
          84.36, 75.05, 80.67, 55.13, 51.3, 76.99, 80.68, 83.23, 83.49, 82.5, 80.09, 78.51]
labels = ["Argentina", "Australia", "Brazil", "Canada", "Chad", "China", "Egypt", "Germany", "Ireland", 
          "Japan", "Mexico", "New Zealand", "Niger", "Nigeria", "Paraguay", "Portugal", "South Korea", 
          "Spain", "Switzerland", "United Kingdom", "United States"]

In [None]:
life_exp = pd.Series(values, labels)
len(life_exp)

We can display the first *n* values in the Series by calling the associated *head()* function:

In [None]:
# show the first 10 values
life_exp.head(10)

A Series has an associated `index` attribute, which allows us to access the index values alone:

In [None]:
life_exp.index

We can use the Python *in* operator to check whether or not a particular index exists in a Series:

In [None]:
"Canada" in life_exp.index

In [None]:
"France" in life_exp.index

## Accessing Values by Position

A Pandas Series offers a number of different ways to access values. We can use simple position numbers like with standard Python lists, counting from 0:

In [None]:
life_exp.iloc[0]

In [None]:
life_exp.iloc[4]

We can use negative indexing to count from the last position backwards:

In [None]:
# get the last value in the Series
life_exp.iloc[-1]

In [None]:
# get the third last value
life_exp.iloc[-3]

Just like lists, we can also using slicing via the `[i:j]` operation. Remember this includes the elements from position *i* up to but not including position *j*: 

In [None]:
# start at position 0, end before position 2
life_exp[0:2]

In [None]:
# start at position 3, end before position 7
life_exp[3:7]

In [None]:
# start at the beginning of the Series, end before position 5
life_exp[:5]

In [None]:
# start at position 8, go to the end of the Series
life_exp[8:]

To access values by position in a Series, we can also use the `iloc[]` operator. This can be useful when we want to explicitly distinguish between positions and numeric index labels.

In [None]:
# get the value at position 3
life_exp.iloc[3]

In [None]:
# start at position 3, end before position 7
life_exp.iloc[3:7]

We can return multiple specific values by passing in a list of numeric positions to `iloc[]`:

In [None]:
life_exp.iloc[[1, 3, 5, 7]]

In [None]:
life_exp.iloc[[8, 11, 3, 14, 0]]

## Accessing Values by Index Label

We can also access values by their associated index labels defined at creation using the `loc[]` operator:

In [None]:
life_exp.loc["Ireland"]

In [None]:
life_exp.loc["Japan"]

We can return multiple values by passing in a list of index labels to *loc[]*:

In [None]:
life_exp.loc[["Ireland", "Germany", "United Kingdom"]]

In [None]:
life_exp.loc[["Japan", "United Kingdom", "China", "Australia"]]

## Applying Conditions to Series

We might want to filter the values in a Pandas Series, to reduce it to a subset of the original values based on some condition applied to the values. We can do this by indexing with a boolean expression.

In [None]:
# check which values match the specified condition
life_exp > 80

To actually apply the filter to the Series, we use the `loc` operator:

In [None]:
# create a new series, with only values > 80
life_exp.loc[life_exp > 80]

In [None]:
# create a new series, with only values <= 80
life_exp.loc[life_exp <= 80]

We can combine several different conditions using a boolean operator like AND (indicated by the character `&`) or OR ( indicated by the character `|`). Note that each condition is surrounded in parentheses:

In [None]:
# values > 75 AND < 80
life_exp.loc[(life_exp > 75) & (life_exp < 80)]

In [None]:
# values < 70 OR values > 80
life_exp.loc[(life_exp < 70) | (life_exp > 83)]

## Modifying a Series

The easiest way to change elements in an existing Pandas Series is to use the index label and `loc`:

In [None]:
# modify the existing value
life_exp.loc["Chad"] = 50.3

In [None]:
# check the values has changed
life_exp["Chad"]

We can also use position numbers to modify elements, via the *iloc[]* operator:

In [None]:
life_exp.iloc[0] = 75.90
life_exp.head()

We can also add an additional element to the Series, by just assigning a value to a label:

In [None]:
life_exp["Norway"] = 82.91
life_exp

##  Series Statistics

A Series has associated functions for a range of simple statisticsl analyses.

In [None]:
# average of the values
life_exp.mean()

In [None]:
# median of the values (the middle value)
life_exp.median()

In [None]:
# the standard deviation of the values (the spread)
life_exp.std()

In [None]:
# range of the values (the minimum and maximum)
life_exp.min(), life_exp.max()

The associated `describe` function gives a useful statistical summary of a Series:

In [None]:
life_exp.describe()

## Sorting Series

To sort a Series, we call its associated `sort_values` function. Note this creates a copy of the original Series.

In [None]:
# sort lowest to highest
life_exp.sort_values()

By default values are ordered in ascending order. We can sort in descending order, by specifying the argument `ascending=False`:

In [None]:
# sort highest to lowest
life_exp.sort_values(ascending=False)

We can get the top-10 highest values by combining `sort_values` and `head`:

In [None]:
life_exp.sort_values(ascending=False).head(10)

We can also sort a Series based on its index labels, by calling `sort_index`:

In [None]:
life_exp.sort_index()