# Lecture 3: Pandas [`Series`](https://pandas.pydata.org/pandas-docs/stable/reference/series.html) 2

* How to use masking

We begin with the Pandas import and the same `Series` as in the previous JNB:

In [1]:
import pandas as pd

In [2]:
s = pd.Series(range(6), index=list('abcdef'))
s

a    0
b    1
c    2
d    3
e    4
f    5
dtype: int64

## Series Read Access

### `Series[bitmask]`

As we can see, the masking operation by itself creates a `bool` `Series`:

In [3]:
s == 3

a    False
b    False
c    False
d     True
e    False
f    False
dtype: bool

In [4]:
type(s == 3)

pandas.core.series.Series

Now let's use it for selection:

In [5]:
s[s == 3]

d    3
dtype: int64

`Series[bitmask]` is the standard way to use a bitmask in `Series`.

You can use all standard comparison operators:
`==, !=, <, …`

### `Series.loc[bitmask]`

Masking also works on `.loc[]`:

In [6]:
s.loc[s == 3]

d    3
dtype: int64

### ~`Series.iloc[bitmask]`~

`.iloc[]` does not support masks

In [7]:
s.iloc[s == 3]

ValueError: iLocation based boolean indexing cannot use an indexable as a mask

## Modifying `Series` with Masks

Just like with all other indexing, we can use masks to modify existing data:

In [8]:
s[s < 3] = 0
s

a    0
b    0
c    0
d    3
e    4
f    5
dtype: int64

Usually when we want to delete a range of observations from a `Series`, we also use masking:

In [9]:
s = s[s < 3]
s

a    0
b    0
c    0
dtype: int64

In this operation, we simply overwrite the original `Series` `s` with the masked one. There are other approaches to delete data from a `Series`, but this is by far the fastest.

## Advanced Masking

Let's quickly re-create our original `Series`:

In [10]:
s = pd.Series(range(6), index=list('abcdef'))
s

a    0
b    1
c    2
d    3
e    4
f    5
dtype: int64

### Combining Conditions

Every condition that involves a comparison (e.g., <, ==) needs to be in brackets if combined with other conditions. For example:

In [11]:
s[(s < 3) | (s > 4)]

a    0
b    1
c    2
f    5
dtype: int64

You can use the following bitwise operators to combine conditions:
* And: `&`
* Or: `|`
* Xor: `^`
* Not (negation): `~`

(These are bitwise operators, similar to Java's bitwise operators, such as `|`.)

### Masking on Indices

Generally, for indices we can use normal indexing and slicing:

In [12]:
s[:3]

a    0
b    1
c    2
dtype: int64

In [13]:
s[3:5]

d    3
e    4
dtype: int64

In [14]:
s['c':'e']

c    2
d    3
e    4
dtype: int64

To filter on the explicit index, we can also use `Series.index` to get the `Index` object. Once we have the `Index` object, we can use our filter operations like before:

In [15]:
s[(s.index == 'a') | (s.index == 'e')]

a    0
e    4
dtype: int64

Of course, this is equivalent to just selecting multiple items with the index:

In [16]:
s[['a', 'e']]

a    0
e    4
dtype: int64

### [`Series.where()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.where.html)
Same functionality as masking, but instead of only returning matching rows, it returns all rows and replaces mismatches with `NaN` (missing value, from `numpy.nan`; only `float64`).

In [17]:
s.where(s == 3)

a    NaN
b    NaN
c    NaN
d    3.0
e    NaN
f    NaN
dtype: float64

© 2023 Philipp Cornelius