# Lecture 3: Pandas [`Series`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html) 1

* How to create a `Series`
* How to access a `Series`' elements
* How to modify a `Series`
* `Series` data types

In [1]:
import pandas as pd

## How to Create a `Series`

Often you will work with `Series` that come directly from your data, but to practice we will create a `Series` from scratch.

In this case, we use `range(...)` to generate elements and we don't supply an *explicit index*:

In [2]:
s = pd.Series(range(6))
s

0    0
1    1
2    2
3    3
4    4
5    5
dtype: int64

```
0    0
1    1
2    2
3    3
4    4
5    5
dtype: int64

^
explicit index

     ^
values
```


Note how the index is now just a sequence of integers, starting at 0, as we're used to from Python lists. This is the *implicit index*, and since we did not specify an explicit index, *implicit index = explicit index*.

Now we do supply an *explicit index* `['a', 'b', 'c', 'd', 'e', 'f']` to the optional `index` argument.

In [3]:
s = pd.Series(range(6), index=list('abcdef'))
s

a    0
b    1
c    2
d    3
e    4
f    5
dtype: int64

Now we see the explicit index. Nonetheless, the implicit index still exists, and we'll see in a few seconds how to access it.

Of course:
* `list('abcdef') = ['a', 'b', 'c', 'd', 'e', 'f']`, as we learned in the previous two lectures;
* there must be as many explicit indices as there are elements.

We can access the NumPy array that a `Series` uses to store its data through the `.values` attribute:

In [4]:
s.values

array([0, 1, 2, 3, 4, 5], dtype=int64)

## Accessing a `Series`' Elements

### `Series[i]` and `Series[i:j]`
#### Explicit index
Single item:

In [5]:
s['c']

2

Slicing (includes `j`):

In [6]:
s['b':'c']

b    1
c    2
dtype: int64

Multiple items:

In [7]:
s[['a', 'c']]

a    0
c    2
dtype: int64

#### Implicit index

Single item:

In [8]:
s[2]

2

Slicing:

In [9]:
s[1:3]

b    1
c    2
dtype: int64

Multiple items:

In [10]:
s[[0, 2]]

a    0
c    2
dtype: int64

### `Series.loc[...]`

For explicit indices.

Single item:

In [11]:
s.loc['e']

4

Slicing (includes `j`):

In [12]:
s.loc['e':'f']

e    4
f    5
dtype: int64

When slicing, indices can be out of range:

In [13]:
s.loc['e':'g']

e    4
f    5
dtype: int64

In [14]:
s.loc[list('ef')]

e    4
f    5
dtype: int64

Implicit index does not work:

In [15]:
s.loc[1]

KeyError: 1

### `Series.iloc[...]`

For implicit indices.

Single item:

In [None]:
s.iloc[3]

Slicing:

In [16]:
s.iloc[3:10]

d    3
e    4
f    5
dtype: int64

In [17]:
s.iloc[[0, 4]]

a    0
e    4
dtype: int64

## Modifying a `Series`

We can use the same APIs to modify a `Series` or insert new elements.

This is what the `Series` looks like at the moment:

In [18]:
s

a    0
b    1
c    2
d    3
e    4
f    5
dtype: int64

Modify ("set") one element:

In [19]:
s['a'] = 10
s

a    10
b     1
c     2
d     3
e     4
f     5
dtype: int64

Set all elements:

In [20]:
s[:] = 10
s

a    10
b    10
c    10
d    10
e    10
f    10
dtype: int64

Set some elements using slicing:

In [21]:
s[:2] = 20
s

a    20
b    20
c    10
d    10
e    10
f    10
dtype: int64

Insert a new element:

In [22]:
s['g'] = 0
s

a    20
b    20
c    10
d    10
e    10
f    10
g     0
dtype: int64

Delete one element:

In [23]:
s = s.drop('d')
s

a    20
b    20
c    10
e    10
f    10
g     0
dtype: int64

Often, it is better to use masking for deletion, and we will see next how that works.

## `Series` Data Types

When we display a `Series` in JNB's, we're automatically shown its data type:

```dtype: int64```

This means, the entire `Series` has the same data type (or `dtype`), which is a 64-bit integer.

We can also check a `Series`' data type directly:

In [24]:
s.dtype

dtype('int64')

What happens if we add a string?

In [25]:
s['h'] = 'Hello!'
s

a        20
b        20
c        10
e        10
f        10
g         0
h    Hello!
dtype: object

Now, the `Series` is of type `object`. When a `Series` is of type `object`, it can store any data type known to Python, but it does so at a huge cost to memory and performance. Since performance matters most for numerical data (such as integers and floats), we therefore **never mix numerical data types with other data types**.

© 2023 Philipp Cornelius