# Series Introduction


In [53]:
import pandas as pd
import random

print(pd.__version__)


1.5.1


In [66]:
def generate_random_series(length):
    return pd.Series(data=[random.randint(0, 100) for _ in range(length)])


In [2]:
list_of_names = ['a', 'b', 'c']


Series can be created from a list


In [3]:
pd.Series(list_of_names)


0    a
1    b
2    c
dtype: object

> Notice how different datatypes can be added in a Series


In [4]:
mixed = [True, 'true', 11, 11.332, {'key': 'value'}]
print(pd.Series(mixed))


0                True
1                true
2                  11
3              11.332
4    {'key': 'value'}
dtype: object


## Using lists and dictionaries to create a series


In [5]:
import random
list_1 = [random.randint(0, 100) for _ in range(0, 5)]
list_series = pd.Series(list_1)

dict_1 = {random.randint(0, 100): random.randint(0, 100) for _ in range(0, 5)}
dict_series = pd.Series(dict_1)

print(f"list_series: \n{list_series}\n")
print(f"dict_series: \n{dict_series}\n")


list_series: 
0    92
1    47
2    23
3    19
4    17
dtype: int64

dict_series: 
15    86
80    79
19    56
18    90
36    57
dtype: int64



## `dtype` attribute

- Usually the type inference in Pandas is good enough, no need to specify this attribute.

> If a list of strings is passed in, the `dtype` would be inferrered as _object_.

## Index

We can specify the index instead of the default indexing in a series.


In [6]:
index_pd = pd.Series(data=list_1, index=[
                     "one", "two", "three", "four", "five"])
print(index_pd.index, type(index_pd.index), sep="\n")


Index(['one', 'two', 'three', 'four', 'five'], dtype='object')
<class 'pandas.core.indexes.base.Index'>


In [7]:
customIndex = pd.RangeIndex(start=0, stop=10, step=2)
print(customIndex, type(customIndex))


RangeIndex(start=0, stop=10, step=2) <class 'pandas.core.indexes.range.RangeIndex'>


In [8]:
customIndexSeries = pd.Series(data=[i for i in range(0, 5)], index=customIndex)
print(customIndexSeries)


0    0
2    1
4    2
6    3
8    4
dtype: int64


In [9]:
list_names = ["pota", "django", "kitty", "mikki"]
list_ages = [random.randint(5, 10) for _ in range(4)]


In [10]:
series_names = pd.Series(data=list_names, index=list_ages, name="Cats")
print(series_names)


9       pota
10    django
8      kitty
6      mikki
Name: Cats, dtype: object


## `head()` and `tail()` methods


In [11]:
series_names.head(n=1), series_names.tail(n=1)


(9    pota
 Name: Cats, dtype: object,
 6    mikki
 Name: Cats, dtype: object)

## `size` to get the length of the series

We can display a specific number of rows by setting,
`pd.options.display.min_rows = 40`

This would display 40 rows minimum.


In [12]:
series_names.size


4

## Extracting by index position

Elements in a series can be accessed by the index the same way we access a list.


In [19]:
from string import ascii_lowercase, ascii_uppercase

letters_lowercase = pd.Series(list(ascii_lowercase))
print(letters_lowercase.head(5))


0    a
1    b
2    c
3    d
4    e
dtype: object


In [24]:
labeled_letters = pd.Series(
    data=list(ascii_uppercase), index=letters_lowercase)
print(labeled_letters.head(4))


a    A
b    B
c    C
d    D
dtype: object


In [36]:
# first letter
print(labeled_letters[0] == labeled_letters.get(
    key='a') == labeled_letters['a'])

# 11th letter
print(labeled_letters[10] == labeled_letters.get(
    key='k') == labeled_letters['k'])

# first 3 letters
# notice how the labeled access includes the label
print(labeled_letters[:2], labeled_letters[:'c'])

# last 6 letters
print(labeled_letters['u':])


True
True
a    A
b    B
dtype: object a    A
b    B
c    C
dtype: object
u    U
v    V
w    W
x    X
y    Y
z    Z
dtype: object


## `add_prefix()` and `add_suffix()`

These methods can be used to add a prefix and suffix to the _labels_ of the series/dataframe.

> These are not inplace operations and create a new copy


In [39]:
print(letters_lowercase.add_prefix('index_').head(3))
print(letters_lowercase.head(3))


index_0    a
index_1    b
index_2    c
dtype: object
0    a
1    b
2    c
dtype: object


## `loc` and `iloc`

- `loc` can be used locate the elements in the Series/Dataframe. It is _label based_ which means we need to specify the rows and columns based on the labels.

- `iloc` is _integer based_ so we need to specify the **integer position values**.

- Syntax for `loc` - `loc[row_label, col_label]`
- Syntax for `iloc` - `iloc[row_position, col_position]`

Example - `labeled_letters.loc['d'] == labeled_letters.iloc[3]` _(returns True)_

### Boolean mask

- This is used with `loc` and `iloc` to select items at a scale
- They need to be of the same length as the series

- They can be used with a callable, slice and passing through list of values.


In [67]:
random_mask = [True if random.randint(0, 2) == 0 else False for _ in range(26)]
print(letters_lowercase.loc[random_mask])


0     a
2     c
3     d
4     e
7     h
8     i
13    n
16    q
17    r
22    w
24    y
dtype: object


In [80]:
labeled_letters.loc['d'] == labeled_letters.iloc[3]


True

### Passing a list of values to `loc` and `iloc`


In [84]:
print(labeled_letters.loc[['a', 'b', 'f']],
      labeled_letters.iloc[[3, 5, 13]], sep="\n")


a    A
b    B
f    F
dtype: object
d    D
f    F
n    N
dtype: object


### Selecting using a slice


In [87]:
print(labeled_letters.loc['a':'f'], labeled_letters.iloc[2:4], sep="\n")


a    A
b    B
c    C
d    D
e    E
f    F
dtype: object
c    C
d    D
dtype: object


> `loc` and `iloc` are interchanable when using 0-based labels

### Using a callable


In [91]:
labeled_letters.loc[lambda x: [True if i <
                               10 else False for i in range(x.size)]]


a    A
b    B
c    C
d    D
e    E
f    F
g    G
h    H
i    I
j    J
dtype: object

## Selection of data in a nutshell

### Selection by label

| **Approach**              | **Example**                      | **Comments**                                                                       |
| ------------------------- | -------------------------------- | ---------------------------------------------------------------------------------- |
| indexing                  | `series['label_name']`           | slices, callables, boolean masks                                                   |
| `loc`                     | `series.loc['label']`            | slices, callables, boolean masks                                                   |
| direct access using _dot_ | `series.label`                   | no slice of boolean mask support                                                   |
| `.get()`                  | `series.get('label', default=0)` | no slice support, provides default value, can exit gracefully if label not present |

### Selection by Position

| **Approach** | **Example**                      | **Comments**                                                                       |
| ------------ | -------------------------------- | ---------------------------------------------------------------------------------- |
| indexing     | `series[index]`                      | slices, callables, boolean masks                                                   |
| `iloc`       | `series.iloc[index]`               | slices, callables, boolean masks                                                   |
| `.get()`     | `series.get(index, default=0)` | no slice support, provides default value, can exit gracefully if label not present |


In [97]:
# exercise
squares = pd.Series(data=[i**2 for i in range(100)])
print(squares.tail(3), squares[-3:], squares[-3:] == squares.tail(3), sep="\n")

97    9409
98    9604
99    9801
dtype: int64
97    9409
98    9604
99    9801
dtype: int64
97    True
98    True
99    True
dtype: bool
