Let's import the Pandas library

In [1]:
import pandas as pd

Let's perform element-wise operations and broadcasting on Series

In [2]:
series1 = pd.Series([10, 20, 30])
series2 = pd.Series([1, 2, 3])

In [3]:
series1 + series2

0    11
1    22
2    33
dtype: int64

In [4]:
series1.add(series2)

0    11
1    22
2    33
dtype: int64

In [5]:
series1 * series2

0    10
1    40
2    90
dtype: int64

In [6]:
series1.mul(series2)

0    10
1    40
2    90
dtype: int64

In [7]:
series1 + 5

0    15
1    25
2    35
dtype: int64

In [8]:
series1.add(5)

0    15
1    25
2    35
dtype: int64

Let's imagine you have two devices which count the number of steps you take: a phone and a smart watch. You typically only use one at a time, and these devices are not synced with each other. You would like to tally the total steps that you walked per day.

In [9]:
phone_steps = pd.Series(data = [6000, 7400, 6750, 8000, 5500], index = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri'], name = 'Phone')
phone_steps

Mon    6000
Tue    7400
Wed    6750
Thu    8000
Fri    5500
Name: Phone, dtype: int64

In [10]:
smartwatch_steps = pd.Series(data = [5500, 4750, 6250], index = ['Mon', 'Wed', 'Fri'], name = 'Smart Watch')
smartwatch_steps

Mon    5500
Wed    4750
Fri    6250
Name: Smart Watch, dtype: int64

In [None]:
total_steps = phone_steps + smartwatch_steps
total_steps

Fri    11750.0
Mon    11500.0
Thu        NaN
Tue        NaN
Wed    11500.0
dtype: float64

In [15]:
phone_steps.add(smartwatch_steps, fill_value = 0)  # set the missing values to 0 before adding

Fri    11750.0
Mon    11500.0
Thu     8000.0
Tue     7400.0
Wed    11500.0
dtype: float64

In [16]:
phone_steps.add(smartwatch_steps, fill_value = 100)  # set the missing values to 100 before adding

Fri    11750.0
Mon    11500.0
Thu     8100.0
Tue     7500.0
Wed    11500.0
dtype: float64

We can fill missing values with `.fillna()`

In [17]:
total_steps.fillna(99999)

Fri    11750.0
Mon    11500.0
Thu    99999.0
Tue    99999.0
Wed    11500.0
dtype: float64

In [18]:
total_steps

Fri    11750.0
Mon    11500.0
Thu        NaN
Tue        NaN
Wed    11500.0
dtype: float64

We can check for the location of null values using the `.isna()` method

In [19]:
total_steps.isna()

Fri    False
Mon    False
Thu     True
Tue     True
Wed    False
dtype: bool

We can use the `inplace` parameter to perform an operation without having to reassign the Series to a variable; most methods on Series can use this parameter

In [20]:
tmp = total_steps.fillna(12345, inplace = True)  # returns nothing
tmp

In [21]:
type(tmp)

NoneType

In [22]:
total_steps

Fri    11750.0
Mon    11500.0
Thu    12345.0
Tue    12345.0
Wed    11500.0
dtype: float64

We can drop missing values in a Series with `.dropna()`

In [23]:
total_steps = phone_steps + smartwatch_steps
total_steps

Fri    11750.0
Mon    11500.0
Thu        NaN
Tue        NaN
Wed    11500.0
dtype: float64

In [24]:
total_steps.dropna()

Fri    11750.0
Mon    11500.0
Wed    11500.0
dtype: float64

In [25]:
total_steps

Fri    11750.0
Mon    11500.0
Thu        NaN
Tue        NaN
Wed    11500.0
dtype: float64

In [26]:
total_steps.dropna(inplace = True)

In [27]:
total_steps

Fri    11750.0
Mon    11500.0
Wed    11500.0
dtype: float64

We can perform mathematical operations on Series; most NumPy methods and functions work on Series

In [28]:
phone_steps

Mon    6000
Tue    7400
Wed    6750
Thu    8000
Fri    5500
Name: Phone, dtype: int64

In [29]:
phone_steps.sum()

np.int64(33650)

In [30]:
phone_steps.mean()

np.float64(6730.0)

In [31]:
phone_steps.count()  # counts non-null values

np.int64(5)

We can sort a Series in ascending or descending order, either by their index or by their values

In [32]:
total_steps = phone_steps.add(smartwatch_steps, fill_value = 0)
total_steps

Fri    11750.0
Mon    11500.0
Thu     8000.0
Tue     7400.0
Wed    11500.0
dtype: float64

In [30]:
# total_steps.sort()

In [33]:
total_steps.sort_values()  # not inplace

Tue     7400.0
Thu     8000.0
Mon    11500.0
Wed    11500.0
Fri    11750.0
dtype: float64

In [34]:
total_steps.sort_index()  # not inplace

Fri    11750.0
Mon    11500.0
Thu     8000.0
Tue     7400.0
Wed    11500.0
dtype: float64

In [35]:
total_steps.rank()  # ranks values in ascending order; if there's a tie, an average rank is returned; not inplace

Fri    5.0
Mon    3.5
Thu    2.0
Tue    1.0
Wed    3.5
dtype: float64

We can reset the index of a Series using `.reset_index()`

In [36]:
total_steps

Fri    11750.0
Mon    11500.0
Thu     8000.0
Tue     7400.0
Wed    11500.0
dtype: float64

In [37]:
what_is_this_structure = total_steps.reset_index()
what_is_this_structure

Unnamed: 0,index,0
0,Fri,11750.0
1,Mon,11500.0
2,Thu,8000.0
3,Tue,7400.0
4,Wed,11500.0


In [38]:
type(what_is_this_structure)

pandas.core.frame.DataFrame

In [37]:
# use the drop parameter to prevent .reset_index() from making a column with the old index
still_a_series = total_steps.reset_index(drop = True)
still_a_series

Unnamed: 0,0
0,11750.0
1,11500.0
2,8000.0
3,7400.0
4,11500.0


In [38]:
type(still_a_series)