In [1]:
import pandas as pd

df = pd.DataFrame({'popularity': [50, 60, 70, 80, 90]})
df

Unnamed: 0,popularity
0,50
1,60
2,70
3,80
4,90


### 1. expanding() → Expanding Window Calculations

 Think of it as a growing window — it includes all rows from the start up to the current row, and applies an aggregation (like .mean(), .sum(), etc.).

In [13]:
df['exp_mean'] = df['popularity'].expanding().mean()
df['exp_sum'] = df['popularity'].expanding().sum()

df['exp_mean']

0    50.0
1    55.0
2    60.0
3    65.0
4    70.0
Name: exp_mean, dtype: float64

In [14]:
df['exp_sum']

0     50.0
1    110.0
2    180.0
3    260.0
4    350.0
Name: exp_sum, dtype: float64

Interpretation:

At row 3, exp_mean = average of [50,60,70,80] = 65.

At row 3, exp_sum = sum of [50,60,70,80] = 260.

#### How expanding() Works

expanding() always starts at the first row.

At each step, it includes all previous rows up to the current one.

That means every row has a bigger “window” of values than the row above it.

In [24]:
import pandas as pd

df = pd.DataFrame({
    'song': ['A', 'B', 'C', 'D', 'E'],
    'popularity': [50, 60, 70, 80, 90]
})

df

Unnamed: 0,song,popularity
0,A,50
1,B,60
2,C,70
3,D,80
4,E,90


In [26]:
df['exp_mean'] = df['popularity'].expanding().mean()
df['exp_sum'] = df['popularity'].expanding().sum()

df['exp_mean']

0    50.0
1    55.0
2    60.0
3    65.0
4    70.0
Name: exp_mean, dtype: float64

In [27]:
df['exp_sum']

0     50.0
1    110.0
2    180.0
3    260.0
4    350.0
Name: exp_sum, dtype: float64

 What Happened Step by Step?

Row 0 (A) → Only [50] → mean = 50, sum = 50

Row 1 (B) → Values [50,60] → mean = 55, sum = 110

Row 2 (C) → Values [50,60,70] → mean = 60, sum = 180

Row 3 (D) → Values [50,60,70,80] → mean = 65, sum = 260

Row 4 (E) → Values [50,60,70,80,90] → mean = 70, sum = 350

👉 Each row’s expanding calculation includes all the rows above + itself.

#### 2. interpolate() → Fill Missing Values

👉 Smart way to fill in missing values by estimating them.
You can use linear interpolation, time-based, polynomial, etc.

In [29]:
df = pd.DataFrame({'year': [2018, 2019, 2020, 2021],
                   'popularity': [80, None, None, 95]})

df

Unnamed: 0,year,popularity
0,2018,80.0
1,2019,
2,2020,
3,2021,95.0


In [30]:
df_popularity_linear = df['popularity'].interpolate(method='linear')
df_popularity_linear

0    80.0
1    85.0
2    90.0
3    95.0
Name: popularity, dtype: float64