# Pandas Fun Problems - Final Boss

Welcome to the ultimate challenge in our Pandas series! This notebook is the pinnacle of our journey through data manipulation with Pandas, where you'll face the most demanding problems yet. Designed for those who seek to test their limits, this set of problems is dubbed the "Final Boss" for good reason.

Before attempting these challenges, make sure you have worked through the previous notebooks in this series. It is particularly crucial that you've completed the **[Pandas Fun Problems - Advanced](https://www.kaggle.com/code/matinmahmoudi/pandas-fun-problems-advanced/)** notebook. The advanced notebook covers sophisticated techniques and operations that will be essential for conquering the challenges ahead.

- **Practice Problems Source:** [Final Boss Problems](https://www.practiceprobs.com/problemsets/python-pandas/final-boss/)
- **YouTube Tutorial:** [Video Guide](https://www.youtube.com/watch?v=8xUgesdShE8)


As always, we encourage you to share your solutions, thoughts, and questions in the comments. Your engagement not only enriches your learning experience but also supports others on their path to mastering Pandas.

### **Found this notebook challenging and rewarding? Show your support with an upvote!**


# Q1 - COVID Tracing Problem

https://www.practiceprobs.com/problemsets/python-pandas/final-boss/covid-tracing/

You track the whereabouts of 100 individuals in a DataFrame called whereabouts. Each person has a corresponding list of place ids indicating the places they’ve visited in the recent week. You also track which places have been exposed to COVID-19 in a list called exposed 😷.


In [1]:
import numpy as np
import pandas as pd

# exposed places
exposed = [0,5,9]

# whereabouts of each person
generator = np.random.default_rng(2468)
Nplaces = 10
Npersons = 10
place_ids = np.arange(Nplaces)
visits = generator.choice(place_ids, size=3*Nplaces, replace=True)
split_idxs = np.sort(generator.choice(len(visits), size=9, replace=True))
whereabouts = pd.DataFrame({
    'person_id': range(Npersons),
    'places': [np.unique(x).tolist() for x in np.array_split(visits, split_idxs)]
})

print(whereabouts)
#    person_id              places
# 0          0        [3, 4, 5, 6]
# 1          1                  []
# 2          2                 [3]
# 3          3           [6, 8, 9]
# 4          4                 [3]
# 5          5  [0, 2, 5, 6, 7, 8]
# 6          6              [2, 7]
# 7          7        [0, 5, 8, 9]
# 8          8           [2, 7, 9]
# 9          9        [0, 5, 8, 9]

   person_id              places
0          0        [3, 4, 5, 6]
1          1                  []
2          2                 [3]
3          3           [6, 8, 9]
4          4                 [3]
5          5  [0, 2, 5, 6, 7, 8]
6          6              [2, 7]
7          7        [0, 5, 8, 9]
8          8           [2, 7, 9]
9          9        [0, 5, 8, 9]


For each person, identify the places they visited which have been exposed. Make this a new list-column in whereabouts called exposures.



## Solution 1

In [2]:
whereabouts['exposures'] = whereabouts['places'].apply(lambda x: list(set(x) & set(exposed)))
whereabouts

Unnamed: 0,person_id,places,exposures
0,0,"[3, 4, 5, 6]",[5]
1,1,[],[]
2,2,[3],[]
3,3,"[6, 8, 9]",[9]
4,4,[3],[]
5,5,"[0, 2, 5, 6, 7, 8]","[0, 5]"
6,6,"[2, 7]",[]
7,7,"[0, 5, 8, 9]","[0, 9, 5]"
8,8,"[2, 7, 9]",[9]
9,9,"[0, 5, 8, 9]","[0, 9, 5]"


## Solution 2

In [3]:
whereabouts['exposures'] = [list(set(x) & set(exposed)) for x in whereabouts['places']]
whereabouts

Unnamed: 0,person_id,places,exposures
0,0,"[3, 4, 5, 6]",[5]
1,1,[],[]
2,2,[3],[]
3,3,"[6, 8, 9]",[9]
4,4,[3],[]
5,5,"[0, 2, 5, 6, 7, 8]","[0, 5]"
6,6,"[2, 7]",[]
7,7,"[0, 5, 8, 9]","[0, 9, 5]"
8,8,"[2, 7, 9]",[9]
9,9,"[0, 5, 8, 9]","[0, 9, 5]"


# Q2 - Pickle Problem

Given a Series called pickle, replace NaNs using the following algorithm.


In [4]:
'''
for each NaN:
  get the nearest non NaN value before and after it
  if both of those values exist:
    replace NaN with the minimum of those two non NaN values
  else:
    replace NaN with the nearest non NaN value
'''

'\nfor each NaN:\n  get the nearest non NaN value before and after it\n  if both of those values exist:\n    replace NaN with the minimum of those two non NaN values\n  else:\n    replace NaN with the nearest non NaN value\n'

In [5]:
import numpy as np
import pandas as pd

pickle = pd.Series([1.5, np.nan, 2.3, np.nan, np.nan, -3.9, np.nan, 4.5, np.nan, np.nan, np.nan, 1.9, np.nan])

print(pickle)
# 0     1.5
# 1     NaN
# 2     2.3
# 3     NaN
# 4     NaN
# 5    -3.9
# 6     NaN
# 7     4.5
# 8     NaN
# 9     NaN
# 10    NaN
# 11    1.9
# 12    NaN
# dtype: float64

0     1.5
1     NaN
2     2.3
3     NaN
4     NaN
5    -3.9
6     NaN
7     4.5
8     NaN
9     NaN
10    NaN
11    1.9
12    NaN
dtype: float64


## Solution 

In [6]:
def fillna_custom(series):
    forward_filled = series.fillna(method='ffill')
    backward_filled = series.fillna(method='bfill')
    min_filled = pd.concat([forward_filled, backward_filled], axis=1).min(axis=1)
    return min_filled

pickle_filled = fillna_custom(pickle)
print(pickle_filled)

0     1.5
1     1.5
2     2.3
3    -3.9
4    -3.9
5    -3.9
6    -3.9
7     4.5
8     1.9
9     1.9
10    1.9
11    1.9
12    1.9
dtype: float64


  forward_filled = series.fillna(method='ffill')
  backward_filled = series.fillna(method='bfill')
