# What to watch after Great British Bake Off?

I've run out of Great British Bake Off seasons, and I need more of that wholesome, tasty, British reality TV. What should I watch?

In [None]:
import numpy as np
import pandas as pd

In [None]:
content = pd.read_csv('/kaggle/input/netflix-shows/netflix_titles.csv')
content.shape

## What fields are missing?
It looks like the director and cast are missing from the most entries. This could be important because the director and cast have a large influence on the overall feel of the movie, and the same director produces movies and TV with the same 'feel'. Also, your favorite actor might be one of the major reasons that you pick a show or movie.

We can't quite remove items missing these fields because we'd be cutting down our dataset by about a third.

In [None]:
# find fields with many na values
content.isna().sum().sort_values(ascending=False)

In [None]:
# don't remove anything for which we don't have a director or cast
# content = content.dropna(how='any', subset=['director', 'cast'])

## What data do we have for the Great British Bake Off?

Let's get some keywords, and important people on the show.

In [None]:
# what gbbo is available on netflix
content.title = content.title.str.lower()
gbbo = content[content.title == 'the great british baking show'].iloc[0]
gbbo

In [None]:
# who is involved in the show
people = gbbo.cast.split(',')
people = people + [
    'Noel Fielding',
    'Matt Lucas',
    'Sandi Toksvig',
    'Prue Leith',
    gbbo.director
]

people = pd.Series(list(map(str.strip, people)))
people

In [None]:
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem.snowball import SnowballStemmer

stop = stopwords.words('english')
stemmer = SnowballStemmer("english")

def remove_stop_words(words):
    return list(stemmer.stem(word) for word in words if word.lower() not in stop and word.isalpha())

def get_keywords(item):
    keywords = word_tokenize(item.description.lower())
    keywords.extend(word_tokenize(item.title.lower()))
    keywords = remove_stop_words(keywords)
    return pd.Series(keywords).unique()

# grab the keywords
keywords = pd.Series(get_keywords(gbbo))
keywords

## Finding similar shows

In [None]:
def is_similar(item, matches=3):
    """Find items which are similar to gbbo."""
    similar_keywords = set(get_keywords(item)) & set(keywords)
    return len(similar_keywords) >= matches

content[content.apply(lambda x: is_similar(x, matches=3), axis=1)]

Not bad for such a simple search!