<div class="alert alert-info">
    
‚û°Ô∏è Before you start, make sure that you are familiar with the **[study guide](https://liu-nlp.ai/text-mining/logistics/)**, in particular the rules around **cheating and plagiarism** (found in the course memo).

‚û°Ô∏è If you use code from external sources (e.g. StackOverflow, ChatGPT, ...) as part of your solutions, don't forget to add a reference to these source(s) (for example as a comment above your code).

‚û°Ô∏è Make sure you fill in all cells that say **`YOUR CODE HERE`** or **YOUR ANSWER HERE**.  You normally shouldn't need to modify any of the other cells.

</div>

# L1: Information Retrieval

In this lab you will apply basic techniques from information retrieval to implement the core of a minimalistic search engine. The data for this lab consists of a collection of app descriptions scraped from the [Google Play Store](https://play.google.com/store/apps?hl=en). From this collection, your search engine should retrieve those apps whose descriptions best match a given query under the vector space model.

In [1]:
# Define some helper functions that are used in this notebook
from IPython.display import display, HTML

def success():
    display(HTML('<div class="alert alert-success"><strong>Checks have passed!</strong></div>'))

## Dataset

The app descriptions come in the form of a compressed [JSON](https://en.wikipedia.org/wiki/JSON) file. Start by loading this file into a [Pandas](https://pandas.pydata.org) [DataFrame](https://pandas.pydata.org/pandas-docs/stable/getting_started/dsintro.html#dataframe).

In [2]:
import bz2
import numpy as np
import pandas as pd
pd.set_option('display.max_colwidth', 500)

with bz2.open('app-descriptions.json.bz2', mode='rt', encoding='utf-8') as source:
    df = pd.read_json(source, encoding='utf-8')

In Pandas, a DataFrame is a table with indexed rows and labelled columns of potentially different types. You can access data in a DataFrame in various ways, including by row and column. To give an example, the code in the next cell shows rows 200‚Äì204:

In [3]:
df.loc[200:205]

Unnamed: 0,name,description
200,Brick Breaker Star: Space King,"Introducing the best Brick Breaker game that everyone can enjoy.\nEnjoy various missions and addictively simple play control.\n\n[Features]\n- Hundreds of stages and various missions\n- No limit to play such as Heart, play as much as you can!\n- 5 kinds of various items and items reinforcement system\n- No network required\n- game file is as low as 20M, light-weight download!\n- supports tablet screen\n- supports Google Play Leaderboards, Achievement, Multiplay\n- supports 14 languages\n\nHo..."
201,Brick Classic - Brick Game,"Classic Brick Game!\n\nBrick Classic is a popular and addictive puzzle game!\n\nHow to play?\n- Simply drag the bricks to move them.\n- Create full lines on the grid vertically or horizontally to break bricks.\n\nTips:\n- Classic brick game without time limits.\n- Place the bricks in a reasonable position.\n- The more brick break, the more scores you have.\n- Bricks can't be rotated.\n\nWho's the best brick breaker? Challenge it now!!!"
202,Bricks Breaker - Glow Balls,"Bricks Breaker - Glow Balls is a addictive and challenging brick game.\nJust play it to relax your brain. Be focus on breaking bricks and you will find it more funny and exciting.\n\nHow to play\n- Hold the screen with your finger and move to aim.\n- Find best positions and angles to hit all bricks.\n- When the durability of brick reaches 0, destroyed.\n- Never let bricks reach the bottom or game is over.\n\nFeatures\n- Colorful glow skins.\n- Free to play.\n- Easy game controls with one fin..."
203,Bricks Breaker Quest,"How to play\n- The ball flies to wherever you touched.\n- Clear the stages by removing bricks on the board.\n- Break the bricks and never let them hit the bottom.\n- Find best positions and angles to hit every brick.\n\nFeature\n- Free to play\n- Tons of stages\n- Various types of balls\n- Easy to play, Simplest game system, Designed for one handheld gameplay.\n- Off-line (without internet connection) gameplay supported \n- Multi-play supported\n- Tablet device supported\n- Achievement & lea..."
204,Brothers in Arms¬Æ 3,"Fight brave soldiers from around the globe on the frenzied multiplayer battlegrounds of World War 2 or become Sergeant Wright and experience a dramatic, life-changing single-player journey, in the aftermath of the D-Day invasion.\n\nCLIMB THE ARMY RANKS IN MULTIPLAYER \n> 4 maps to master and enjoy. \n> 2 gameplay modes to begin with: Free For All and Team Deathmatch.\n> Unlock game-changing perks by playing with each weapon class!\n> A soldier‚Äôs only as deadly as his weapon. Be sure to upgr..."
205,Brown Dust - Tactical RPG,"The Empire has fallen, and the Age of Great Mercenaries Now Begins!\nCreate Your Ultimate Team And Strike Down Your Enemies!\n\nCAPTIVATING AND STUNNING ARTWORK\n- Experience the high-quality anime illustrations you have never seen before.\n- Meet Brown Dust's charming Mercenaries now.\n\nASSEMBLE LEGENDARY MERCENARIES\n- Over 300 Mercenaries and a Variety of Skills.\n- Discover the Unique Mercenaries, 6 Devils and Dominus Octo.\n- All Mercenaries can reach max level and the highest rank.\n\..."


As you can see, there are two labelled columns: `name` (the name of the app) and `description` (a textual description). The next cell shows how to access only the description field from row 200:

In [4]:
df.loc[200, 'description']

'Introducing the best Brick Breaker game that everyone can enjoy.\nEnjoy various missions and addictively simple play control.\n\n[Features]\n- Hundreds of stages and various missions\n- No limit to play such as Heart, play as much as you can!\n- 5 kinds of various items and items reinforcement system\n- No network required\n- game file is as low as 20M, light-weight download!\n- supports tablet screen\n- supports Google Play Leaderboards, Achievement, Multiplay\n- supports 14 languages\n\nHomepage:\nhttps://play.google.com/store/apps/dev?id=4931745640662708567\n\nFacebook: \nhttps://www.facebook.com/spcomesgames/'

## Problem 1: What's in a vector?

We start by vectorising the data ‚Äî more specifically, we map each app description to a tf‚Äìidf vector. This is very simple with a library like [scikit-learn](https://scikit-learn.org/stable/), which provides a [TfidfVectorizer](https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html) class for exactly this purpose.  If we instantiate this class, and call `fit_transform()` on all of our app descriptions, scikit-learn will preprocess and tokenize each app description, compute tf‚Äìidf values for each of them, and return a vectorised representation:

In [5]:
from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(df['description'])
X

<Compressed Sparse Row sparse matrix of dtype 'float64'
	with 267110 stored elements and shape (1614, 27877)>

Let‚Äôs pick the app "Pancake Tower", which has a rather short description text, to see how it has been vectorised:

In [6]:
# We can use 'toarray' to convert the sparse matrix object into a "normal" array
vec = X[1032].toarray()[0]

# The app description & its corresponding vector
df.loc[1032, 'description'], vec

("Let's see how many pancakes you can pile up!!",
 array([0., 0., 0., ..., 0., 0., 0.], shape=(27877,)))

That's not very informative yet.  We know that the vector contains tf‚Äìidf values, and that each dimension of the vector corresponds to a token in the vectorizer‚Äôs vocabulary; let's extract these for this specific example.

Your **first task** is to find out how to access the `vectorizer`‚Äôs vocabulary, for example by [checking the documentation of `TfidfVectorizer`](https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html), and print all the tokens that are represented in the vector with a tf‚Äìidf value greater than zero (i.e., only the tokens that are actually part of this app‚Äôs description) _in descending order of the tf‚Äìidf values_.  In other words, the token with the highest tf‚Äìidf value should be at the top of your output, and the token with the lowest tf‚Äìidf value at the bottom.   Before you implement this, think about what you would expect the output look like, for example which words you would expect to have the highest/lowest tf‚Äìidf values in this example.

Your final output should look something like this:

```
<token 1>: <tf-idf value 1>
<token 2>: <tf-idf value 2>
...
```

In [None]:
"""Print the tokens and their tf‚Äìidf values, in descending order."""

# YOUR CODE HERE
# encoded length is equal to the length of vocabulary
# but the value is encoded differently from one-hot, not only position info
# TF-IDF = TF √ó IDF
# TF: term frequency in present documentation, fre_in_doc(token, doc)/token_length(doc)
# IDF: term frequency in the documents, log((length(docs)+1) / length(sum(token in docs)) + 1)
# IDF is fixed but not IF

# sort vocab based on their id
# return a list consisting of tuples(token, token_id)
vocab = vectorizer.vocabulary_
sorted_vocab = sorted(vocab.items(), key=lambda x: x[1])
# get the non zero vector 
nonzero_vec_indices = np.where(vec > 0)[0]
nonzero_vec = vec[nonzero_vec_indices]
# get the corresponding tokens
nonzero_tokens = [sorted_vocab[index][0] for index in nonzero_vec_indices]
# create a vector to token dict and sort it
vec2token = {token: float(vector) for token, vector in zip(nonzero_tokens, nonzero_vec)}
sorted_items = sorted(vec2token.items(), key=lambda x: x[1], reverse=True)
sorted_items


[('pancakes', 0.6539332651185913),
 ('pile', 0.5304701435508047),
 ('let', 0.2615287714771797),
 ('see', 0.2557630827415271),
 ('many', 0.23491959669849022),
 ('how', 0.21153246225085887),
 ('up', 0.17216837691451817),
 ('can', 0.13047602895910532),
 ('you', 0.10276923239718011)]

## Problem 2: Finding the nearest vectors

To build a small search engine, we need to be able to turn _queries_ (for example the string "pile up pancakes") into _query vectors_, and then find out which of our app description vectors are closest to the query vector.

For the first part (turning queries into query vectors), we can simply re-use the `vectorizer` that we used for the app descriptions. For the second part, an easy way to find the closest vectors is to use scikit-learn‚Äôs [NearestNeighbors](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html) class. This class needs to be _fit_ on a set of vectors (the "training set"; in our case the app descriptions) and can then be used with any vector to find its _nearest neighbors_ in the vector space.

**First,** instantiate and fit a class that returns the _ten (10)_ nearest neighbors:

In [8]:
"""Instantiate and fit a class that returns the 10 nearest neighboring vectors."""
# YOUR CODE HERE
from sklearn.neighbors import NearestNeighbors
n = 10

nn = NearestNeighbors(n_neighbors=n, metric='cosine')  
nn.fit(X)

0,1,2
,n_neighbors,10
,radius,1.0
,algorithm,'auto'
,leaf_size,30
,metric,'cosine'
,p,2
,metric_params,
,n_jobs,


**Second,** implement a function that uses the vectorizer and the fitted class to find the nearest neighbours for a given query string:

In [9]:
def search(query):
    """Find the nearest neighbors in `df` for a query string.

    Arguments:
      query (str): A query string.

    Returns:
      The 10 apps (with name and description) most similar (in terms of
      cosine similarity) to the given query as a Pandas DataFrame.
    """
    # YOUR CODE HERE
    # Iterable parameter required
    # why it would compress itself......
    encoded_query = vectorizer.transform([query]).toarray()
    # indices gets a 2d list, don't know why 
    _, indices = nn.kneighbors(encoded_query)
    indices = indices.ravel()
    result_pd = pd.DataFrame({"name": df['name'].loc[indices], "description": df['description'].loc[indices]})
    return result_pd.reset_index(drop=True)

### ü§û Test your code

Test your implementation by running the following cell, which will sanity-check your return value and show the 10 best search results for the query _"pile up pancakes"_:

In [10]:
"""Check that searching for "pile up pancakes" returns a DataFrame with ten results,
   and that the top result is "Pancake Tower"."""

result = search('pile up pancakes')
display(result)
assert isinstance(result, pd.DataFrame), "search() function should return a Pandas DataFrame"
assert len(result) == 10, "search() function should return 10 search results"
assert result.iloc[0]["name"] == "Pancake Tower", "Top search result should be 'Pancake Tower'"
success()

Unnamed: 0,name,description
0,Pancake Tower,Let's see how many pancakes you can pile up!!
1,Cooking School: Games for Girls,"Children like to help their parents. They especially like to help with cooking . When there is a cooking in the kitchen, it is no way to play. But cooking is a complicated process and often it ends up with a huge mess in the kitchen. But what if you are so eager to cook pancakes, cake or cupcakes? How to cook all that without doing a cleaning after? We have a solution! Home Cooking School with our curious Hippo has opened especially for parents and children! We do not only cook food here. We..."
2,"Hell‚Äôs Cooking ‚Äî crazy chef burger, kitchen fever","‚≠ê ‚≠ê ‚≠ê ‚≠ê ‚≠ê New world of crazy cooking is here. Feel what it means to be a master chef who prepares fantastic fast food in a prominent king kitchen! If you haven't ever tried yourself as a hamburger chef cook, it's possibly the best time for making diner. Download and launch Hell's Cooking ‚Äî crazy chef burger, kitchen fever HD game and get prepared to jump into a fever and adventurous perfect world of burgers.\n\nNew girls game Hell's Cooking gives you lots of opportunities for your crazy cafe..."
3,Solitaire,"Solitaire Free by Solitaire Card Games is the #1 klondike solitaire games on android. The solitaire Free is popular and classic card games you know and love.\n\nWe carefully designed a fresh solitaire free modern look, woven into the wonderful solitaire classic feel that everyone loves. \n\nExperience the crisp, clear, and easy to read cards, simple and quick animations, and subtle sounds, in either landscape or portrait views. \n\nYou can move cards with a single tap or drag them to their d..."
4,Rummy - Free,"Play the famous Rummy card game on your Android Smartphone or Tablet !! \n\nPlay rummy with 2, 3, or 4 players against simulated opponents playing with high-level artificial intelligence. \nThere are a number of rules that can be modified, making this game very faithful to the original. \n\n*** MANY VARIATIONS INCLUDED *** \n\nMany rummy variations are included in the application: \n\n- From 2 to 4 players. \n- Choose the AI level of opponents. \n- Number of cards dealt to each player (from ..."
5,Sago Mini Trucks and Diggers,"Drive a dump truck with Rosie the hamster! Pile dirt high and dig deep in the ground with diggers, cranes and bulldozers. Build a home for a new friend! Choose a barn, a castle or even a cupcake-house. Don‚Äôt forget to add the finishing touches for the proud owner.\n\nOn this construction site, kids love being the boss. With six mighty machines and piles of dirt, you can build all day! Part of the award-winning suite of Sago Mini apps, this app puts kids in charge.\n\nSago Mini apps have no i..."
6,Dr. Panda's Ice Cream Truck,"Chocolate? Vanilla? Strawberry? All three!? You decide! In Dr. Panda‚Äôs Ice Cream Truck you can mix up all sorts of different flavors with cookies, chocolate, nuts and more to make the perfect ice cream‚Äîhundreds of combinations in all.\n\nScoop it!\nThese animals love ice cream, and will eat as much (or little) as you want to serve them. You can make scoops big or small and pile them as high as you want‚Äîusing any of the ice cream you‚Äôve created!\n\nToppings galore!\nUse chocolate syrup, cooki..."
7,Turbo Dismount‚Ñ¢,"The legendary crash simulator is now on Google Play!\n\nPerform death-defying motor stunts, crash into walls, create traffic pile-ups of epic scale - and share the fun!\n\nTurbo Dismount‚Ñ¢ is a kinetic tragedy about Mr. Dismount and the cars who love him. It is the official sequel to the wildly popular and immensely successful personal impact simulator - Stair Dismount‚Ñ¢. \n\nFEATURES:\n* Flinch-inducing crash physics\n* Crunchy sound effects\n* Delicious slow-mo replay system\n* Multiple vehi..."
8,UNO!‚Ñ¢,"Play the world‚Äôs number one card game like never before. UNO!‚Ñ¢ has all-new rules, tournaments, adventures and so much more! At home or on the move, jump into games instantly. Whether an UNO!‚Ñ¢ veteran or completely new, take on challenges and reap the rewards. UNO!‚Ñ¢ is the ultimate competitive family-friendly card game.\n- Play classic UNO!‚Ñ¢ or use tons of popular house rules!\n- Connect anytime, anywhere with friends from around the world! \n- Two heads are better than one in 2v2 mode. Use t..."
9,TO-FU Oh!SUSHI,"You are the veritable sushi master! Prepare your own fun sushi with ‚ÄúDaizu‚Äù the skunk!\n\nThis app is designed to allow children to be creative by decorating their original sushi.\n\nServe your delicious, mysterious or impossible sushi to the people of ‚ÄúTofu Island‚Äù! \n\nHow about creating sushi that is totally original and serve it to your beloved guests? Spice it up with tons of wasabi or even sprinkle chocolate and gummy bears for those sweet lovers.\nFeel free to make any kind of sushi y..."


Before continuing with the next problem, play around a bit with this simple search functionality by trying out different search queries, and see if the results look like what you would expect:

In [11]:
# Example ‚Äî try out your own queries!
search("dodge trains")

Unnamed: 0,name,description
0,Train Conductor World,"Master and manage the chaos of international railway traffic as the ultimate railroad tycoon. Build the rail network of your dreams; lay rails and solve the railroad puzzle with branching and forking roads at every turn. Become the richest manager and pick your path, do you optimise to the micro level, planning routes and managing the timetable, or sit idle letting your business keep earning while you sleep! \n\nGet in the driver's seat and take passengers to their destinations, dropping the..."
1,Subway Surfers,"DASH as fast as you can! \nDODGE the oncoming trains! \n\nHelp Jake, Tricky & Fresh escape from the grumpy Inspector and his dog. \n\n‚òÖ Grind trains with your cool crew! \n‚òÖ Colorful and vivid HD graphics! \n‚òÖ Hoverboard Surfing! \n‚òÖ Paint powered jetpack! \n‚òÖ Lightning fast swipe acrobatics! \n‚òÖ Challenge and help your friends! \n\nJoin the most daring chase! \n\nA Universal App with HD optimized graphics.\n\nBy Kiloo and Sybo."
2,Subway Princess Runner,"Subway princess runner, Bus run, forest rush with addictive endless running game!\nRush as fast as you can, dodge the oncoming trains and buses. Careful the rolling wood in the forest! Intuitive controls to run left or right, jump in the sky to obtain more coins, excited slide to safety!\n\nHelp your loved beautiful princess to escape the police! Use skateboard after double tapping, experience the unique board in the subway. Challenge the highest score of the rank with the world players or s..."
3,No Humanity - The Hardest Game,"2M+ Downloads All Over The World!\n\n* IGN Nominated Best Aussie/NZ game *\n* Top 5 indie games at PAX 2015 Australia ‚Äì Mashable *\n* Global Game Jam ""Best Game"" Sydney 2015 *\n* Global Game Jam ""Best Audio"" Sydney 2015 *\n\nIt's the end of the world and you are the lone survivor in a tiny spaceship. Get ready to dodge everything that is trying to kill you! Your reaction time and precision is key! No Humanity is the hardest bullet hell dodge game. Compare your score with friends and watch as..."
4,Bus Rush 2,"Bus Rush 2 is one of the most complete multiplayer runners for Android. \nRun along Rio de Janeiro and other scenarios. Drag to jump or slide and to move left or right, avoid hitting obstacles like trucks, buses and subway trains among others!\nPlay races with other users around the world in the multiplayer mode. Run around and gather all the coins you can in different scenarios from Rio city like downtown, subway, sewer, forest, different beaches, and an amazing jungle!\n\nIn Bus Rush 2, yo..."
5,Virus War - Space Shooting Game,"Warning! Virus invasion! Destroy them with your fingertip! \nis a free casual shooting game. Using only your fingertip, destroy all sorts of viruses. Remember to dodge, don‚Äôt let those filthy things hit your ship!\n*Simple and engaging gameplay. Play Virus War anywhere and anytime; get the most fun out of your breaks!\n*Equip your ship with different weapons and blast through swarms of enemies!\n*Surpass your friends in the ranking; set new records!"
6,Dancing Road: Color Ball Run!,Try out the most exciting Running - Sliding - Matching Music Game!\n\nThe rolling ball starts simply and ramps up shortly. \n\n‚òÖ Hold and drag your rolling ball to match other balls of the same color!\n‚òÖ Dodge different color balls!\n‚òÖ Try to collect all the coins and Gift Boxes on the dancing road!\n\nEnjoy the catchy music and challenges designed for each dancing road. \n\nLet's roll the ball and feel the beat in this Color Matching Game!
7,Bob - jigsaw puzzles free games for kids & parents,"Free jigsaw puzzles for kids, hundreds of puzzles for toddlers to assemble. Try now kids puzzle games for toddlers.\nJigsaw puzzles are great game for your toddler to play in waiting room or anywhere while you have to wait.\n\nFeatures:\n- kid puzzle game for free\n- unlimited number of pictures, many colorful pictures to choose from by your children\n- Perfect kid game when you are waiting in line with your children\n- 4 to 100 puzzles, various difficulty levels of jigsaw puzzles \n- jigsaw..."
8,Blocky Highway: Traffic Racing,"Blocky Highway is about racing traffic, avoiding trains, collecting cars and most importantly having fun. Collect coins, open prize boxes to get new cars and complete collections! Drive at full speed to score big and be the #1. \n\nCrash time! Control your car after crash, hit traffic cars for extra score!\n\nKey Features\n- Gorgeous voxel art graphics\n- 4 worlds to choose from\n- 55 different vehicles to drive : Taxi, Tank, Ufo, Police Car, Army 4x4, Dragster, Monster, Space Shuttle, Motor..."
9,Cat Runner: Decorate Home,"Cat runner is the best cat running game. Decorate your home for free! From the Living to bedroom or many other rooms, you can design and decorate everything with you loving!\n\nEnjoy hours of fun with your loved cat, run to collect gold coins after being robbed in this endless runner game! Explore new worlds, only racing with fast speed. go on a running adventure, dodge fast cars and trains as you go after the robber.\n\nIt is very easy to control, run as fast as you can, rush in the endless..."


## Problem 3: Custom preprocessing & tokenization

In Problem 1, you should have seen that `TfidfVectorizer` already performs some preprocessing by default and also does its own tokenization of the input data. This is great for getting started, but often we want to have more control over these steps. We can customize some aspects of the preprocessing through arguments when instantiating `TfidfVectorizer`, but for this exercise, we want to do _all_ of our preprocessing & tokenizing outside of scikit-learn.

Concretely, we want to use [spaCy](https://spacy.io), a library that we will make use of in later labs as well.  Here is a brief example of how to load and use a spaCy model:

In [12]:
import spacy
# Load the small English model, disabling some components that we don't need right now
nlp = spacy.load('en_core_web_sm', disable=['parser', 'ner', 'textcat'])

# Take an example sentence and print every token from it separately
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
for token in doc:
    print(token.text)

Apple
is
looking
at
buying
U.K.
startup
for
$
1
billion


**Your task** is to write a preprocessing function that uses spaCy to perform the following steps:
- tokenization
- lemmatization
- stop word removal
- removing tokens containing non-alphabetical characters

We recommend that you go through the [Linguistic annotations](https://spacy.io/usage/spacy-101#annotations) section of the spaCy&nbsp;101, which demonstrates how you can get the relevant kind of information via the spaCy library.

Implement your preprocessor by completing the following function:

In [13]:
def preprocess(text):
    """Preprocess the given text by tokenising it, removing any stop words, 
    replacing each remaining token with its lemma (base form), and discarding 
    all lemmas that contain non-alphabetical characters.

    Arguments:
      text (str): The text to preprocess.

    Returns:
      The list of remaining lemmas after preprocessing (represented as strings).
    """
    # YOUR CODE HERE
    # text to tokens
    tokens = nlp(text)
    # the functions below cannot apply on str
    # print(type(tokens[0]))
    # lemma_ -> lemmatization
    # is_stop -> stop word removal
    # is_alpha -> remove non-letter words
    processed_tokens = [token.lemma_ for token in tokens if token.is_alpha and not token.is_stop]
    return processed_tokens
  

### ü§û Test your code

Test your implementation by running the following cell:

In [14]:
"""Check that the preprocessing returns the correct output for a number of test cases."""

assert (
    preprocess('Apple is looking at buying U.K. startup for $1 billion') ==
    ['Apple', 'look', 'buy', 'startup', 'billion']
)
assert (
    preprocess('"Love Story" is a country pop song written and sung by Taylor Swift.') ==
    ['Love', 'Story', 'country', 'pop', 'song', 'write', 'sing', 'Taylor', 'Swift']
)
success()

## Problem 4: The effect of preprocessing

To make use of the new `preprocess` function from Problem 3, we need to make sure that we incorporate it into `TfidfVectorizer` and disable all preprocessing & tokenization that `TfidfVectorizer` performs by default. Afterwards, we also need to re-fit the vectorizer and the nearest-neighbors class. To make this a bit easier to handle, let‚Äôs take everything we have done so far and put it in a single class `AppSearcher`.

### Task 4.1

**Your first task** is to complete the stub of the `AppSearcher` class given below. Keep in mind:
- The `fit()` function should fit both the vectorizer (from Problem 1) and the nearest-neighbors class (from Problem 2).  Make sure to modify the call to `TfidfVectorizer` to _disable all preprocessing & tokenization_ that it would do by default, and replace it with a call to the `preprocess()` function _defined in `AppSearcher`_.
- For the `preprocess()` function, you can start by copying your solution from Problem 3.
- For the `search()` function, you can copy your solution from Problem 2.
- Make sure to adapt your code to store the everything (data, vectorizer, nearest-neighbors class) within the `AppSearcher` class, so that your solution is independent of the code you wrote above!

In [None]:
class AppSearcher:
    def fit(self, df, reults_num=10):
        """Instantiate and fit all the classes required for the search engine (cf. Problems 1 and 2)."""
        self.df = df
        # fit vectorizer
        vectorizer = TfidfVectorizer()
        feature_matrix = vectorizer.fit_transform(df['description'])
        self.vectorizer = vectorizer
        # fit nearest neighbor
        nn = NearestNeighbors(n_neighbors=reults_num, metric='cosine')  
        nn.fit(feature_matrix)
        self.nn = nn

    def preprocess(self, text, lemma=True, lowercase=True, stopwords_removal=True, nonletter_words=True):
        """
        Preprocess the given text (cf. Problem 3).
        Tasks include:
            -  
            - lowercasing all characters
            - removing stop words
            - removing tokens containing non-alphabetical characters
        """
        # YOUR CODE HERE
        # load the model
        nlp = spacy.load('en_core_web_sm', disable=['parser', 'ner', 'textcat'])
        doc = nlp(text)
        # to prevent the type(outcome) from last round becomes str
        # cope each token with full process
        tokens = []
        for token in doc:
            if nonletter_words and not token.is_alpha:
                continue
            if stopwords_removal and token.is_stop:
                continue

            tok = token.lemma_ if lemma else token.text
            if lowercase:
                tok = tok.lower()

            tokens.append(tok)

        result = " ".join(tokens)
        return result

    def search(self, query, lemma=True, lowercase=True, stopwords_removal=True, nonletter_words=True):
        """Find the nearest neighbors in `df` for a query string (cf. Problem 2)."""
        # YOUR CODE HERE
        tokens = self.preprocess(query, lemma, lowercase, stopwords_removal, nonletter_words)
        encoded_query = self.vectorizer.transform([tokens]).toarray()
        _, indices = self.nn.kneighbors(encoded_query)
        indices = indices.ravel()
        result_pd = pd.DataFrame({"name": df['name'].loc[indices], "description": df['description'].loc[indices]})
        return result_pd.reset_index(drop=True)


#### ü§û Test your code

The following cell demonstrates how your class should be used. Note that it can take a bit longer to train it on the data as before, since we‚Äôre now calling spaCy for the preprocessing.

In [16]:
apps = AppSearcher()
apps.fit(df)
apps.search("pile up pancakes")

Unnamed: 0,name,description
0,Pancake Tower,Let's see how many pancakes you can pile up!!
1,Solitaire,"Solitaire Free by Solitaire Card Games is the #1 klondike solitaire games on android. The solitaire Free is popular and classic card games you know and love.\n\nWe carefully designed a fresh solitaire free modern look, woven into the wonderful solitaire classic feel that everyone loves. \n\nExperience the crisp, clear, and easy to read cards, simple and quick animations, and subtle sounds, in either landscape or portrait views. \n\nYou can move cards with a single tap or drag them to their d..."
2,Rummy - Free,"Play the famous Rummy card game on your Android Smartphone or Tablet !! \n\nPlay rummy with 2, 3, or 4 players against simulated opponents playing with high-level artificial intelligence. \nThere are a number of rules that can be modified, making this game very faithful to the original. \n\n*** MANY VARIATIONS INCLUDED *** \n\nMany rummy variations are included in the application: \n\n- From 2 to 4 players. \n- Choose the AI level of opponents. \n- Number of cards dealt to each player (from ..."
3,Sago Mini Trucks and Diggers,"Drive a dump truck with Rosie the hamster! Pile dirt high and dig deep in the ground with diggers, cranes and bulldozers. Build a home for a new friend! Choose a barn, a castle or even a cupcake-house. Don‚Äôt forget to add the finishing touches for the proud owner.\n\nOn this construction site, kids love being the boss. With six mighty machines and piles of dirt, you can build all day! Part of the award-winning suite of Sago Mini apps, this app puts kids in charge.\n\nSago Mini apps have no i..."
4,Turbo Dismount‚Ñ¢,"The legendary crash simulator is now on Google Play!\n\nPerform death-defying motor stunts, crash into walls, create traffic pile-ups of epic scale - and share the fun!\n\nTurbo Dismount‚Ñ¢ is a kinetic tragedy about Mr. Dismount and the cars who love him. It is the official sequel to the wildly popular and immensely successful personal impact simulator - Stair Dismount‚Ñ¢. \n\nFEATURES:\n* Flinch-inducing crash physics\n* Crunchy sound effects\n* Delicious slow-mo replay system\n* Multiple vehi..."
5,Dr. Panda's Ice Cream Truck,"Chocolate? Vanilla? Strawberry? All three!? You decide! In Dr. Panda‚Äôs Ice Cream Truck you can mix up all sorts of different flavors with cookies, chocolate, nuts and more to make the perfect ice cream‚Äîhundreds of combinations in all.\n\nScoop it!\nThese animals love ice cream, and will eat as much (or little) as you want to serve them. You can make scoops big or small and pile them as high as you want‚Äîusing any of the ice cream you‚Äôve created!\n\nToppings galore!\nUse chocolate syrup, cooki..."
6,Solitaire Free,"Solitaire by Gemego is the card game you know and love for your phone and tablet. Our Solitaire is beautifully designed with a simple interface to help you enjoy this classic game. \n\nOur Solitaire has the best card movement on the market. You don't need to select a specific card in a pile unlike other Solitaire games. \n\nFeatures\n‚òÖ Instructions - an overview of the rules of Solitaire\n‚òÖ Winning deals (random) - unlike any other Solitaire! \n‚òÖ One Card, Three Card and Vegas style games\n‚òÖ..."
7,Dr. Panda Ice Cream Truck Free,"Dr. Panda Ice Cream Truck is FREE for you to play!\n\nChocolate? Vanilla? Strawberry? All three!? You decide! In Dr. Panda Ice Cream Truck you can mix up all sorts of different flavors with cookies, chocolate, nuts and more to make the perfect ice cream‚Äîhundreds of combinations in all.\n\nScoop it!\nThese animals love ice cream, and will eat as much (or little) as you want to serve them. You can make scoops big or small and pile them as high as you want‚Äîusing any of the ice cream you‚Äôve crea..."
8,TO-FU Oh!SUSHI,"You are the veritable sushi master! Prepare your own fun sushi with ‚ÄúDaizu‚Äù the skunk!\n\nThis app is designed to allow children to be creative by decorating their original sushi.\n\nServe your delicious, mysterious or impossible sushi to the people of ‚ÄúTofu Island‚Äù! \n\nHow about creating sushi that is totally original and serve it to your beloved guests? Spice it up with tons of wasabi or even sprinkle chocolate and gummy bears for those sweet lovers.\nFeel free to make any kind of sushi y..."
9,UNO!‚Ñ¢,"Play the world‚Äôs number one card game like never before. UNO!‚Ñ¢ has all-new rules, tournaments, adventures and so much more! At home or on the move, jump into games instantly. Whether an UNO!‚Ñ¢ veteran or completely new, take on challenges and reap the rewards. UNO!‚Ñ¢ is the ultimate competitive family-friendly card game.\n- Play classic UNO!‚Ñ¢ or use tons of popular house rules!\n- Connect anytime, anywhere with friends from around the world! \n- Two heads are better than one in 2v2 mode. Use t..."


### Task 4.2

**Your second task** is to experiment with the effect of using (or not using) different preprocessing steps.  We always need to _tokenize_ the text, but other preprocessing steps are optional and require a conscious decision whether to use them or not, such as:
- lemmatization
- lowercasing all characters
- removing stop words
- removing tokens containing non-alphabetical characters

**Modify the definition of the `preprocess()` function** of `AppSearcher` to include/exclude individual preprocessing steps, run some searches, and observe if and how the results change.  Which search queries you try out is up to you ‚Äî you could compare searching for "pile up pancakes" with "pancake piling", for example; or you could try entirely different search queries aimed at different kinds of apps.  (You can modify the class directly by changing the cell above under Task 4.1, or copy the definitions to the cells below, whichever you prefer; there is no separate code to show for this task, but you will use your observations here for the individual reflection.)

In [17]:
# if I use the previous query, whose reult is existed in the list
# probably hardy detect difference 
query = "Track the Best AI-based Budget planner for Students v2.0!"
apps.fit(df, reults_num=5)
apps.search(query, lemma=True, lowercase=True, stopwords_removal=True, nonletter_words=True)


Unnamed: 0,name,description
0,"Lifesum - Diet Plan, Macro Calculator & Food Diary","Diet plan, food diary, macro calculator, calorie counter & healthy recipes, all in one convenient place. Reach your goals with food tracker Lifesum!\n\nKeto diet, high protein or maybe vegan? We‚Äôll help you find a diet plan best suited to you. Need a health tracker to keep you going? Fear not. We‚Äôve got an intuitive food diary, macro calculator & diet tips to give you a helping hand.\n\nMeal planner & macro tracker - TOP Lifesum features:\n‚óè Diet plan & diet tips for any goal - lose weight &..."
1,Instant Match For Tinder,"What is ""Instant Match""?\nAutomated tinder match machine!\nInterested in Filter\nAi auto like.\nAi auto super like.\nAuto first message. ( optional )\nView people as a list. Not 1 by 1.\nLocation modifier.\n\nYou need the ""Instant Match"" right now!\nThis is why:\n- See tinder users as a list. Not 1 by 1.\n- Skip the inactive profiles.\n- Stop wasting your limited likes.\n- Well trained AI for auto-likes.\n- Optional auto-message to start the conversation.\n- Auto-super like on most probable ..."
2,2D Strike,"This is a fun shooter with a view from above. Use grenades, machine guns, RPGs or flamethrowers against your enemies!\n- Multiplayer Online or via Wi-Fi.\n- 12 game modes: Team game, Battle Royale, Ghost, Base defense, Flag capture, Bugs attack, Zombie - infection and others.\n- Cards with a variety of destructible objects.\n- More than 20 kinds of different weapons.\n- Put the turrets and the barricade system to strengthen your base.\n- Up to 20 players online on a single map (up to 50 in B..."
3,Complete Anatomy Platform 2020,"*** TRY IT FOR FREE! ***\n\nRequires 1.5 GB storage. For updates, it's recommended to have at least 3 GB storage available.\n\nThe world‚Äôs most accurate, most advanced and best-selling 3D anatomy platform with groundbreaking new technology, models and content. Not just an atlas, but an anatomy learning platform with unique collaboration and learning tools.\n\nUsed by 250 of the world‚Äôs top universities, including 6 US Ivy League schools, 20 of the world‚Äôs top 25 ranked medical schools, 7 of ..."
4,Durak | –î—É—Ä–∞–∫,"""Durak"" or ""–î—É—Ä–∞–∫"" is the most popular Russian card game. This game allows to play with 1 to 3 AI players. It's also been know as ""v duraka"" or "" –í –¥—É—Ä–∞–∫–∞"". \n\nFeatures:\n- sounds\n- vibration (e.g. it will notify you when it's your turn)\n- sort cards by rank or/and by suit\n- buy full version to disable ads\n- big all small symbols on cards\n- animations (can be turned off)\n- 1 to 3 AI players to play against you\n- and more...\n\nWe are still working on the development and will and more..."


In [18]:
apps.search(query, lemma=False, lowercase=True, stopwords_removal=True, nonletter_words=True)


Unnamed: 0,name,description
0,Durak | –î—É—Ä–∞–∫,"""Durak"" or ""–î—É—Ä–∞–∫"" is the most popular Russian card game. This game allows to play with 1 to 3 AI players. It's also been know as ""v duraka"" or "" –í –¥—É—Ä–∞–∫–∞"". \n\nFeatures:\n- sounds\n- vibration (e.g. it will notify you when it's your turn)\n- sort cards by rank or/and by suit\n- buy full version to disable ads\n- big all small symbols on cards\n- animations (can be turned off)\n- 1 to 3 AI players to play against you\n- and more...\n\nWe are still working on the development and will and more..."
1,"Lifesum - Diet Plan, Macro Calculator & Food Diary","Diet plan, food diary, macro calculator, calorie counter & healthy recipes, all in one convenient place. Reach your goals with food tracker Lifesum!\n\nKeto diet, high protein or maybe vegan? We‚Äôll help you find a diet plan best suited to you. Need a health tracker to keep you going? Fear not. We‚Äôve got an intuitive food diary, macro calculator & diet tips to give you a helping hand.\n\nMeal planner & macro tracker - TOP Lifesum features:\n‚óè Diet plan & diet tips for any goal - lose weight &..."
2,Instant Match For Tinder,"What is ""Instant Match""?\nAutomated tinder match machine!\nInterested in Filter\nAi auto like.\nAi auto super like.\nAuto first message. ( optional )\nView people as a list. Not 1 by 1.\nLocation modifier.\n\nYou need the ""Instant Match"" right now!\nThis is why:\n- See tinder users as a list. Not 1 by 1.\n- Skip the inactive profiles.\n- Stop wasting your limited likes.\n- Well trained AI for auto-likes.\n- Optional auto-message to start the conversation.\n- Auto-super like on most probable ..."
3,"Todoist: To-Do List, Tasks & Reminders","üèÜ 2019 Editor's Choice by Google\nü•á ""Todoist is the best to-do list right now"" ‚Äî The Verge\n\nTodoist is a to-do list that helps 20 million people and teams organize, plan, and collaborate on projects, both big and small.\n\n‚ú® The Work and Life Organizer:\nTodoist will help you stay focused and organized with:\n‚Üí Project management: Collaborate with colleagues. Assign tasks to stay on track with projects.\n‚Üí Lists: Use projects for a new to-do list, checklist, reading list, bucket list, wish..."
4,Tic Tac Toe Glow,Play Tic Tac Toe on your Android phone. No need waste paper to play puzzle games! Now you can play Tic Tac Toe on your Android device for free. Our new modern version appears in a cool glow design. \n\nThe AI for this puzzle game is one of the best you will see. It adapts to your play style and is highly unpredictable. Unlike other Tic Tac Toe games on the market you will always find Glow Tic Tac Toe AI to be fresh and entertaining. If that is not all the AI skill can be adjusted on the fly ...


In [19]:
apps.search(query, lemma=True, lowercase=False, stopwords_removal=True, nonletter_words=True)


Unnamed: 0,name,description
0,"Lifesum - Diet Plan, Macro Calculator & Food Diary","Diet plan, food diary, macro calculator, calorie counter & healthy recipes, all in one convenient place. Reach your goals with food tracker Lifesum!\n\nKeto diet, high protein or maybe vegan? We‚Äôll help you find a diet plan best suited to you. Need a health tracker to keep you going? Fear not. We‚Äôve got an intuitive food diary, macro calculator & diet tips to give you a helping hand.\n\nMeal planner & macro tracker - TOP Lifesum features:\n‚óè Diet plan & diet tips for any goal - lose weight &..."
1,Instant Match For Tinder,"What is ""Instant Match""?\nAutomated tinder match machine!\nInterested in Filter\nAi auto like.\nAi auto super like.\nAuto first message. ( optional )\nView people as a list. Not 1 by 1.\nLocation modifier.\n\nYou need the ""Instant Match"" right now!\nThis is why:\n- See tinder users as a list. Not 1 by 1.\n- Skip the inactive profiles.\n- Stop wasting your limited likes.\n- Well trained AI for auto-likes.\n- Optional auto-message to start the conversation.\n- Auto-super like on most probable ..."
2,2D Strike,"This is a fun shooter with a view from above. Use grenades, machine guns, RPGs or flamethrowers against your enemies!\n- Multiplayer Online or via Wi-Fi.\n- 12 game modes: Team game, Battle Royale, Ghost, Base defense, Flag capture, Bugs attack, Zombie - infection and others.\n- Cards with a variety of destructible objects.\n- More than 20 kinds of different weapons.\n- Put the turrets and the barricade system to strengthen your base.\n- Up to 20 players online on a single map (up to 50 in B..."
3,Complete Anatomy Platform 2020,"*** TRY IT FOR FREE! ***\n\nRequires 1.5 GB storage. For updates, it's recommended to have at least 3 GB storage available.\n\nThe world‚Äôs most accurate, most advanced and best-selling 3D anatomy platform with groundbreaking new technology, models and content. Not just an atlas, but an anatomy learning platform with unique collaboration and learning tools.\n\nUsed by 250 of the world‚Äôs top universities, including 6 US Ivy League schools, 20 of the world‚Äôs top 25 ranked medical schools, 7 of ..."
4,Durak | –î—É—Ä–∞–∫,"""Durak"" or ""–î—É—Ä–∞–∫"" is the most popular Russian card game. This game allows to play with 1 to 3 AI players. It's also been know as ""v duraka"" or "" –í –¥—É—Ä–∞–∫–∞"". \n\nFeatures:\n- sounds\n- vibration (e.g. it will notify you when it's your turn)\n- sort cards by rank or/and by suit\n- buy full version to disable ads\n- big all small symbols on cards\n- animations (can be turned off)\n- 1 to 3 AI players to play against you\n- and more...\n\nWe are still working on the development and will and more..."


In [20]:
apps.search(query, lemma=True, lowercase=True, stopwords_removal=False, nonletter_words=True)


Unnamed: 0,name,description
0,Last Shelter: Survival,"Tons of Rewards, Limited Packages, Don't wait, ACT!\n\nThey‚Äôre Here‚Ä¶..\n\nYour Mission, Survive\n\nThe war ravages\nThe virus has spread on a global scale, and most of the humans are now zombies.In this dire time, we have to ensure our survival!\n‚ÄúCommander, build your base, and protect the people from zombies!‚Äù\nKeep the humanity‚Äôs flame lit!\n\nGame content\n\nBuild your city: To survive, you have to build your base\nYou start small, but you will grow fast, you have to be strong, so you ca..."
1,World on Fire,"In the fight for power and glory, a global war between opposing armies is ongoing with no end in sight.\n\nWith debris falling and fire blazing across the skies, a crumbling world has been left in utter ruins. But the war has only just begun.\n\nRuined bases to be rebuilt! Armies are waiting for your command! Lost territories to be recovered!\n\nForm alliances with other united commanders from around the world. Research new technologies. Destroy the rebels. Exterminate your enemies. Become t..."
2,Complete Anatomy Platform 2020,"*** TRY IT FOR FREE! ***\n\nRequires 1.5 GB storage. For updates, it's recommended to have at least 3 GB storage available.\n\nThe world‚Äôs most accurate, most advanced and best-selling 3D anatomy platform with groundbreaking new technology, models and content. Not just an atlas, but an anatomy learning platform with unique collaboration and learning tools.\n\nUsed by 250 of the world‚Äôs top universities, including 6 US Ivy League schools, 20 of the world‚Äôs top 25 ranked medical schools, 7 of ..."
3,"Lifesum - Diet Plan, Macro Calculator & Food Diary","Diet plan, food diary, macro calculator, calorie counter & healthy recipes, all in one convenient place. Reach your goals with food tracker Lifesum!\n\nKeto diet, high protein or maybe vegan? We‚Äôll help you find a diet plan best suited to you. Need a health tracker to keep you going? Fear not. We‚Äôve got an intuitive food diary, macro calculator & diet tips to give you a helping hand.\n\nMeal planner & macro tracker - TOP Lifesum features:\n‚óè Diet plan & diet tips for any goal - lose weight &..."
4,Rocket War: Clash in the Fog,"A choice for those who love massive firepower!Feel the joy of bombing in‚ÄúRocket War: Clash in the Fog‚Äù!\n\n[Introduction]\nThe enemy base is covered in Dark Fog when the battle starts. \nYour first strike might decide the outcome of the battle,\nor your strategic maneuver or bombing might turn the tide!\n\nEnjoy exciting battles in the fog with your fingertips!\n\n‚ñ∂ Upgrade your weapons and units and execute your plans!\nThere are plenty of awesome weapons and units, including Laser Girl, Dr..."


In [21]:
apps.search(query, lemma=True, lowercase=True, stopwords_removal=True, nonletter_words=False)

Unnamed: 0,name,description
0,"Lifesum - Diet Plan, Macro Calculator & Food Diary","Diet plan, food diary, macro calculator, calorie counter & healthy recipes, all in one convenient place. Reach your goals with food tracker Lifesum!\n\nKeto diet, high protein or maybe vegan? We‚Äôll help you find a diet plan best suited to you. Need a health tracker to keep you going? Fear not. We‚Äôve got an intuitive food diary, macro calculator & diet tips to give you a helping hand.\n\nMeal planner & macro tracker - TOP Lifesum features:\n‚óè Diet plan & diet tips for any goal - lose weight &..."
1,Instant Match For Tinder,"What is ""Instant Match""?\nAutomated tinder match machine!\nInterested in Filter\nAi auto like.\nAi auto super like.\nAuto first message. ( optional )\nView people as a list. Not 1 by 1.\nLocation modifier.\n\nYou need the ""Instant Match"" right now!\nThis is why:\n- See tinder users as a list. Not 1 by 1.\n- Skip the inactive profiles.\n- Stop wasting your limited likes.\n- Well trained AI for auto-likes.\n- Optional auto-message to start the conversation.\n- Auto-super like on most probable ..."
2,2D Strike,"This is a fun shooter with a view from above. Use grenades, machine guns, RPGs or flamethrowers against your enemies!\n- Multiplayer Online or via Wi-Fi.\n- 12 game modes: Team game, Battle Royale, Ghost, Base defense, Flag capture, Bugs attack, Zombie - infection and others.\n- Cards with a variety of destructible objects.\n- More than 20 kinds of different weapons.\n- Put the turrets and the barricade system to strengthen your base.\n- Up to 20 players online on a single map (up to 50 in B..."
3,Complete Anatomy Platform 2020,"*** TRY IT FOR FREE! ***\n\nRequires 1.5 GB storage. For updates, it's recommended to have at least 3 GB storage available.\n\nThe world‚Äôs most accurate, most advanced and best-selling 3D anatomy platform with groundbreaking new technology, models and content. Not just an atlas, but an anatomy learning platform with unique collaboration and learning tools.\n\nUsed by 250 of the world‚Äôs top universities, including 6 US Ivy League schools, 20 of the world‚Äôs top 25 ranked medical schools, 7 of ..."
4,Durak | –î—É—Ä–∞–∫,"""Durak"" or ""–î—É—Ä–∞–∫"" is the most popular Russian card game. This game allows to play with 1 to 3 AI players. It's also been know as ""v duraka"" or "" –í –¥—É—Ä–∞–∫–∞"". \n\nFeatures:\n- sounds\n- vibration (e.g. it will notify you when it's your turn)\n- sort cards by rank or/and by suit\n- buy full version to disable ads\n- big all small symbols on cards\n- animations (can be turned off)\n- 1 to 3 AI players to play against you\n- and more...\n\nWe are still working on the development and will and more..."


In [None]:
# code here for supporting reflection
# question 1
# for description: "Let's see how many pancakes you can pile up!!"
max_ = np.argmax(vec)
min_ = np.argmin(vec)

max_token = sorted_vocab[max_][0]
min_token = sorted_vocab[min_][0]

print(max_token)
print(min_token)



pancakes
00
tsuki
applikation


## Individual reflection

<div class="alert alert-info">
    <strong>After you have solved the lab,</strong> write a <em>brief</em> reflection (max. one A4 page) on the question(s) below.  Remember:
    <ul>
        <li>You are encouraged to discuss this part with your lab partner, but you should each write up your reflection <strong>individually</strong>.</li>
        <li><strong>Do not put your answers in the notebook</strong>; upload them in the separate submission opportunity for the reflections on Lisam.</li>
    </ul>
</div>

1. In Problem 1, which token had the highest tf‚Äìidf score, which the lowest?  Based on your knowledge of how tf‚Äìidf works, how would you explain this result?
2. Based on your observations in Problem 4, which preprocessing steps do you think are the most appropriate for this "search engine" example?  Why?

**Congratulations on finishing this lab! üëç**

<div class="alert alert-info">
    
‚û°Ô∏è Before you submit, **make sure the notebook can be run from start to finish** without errors.  For this, _restart the kernel_ and _run all cells_ from top to bottom. In Jupyter Notebook version 7 or higher, you can do this via "Run$\rightarrow$Restart Kernel and Run All Cells..." in the menu (or the "‚è©" button in the toolbar).

</div>