<h1 id="tocheading">Table of Contents</h1>
<br />
<div id="toc"><ul class="toc"><li><a href="#1.-Quick-look-at-the-dataset">1. Quick look at the dataset</a><a class="anchor-link" href="#1.-Quick-look-at-the-dataset">¶</a></li><ul class="toc"><li><a href="#1.1.-First-few-rows">1.1. First few rows</a><a class="anchor-link" href="#1.1.-First-few-rows">¶</a></li><li><a href="#1.2.-Data-formats">1.2. Data formats</a><a class="anchor-link" href="#1.2.-Data-formats">¶</a></li></ul><li><a href="#2.-Cleaning-the-data">2. Cleaning the data</a><a class="anchor-link" href="#2.-Cleaning-the-data">¶</a></li><ul class="toc"><li><a href="#2.1.-Column-names">2.1. Column names</a><a class="anchor-link" href="#2.1.-Column-names">¶</a></li><li><a href="#2.2.-Normalise-columns">2.2. Normalise columns</a><a class="anchor-link" href="#2.2.-Normalise-columns">¶</a></li><ul class="toc"><li><a href="#2.2.1.-Question,-Answer-and-Category--columns">2.2.1. <code>Question</code>, <code>Answer</code> and <code>Category</code>  columns</a><a class="anchor-link" href="#2.2.1.-Question,-Answer-and-Category--columns">¶</a></li><li><a href="#2.2.2.-Value-column">2.2.2. <code>Value</code> column</a><a class="anchor-link" href="#2.2.2.-Value-column">¶</a></li><li><a href="#2.2.3.-Air-Date-column">2.2.3. <code>Air Date</code> column</a><a class="anchor-link" href="#2.2.3.-Air-Date-column">¶</a></li></ul></ul><li><a href="#3.-What-could-you-potentially-focus-on-to-prepare?">3. What could you potentially focus on to prepare?</a><a class="anchor-link" href="#3.-What-could-you-potentially-focus-on-to-prepare?">¶</a></li><ul class="toc"><li><a href="#3.1.-Words-appearing-in-questions">3.1. Words appearing in questions</a><a class="anchor-link" href="#3.1.-Words-appearing-in-questions">¶</a></li><ul class="toc"><li><a href="#3.1.1.-How-often-is-the-answer-deducible-from-the-question?">3.1.1. How often is the answer deducible from the question?</a><a class="anchor-link" href="#3.1.1.-How-often-is-the-answer-deducible-from-the-question?">¶</a></li><li><a href="#3.1.2.-What-are-the-words-which-recur-in-the-questions-most-often?">3.1.2. What are the words which recur in the questions most often?</a><a class="anchor-link" href="#3.1.2.-What-are-the-words-which-recur-in-the-questions-most-often?">¶</a></li><li><a href="#3.1.3.-Which-words-in-each-question-will-lead-to-better-chance-of-winning-bigger-reward?">3.1.3. Which words in each question will lead to better chance of winning bigger reward?</a><a class="anchor-link" href="#3.1.3.-Which-words-in-each-question-will-lead-to-better-chance-of-winning-bigger-reward?">¶</a></li></ul><li><a href="#3.2.-Categories">3.2. Categories</a><a class="anchor-link" href="#3.2.-Categories">¶</a></li><ul class="toc"><li><a href="#3.2.1.-What-are-the-most-recurring-categories?">3.2.1. What are the most recurring categories?</a><a class="anchor-link" href="#3.2.1.-What-are-the-most-recurring-categories?">¶</a></li><li><a href="#3.2.2.-Which-categories-will-lead-to-better-chance-of-winning-bigger-reward?">3.2.2. Which categories will lead to better chance of winning bigger reward?</a><a class="anchor-link" href="#3.2.2.-Which-categories-will-lead-to-better-chance-of-winning-bigger-reward?">¶</a></li></ul></ul><li><a href="#4.-So,-should-you-prepare-for-it-if-you-are-to-participate-in-Jeopardy!-?">4. So, should you prepare for it if you are to participate in <em><code>Jeopardy!</code></em> ?</a><a class="anchor-link" href="#4.-So,-should-you-prepare-for-it-if-you-are-to-participate-in-Jeopardy!-?">¶</a></li><li><a href="#5.-Tasks-for-later">5. Tasks for later</a><a class="anchor-link" href="#5.-Tasks-for-later">¶</a></li></ul></div>

Project guide: https://www.dataquest.io/m/210/guided-project%3A-winning-jeopardy

Solution by DataQuest: https://github.com/dataquestio/solutions/blob/master/Mission210Solution.ipynb



This project uses data on American game show [Jeopardy!](https://en.wikipedia.org/wiki/Jeopardy!). The big questions I will ask is whether it is feasible to prepare for it and, if so, what to focus on.

`jeopardy.csv` dataset will be used which has been downloaded from DataQuest. It is a sample (19999 entries) from the whole dataset (216930 entries) at https://www.reddit.com/r/datasets/comments/1uyd0t/200000_jeopardy_questions_in_a_json_file/.

The sample contains only about 9% of the whole dataset. Therefore, any observation made here cannot be safely applied to the whole dataset.

# 1. Quick look at the dataset

## 1.1. First few rows

Below are the first five rows of dataset.

In [1]:
from IPython.display import display
import pandas as pd

# read in dataset
jeopardy = pd.read_csv("jeopardy.csv")

# display first 5 rows of dataset
display(jeopardy.head(5))

print("Number of rows:", jeopardy.shape[0])

Unnamed: 0,Show Number,Air Date,Round,Category,Value,Question,Answer
0,4680,2004-12-31,Jeopardy!,HISTORY,$200,"For the last 8 years of his life, Galileo was ...",Copernicus
1,4680,2004-12-31,Jeopardy!,ESPN's TOP 10 ALL-TIME ATHLETES,$200,No. 2: 1912 Olympian; football star at Carlisl...,Jim Thorpe
2,4680,2004-12-31,Jeopardy!,EVERYBODY TALKS ABOUT IT...,$200,The city of Yuma in this state has a record av...,Arizona
3,4680,2004-12-31,Jeopardy!,THE COMPANY LINE,$200,"In 1963, live on ""The Art Linkletter Show"", th...",McDonald's
4,4680,2004-12-31,Jeopardy!,EPITAPHS & TRIBUTES,$200,"Signer of the Dec. of Indep., framer of the Co...",John Adams


Number of rows: 19999


## 1.2. Data formats
Most columns have object (string) data type except `Show Number` column which contains integers.

In [2]:
print("Data types for each column")

for c in jeopardy.columns:
    print("{}: {}".format(c, jeopardy[c].dtype))

Data types for each column
Show Number: int64
 Air Date: object
 Round: object
 Category: object
 Value: object
 Question: object
 Answer: object


# 2. Cleaning the data
## 2.1. Column names

Following are the description for all columns (as shown on DataQuest).

*   `Show Number` \-\- the Jeopardy episode number of the show this question was in.
*   `Air Date` \-\- the date the episode aired.
*   `Round` \-\- the round of Jeopardy that the question was asked in. Jeopardy has several rounds as each episode progresses.
*   `Category` \-\- the category of the question.
*   `Value` \-\- the number of dollars answering the question correctly is worth.
*   `Question` \-\- the text of the question.
*   `Answer` \-\- the text of the answer.

Some column names have leading whitespaces, so they will be removed now.

In [3]:
# show original column names
print("Original column names:\n", jeopardy.columns.tolist())
print()

# remove leading white spaces
jeopardy.columns = jeopardy.columns.str.lstrip()

# show updated column names
print("Updated column names:\n", jeopardy.columns.tolist())
print("\n")

Original column names:
 ['Show Number', ' Air Date', ' Round', ' Category', ' Value', ' Question', ' Answer']

Updated column names:
 ['Show Number', 'Air Date', 'Round', 'Category', 'Value', 'Question', 'Answer']




## 2.2. Normalise columns

Now, the data will be converted so that I can work with them.

### 2.2.1. `Question`, `Answer` and `Category`  columns

Items in `Category` column will only be capitalised because infomration will be lost if thoroughly processed.

For `Question` and `Answer` columns, HTML tags, punctuation and [stop words](https://en.wikipedia.org/wiki/Stop_words) will be removed. Also, all characters will be lowercased.

Note that unfortunately the stopwords list from [NLTK library](http://www.nltk.org/) filtering is not perfect, I have not added missing words due to time contraint. Words such as `They've` will not be filtered, but converted into `theyve`.

In [4]:
from nltk.corpus import stopwords
from pprint import pprint
from string import punctuation
from bs4 import BeautifulSoup

import re

def normalise_text(text, thorough=True):
    """
    Strip html tags from text,
    and lowercase text.
    
    If "thorough" is set to be True, also
    remove double qoutes from text,
    exclude stopwords then
    remove punctuation from remaining text.
    Lastly, remove repeated words.
    
    Return output as a list of words.
    """

    # remove HTML tags from text
    t = BeautifulSoup(text, "lxml").get_text()

    # lowercase text
    t = t.lower() if thorough else t.upper()
    
    if thorough:
    
        # remove double quotation marks from text
        t = re.sub("\"", " ", t)

        # load stopwords
        sw = stopwords.words('english')

        # set up punctuation remover
        translator = str.maketrans("", "", punctuation)

        # initiate normalised word list
        t_norm = []

        # for each word which is not a stopword
        for word in t.split(" "):
            if word not in sw:

                # remove punctuation
                w = word.translate(translator)

                # if word is not null and still not a stopword
                if (w != "") and (w not in sw):

                    # add it to normalised list
                    t_norm.append(w)

    # either return a lower-cased string
    # or return a list of further processed text with repeated words removed
    return list(set(t_norm)) if thorough else t

# show example of original text
print("Original text examples")
print(jeopardy["Question"].iloc[1])
print(jeopardy["Answer"].iloc[1])
print(jeopardy["Category"].iloc[1])
print()

# normalise text
jeopardy["clean_question"] = jeopardy["Question"].apply(normalise_text, args=(True,))
jeopardy["clean_answer"] = jeopardy["Answer"].apply(normalise_text, args=(True,))
jeopardy["clean_category"] = jeopardy["Category"].apply(normalise_text, args=(False,))

# show example of updated text
print("Updated text example")
print(jeopardy["clean_question"].iloc[1])
print(jeopardy["clean_answer"].iloc[1])
print(jeopardy["clean_category"].iloc[1])

Original text examples
No. 2: 1912 Olympian; football star at Carlisle Indian School; 6 MLB seasons with the Reds, Giants & Braves
Jim Thorpe
ESPN's TOP 10 ALL-TIME ATHLETES

Updated text example
['reds', 'carlisle', 'indian', '2', 'school', 'olympian', 'mlb', '1912', 'football', 'star', 'braves', 'giants', '6', 'seasons']
['thorpe', 'jim']
ESPN'S TOP 10 ALL-TIME ATHLETES


### 2.2.2. `Value` column

`$` symbol will be removed. Then, its format will be converted from string to integer. "None" will be converted to 0.

In [5]:
import re

def normalise_value(value):
    """
    Remove "$" and "," from value, and
    convert it it into integer.
    
    If value is "None",
    convert it into 0.
    """
    
    # remove "," and "$"
    output_1 = re.sub(",|\$", "", value)
    
    # Change "None" to "0"
    output_2 = re.sub("None", "0", output_1)
    
    return int(output_2)


# show examples of original value
print("Original value examples")
print(jeopardy["Value"].loc[22])
print(jeopardy["Value"].loc[19987])
print()

# noremalise value
jeopardy["clean_value"] = jeopardy["Value"].apply(normalise_value)

# show example of updated text
print("Updated value examples")
print(jeopardy["clean_value"].loc[22])
print(jeopardy["clean_value"].loc[19987])
print()

Original value examples
$2,000
None

Updated value examples
2000
0



### 2.2.3. `Air Date` column

Strings will be converted to `datetime64` format.

This column has no use in this project, but I will do it just to demonstrate it.

In [6]:
# convert string to datetime64
jeopardy["Air Date"] = pd.to_datetime(jeopardy["Air Date"])

# display first 5 rows of converted version
print(jeopardy["Air Date"].head(5))

0   2004-12-31
1   2004-12-31
2   2004-12-31
3   2004-12-31
4   2004-12-31
Name: Air Date, dtype: datetime64[ns]


# 3. What could you potentially focus on to prepare?

If you are to participate in the show, perhaps you could prepare. Or perhaps not?

I will tackle this question by first checking whether any of two potential foci will be useful: **words appearing in questions** or **categories**.

## 3.1. Words appearing in questions

### 3.1.1. How often is the answer deducible from the question?

Could it be that the questions are so obvious no preparation is needed at all? Perhaps part of the answer is given in the question? To answer this, I will do the following for each question and answer pair.

* Count the words which appear both in each question and its answer pair.
* Divide it by the number of words in the answer.

The higher this ratio, the more obvious the question may be.

In [7]:
def get_answer_in_question(series):
    """
    For each row of series (pandas series),
    count the number of words which appear
    in both question and answer.
    
    Divide this number by the number of words in answer.
    
    Return the output.
    Return 0 if answer contains no words.
    """

    # load question and answer
    q_ns = set(series["clean_question"])
    a_ns = set(series["clean_answer"])
    
    # return 0 if answer contains no words
    if len(a_ns) == 0:
        return 0
    
    # number of words appearing in both question and answer
    overlap = len(a_ns.intersection(q_ns))
    
    # return measure of how many words in answer also appears in question
    return overlap / len(a_ns)

jeopardy["answer_in_question"] = jeopardy.apply(get_answer_in_question, axis=1)
print("Mean ratio of words in answers which also appear in questions is {:.3%}".\
      format(jeopardy["answer_in_question"].mean()))

Mean ratio of words in answers which also appear in questions is 3.838%


So, only a handful of words in answers also appeared in questions. As the questions are not that obvious, you cannot just walk in and expect to answer the questions.

### 3.1.2. What are the words which recur in the questions most often?

If you are to study for it, the words in questions might be used as a guideline.

I will first check the words used again and again across questions. The most recurring words will be displayed with the number of questions they appear in.

In [8]:
from pprint import pprint
from collections import OrderedDict

import operator

def most_recurring_items(col, item):
    """
    Display a table showing item and its frequency in col,
    where col is a column name in pandas dataframe.
    """
    
    def print_repeated_ratio(counter, item):
        """
        item: string label
        counter: a dictionary whose keys are different types of the item
        
        Print ratio of repeated instances of an item.
        """
        
        # get ratio of repeated words
        counter_len = len(counter)
        tot_items = sum(counter.values())

        repeat_ratio = (tot_items - counter_len) / tot_items
        
        print("Total instances of {}:".format(item))
        print(tot_items)
        print()

        print("Ratio of repeat instances:")
        print("({} - {}) / {}) = {}".format(tot_items, counter_len, tot_items, repeat_ratio))
        print()

        print("Around {:.3%} were repeated instances\n\
        across all instances of {},\n\
        (i.e. appearing in more than one question)"\
              .format(repeat_ratio, item))
        print("\n\n")

    # create word counter (key is word, value is count)
    counter = {}
    
    # update counter
    count(counter, col, update_counter)

    # print total number of items
    print("Total number of {}:".format(item))
    print(len(counter))
    print()
    
    # print ratio of repeated instances if item is "words"
    if item == "words":
        print_repeated_ratio(counter, item)

    print("100 most repeated {}".format(item))
    display(pd.DataFrame(sorted(counter.items(), key=operator.itemgetter(1), reverse=True)[:100],
                columns=[item, "Frequency"]))


def count(counter, col, update_counter_func, high_val=False):
    """
    For each row in col (column label in pandas dataframe),
    count number of item (if col == "clean_category")
        or its elements (if col == "clean question")
    and put it into counter (dictionary)
    using (update_counter_func).

    If high_val is True,
    define h_val as the value in "high_value" column, and
    h_val_len as number of entries in "high_value" column.
    """
    
    # number of entries in "high_value" column
    h_val_len = jeopardy["high_value"].shape[0] if high_val else None
    
    for idx, row in jeopardy.iterrows():
        h_val = None if high_val == False else row["high_value"]
        if col == "clean_question":
            for i in row[col]:
                update_counter_func(i, counter, h_val, h_val_len)            
        elif col == "clean_category":
            update_counter_func(row[col], counter, h_val, h_val_len)            
    
    return counter

def update_counter(key, counter, high_val=None, high_val_len=None):
    """
    Update frequencies of key
    into counter (dictionary)

    If high_val and high_val_len are not False,
    count separately for different
    values in "high_value" column.
    """

    if (high_val is None) and (high_val_len is None):

        # count item
        if key in counter:
            counter[key] += 1
        else:
            counter[key] = 1
    else:

        # add item to counter if not already present
        if key not in counter:
            counter[key] = [0] * high_val_len

        # increment count
        counter[key][high_val] += 1


pd.options.display.max_rows = 100
most_recurring_items("clean_question", "words")

Total number of words:
31664

Total instances of words:
166473

Ratio of repeat instances:
(166473 - 31664) / 166473) = 0.8097949817688154

Around 80.979% were repeated instances
        across all instances of words,
        (i.e. appearing in more than one question)



100 most repeated words


Unnamed: 0,words,Frequency
0,one,1092
1,name,996
2,first,922
3,city,558
4,us,558
5,called,514
6,2,508
7,named,502
8,like,479
9,country,473


OK. Here they are, but they are unimpressive. "One" or "name" is not really a study topic. So, the frequency of a *word* does not seem useful.

One alternative would be use *topics*. Extracting *topics* from questions would require a more fine-tuned method. Unfortunately, though, this cannot be impelemnted due to time constraint and it is currently beyond my ability.

So, a feasible alternative would use *categories* in the `Category` column. This will be done in a later in  [section 3.2.](#3.2.-Categories).

For now, however, I will do one more anlaysis with words. Although this will be just for the sake of demonstration, the same method will be used on categories later on.

### 3.1.3. Which words in each question will lead to better chance of winning bigger reward?

Some questions give more money when answered correctly. If we divide the questions into high- and low-value questions, we can find the words which appear more often in the high-value questions than in low-value ones.

I will use [Chi-squared test](https://en.wikipedia.org/wiki/Chi-squared_test) to find the words whose frequencies in high- and low-value questions are significantly different (p < 0.05). Then, I will select words which appear more often in high-value questions than in low-value ones.

For each selected words, its probability to appear in high-value questions will be calculated (` = (number of selected words) / (number of words in high-value questions)`). This probability will be multiplied by Chi-squared value, which will yield a **score** for each word. Words with highest scores will be considered to be potentially more profitable.

After that, I will sort the selected words by **score** and then by chi-square value, and display the top 100 in the output. It is probably unnecessary to sort output by chi-square value, but I will do so in case words share same scores.

Each row of the output will contain the word, chi-square value, p-value and score.

In [9]:
from scipy.stats import chisquare

def high_value_assign(series, cutoff):
    """
    Return 1 if "Value" is above cutoff.
    Return 0 otherwise.
    """
    
    return 1 if series["clean_value"] > cutoff else 0

def count_high_low_val(col):
    """
    Returns a dictionary where
        each key is
            either a whole item at col
            or an element of the item
            (both will be referred to 
            'item' for simplicity) and
        each value is
            number of occurences of the key
            in low or high value questions.
    
    e.g. {item: [low_count, high_count)]}
    """

    # initiate counter
    counter = {}

    # update counter
    count(counter, col, update_counter, high_val=True)
            
    return counter

def get_highLowVal_chisq(counter, item):
    """
    Return item, chi-squared value, p-value and score
        for each item
            (1) in counter and
            (2) which appears more often in high group questions
                than low-value questions.

    Output structure:
    [(item, chi-square_value, p-value, score), ...]
    
    "score" shows potential fruitfulness of item.
    score = (probability of word appearing in high-value question) * (chi-square_value)
    """
    
    # set number of items with highest significance to display
    disp_num = 100

    # get numbers of high and low value questions
    high_len = jeopardy[jeopardy["high_value"] == 1].shape[0]
    low_len = jeopardy[jeopardy["high_value"] == 0].shape[0]

    # get total number of questions
    all_len = jeopardy.shape[0]

    # initiate list for chi-squared statistics output
    c_chisq = []

    # iterate over each item in item counter
    high_over_c = 0
    low_over_c = 0

    for i in counter:

        # get counts
        high_obs = counter[i][0]
        low_obs = counter[i][1]

        # get total observed counts
        all_obs = low_obs + high_obs

        # get prevelance in dataset
        prev = all_obs / all_len

        # get expected count of a item appearing in high value question
        high_exp = prev * high_len

        # get expected count of a item appearing in low value question
        low_exp = prev * low_len

        # get chi-squared value and its p value
        chisq, p = chisquare([low_obs, high_obs], [low_exp, high_exp])


        # if chi-square statistics is significant
        if p < 0.05:

            # if items in high-value questions
            # appear more frequently than expected
            if high_obs - high_exp > 0:
                
                # get probability of word appearing in high-value questions
                high_prob = high_obs / high_len
                
                # get score (this is an overal measure of
                # potential fruitfulness of the word)
                score = chisq * high_prob
                
                # update list
                # (last item in the list is a score)
                c_chisq.append((i, chisq, p, score))
    
    # display top disp_num items (sorted by )
    pd.options.display.max_rows = 100
    display(pd.DataFrame(sorted(c_chisq, key=operator.itemgetter(3,1), \
                        reverse=True)[:disp_num], \
                        columns = [item, "Chi-sqaured value", "P-value", "Score"]))
    

# 1. categorise questions as low or high value
cutoff = 800
jeopardy["high_value"] = jeopardy.apply(high_value_assign, axis=1, args=(cutoff,))

# 2. count number of times a word appears in high or low value questions
word_counter = count_high_low_val("clean_question")

# 3. display 100 words which are most heavily prevalent in high-value questions
# (more prevalent ones appear nearer top)
# Output structure is [(item, chi-square_value, p-value, score), ...]
# where higher score being better associated with greater reward
get_highLowVal_chisq(word_counter, "Word")

# delete counter to clear memory
del word_counter

Unnamed: 0,Word,Chi-sqaured value,P-value,Score
0,one,1057.237256,6.505652e-232,147.319946
1,first,817.647311,7.857238000000001e-180,93.685784
2,name,710.530998,1.5335350000000002e-156,82.527667
3,us,539.017905,3.083795e-119,38.353559
4,city,534.679984,2.7089510000000003e-118,37.951649
5,2,438.132112,2.760403e-97,27.431013
6,like,431.784884,6.643699e-96,25.828778
7,seen,415.082964,2.868878e-92,24.178185
8,country,398.693069,1.060323e-88,23.084426
9,called,375.325915,1.295983e-83,22.647849


Again, this list of *words* do not seem so helpful.

Let me finally analyse categories, which should be more useful.

## 3.2. Categories

### 3.2.1. What are the most recurring categories?

In [10]:
most_recurring_items("clean_category", "Categories")

Total number of Categories:
3579

100 most repeated Categories


Unnamed: 0,Categories,Frequency
0,TELEVISION,51
1,U.S. GEOGRAPHY,50
2,LITERATURE,45
3,HISTORY,40
4,AMERICAN HISTORY,40
5,BEFORE & AFTER,40
6,AUTHORS,39
7,WORD ORIGINS,38
8,WORLD CAPITALS,37
9,SPORTS,36


OK, the good news is that categories seem more informative than words in questions.

The bad news, however, is that the categories are still rather too broad to be considered foci. Besides, the questions span across the categories, none of which show obvious dominance.

This means that there will be **a lot** of categories to study.

But, I will do continue with the analysis to at least find the most potentially profitable category.

### 3.2.2. Which categories will lead to better chance of winning bigger reward?

This is equivalent to section [3.1.3](#3.1.3.-Which-words-in-each-question-will-lead-to-better-chance-of-winning-bigger-reward?) where reasoning and procedure are explained.

In [11]:
# count number of times a category appears in high or low value questions
category_counter = count_high_low_val("clean_category")

# display 100 categories which are most heavily prevalent in high-value questions
# (more prevalent ones appear nearer top)
get_highLowVal_chisq(category_counter, "clean_category")

# delete counter to clear memory
del category_counter

Unnamed: 0,clean_category,Chi-sqaured value,P-value,Score
0,TELEVISION,88.475484,5.146882e-21,0.694349
1,AUTHORS,59.684007,1.11378e-14,0.34349
2,HISTORY,56.672545,5.147728e-14,0.326159
3,BIRDS,63.801323,1.376211e-15,0.322678
4,BODIES OF WATER,58.078258,2.518949e-14,0.313991
5,U.S. GEOGRAPHY,45.899266,1.244926e-11,0.288171
6,SPORTS,52.596766,4.095631e-13,0.275184
7,TRAVEL & TOURISM,55.173937,1.103219e-13,0.259801
8,MAGAZINES,50.248681,1.354457e-12,0.254135
9,FOOD FACTS,55.414809,9.75984e-14,0.231942


# 4. So, should you prepare for it if you are to participate in *`Jeopardy!`* ?

My answer is a **no** There is simply too much ground to cover.


# 5. Tasks for later

It will not be worth studying for it no matter what, but the method could be fine-tuned to better identify categories which are potentially more profitable.

I used high- and low-value categorisation to judge potential profit, but it would be better to use the `Value` column.