# Functions

In [3]:
import numpy as np

In [6]:
def bool_DCG(judgements):
    dcgsum = judgements[0]
    for i in range(1,len(judgements)):
        dcgsum += judgements[i]/np.log2(i+2)     
    return dcgsum

In [8]:
def bool_nDCG(judgements,standard):
    dcgsum = judgements[0]
    stdsum = standard[0]
    for i in range(1,len(judgements)):
        dcgsum += judgements[i]/np.log2(i+2) 
        stdsum += standard[i]/np.log2(i+2)
    return dcgsum/stdsum

# Tests

## "Avengers" 
Here, we expect vague results but the inention is to show the top recent avengers films. The result show all the expected recent MCU avenger films, but the top result was the old avenger movie from 1998. The rankings werent what we expected so there was a difference, thus, the less than 1 nDCG score. 



In [46]:
# 0 for irrelevant
# 1 for somewhat relevant
# 2 for relevant
judgement = [1,2,2,2,2,1,0,0,0,0]
standard = [2,2,2,2,1,1,0,0,0,0]
print("| DCG | nDCG | Recall |")
print("|",bool_DCG(judgement), "|",bool_nDCG(judgement,standard),"|",str(0.6),"|")


| DCG | nDCG | Recall |
| 5.2531254248668064 | 0.8954792535685231 | 0.6 |


## "Avengers Age of Ultron"
Here we are expecting 1 title to be on top because the query is very specific. Since we're only expecting 1 and it returned it as top results, recall is perfect.



In [53]:

# 0 for irrelevant
# 1 for somewhat relevant
# 2 for relevant
judgement = [2]
standard = [2]
print("| DCG | nDCG | Recall |")
print("|",bool_DCG(judgement), "|",bool_nDCG(judgement,standard),"|",str(1),"|")

| DCG | nDCG | Recall |
| 2 | 1.0 | 1 |


## "Avengers" - filtered (Year 2000-2020)
Here we are expecting the top results to be the MCU Avengers films from 2000-2020.The results show these films in the top 4, therefore the recall is 1, nDCG is 1 aswell


In [55]:
# 0 for irrelevant
# 1 for somewhat relevant
# 2 for relevant
judgement = [2,2,2,2]
standard = [2,2,2,2]
print("| DCG | nDCG | Recall |")
print("|",bool_DCG(judgement), "|",bool_nDCG(judgement,standard),"|",str(1),"|")

| DCG | nDCG | Recall |
| 5.123212623289701 | 1.0 | 1 |


## "Michael"
For this scenario, as a user, i want to look for michael bay films. but i get lazy and just put 'michael'.

Since the algorithm prioritizes searching for titles first, the titles that include 'michael' gets returned first. We get bed scored for this search

nDCG returns 1 because we still consider the results somewhat relevant and we dont care about the rank. but recall is 0 because we don't have what we're looking for

In [56]:
# 0 for irrelevant
# 1 for somewhat relevant
# 2 for relevant
judgement = [1,1,1,1]
standard = [1,1,1,1]
print("| DCG | nDCG | Recall |")
print("|",bool_DCG(judgement), "|",bool_nDCG(judgement,standard),"|",str(0),"|")

| DCG | nDCG | Recall |
| 2.5616063116448506 | 1.0 | 0 |


## "Michael Bay" filter - Action
Now we used filter and specific search terms. The result was better. we have 7/10 relevant results. in terms of ranking, we dont have the top results on top 1, so we have less than 1 nDCG

In [57]:
# 0 for irrelevant
# 1 for somewhat relevant
# 2 for relevant
judgement = [0,2,2,2,0,2,2,2,2,0]
standard = [2,2,2,2,2,2,2,0,0,0]
print("| DCG | nDCG | Recall |")
print("|",bool_DCG(judgement), "|",bool_nDCG(judgement,standard),"|",str(0.7),"|")

| DCG | nDCG | Recall |
| 5.735283409071832 | 0.788246835854919 | 0.7 |


## "Anthony Hopkins"
Now the intention is to use actor name to search for films while adding horror filter. as a user, i expect top results to have Horror films that have Anthony Hopkins in it. 

This time, the top 2 results featured Anthony hopkins as director and actor. But the rest of the results showed "anthony" in the title. This has a perfect nDCG score since the ranking is correct but the recall is bad

In [58]:
# 0 for irrelevant
# 1 for somewhat relevant
# 2 for relevant
judgement = [2,2,0,0,0,0,0,0,0,0,0]
standard = [2,2,0,0,0,0,0,0,0,0,0]
print("| DCG | nDCG | Recall |")
print("|",bool_DCG(judgement), "|",bool_nDCG(judgement,standard),"|",str(0.2),"|")

| DCG | nDCG | Recall |
| 3.261859507142915 | 1.0 | 0.2 |


## "French Comedy" 
The intention to this query is to search for french movies that are comedy. This result gets films that have french in the title, but not necessarily french in language. So we have some results thet are irrelevant for our wants. recall is not perfect. 

In [61]:
# 0 for irrelevant
# 1 for somewhat relevant
# 2 for relevant
judgement = [2,1,2,0,0,0,1,0,0,1]
standard = [2,2,1,1,1,0,0,0,0,0]
print("| DCG | nDCG | Recall |")
print("|",bool_DCG(judgement), "|",bool_nDCG(judgement,standard),"|",str(1),"|")

| DCG | nDCG | Recall |
| 4.253327913222679 | 0.9287981500785571 | 1 |


## "Surprise Me" - filter French/comedy
For this, it seems like if a user is looking for some general recommendation, it would be better to use the filter option and "Surprise me" button. Since the filter 'french' and 'comedy' is used, almost all results would be relevant. Rank would be irrelevant for this one since the system uses random multipliers for the score of each result

In [60]:
# 0 for irrelevant
# 1 for somewhat relevant
# 2 for relevant
judgement = [2,2,2,2,2,2,2,2,2,2]
standard = [2,2,2,2,2,2,2,2,2,2]
print("| DCG | nDCG | Recall |")
print("|",bool_DCG(judgement), "|",bool_nDCG(judgement,standard),"|",str(1),"|")

| DCG | nDCG | Recall |
| 9.087118676176692 | 1.0 | 1 |


## "Surprise Me" filter - pre 2000/ - Western - English
the user intention for this one is to look for films pre-2000 that are western and in english. The Surprise me results to random films which follow the filter applied. The rank is irrelevant thus, we get a perfect nDCG score. Since these are recommendations, recall is perfect as all is relevant

In [62]:
# 0 for irrelevant
# 1 for somewhat relevant
# 2 for relevant
judgement = [2,2,2,2,2,2,2,2,2,2]
standard = [2,2,2,2,2,2,2,2,2,2]
print("| DCG | nDCG | Recall |")
print("|",bool_DCG(judgement), "|",bool_nDCG(judgement,standard),"|",str(1),"|")

| DCG | nDCG | Recall |
| 9.087118676176692 | 1.0 | 1 |


# Overall Results


| Query | Type | DCG | nDCG | Recall | 
|-------|------|-----|------|--------|
|  "Avengers"     |  Title - generic    | 5.2531254248668064 | 0.8954792535685231 | 0.6 |
|   "Avengers Age of Ultron"    |  Title - specific    | 2 | 1.0 | 1 |
|     "Avengers" - 2018 |   Title - filtered   |   5.123212623289701 | 1.0 | 1 |
|     "Michael"  |   Director - generic   | 2.5616063116448506 | 1.0 | 0 |
|     "Michael Bay" - action  |   Director - filtered   |  5.735283409071832 | 0.788246835854919 | 0.7 |
|     "Anthony Hopkins"  |   Actor - specific   | 3.261859507142915 | 1.0 | 0.2 ||
|     "French Comedy"  |   Genre - specific   | 4.253327913222679 | 0.9287981500785571 | 1 |
|     "Surprise Me!" - French - Comedy |   filtered   |9.087118676176692 | 1.0 | 1 |
|     "Surprise Me!" - pre-2000 - Western - American |   filtered   |   9.087118676176692 | 1.0 | 1 |

From these overall results, we can generalize that if a user wants to search for a specific movie, it's best practice to type the whole movie title to retrieve the specific movie as in the example of 'Avengers Age of Ultron'. Generally, more spicific search queries result to higher recall as more relevant results are retrieved.
If a user wishes to get recommendations with filters applied, hitting "Surprise Me" button is a fun was to get recommendations. Precision is not calculated because it needs false negatives. it would be impossible to know which movies were not retrieved since there are millions of titles in the database. Recall is restricted to the top 10 results. In this application. Recall is the amount of relevant movies in the top 10. But for some applications where we expect a certain amount (ex. 1 specific movie title), recall is 1, as long as the movie is shown in the top spot. 

