## Information Retrieval lab5

- Martyna Stasiak id.156071
- Maria Musiał id.156062
----

The purpose of the exercise is to implement a recommendation system for a movie search engine.

When we think about selecting a video that our user will like, let's first consider what data we have available? First of all, we have information in the database about how our user rated the movies he once watched. It's worth noting here that this is absolutely not all of the movies in our database given, and most often it's a heavily limited subset of a huge set of movies. So we can find out which movies our user liked and which ones he didn't. 

Is this all the data available? Well, no! We also have information about the preferences of other users! So we can find in the data a sample of users who have similar movie taste to our user. Note that virtually every such other user has watched some movies that our user has never watched before! The idea behind collaborative filtering is very simple: if another user with similar tastes rated a movie highly, our user will probably rate it highly too! Let's recommend movies that users with similar tastes have rated highly!


Let's formalize some ideas:
 - how to count the similarity between users' tastes? 
 
 Just calculate the correlation between their movie ratings. Users with a strongly positive correlation have similar tastes, and those with a strongly negative correlation have opposite tastes;) 
 
 - Having found similar users, how to count the predicted rating of the video by our user?
 
 We count the weighted average of ratings of users with similar tastes where the weight is the measure of similarity (correlation). The closer a user's tastes are to us, the more weight his rating has for us. (slide 27, http://www.mmds.org/mmds/v2.1/ch09-recsys1.pdf)


In [64]:
import pandas as pd
import numpy as np
from scipy.stats import pearsonr
import random
# import ace_tools as tools

pd.set_option("display.max_columns", None) 
pd.set_option("display.width", 1000)
pd.set_option("display.max_colwidth", None)
from IPython.core.display import display, HTML
from sklearn.metrics import mean_absolute_error, mean_squared_error

df = pd.read_csv('./ratings.csv')
df

  from IPython.core.display import display, HTML


Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931
...,...,...,...,...
100831,610,166534,4.0,1493848402
100832,610,168248,5.0,1493850091
100833,610,168250,5.0,1494273047
100834,610,168252,5.0,1493846352


In [65]:
random.seed(0)

-----

### <b>Task 1
Modify the dataframe to have moveID as index, userID as column and rating as values

In [66]:
dfTask = df.pivot(index='movieId', columns='userId', values='rating')
dfTask.head()

userId,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515,516,517,518,519,520,521,522,523,524,525,526,527,528,529,530,531,532,533,534,535,536,537,538,539,540,541,542,543,544,545,546,547,548,549,550,551,552,553,554,555,556,557,558,559,560,561,562,563,564,565,566,567,568,569,570,571,572,573,574,575,576,577,578,579,580,581,582,583,584,585,586,587,588,589,590,591,592,593,594,595,596,597,598,599,600,601,602,603,604,605,606,607,608,609,610
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1,Unnamed: 136_level_1,Unnamed: 137_level_1,Unnamed: 138_level_1,Unnamed: 139_level_1,Unnamed: 140_level_1,Unnamed: 141_level_1,Unnamed: 142_level_1,Unnamed: 143_level_1,Unnamed: 144_level_1,Unnamed: 145_level_1,Unnamed: 146_level_1,Unnamed: 147_level_1,Unnamed: 148_level_1,Unnamed: 149_level_1,Unnamed: 150_level_1,Unnamed: 151_level_1,Unnamed: 152_level_1,Unnamed: 153_level_1,Unnamed: 154_level_1,Unnamed: 155_level_1,Unnamed: 156_level_1,Unnamed: 157_level_1,Unnamed: 158_level_1,Unnamed: 159_level_1,Unnamed: 160_level_1,Unnamed: 161_level_1,Unnamed: 162_level_1,Unnamed: 163_level_1,Unnamed: 164_level_1,Unnamed: 165_level_1,Unnamed: 166_level_1,Unnamed: 167_level_1,Unnamed: 168_level_1,Unnamed: 169_level_1,Unnamed: 170_level_1,Unnamed: 171_level_1,Unnamed: 172_level_1,Unnamed: 173_level_1,Unnamed: 174_level_1,Unnamed: 175_level_1,Unnamed: 176_level_1,Unnamed: 177_level_1,Unnamed: 178_level_1,Unnamed: 179_level_1,Unnamed: 180_level_1,Unnamed: 181_level_1,Unnamed: 182_level_1,Unnamed: 183_level_1,Unnamed: 184_level_1,Unnamed: 185_level_1,Unnamed: 186_level_1,Unnamed: 187_level_1,Unnamed: 188_level_1,Unnamed: 189_level_1,Unnamed: 190_level_1,Unnamed: 191_level_1,Unnamed: 192_level_1,Unnamed: 193_level_1,Unnamed: 194_level_1,Unnamed: 195_level_1,Unnamed: 196_level_1,Unnamed: 197_level_1,Unnamed: 198_level_1,Unnamed: 199_level_1,Unnamed: 200_level_1,Unnamed: 201_level_1,Unnamed: 202_level_1,Unnamed: 203_level_1,Unnamed: 204_level_1,Unnamed: 205_level_1,Unnamed: 206_level_1,Unnamed: 207_level_1,Unnamed: 208_level_1,Unnamed: 209_level_1,Unnamed: 210_level_1,Unnamed: 211_level_1,Unnamed: 212_level_1,Unnamed: 213_level_1,Unnamed: 214_level_1,Unnamed: 215_level_1,Unnamed: 216_level_1,Unnamed: 217_level_1,Unnamed: 218_level_1,Unnamed: 219_level_1,Unnamed: 220_level_1,Unnamed: 221_level_1,Unnamed: 222_level_1,Unnamed: 223_level_1,Unnamed: 224_level_1,Unnamed: 225_level_1,Unnamed: 226_level_1,Unnamed: 227_level_1,Unnamed: 228_level_1,Unnamed: 229_level_1,Unnamed: 230_level_1,Unnamed: 231_level_1,Unnamed: 232_level_1,Unnamed: 233_level_1,Unnamed: 234_level_1,Unnamed: 235_level_1,Unnamed: 236_level_1,Unnamed: 237_level_1,Unnamed: 238_level_1,Unnamed: 239_level_1,Unnamed: 240_level_1,Unnamed: 241_level_1,Unnamed: 242_level_1,Unnamed: 243_level_1,Unnamed: 244_level_1,Unnamed: 245_level_1,Unnamed: 246_level_1,Unnamed: 247_level_1,Unnamed: 248_level_1,Unnamed: 249_level_1,Unnamed: 250_level_1,Unnamed: 251_level_1,Unnamed: 252_level_1,Unnamed: 253_level_1,Unnamed: 254_level_1,Unnamed: 255_level_1,Unnamed: 256_level_1,Unnamed: 257_level_1,Unnamed: 258_level_1,Unnamed: 259_level_1,Unnamed: 260_level_1,Unnamed: 261_level_1,Unnamed: 262_level_1,Unnamed: 263_level_1,Unnamed: 264_level_1,Unnamed: 265_level_1,Unnamed: 266_level_1,Unnamed: 267_level_1,Unnamed: 268_level_1,Unnamed: 269_level_1,Unnamed: 270_level_1,Unnamed: 271_level_1,Unnamed: 272_level_1,Unnamed: 273_level_1,Unnamed: 274_level_1,Unnamed: 275_level_1,Unnamed: 276_level_1,Unnamed: 277_level_1,Unnamed: 278_level_1,Unnamed: 279_level_1,Unnamed: 280_level_1,Unnamed: 281_level_1,Unnamed: 282_level_1,Unnamed: 283_level_1,Unnamed: 284_level_1,Unnamed: 285_level_1,Unnamed: 286_level_1,Unnamed: 287_level_1,Unnamed: 288_level_1,Unnamed: 289_level_1,Unnamed: 290_level_1,Unnamed: 291_level_1,Unnamed: 292_level_1,Unnamed: 293_level_1,Unnamed: 294_level_1,Unnamed: 295_level_1,Unnamed: 296_level_1,Unnamed: 297_level_1,Unnamed: 298_level_1,Unnamed: 299_level_1,Unnamed: 300_level_1,Unnamed: 301_level_1,Unnamed: 302_level_1,Unnamed: 303_level_1,Unnamed: 304_level_1,Unnamed: 305_level_1,Unnamed: 306_level_1,Unnamed: 307_level_1,Unnamed: 308_level_1,Unnamed: 309_level_1,Unnamed: 310_level_1,Unnamed: 311_level_1,Unnamed: 312_level_1,Unnamed: 313_level_1,Unnamed: 314_level_1,Unnamed: 315_level_1,Unnamed: 316_level_1,Unnamed: 317_level_1,Unnamed: 318_level_1,Unnamed: 319_level_1,Unnamed: 320_level_1,Unnamed: 321_level_1,Unnamed: 322_level_1,Unnamed: 323_level_1,Unnamed: 324_level_1,Unnamed: 325_level_1,Unnamed: 326_level_1,Unnamed: 327_level_1,Unnamed: 328_level_1,Unnamed: 329_level_1,Unnamed: 330_level_1,Unnamed: 331_level_1,Unnamed: 332_level_1,Unnamed: 333_level_1,Unnamed: 334_level_1,Unnamed: 335_level_1,Unnamed: 336_level_1,Unnamed: 337_level_1,Unnamed: 338_level_1,Unnamed: 339_level_1,Unnamed: 340_level_1,Unnamed: 341_level_1,Unnamed: 342_level_1,Unnamed: 343_level_1,Unnamed: 344_level_1,Unnamed: 345_level_1,Unnamed: 346_level_1,Unnamed: 347_level_1,Unnamed: 348_level_1,Unnamed: 349_level_1,Unnamed: 350_level_1,Unnamed: 351_level_1,Unnamed: 352_level_1,Unnamed: 353_level_1,Unnamed: 354_level_1,Unnamed: 355_level_1,Unnamed: 356_level_1,Unnamed: 357_level_1,Unnamed: 358_level_1,Unnamed: 359_level_1,Unnamed: 360_level_1,Unnamed: 361_level_1,Unnamed: 362_level_1,Unnamed: 363_level_1,Unnamed: 364_level_1,Unnamed: 365_level_1,Unnamed: 366_level_1,Unnamed: 367_level_1,Unnamed: 368_level_1,Unnamed: 369_level_1,Unnamed: 370_level_1,Unnamed: 371_level_1,Unnamed: 372_level_1,Unnamed: 373_level_1,Unnamed: 374_level_1,Unnamed: 375_level_1,Unnamed: 376_level_1,Unnamed: 377_level_1,Unnamed: 378_level_1,Unnamed: 379_level_1,Unnamed: 380_level_1,Unnamed: 381_level_1,Unnamed: 382_level_1,Unnamed: 383_level_1,Unnamed: 384_level_1,Unnamed: 385_level_1,Unnamed: 386_level_1,Unnamed: 387_level_1,Unnamed: 388_level_1,Unnamed: 389_level_1,Unnamed: 390_level_1,Unnamed: 391_level_1,Unnamed: 392_level_1,Unnamed: 393_level_1,Unnamed: 394_level_1,Unnamed: 395_level_1,Unnamed: 396_level_1,Unnamed: 397_level_1,Unnamed: 398_level_1,Unnamed: 399_level_1,Unnamed: 400_level_1,Unnamed: 401_level_1,Unnamed: 402_level_1,Unnamed: 403_level_1,Unnamed: 404_level_1,Unnamed: 405_level_1,Unnamed: 406_level_1,Unnamed: 407_level_1,Unnamed: 408_level_1,Unnamed: 409_level_1,Unnamed: 410_level_1,Unnamed: 411_level_1,Unnamed: 412_level_1,Unnamed: 413_level_1,Unnamed: 414_level_1,Unnamed: 415_level_1,Unnamed: 416_level_1,Unnamed: 417_level_1,Unnamed: 418_level_1,Unnamed: 419_level_1,Unnamed: 420_level_1,Unnamed: 421_level_1,Unnamed: 422_level_1,Unnamed: 423_level_1,Unnamed: 424_level_1,Unnamed: 425_level_1,Unnamed: 426_level_1,Unnamed: 427_level_1,Unnamed: 428_level_1,Unnamed: 429_level_1,Unnamed: 430_level_1,Unnamed: 431_level_1,Unnamed: 432_level_1,Unnamed: 433_level_1,Unnamed: 434_level_1,Unnamed: 435_level_1,Unnamed: 436_level_1,Unnamed: 437_level_1,Unnamed: 438_level_1,Unnamed: 439_level_1,Unnamed: 440_level_1,Unnamed: 441_level_1,Unnamed: 442_level_1,Unnamed: 443_level_1,Unnamed: 444_level_1,Unnamed: 445_level_1,Unnamed: 446_level_1,Unnamed: 447_level_1,Unnamed: 448_level_1,Unnamed: 449_level_1,Unnamed: 450_level_1,Unnamed: 451_level_1,Unnamed: 452_level_1,Unnamed: 453_level_1,Unnamed: 454_level_1,Unnamed: 455_level_1,Unnamed: 456_level_1,Unnamed: 457_level_1,Unnamed: 458_level_1,Unnamed: 459_level_1,Unnamed: 460_level_1,Unnamed: 461_level_1,Unnamed: 462_level_1,Unnamed: 463_level_1,Unnamed: 464_level_1,Unnamed: 465_level_1,Unnamed: 466_level_1,Unnamed: 467_level_1,Unnamed: 468_level_1,Unnamed: 469_level_1,Unnamed: 470_level_1,Unnamed: 471_level_1,Unnamed: 472_level_1,Unnamed: 473_level_1,Unnamed: 474_level_1,Unnamed: 475_level_1,Unnamed: 476_level_1,Unnamed: 477_level_1,Unnamed: 478_level_1,Unnamed: 479_level_1,Unnamed: 480_level_1,Unnamed: 481_level_1,Unnamed: 482_level_1,Unnamed: 483_level_1,Unnamed: 484_level_1,Unnamed: 485_level_1,Unnamed: 486_level_1,Unnamed: 487_level_1,Unnamed: 488_level_1,Unnamed: 489_level_1,Unnamed: 490_level_1,Unnamed: 491_level_1,Unnamed: 492_level_1,Unnamed: 493_level_1,Unnamed: 494_level_1,Unnamed: 495_level_1,Unnamed: 496_level_1,Unnamed: 497_level_1,Unnamed: 498_level_1,Unnamed: 499_level_1,Unnamed: 500_level_1,Unnamed: 501_level_1,Unnamed: 502_level_1,Unnamed: 503_level_1,Unnamed: 504_level_1,Unnamed: 505_level_1,Unnamed: 506_level_1,Unnamed: 507_level_1,Unnamed: 508_level_1,Unnamed: 509_level_1,Unnamed: 510_level_1,Unnamed: 511_level_1,Unnamed: 512_level_1,Unnamed: 513_level_1,Unnamed: 514_level_1,Unnamed: 515_level_1,Unnamed: 516_level_1,Unnamed: 517_level_1,Unnamed: 518_level_1,Unnamed: 519_level_1,Unnamed: 520_level_1,Unnamed: 521_level_1,Unnamed: 522_level_1,Unnamed: 523_level_1,Unnamed: 524_level_1,Unnamed: 525_level_1,Unnamed: 526_level_1,Unnamed: 527_level_1,Unnamed: 528_level_1,Unnamed: 529_level_1,Unnamed: 530_level_1,Unnamed: 531_level_1,Unnamed: 532_level_1,Unnamed: 533_level_1,Unnamed: 534_level_1,Unnamed: 535_level_1,Unnamed: 536_level_1,Unnamed: 537_level_1,Unnamed: 538_level_1,Unnamed: 539_level_1,Unnamed: 540_level_1,Unnamed: 541_level_1,Unnamed: 542_level_1,Unnamed: 543_level_1,Unnamed: 544_level_1,Unnamed: 545_level_1,Unnamed: 546_level_1,Unnamed: 547_level_1,Unnamed: 548_level_1,Unnamed: 549_level_1,Unnamed: 550_level_1,Unnamed: 551_level_1,Unnamed: 552_level_1,Unnamed: 553_level_1,Unnamed: 554_level_1,Unnamed: 555_level_1,Unnamed: 556_level_1,Unnamed: 557_level_1,Unnamed: 558_level_1,Unnamed: 559_level_1,Unnamed: 560_level_1,Unnamed: 561_level_1,Unnamed: 562_level_1,Unnamed: 563_level_1,Unnamed: 564_level_1,Unnamed: 565_level_1,Unnamed: 566_level_1,Unnamed: 567_level_1,Unnamed: 568_level_1,Unnamed: 569_level_1,Unnamed: 570_level_1,Unnamed: 571_level_1,Unnamed: 572_level_1,Unnamed: 573_level_1,Unnamed: 574_level_1,Unnamed: 575_level_1,Unnamed: 576_level_1,Unnamed: 577_level_1,Unnamed: 578_level_1,Unnamed: 579_level_1,Unnamed: 580_level_1,Unnamed: 581_level_1,Unnamed: 582_level_1,Unnamed: 583_level_1,Unnamed: 584_level_1,Unnamed: 585_level_1,Unnamed: 586_level_1,Unnamed: 587_level_1,Unnamed: 588_level_1,Unnamed: 589_level_1,Unnamed: 590_level_1,Unnamed: 591_level_1,Unnamed: 592_level_1,Unnamed: 593_level_1,Unnamed: 594_level_1,Unnamed: 595_level_1,Unnamed: 596_level_1,Unnamed: 597_level_1,Unnamed: 598_level_1,Unnamed: 599_level_1,Unnamed: 600_level_1,Unnamed: 601_level_1,Unnamed: 602_level_1,Unnamed: 603_level_1,Unnamed: 604_level_1,Unnamed: 605_level_1,Unnamed: 606_level_1,Unnamed: 607_level_1,Unnamed: 608_level_1,Unnamed: 609_level_1,Unnamed: 610_level_1
1,4.0,,,,4.0,,4.5,,,,,,,,2.5,,4.5,3.5,4.0,,3.5,,,,,,3.0,,,,5.0,3.0,3.0,,,,,,,5.0,,,5.0,3.0,4.0,5.0,,,,3.0,,,,3.0,,,5.0,,,,,,5.0,4.0,,4.0,,2.5,,,5.0,,4.5,,,0.5,,4.0,,,,2.5,,,,4.0,,,3.0,3.0,4.0,,3.0,,,5.0,,4.5,,,,,4.0,,,,4.0,,,,,3.0,,,,,,,3.5,,4.0,,,4.0,,,,,,3.0,,2.0,,3.0,4.0,,4.0,,,3.0,4.0,,,3.5,5.0,,,,,,5.0,,2.0,,3.0,4.0,,,4.5,4.0,4.0,,,,,5.0,3.5,,4.5,,5.0,,,,,,5.0,4.0,4.0,,,4.0,,,4.0,4.0,,,,,4.0,,2.0,,,,,,,3.5,5.0,4.0,,,,5.0,,,,,,,3.5,3.0,,3.0,4.0,,3.5,5.0,,,3.5,,,3.5,,,5.0,,,3.5,3.0,5.0,,,,,4.0,5.0,,,,,,,5.0,,4.0,,,4.5,,4.5,,,,,,,,,4.0,4.0,,2.0,,,5.0,5.0,,,5.0,4.0,5.0,4.0,4.0,,3.0,4.5,,4.5,3.0,,,,,4.5,,4.0,4.0,4.0,3.0,,,,,2.0,,,,,,5.0,,,4.0,,,,,,,3.0,,,,,,,,3.5,3.5,,,,,5.0,,4.0,,4.0,,3.5,,4.0,4.0,,4.0,,5.0,,,,,,5.0,,,4.0,,,5.0,,,,5.0,,4.0,,,,,5.0,,,5.0,,,,,3.0,3.0,,,,,4.5,,5.0,3.5,4.5,,,4.0,,,,5.0,,3.0,,,,,5.0,,,4.0,,3.5,,,,,,,,,,5.0,2.0,,4.0,,,,,,4.0,,4.0,,,,,,,,,,2.5,,4.0,,4.0,,4.5,,,,,4.0,,,,,5.0,,,5.0,,5.0,,,5.0,,,,4.5,,1.5,,,,,,4.0,4.0,4.0,5.0,,,4.0,,4.0,4.0,,,3.0,,,4.0,4.5,,,,4.5,,3.5,,4.0,,,,,,,,4.0,,,,4.0,,,,,4.0,,,,,4.0,,,4.0,,,,,3.0,,4.0,4.0,,,2.5,3.0,,,,5.0,4.0,,,,,,,3.0,,,3.0,,,,,,4.0,,,,,4.0,,,,5.0,3.0,4.0,4.5,,,,,3.5,,,4.0,,4.0,5.0,,,,,,4.0,3.0,,,,5.0,,,5.0,,,4.0,,,,,,4.0,4.0,,3.0,2.5,4.0,,4.0,3.0,4.0,2.5,4.0,2.5,3.0,5.0
2,,,,,,4.0,,4.0,,,,,,,,,,3.0,3.0,3.0,3.5,,,,,,4.0,,,,,,,,,,,,,,,,,,,,,,,,4.5,,,,,,,,,,,4.0,,,,,,2.5,,,,,,,,,,,,,,3.0,,,,,,,,,3.0,,5.0,4.0,,,,,,,,,4.0,3.0,,,5.0,,,,,1.5,,,,,3.0,,,,,4.0,,,4.0,,,,,,,,,,3.0,,,,,3.5,,,,3.0,,,,,1.0,,,,2.0,,,,,,,4.0,,,,,,,,,4.0,,,,,,,,3.5,,,,,,,,,4.0,,,,,,,,,,,,,,,,4.0,,,,,,,,,,,,,,,2.0,,2.5,,3.5,2.5,,,,3.0,,,,2.5,,4.0,,,,,,,,5.0,,,,,,,,,4.0,,,,,,,,,,2.0,,,,,,,,,,,,,,,3.5,,4.0,,,,,,,,4.0,,,,2.0,,,,,,3.0,,,,0.5,3.0,,,,,4.0,3.5,,2.5,3.0,,,,,,,,,,3.5,,,5.0,3.0,4.0,,,,,,,1.5,,,,,,,,,,,,,,,,,3.0,,,,,,4.0,,,,3.0,,3.5,,,,,,,,,,,,,,3.0,,,,,,,5.0,4.0,,,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,4.0,,,3.0,,,,,,,,,,,3.0,4.5,,,,,,4.0,,2.5,,4.0,,,,,,,,,,3.0,5.0,3.0,,,,,,,,,,4.0,,,,,,,,,,,,3.0,,,,3.0,4.5,4.0,4.0,,,3.0,,4.5,4.0,2.5,,,,,2.5,,,,,,,,2.5,,,,3.0,,,,,,,,,,,3.0,,,,,3.0,,,,,,4.5,,3.5,,4.0,,,,,,,4.5,,,,,,,,,,,,,,,,,,,,,,,,,4.0,,4.0,,2.5,,,4.0,,,,3.5,,,4.5,,,,,,,,,,,,,4.0,,,,2.5,,4.0,,4.0,,,,,2.5,4.0,,4.0,,5.0,3.5,,,2.0,,
3,4.0,,,,,5.0,,,,,,,,,,,,,3.0,,,,,,,,,,,,,3.0,,,,,,,,,,4.0,5.0,3.0,,,,,,,4.0,,,,,,,3.0,,,,,,3.5,,,,2.0,,,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,,3.5,,5.0,,,,,,,,,,,,,,3.5,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.0,3.0,,,,,,,,,,,,,,,,,,5.0,,,,,,,,,,4.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,,,,,,,,,3.5,,,,,,,,,,,,,,4.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4.0,3.0,,,,,,,,,,,,,,,,,,4.0,2.5,,,,,1.0,,,,,,,,3.0,,,,,3.5,0.5,,,,,,,,,,,,,3.0,,,,,,,,,3.0,,,,,,,4.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4.0,,,,4.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,3.0,,,,,,,,,,,,,,3.0,,,,,,,3.0,,,2.5,,,,,,,,,,,,4.0,,,,,,,,,5.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,1.0,,,5.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.0,,3.0,,,,4.0,,,,,1.5,,,,,,,,,2.0,,
4,,,,,,3.0,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.5,,,,,,,,,,
5,,,,,,5.0,,,,,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,,,,,5.0,,3.0,,,,,,,,,,,,,4.0,,,,,,,,4.0,,2.0,,,,,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,4.0,,,,4.0,,,,3.5,,,,,,3.0,,,4.0,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,4.5,,,3.0,,,,,,,,,,,,,,,,,,,5.0,3.0,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,4.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.0,3.0,,,,,,3.0,,,,,,,,,,,,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,4.0,,,,,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4.0,,,,,,,,,,,,,3.0,,,,,,,,,,,,2.0,,,,,,,,,,,,,,,,,,,,,,,2.0,,,,,,,,,,,3.0,,,3.0,,,,,3.0,,3.0,,,,,,,,,,,,3.0,,,,1.5,,,,,,,,,2.5,,,,,,2.0,0.5,,3.0,,,,,,,,,3.0,,,,,,,,1.5,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2.0,,,,,,,,,,2.5,,,,3.0,,,,,,


#### Now let's also see some stats about our movie database

In [67]:
numMovies = dfTask.shape[0]
numUsers = dfTask.shape[1]

numNonNan = dfTask.notna().sum().sum()
numNan = dfTask.isna().sum().sum()

#most and least watched movies:
movieWatchCount = dfTask.count(axis=1)
mostWatchedMovie = movieWatchCount.idxmax()
mostWatchedMovieWatchCount = movieWatchCount.max()
leastWatchedMovie = movieWatchCount.idxmin()
leastWatchedMovieWatchCount = movieWatchCount.min()

#most and least active users:
userWatchCount = dfTask.count(axis=0)
mostActiveUser = userWatchCount.idxmax()
mostActiveUserWatchCount = userWatchCount.max()
leastActiveUser = userWatchCount.idxmin()
leastActiveUserWatchCount = userWatchCount.min()


print(f"Dataset summary:")
print(f"Number of movies in the dataset: {numMovies}")
print(f"Number of users in the dataset: {numUsers}")
print(f"Number of non-NaN values in the dataset: {numNonNan}")
print(f"Number of NaN values in the dataset: {numNan}\n")

print(f"Most watched movie: {mostWatchedMovie} ({mostWatchedMovieWatchCount} watches)")
print(f"Least watched movie: {leastWatchedMovie} ({leastWatchedMovieWatchCount} watches)\n")

print(f"Most active user: {mostActiveUser} ({mostActiveUserWatchCount} movies rated)")
print(f"Least active user: {leastActiveUser} ({leastActiveUserWatchCount} movies rated)\n")


Dataset summary:
Number of movies in the dataset: 9724
Number of users in the dataset: 610
Number of non-NaN values in the dataset: 100836
Number of NaN values in the dataset: 5830804

Most watched movie: 356 (329 watches)
Least watched movie: 49 (1 watches)

Most active user: 414 (2698 movies rated)
Least active user: 53 (20 movies rated)



Small remark: <br>
Those stats for the most/least active user and watched movie might be different since there are different movies that might have the same 'watch count' (same with the users) and we print only one of them :)

--------

### <b>Task 2
Let's try to recommend movies for user 610. Calculate the correlation between this user and the remaining ones.

In [68]:
user = 610
user
userRatings = dfTask[user]
userRatings
print(f"User {user} has rated {userRatings.count()} movies")
print(f"Ratings of user {user}:\n {userRatings.dropna()}")


User 610 has rated 1302 movies
Ratings of user 610:
 movieId
1         5.0
6         5.0
16        4.5
32        4.5
47        5.0
         ... 
166534    4.0
168248    5.0
168250    5.0
168252    5.0
170875    3.0
Name: 610, Length: 1302, dtype: float64


In [69]:
def CalculatetCorrelations(user, commonMovies=2, moviesdf=dfTask):
    correlations = {}
    userRatings = moviesdf[user].dropna()
    
    for otherUser in moviesdf.columns:
        
        if otherUser != user:
            otherUserRatings = moviesdf[otherUser].dropna()
            commonRatings = userRatings.index.intersection(otherUserRatings.index)
            
            if len(commonRatings) >= commonMovies: 
                correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]

                
    valid_correlations = {k: v for k, v in correlations.items() if not np.isnan(v)} #getting rid of Nan correlations since we get some of that
                
    sorted_correlations = sorted(valid_correlations.items(), key=lambda x: x[1], reverse=True)
    return sorted_correlations

In [70]:
user610Correlations = CalculatetCorrelations(user=610)
print(f"Top correlated users with the user {user} are:")
for user, corr in user610Correlations[:10]:
    print(f"User {user} with correlation {corr:.2f}")

  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Top correlated users with the user 610 are:
User 442 with correlation 1.00
User 545 with correlation 1.00
User 576 with correlation 1.00
User 158 with correlation 0.91
User 92 with correlation 0.90
User 595 with correlation 0.89
User 120 with correlation 0.88
User 463 with correlation 0.82
User 138 with correlation 0.82
User 494 with correlation 0.81


---------

### <b>Task 2b
There are a few users with the perfect match. Isn't it suspicious? Check it

In [71]:
user610Ratings = dfTask[610].dropna()
print(f"User 610 has rated {user610Ratings.count()} movies")
user442Ratings = dfTask[442].dropna()
print(f"User 442 has rated {user442Ratings.count()} movies")
commonMoviesRated = user610Ratings.index.intersection(user442Ratings.index)
print(f"User 610 and User 442 have rated {len(commonMoviesRated)} common movies")

User 610 has rated 1302 movies
User 442 has rated 20 movies
User 610 and User 442 have rated 2 common movies


In [72]:
user610Ratings = dfTask[610].dropna()
print(f"User 610 has rated {user610Ratings.count()} movies")
user545Ratings = dfTask[545].dropna()
print(f"User 545 has rated {user545Ratings.count()} movies")
commonMoviesRated = user610Ratings.index.intersection(user545Ratings.index)
print(f"User 610 and User 545 have rated {len(commonMoviesRated)} common movies")

User 610 has rated 1302 movies
User 545 has rated 23 movies
User 610 and User 545 have rated 2 common movies


In [73]:
user610Ratings = dfTask[610].dropna()
print(f"User 610 has rated {user610Ratings.count()} movies")
user576Ratings = dfTask[576].dropna()
print(f"User 576 has rated {user576Ratings.count()} movies")
commonMoviesRated = user610Ratings.index.intersection(user576Ratings.index)
print(f"User 610 and User 576 have rated {len(commonMoviesRated)} common movies")

User 610 has rated 1302 movies
User 576 has rated 20 movies
User 610 and User 576 have rated 2 common movies


and on th other hand:

In [74]:
user610Ratings = dfTask[610].dropna()
print(f"User 610 has rated {user610Ratings.count()} movies")
user158Ratings = dfTask[158].dropna()
print(f"User 158 has rated {user158Ratings.count()} movies")
commonMoviesRated = user610Ratings.index.intersection(user158Ratings.index)
print(f"User 610 and User 158 have rated {len(commonMoviesRated)} common movies")

User 610 has rated 1302 movies
User 158 has rated 26 movies
User 610 and User 158 have rated 4 common movies


In [75]:
user610Ratings = dfTask[610].dropna()
print(f"User 610 has rated {user610Ratings.count()} movies")
user92Ratings = dfTask[92].dropna()
print(f"User 92 has rated {user92Ratings.count()} movies")
commonMoviesRated = user610Ratings.index.intersection(user92Ratings.index)
print(f"User 610 and User 92 have rated {len(commonMoviesRated)} common movies")

User 610 has rated 1302 movies
User 92 has rated 24 movies
User 610 and User 92 have rated 5 common movies


In [76]:
user610Ratings = dfTask[610].dropna()
print(f"User 610 has rated {user610Ratings.count()} movies")
user494Ratings = dfTask[494].dropna()
print(f"User 494 has rated {user494Ratings.count()} movies")
commonMoviesRated = user610Ratings.index.intersection(user494Ratings.index)
print(f"User 610 and User 494 have rated {len(commonMoviesRated)} common movies")

User 610 has rated 1302 movies
User 494 has rated 22 movies
User 610 and User 494 have rated 19 common movies


The perfect match occurs in those 3 cases because the users have small number of movies in common (only 2), so it is not quite reliable.

----

### <b>Task 3
Find 5 users with at least 5 common movies with user=610 and the highest correlation with that user

In [77]:
numTopUsers = 5
commonMovies = 5
user610Correlations = CalculatetCorrelations(user = 610, commonMovies = commonMovies)
Best5CorrelatedUsers = user610Correlations[:numTopUsers]

print(f"Top {numTopUsers} correlated users with the user {user}, who have wathced at least {commonMovies} same movies are:")
for otherUser, correlation in Best5CorrelatedUsers:
    print(f"User {otherUser} with correlation {correlation:.2f}")


Top 5 correlated users with the user 494, who have wathced at least 5 same movies are:
User 92 with correlation 0.90
User 120 with correlation 0.88
User 463 with correlation 0.82
User 138 with correlation 0.82
User 494 with correlation 0.81


-------

### <b> Task 4
Predict scores for each movie based on the most correlated users. Use weighted average with correlation coefficient as weights.
$$\hat{y_j} = \frac{\sum_{i \in U} w_iy_{ij}}{\sum_{i \in U} w_i}$$

$U$ is a set of those users that also watched $j$th moveie, $w$ denotes the correlation between our user and $i$th user, $y_{ij}$ is a score given by $i$th user to $j$th movie
Use only movies watched by at least two users from the considered set

In [78]:
def predictScores(user, moviesdf = dfTask, commonMovies = 5, topUsers = 5, negativeUsers=0, sortBy='Predicted Score'):
    userXCorrelations = CalculatetCorrelations(user, commonMovies)
    topNUsers = userXCorrelations[:topUsers]
    worstNUsers = userXCorrelations[-negativeUsers:]
    predictedScores = []
    
    for movie in moviesdf.index:
        if np.isnan(moviesdf.loc[movie, user]):
            predictedScore = 0
            sumCorr = 0
            otherUserDetails = []
            contibutingUsers = 0
            
            for otherUser, correlation in topNUsers:
                otherUserRating = moviesdf.loc[movie, otherUser]
                if not np.isnan(otherUserRating):
                    predictedScore += otherUserRating * correlation
                    sumCorr += correlation
                    contibutingUsers += 1
                    otherUserDetails.append(f"User: {otherUser}, Rating: {otherUserRating}, Correlation: {correlation:.2f}")
                    
            if negativeUsers != 0:
                for otherUser, correlation in worstNUsers:
                    otherUserRating = moviesdf.loc[movie, otherUser]
                    if not np.isnan(otherUserRating):
                        predictedScore += abs((abs(6 - otherUserRating)) * correlation)
                        sumCorr += abs(correlation)
                        contibutingUsers += 1
                        otherUserDetails.append(f"User: {otherUser}, Rating: {otherUserRating}, Correlation: {correlation:.2f}")
                    
            if sumCorr != 0:
                predictedScore /= sumCorr
                otherUserDetailsStr = '<br>'.join(otherUserDetails)
                predictedScores.append({'Movie': movie, 'Predicted Score': predictedScore, 'Count of users contributing to prediction': contibutingUsers, 'Users on based on which prediction was made details': otherUserDetailsStr})
                
    if sortBy == 'User':
        predictedScoresdf = pd.DataFrame(predictedScores).sort_values(by=['Count of users contributing to prediction', 'Predicted Score'], ascending=[False, False])
    else:
        predictedScoresdf = pd.DataFrame(predictedScores).sort_values(by=['Predicted Score', 'Count of users contributing to prediction'], ascending=[False, False])

                
                
    # predictedScoresdf = pd.DataFrame(predictedScores).sort_values(by='Predicted Score', ascending=False)         
    return predictedScoresdf



In [79]:
predictedRatings610 = predictScores(610)
display(HTML(predictedRatings610.head(10).to_html(escape=False)))

Unnamed: 0,Movie,Predicted Score,Count of users contributing to prediction,Users on based on which prediction was made details
3,107,5.0,1,"User: 92, Rating: 5.0, Correlation: 0.90"
5,222,5.0,1,"User: 138, Rating: 5.0, Correlation: 0.82"
21,837,5.0,1,"User: 92, Rating: 5.0, Correlation: 0.90"
23,898,5.0,1,"User: 138, Rating: 5.0, Correlation: 0.82"
25,1019,5.0,1,"User: 138, Rating: 5.0, Correlation: 0.82"
49,2150,5.0,1,"User: 138, Rating: 5.0, Correlation: 0.82"
55,2572,5.0,1,"User: 92, Rating: 5.0, Correlation: 0.90"
10,616,4.5,1,"User: 138, Rating: 4.5, Correlation: 0.82"
34,1552,4.5,1,"User: 463, Rating: 4.5, Correlation: 0.82"
45,2087,4.5,1,"User: 92, Rating: 4.5, Correlation: 0.90"


In [80]:
predictedRatings610 = predictScores(610, sortBy='User')
display(HTML(predictedRatings610.head(10).to_html(escape=False)))

Unnamed: 0,Movie,Predicted Score,Count of users contributing to prediction,Users on based on which prediction was made details
32,1367,3.493226,2,"User: 92, Rating: 3.0, Correlation: 0.90 User: 120, Rating: 4.0, Correlation: 0.88"
3,107,5.0,1,"User: 92, Rating: 5.0, Correlation: 0.90"
5,222,5.0,1,"User: 138, Rating: 5.0, Correlation: 0.82"
21,837,5.0,1,"User: 92, Rating: 5.0, Correlation: 0.90"
23,898,5.0,1,"User: 138, Rating: 5.0, Correlation: 0.82"
25,1019,5.0,1,"User: 138, Rating: 5.0, Correlation: 0.82"
49,2150,5.0,1,"User: 138, Rating: 5.0, Correlation: 0.82"
55,2572,5.0,1,"User: 92, Rating: 5.0, Correlation: 0.90"
10,616,4.5,1,"User: 138, Rating: 4.5, Correlation: 0.82"
34,1552,4.5,1,"User: 463, Rating: 4.5, Correlation: 0.82"


In [81]:
predictedRatings610withNegative = predictScores(610, negativeUsers=5)
display(HTML(predictedRatings610withNegative.head(10).to_html(escape=False)))

Unnamed: 0,Movie,Predicted Score,Count of users contributing to prediction,Users on based on which prediction was made details
8,107,5.0,1,"User: 92, Rating: 5.0, Correlation: 0.90"
15,196,5.0,1,"User: 536, Rating: 1.0, Correlation: -0.64"
22,253,5.0,1,"User: 536, Rating: 1.0, Correlation: -0.64"
47,431,5.0,1,"User: 536, Rating: 1.0, Correlation: -0.64"
80,837,5.0,1,"User: 92, Rating: 5.0, Correlation: 0.90"
83,898,5.0,1,"User: 138, Rating: 5.0, Correlation: 0.82"
86,1019,5.0,1,"User: 138, Rating: 5.0, Correlation: 0.82"
120,2150,5.0,1,"User: 138, Rating: 5.0, Correlation: 0.82"
67,616,4.5,1,"User: 138, Rating: 4.5, Correlation: 0.82"
101,1552,4.5,1,"User: 463, Rating: 4.5, Correlation: 0.82"


In [82]:
predictedRatings610withMoreNegative = predictScores(610, negativeUsers=15)
display(HTML(predictedRatings610withNegative.head(10).to_html(escape=False)))

Unnamed: 0,Movie,Predicted Score,Count of users contributing to prediction,Users on based on which prediction was made details
8,107,5.0,1,"User: 92, Rating: 5.0, Correlation: 0.90"
15,196,5.0,1,"User: 536, Rating: 1.0, Correlation: -0.64"
22,253,5.0,1,"User: 536, Rating: 1.0, Correlation: -0.64"
47,431,5.0,1,"User: 536, Rating: 1.0, Correlation: -0.64"
80,837,5.0,1,"User: 92, Rating: 5.0, Correlation: 0.90"
83,898,5.0,1,"User: 138, Rating: 5.0, Correlation: 0.82"
86,1019,5.0,1,"User: 138, Rating: 5.0, Correlation: 0.82"
120,2150,5.0,1,"User: 138, Rating: 5.0, Correlation: 0.82"
67,616,4.5,1,"User: 138, Rating: 4.5, Correlation: 0.82"
101,1552,4.5,1,"User: 463, Rating: 4.5, Correlation: 0.82"


-----

### <b> Task 5
How to check the quality of our recommendations? 

We have to remove a few scores from the dataset and then compare predictions with the real ones.

First to not permamently change the dataframe that we work on I hve created the copy of it, so that when we remove some ratings our dfTask will not be affected and we will be able to compare the changes.

In [83]:
dfTaskComparing = dfTask.copy()
dfTaskComparing.head()

userId,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515,516,517,518,519,520,521,522,523,524,525,526,527,528,529,530,531,532,533,534,535,536,537,538,539,540,541,542,543,544,545,546,547,548,549,550,551,552,553,554,555,556,557,558,559,560,561,562,563,564,565,566,567,568,569,570,571,572,573,574,575,576,577,578,579,580,581,582,583,584,585,586,587,588,589,590,591,592,593,594,595,596,597,598,599,600,601,602,603,604,605,606,607,608,609,610
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1,Unnamed: 136_level_1,Unnamed: 137_level_1,Unnamed: 138_level_1,Unnamed: 139_level_1,Unnamed: 140_level_1,Unnamed: 141_level_1,Unnamed: 142_level_1,Unnamed: 143_level_1,Unnamed: 144_level_1,Unnamed: 145_level_1,Unnamed: 146_level_1,Unnamed: 147_level_1,Unnamed: 148_level_1,Unnamed: 149_level_1,Unnamed: 150_level_1,Unnamed: 151_level_1,Unnamed: 152_level_1,Unnamed: 153_level_1,Unnamed: 154_level_1,Unnamed: 155_level_1,Unnamed: 156_level_1,Unnamed: 157_level_1,Unnamed: 158_level_1,Unnamed: 159_level_1,Unnamed: 160_level_1,Unnamed: 161_level_1,Unnamed: 162_level_1,Unnamed: 163_level_1,Unnamed: 164_level_1,Unnamed: 165_level_1,Unnamed: 166_level_1,Unnamed: 167_level_1,Unnamed: 168_level_1,Unnamed: 169_level_1,Unnamed: 170_level_1,Unnamed: 171_level_1,Unnamed: 172_level_1,Unnamed: 173_level_1,Unnamed: 174_level_1,Unnamed: 175_level_1,Unnamed: 176_level_1,Unnamed: 177_level_1,Unnamed: 178_level_1,Unnamed: 179_level_1,Unnamed: 180_level_1,Unnamed: 181_level_1,Unnamed: 182_level_1,Unnamed: 183_level_1,Unnamed: 184_level_1,Unnamed: 185_level_1,Unnamed: 186_level_1,Unnamed: 187_level_1,Unnamed: 188_level_1,Unnamed: 189_level_1,Unnamed: 190_level_1,Unnamed: 191_level_1,Unnamed: 192_level_1,Unnamed: 193_level_1,Unnamed: 194_level_1,Unnamed: 195_level_1,Unnamed: 196_level_1,Unnamed: 197_level_1,Unnamed: 198_level_1,Unnamed: 199_level_1,Unnamed: 200_level_1,Unnamed: 201_level_1,Unnamed: 202_level_1,Unnamed: 203_level_1,Unnamed: 204_level_1,Unnamed: 205_level_1,Unnamed: 206_level_1,Unnamed: 207_level_1,Unnamed: 208_level_1,Unnamed: 209_level_1,Unnamed: 210_level_1,Unnamed: 211_level_1,Unnamed: 212_level_1,Unnamed: 213_level_1,Unnamed: 214_level_1,Unnamed: 215_level_1,Unnamed: 216_level_1,Unnamed: 217_level_1,Unnamed: 218_level_1,Unnamed: 219_level_1,Unnamed: 220_level_1,Unnamed: 221_level_1,Unnamed: 222_level_1,Unnamed: 223_level_1,Unnamed: 224_level_1,Unnamed: 225_level_1,Unnamed: 226_level_1,Unnamed: 227_level_1,Unnamed: 228_level_1,Unnamed: 229_level_1,Unnamed: 230_level_1,Unnamed: 231_level_1,Unnamed: 232_level_1,Unnamed: 233_level_1,Unnamed: 234_level_1,Unnamed: 235_level_1,Unnamed: 236_level_1,Unnamed: 237_level_1,Unnamed: 238_level_1,Unnamed: 239_level_1,Unnamed: 240_level_1,Unnamed: 241_level_1,Unnamed: 242_level_1,Unnamed: 243_level_1,Unnamed: 244_level_1,Unnamed: 245_level_1,Unnamed: 246_level_1,Unnamed: 247_level_1,Unnamed: 248_level_1,Unnamed: 249_level_1,Unnamed: 250_level_1,Unnamed: 251_level_1,Unnamed: 252_level_1,Unnamed: 253_level_1,Unnamed: 254_level_1,Unnamed: 255_level_1,Unnamed: 256_level_1,Unnamed: 257_level_1,Unnamed: 258_level_1,Unnamed: 259_level_1,Unnamed: 260_level_1,Unnamed: 261_level_1,Unnamed: 262_level_1,Unnamed: 263_level_1,Unnamed: 264_level_1,Unnamed: 265_level_1,Unnamed: 266_level_1,Unnamed: 267_level_1,Unnamed: 268_level_1,Unnamed: 269_level_1,Unnamed: 270_level_1,Unnamed: 271_level_1,Unnamed: 272_level_1,Unnamed: 273_level_1,Unnamed: 274_level_1,Unnamed: 275_level_1,Unnamed: 276_level_1,Unnamed: 277_level_1,Unnamed: 278_level_1,Unnamed: 279_level_1,Unnamed: 280_level_1,Unnamed: 281_level_1,Unnamed: 282_level_1,Unnamed: 283_level_1,Unnamed: 284_level_1,Unnamed: 285_level_1,Unnamed: 286_level_1,Unnamed: 287_level_1,Unnamed: 288_level_1,Unnamed: 289_level_1,Unnamed: 290_level_1,Unnamed: 291_level_1,Unnamed: 292_level_1,Unnamed: 293_level_1,Unnamed: 294_level_1,Unnamed: 295_level_1,Unnamed: 296_level_1,Unnamed: 297_level_1,Unnamed: 298_level_1,Unnamed: 299_level_1,Unnamed: 300_level_1,Unnamed: 301_level_1,Unnamed: 302_level_1,Unnamed: 303_level_1,Unnamed: 304_level_1,Unnamed: 305_level_1,Unnamed: 306_level_1,Unnamed: 307_level_1,Unnamed: 308_level_1,Unnamed: 309_level_1,Unnamed: 310_level_1,Unnamed: 311_level_1,Unnamed: 312_level_1,Unnamed: 313_level_1,Unnamed: 314_level_1,Unnamed: 315_level_1,Unnamed: 316_level_1,Unnamed: 317_level_1,Unnamed: 318_level_1,Unnamed: 319_level_1,Unnamed: 320_level_1,Unnamed: 321_level_1,Unnamed: 322_level_1,Unnamed: 323_level_1,Unnamed: 324_level_1,Unnamed: 325_level_1,Unnamed: 326_level_1,Unnamed: 327_level_1,Unnamed: 328_level_1,Unnamed: 329_level_1,Unnamed: 330_level_1,Unnamed: 331_level_1,Unnamed: 332_level_1,Unnamed: 333_level_1,Unnamed: 334_level_1,Unnamed: 335_level_1,Unnamed: 336_level_1,Unnamed: 337_level_1,Unnamed: 338_level_1,Unnamed: 339_level_1,Unnamed: 340_level_1,Unnamed: 341_level_1,Unnamed: 342_level_1,Unnamed: 343_level_1,Unnamed: 344_level_1,Unnamed: 345_level_1,Unnamed: 346_level_1,Unnamed: 347_level_1,Unnamed: 348_level_1,Unnamed: 349_level_1,Unnamed: 350_level_1,Unnamed: 351_level_1,Unnamed: 352_level_1,Unnamed: 353_level_1,Unnamed: 354_level_1,Unnamed: 355_level_1,Unnamed: 356_level_1,Unnamed: 357_level_1,Unnamed: 358_level_1,Unnamed: 359_level_1,Unnamed: 360_level_1,Unnamed: 361_level_1,Unnamed: 362_level_1,Unnamed: 363_level_1,Unnamed: 364_level_1,Unnamed: 365_level_1,Unnamed: 366_level_1,Unnamed: 367_level_1,Unnamed: 368_level_1,Unnamed: 369_level_1,Unnamed: 370_level_1,Unnamed: 371_level_1,Unnamed: 372_level_1,Unnamed: 373_level_1,Unnamed: 374_level_1,Unnamed: 375_level_1,Unnamed: 376_level_1,Unnamed: 377_level_1,Unnamed: 378_level_1,Unnamed: 379_level_1,Unnamed: 380_level_1,Unnamed: 381_level_1,Unnamed: 382_level_1,Unnamed: 383_level_1,Unnamed: 384_level_1,Unnamed: 385_level_1,Unnamed: 386_level_1,Unnamed: 387_level_1,Unnamed: 388_level_1,Unnamed: 389_level_1,Unnamed: 390_level_1,Unnamed: 391_level_1,Unnamed: 392_level_1,Unnamed: 393_level_1,Unnamed: 394_level_1,Unnamed: 395_level_1,Unnamed: 396_level_1,Unnamed: 397_level_1,Unnamed: 398_level_1,Unnamed: 399_level_1,Unnamed: 400_level_1,Unnamed: 401_level_1,Unnamed: 402_level_1,Unnamed: 403_level_1,Unnamed: 404_level_1,Unnamed: 405_level_1,Unnamed: 406_level_1,Unnamed: 407_level_1,Unnamed: 408_level_1,Unnamed: 409_level_1,Unnamed: 410_level_1,Unnamed: 411_level_1,Unnamed: 412_level_1,Unnamed: 413_level_1,Unnamed: 414_level_1,Unnamed: 415_level_1,Unnamed: 416_level_1,Unnamed: 417_level_1,Unnamed: 418_level_1,Unnamed: 419_level_1,Unnamed: 420_level_1,Unnamed: 421_level_1,Unnamed: 422_level_1,Unnamed: 423_level_1,Unnamed: 424_level_1,Unnamed: 425_level_1,Unnamed: 426_level_1,Unnamed: 427_level_1,Unnamed: 428_level_1,Unnamed: 429_level_1,Unnamed: 430_level_1,Unnamed: 431_level_1,Unnamed: 432_level_1,Unnamed: 433_level_1,Unnamed: 434_level_1,Unnamed: 435_level_1,Unnamed: 436_level_1,Unnamed: 437_level_1,Unnamed: 438_level_1,Unnamed: 439_level_1,Unnamed: 440_level_1,Unnamed: 441_level_1,Unnamed: 442_level_1,Unnamed: 443_level_1,Unnamed: 444_level_1,Unnamed: 445_level_1,Unnamed: 446_level_1,Unnamed: 447_level_1,Unnamed: 448_level_1,Unnamed: 449_level_1,Unnamed: 450_level_1,Unnamed: 451_level_1,Unnamed: 452_level_1,Unnamed: 453_level_1,Unnamed: 454_level_1,Unnamed: 455_level_1,Unnamed: 456_level_1,Unnamed: 457_level_1,Unnamed: 458_level_1,Unnamed: 459_level_1,Unnamed: 460_level_1,Unnamed: 461_level_1,Unnamed: 462_level_1,Unnamed: 463_level_1,Unnamed: 464_level_1,Unnamed: 465_level_1,Unnamed: 466_level_1,Unnamed: 467_level_1,Unnamed: 468_level_1,Unnamed: 469_level_1,Unnamed: 470_level_1,Unnamed: 471_level_1,Unnamed: 472_level_1,Unnamed: 473_level_1,Unnamed: 474_level_1,Unnamed: 475_level_1,Unnamed: 476_level_1,Unnamed: 477_level_1,Unnamed: 478_level_1,Unnamed: 479_level_1,Unnamed: 480_level_1,Unnamed: 481_level_1,Unnamed: 482_level_1,Unnamed: 483_level_1,Unnamed: 484_level_1,Unnamed: 485_level_1,Unnamed: 486_level_1,Unnamed: 487_level_1,Unnamed: 488_level_1,Unnamed: 489_level_1,Unnamed: 490_level_1,Unnamed: 491_level_1,Unnamed: 492_level_1,Unnamed: 493_level_1,Unnamed: 494_level_1,Unnamed: 495_level_1,Unnamed: 496_level_1,Unnamed: 497_level_1,Unnamed: 498_level_1,Unnamed: 499_level_1,Unnamed: 500_level_1,Unnamed: 501_level_1,Unnamed: 502_level_1,Unnamed: 503_level_1,Unnamed: 504_level_1,Unnamed: 505_level_1,Unnamed: 506_level_1,Unnamed: 507_level_1,Unnamed: 508_level_1,Unnamed: 509_level_1,Unnamed: 510_level_1,Unnamed: 511_level_1,Unnamed: 512_level_1,Unnamed: 513_level_1,Unnamed: 514_level_1,Unnamed: 515_level_1,Unnamed: 516_level_1,Unnamed: 517_level_1,Unnamed: 518_level_1,Unnamed: 519_level_1,Unnamed: 520_level_1,Unnamed: 521_level_1,Unnamed: 522_level_1,Unnamed: 523_level_1,Unnamed: 524_level_1,Unnamed: 525_level_1,Unnamed: 526_level_1,Unnamed: 527_level_1,Unnamed: 528_level_1,Unnamed: 529_level_1,Unnamed: 530_level_1,Unnamed: 531_level_1,Unnamed: 532_level_1,Unnamed: 533_level_1,Unnamed: 534_level_1,Unnamed: 535_level_1,Unnamed: 536_level_1,Unnamed: 537_level_1,Unnamed: 538_level_1,Unnamed: 539_level_1,Unnamed: 540_level_1,Unnamed: 541_level_1,Unnamed: 542_level_1,Unnamed: 543_level_1,Unnamed: 544_level_1,Unnamed: 545_level_1,Unnamed: 546_level_1,Unnamed: 547_level_1,Unnamed: 548_level_1,Unnamed: 549_level_1,Unnamed: 550_level_1,Unnamed: 551_level_1,Unnamed: 552_level_1,Unnamed: 553_level_1,Unnamed: 554_level_1,Unnamed: 555_level_1,Unnamed: 556_level_1,Unnamed: 557_level_1,Unnamed: 558_level_1,Unnamed: 559_level_1,Unnamed: 560_level_1,Unnamed: 561_level_1,Unnamed: 562_level_1,Unnamed: 563_level_1,Unnamed: 564_level_1,Unnamed: 565_level_1,Unnamed: 566_level_1,Unnamed: 567_level_1,Unnamed: 568_level_1,Unnamed: 569_level_1,Unnamed: 570_level_1,Unnamed: 571_level_1,Unnamed: 572_level_1,Unnamed: 573_level_1,Unnamed: 574_level_1,Unnamed: 575_level_1,Unnamed: 576_level_1,Unnamed: 577_level_1,Unnamed: 578_level_1,Unnamed: 579_level_1,Unnamed: 580_level_1,Unnamed: 581_level_1,Unnamed: 582_level_1,Unnamed: 583_level_1,Unnamed: 584_level_1,Unnamed: 585_level_1,Unnamed: 586_level_1,Unnamed: 587_level_1,Unnamed: 588_level_1,Unnamed: 589_level_1,Unnamed: 590_level_1,Unnamed: 591_level_1,Unnamed: 592_level_1,Unnamed: 593_level_1,Unnamed: 594_level_1,Unnamed: 595_level_1,Unnamed: 596_level_1,Unnamed: 597_level_1,Unnamed: 598_level_1,Unnamed: 599_level_1,Unnamed: 600_level_1,Unnamed: 601_level_1,Unnamed: 602_level_1,Unnamed: 603_level_1,Unnamed: 604_level_1,Unnamed: 605_level_1,Unnamed: 606_level_1,Unnamed: 607_level_1,Unnamed: 608_level_1,Unnamed: 609_level_1,Unnamed: 610_level_1
1,4.0,,,,4.0,,4.5,,,,,,,,2.5,,4.5,3.5,4.0,,3.5,,,,,,3.0,,,,5.0,3.0,3.0,,,,,,,5.0,,,5.0,3.0,4.0,5.0,,,,3.0,,,,3.0,,,5.0,,,,,,5.0,4.0,,4.0,,2.5,,,5.0,,4.5,,,0.5,,4.0,,,,2.5,,,,4.0,,,3.0,3.0,4.0,,3.0,,,5.0,,4.5,,,,,4.0,,,,4.0,,,,,3.0,,,,,,,3.5,,4.0,,,4.0,,,,,,3.0,,2.0,,3.0,4.0,,4.0,,,3.0,4.0,,,3.5,5.0,,,,,,5.0,,2.0,,3.0,4.0,,,4.5,4.0,4.0,,,,,5.0,3.5,,4.5,,5.0,,,,,,5.0,4.0,4.0,,,4.0,,,4.0,4.0,,,,,4.0,,2.0,,,,,,,3.5,5.0,4.0,,,,5.0,,,,,,,3.5,3.0,,3.0,4.0,,3.5,5.0,,,3.5,,,3.5,,,5.0,,,3.5,3.0,5.0,,,,,4.0,5.0,,,,,,,5.0,,4.0,,,4.5,,4.5,,,,,,,,,4.0,4.0,,2.0,,,5.0,5.0,,,5.0,4.0,5.0,4.0,4.0,,3.0,4.5,,4.5,3.0,,,,,4.5,,4.0,4.0,4.0,3.0,,,,,2.0,,,,,,5.0,,,4.0,,,,,,,3.0,,,,,,,,3.5,3.5,,,,,5.0,,4.0,,4.0,,3.5,,4.0,4.0,,4.0,,5.0,,,,,,5.0,,,4.0,,,5.0,,,,5.0,,4.0,,,,,5.0,,,5.0,,,,,3.0,3.0,,,,,4.5,,5.0,3.5,4.5,,,4.0,,,,5.0,,3.0,,,,,5.0,,,4.0,,3.5,,,,,,,,,,5.0,2.0,,4.0,,,,,,4.0,,4.0,,,,,,,,,,2.5,,4.0,,4.0,,4.5,,,,,4.0,,,,,5.0,,,5.0,,5.0,,,5.0,,,,4.5,,1.5,,,,,,4.0,4.0,4.0,5.0,,,4.0,,4.0,4.0,,,3.0,,,4.0,4.5,,,,4.5,,3.5,,4.0,,,,,,,,4.0,,,,4.0,,,,,4.0,,,,,4.0,,,4.0,,,,,3.0,,4.0,4.0,,,2.5,3.0,,,,5.0,4.0,,,,,,,3.0,,,3.0,,,,,,4.0,,,,,4.0,,,,5.0,3.0,4.0,4.5,,,,,3.5,,,4.0,,4.0,5.0,,,,,,4.0,3.0,,,,5.0,,,5.0,,,4.0,,,,,,4.0,4.0,,3.0,2.5,4.0,,4.0,3.0,4.0,2.5,4.0,2.5,3.0,5.0
2,,,,,,4.0,,4.0,,,,,,,,,,3.0,3.0,3.0,3.5,,,,,,4.0,,,,,,,,,,,,,,,,,,,,,,,,4.5,,,,,,,,,,,4.0,,,,,,2.5,,,,,,,,,,,,,,3.0,,,,,,,,,3.0,,5.0,4.0,,,,,,,,,4.0,3.0,,,5.0,,,,,1.5,,,,,3.0,,,,,4.0,,,4.0,,,,,,,,,,3.0,,,,,3.5,,,,3.0,,,,,1.0,,,,2.0,,,,,,,4.0,,,,,,,,,4.0,,,,,,,,3.5,,,,,,,,,4.0,,,,,,,,,,,,,,,,4.0,,,,,,,,,,,,,,,2.0,,2.5,,3.5,2.5,,,,3.0,,,,2.5,,4.0,,,,,,,,5.0,,,,,,,,,4.0,,,,,,,,,,2.0,,,,,,,,,,,,,,,3.5,,4.0,,,,,,,,4.0,,,,2.0,,,,,,3.0,,,,0.5,3.0,,,,,4.0,3.5,,2.5,3.0,,,,,,,,,,3.5,,,5.0,3.0,4.0,,,,,,,1.5,,,,,,,,,,,,,,,,,3.0,,,,,,4.0,,,,3.0,,3.5,,,,,,,,,,,,,,3.0,,,,,,,5.0,4.0,,,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,4.0,,,3.0,,,,,,,,,,,3.0,4.5,,,,,,4.0,,2.5,,4.0,,,,,,,,,,3.0,5.0,3.0,,,,,,,,,,4.0,,,,,,,,,,,,3.0,,,,3.0,4.5,4.0,4.0,,,3.0,,4.5,4.0,2.5,,,,,2.5,,,,,,,,2.5,,,,3.0,,,,,,,,,,,3.0,,,,,3.0,,,,,,4.5,,3.5,,4.0,,,,,,,4.5,,,,,,,,,,,,,,,,,,,,,,,,,4.0,,4.0,,2.5,,,4.0,,,,3.5,,,4.5,,,,,,,,,,,,,4.0,,,,2.5,,4.0,,4.0,,,,,2.5,4.0,,4.0,,5.0,3.5,,,2.0,,
3,4.0,,,,,5.0,,,,,,,,,,,,,3.0,,,,,,,,,,,,,3.0,,,,,,,,,,4.0,5.0,3.0,,,,,,,4.0,,,,,,,3.0,,,,,,3.5,,,,2.0,,,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,,3.5,,5.0,,,,,,,,,,,,,,3.5,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.0,3.0,,,,,,,,,,,,,,,,,,5.0,,,,,,,,,,4.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,,,,,,,,,3.5,,,,,,,,,,,,,,4.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4.0,3.0,,,,,,,,,,,,,,,,,,4.0,2.5,,,,,1.0,,,,,,,,3.0,,,,,3.5,0.5,,,,,,,,,,,,,3.0,,,,,,,,,3.0,,,,,,,4.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4.0,,,,4.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,3.0,,,,,,,,,,,,,,3.0,,,,,,,3.0,,,2.5,,,,,,,,,,,,4.0,,,,,,,,,5.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,1.0,,,5.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.0,,3.0,,,,4.0,,,,,1.5,,,,,,,,,2.0,,
4,,,,,,3.0,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.5,,,,,,,,,,
5,,,,,,5.0,,,,,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,,,,,5.0,,3.0,,,,,,,,,,,,,4.0,,,,,,,,4.0,,2.0,,,,,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,4.0,,,,4.0,,,,3.5,,,,,,3.0,,,4.0,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,4.5,,,3.0,,,,,,,,,,,,,,,,,,,5.0,3.0,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,4.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.0,3.0,,,,,,3.0,,,,,,,,,,,,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,4.0,,,,,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,4.0,,,,,,,,,,,,,3.0,,,,,,,,,,,,2.0,,,,,,,,,,,,,,,,,,,,,,,2.0,,,,,,,,,,,3.0,,,3.0,,,,,3.0,,3.0,,,,,,,,,,,,3.0,,,,1.5,,,,,,,,,2.5,,,,,,2.0,0.5,,3.0,,,,,,,,,3.0,,,,,,,,1.5,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2.0,,,,,,,,,,2.5,,,,3.0,,,,,,


Now let's create the function that removes some ratings for a provided user.

In [84]:
def removeRatings(user, numScoresToRemove, moviesdf = dfTaskComparing):
    userRatings = moviesdf[user].dropna()
    if len(userRatings) > numScoresToRemove:
        
        moviesToRemove = random.sample(list(userRatings.index), numScoresToRemove)
        moviesdf.loc[moviesToRemove, user] = np.nan
        return moviesdf, moviesToRemove
    else:
        return "You want to remove more ratings for a user thathey actually have!"

In [85]:
def compareRemovedRatings(user, numScoresToRemove, moviesdf = dfTaskComparing, topUsers=5):
    moviesdfChanged, removedMovies = removeRatings(user, numScoresToRemove, moviesdf)
    print(f"Removed ratings for user {user} for movies: {removedMovies}\n")
    
    predictedRatings = predictScores(user, moviesdfChanged, topUsers=topUsers)
    trueRatingsList=[]
    predictedScoresList=[]
    
    for movie in removedMovies:
        trueRating=dfTask.loc[movie, user]
        print(f"True rating for movie {movie}: {trueRating}")
        
        prediction = predictedRatings[predictedRatings['Movie'] == movie]
        if not prediction.empty:
            predictedScore = prediction['Predicted Score'].values[0]
            trueRatingsList.append(trueRating)
            predictedScoresList.append(predictedScore)
            print(f"Predicted rating for movie {movie}: {predictedScore}\n")
        else:
            print(f"Predicted rating for movie {movie}: No prediction possible\n")
            
    print(f"With provide parameters the {len(predictedScoresList)} predictions were made out of {numScoresToRemove} possible ones.")
    if trueRatingsList and predictedScoresList:
        mae = mean_absolute_error(trueRatingsList, predictedScoresList)
        rmse = np.sqrt(mean_squared_error(trueRatingsList, predictedScoresList))
        print(f"Evaluation metrics:")
        print(f"Mean Absolute Error (MAE): {mae:.2f}")
        print(f"Root Mean Squared Error (EMSE): {rmse:.2f}")
    else:
        print(f"Not enough data to calculate evaluation metrics.")
    return predictedRatings, removedMovies

In [86]:
predictedRatings, removedMovies = compareRemovedRatings(user = 26, numScoresToRemove = 5, topUsers = 20)

Removed ratings for user 26 for movies: [344, 349, 34, 225, 434]



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


True rating for movie 344: 3.0
Predicted rating for movie 344: 3.0391460985272163

True rating for movie 349: 3.0
Predicted rating for movie 349: 3.498522572881962

True rating for movie 34: 3.0
Predicted rating for movie 34: 2.690996153634065

True rating for movie 225: 3.0
Predicted rating for movie 225: 3.0

True rating for movie 434: 2.0
Predicted rating for movie 434: 2.1333853928847653

With provide parameters the 5 predictions were made out of 5 possible ones.
Evaluation metrics:
Mean Absolute Error (MAE): 0.20
Root Mean Squared Error (EMSE): 0.27


In [87]:
predictedRatings, removedMovies = compareRemovedRatings(user = 92, numScoresToRemove = 15, topUsers = 20)

Removed ratings for user 92 for movies: [2572, 2398, 2087, 69757, 2139, 6383, 1619, 2664, 1021, 51662, 837, 55282, 327, 54190, 2501]

True rating for movie 2572: 5.0
Predicted rating for movie 2572: 3.8625152116197548

True rating for movie 2398: 4.0
Predicted rating for movie 2398: 3.300118495249798

True rating for movie 2087: 4.5
Predicted rating for movie 2087: 3.7759183024597562

True rating for movie 69757: 4.0
Predicted rating for movie 69757: 3.579838876655824

True rating for movie 2139: 4.0
Predicted rating for movie 2139: 3.427914091064946

True rating for movie 6383: 3.0
Predicted rating for movie 6383: 2.81470443014388

True rating for movie 1619: 4.0
Predicted rating for movie 1619: 3.122148621330286

True rating for movie 2664: 3.5
Predicted rating for movie 2664: 4.0

True rating for movie 1021: 3.0
Predicted rating for movie 1021: 1.7914783240709486

True rating for movie 51662: 4.5
Predicted rating for movie 51662: 4.019197579878815

True rating for movie 837: 5.0
Pre

In [88]:
predictedRatings, removedMovies = compareRemovedRatings(user = 30, numScoresToRemove = 2, topUsers = 5)

Removed ratings for user 30 for movies: [1240, 68358]



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


True rating for movie 1240: 3.5
Predicted rating for movie 1240: 2.2616791649274317

True rating for movie 68358: 5.0
Predicted rating for movie 68358: 5.0

With provide parameters the 2 predictions were made out of 2 possible ones.
Evaluation metrics:
Mean Absolute Error (MAE): 0.62
Root Mean Squared Error (EMSE): 0.88


In [89]:
predictedRatings, removedMovies = compareRemovedRatings(user = 106, numScoresToRemove = 3, topUsers = 12)

Removed ratings for user 106 for movies: [5349, 4993, 72998]



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


True rating for movie 5349: 3.5
Predicted rating for movie 5349: 2.729517710786499

True rating for movie 4993: 5.0
Predicted rating for movie 4993: 4.677031014863373

True rating for movie 72998: 5.0
Predicted rating for movie 72998: 5.0

With provide parameters the 3 predictions were made out of 3 possible ones.
Evaluation metrics:
Mean Absolute Error (MAE): 0.36
Root Mean Squared Error (EMSE): 0.48


In [90]:
predictedRatings, removedMovies = compareRemovedRatings(user = 600, numScoresToRemove = 20, topUsers = 20)

Removed ratings for user 600 for movies: [3450, 4873, 588, 2410, 2959, 2140, 6350, 7323, 1288, 4720, 3478, 3052, 4235, 1772, 344, 4677, 52, 539, 46723, 2706]



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


True rating for movie 3450: 2.5
Predicted rating for movie 3450: No prediction possible

True rating for movie 4873: 4.0
Predicted rating for movie 4873: No prediction possible

True rating for movie 588: 3.5
Predicted rating for movie 588: 5.0

True rating for movie 2410: 3.0
Predicted rating for movie 2410: No prediction possible

True rating for movie 2959: 4.5
Predicted rating for movie 2959: 4.814412772177467

True rating for movie 2140: 4.5
Predicted rating for movie 2140: No prediction possible

True rating for movie 6350: 4.5
Predicted rating for movie 6350: No prediction possible

True rating for movie 7323: 2.5
Predicted rating for movie 7323: 4.0

True rating for movie 1288: 4.5
Predicted rating for movie 1288: 5.0

True rating for movie 4720: 3.0
Predicted rating for movie 4720: No prediction possible

True rating for movie 3478: 3.0
Predicted rating for movie 3478: No prediction possible

True rating for movie 3052: 3.5
Predicted rating for movie 3052: No prediction possib

In [91]:
predictedRatings, removedMovies = compareRemovedRatings(user = 300, numScoresToRemove = 30, topUsers = 20)

Removed ratings for user 300 for movies: [318, 8950, 5995, 99114, 109487, 2762, 2324, 79132, 112183, 63082, 527, 2028, 7361, 92259, 112552, 1172, 4973, 356, 112290, 1704, 2329, 6711, 4848, 6016, 2858, 2959, 112556, 81591, 56174, 593]



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


True rating for movie 318: 4.0
Predicted rating for movie 318: 4.314137543799283

True rating for movie 8950: 3.0
Predicted rating for movie 8950: 2.955421022732718

True rating for movie 5995: 4.5
Predicted rating for movie 5995: 4.369992438777789

True rating for movie 99114: 3.5
Predicted rating for movie 99114: 4.285135620482318

True rating for movie 109487: 5.0
Predicted rating for movie 109487: 4.0

True rating for movie 2762: 3.5
Predicted rating for movie 2762: 3.3258260106774493

True rating for movie 2324: 5.0
Predicted rating for movie 2324: 4.7321354806705935

True rating for movie 79132: 4.0
Predicted rating for movie 79132: 4.031039508798588

True rating for movie 112183: 4.0
Predicted rating for movie 112183: No prediction possible

True rating for movie 63082: 4.0
Predicted rating for movie 63082: 3.5

True rating for movie 527: 5.0
Predicted rating for movie 527: 4.570364455036845

True rating for movie 2028: 4.0
Predicted rating for movie 2028: 3.942991462009687

Tru

Now let's improve the general function to easily compare the results, for different numbers of least number of movies to be watched by the users to get the ratings, and the numbers of top corelated users that we predict rating based on.

In [92]:
def compareRemovedRatingsGiga(user, numScoresToRemove, moviesdf = dfTaskComparing, topUsers=5, commonMovies = 5, negativeUsers=0):
    moviesdfChanged, removedMovies = removeRatings(user, numScoresToRemove, moviesdf)
    
    predictedRatings = predictScores(user, moviesdfChanged, topUsers=topUsers, commonMovies=commonMovies, negativeUsers=negativeUsers)
    trueRatingsList=[]
    predictedScoresList=[]
    
    for movie in removedMovies:
        trueRating=dfTask.loc[movie, user]
        
        prediction = predictedRatings[predictedRatings['Movie'] == movie]
        if not prediction.empty:
            predictedScore = prediction['Predicted Score'].values[0]
            trueRatingsList.append(trueRating)
            predictedScoresList.append(predictedScore)
        else:
            continue            
    if trueRatingsList and predictedScoresList:
        mae = mean_absolute_error(trueRatingsList, predictedScoresList)
        rmse = np.sqrt(mean_squared_error(trueRatingsList, predictedScoresList))

    else:
        mae = None
        rmse = None
        print(f"Not enough data to calculate evaluation metrics.")
    return predictedRatings, removedMovies, mae, rmse, len(predictedScoresList)

In [93]:
users = [100,106,240,30,380,500,90,210]
numScoresToRemove = 10
topUsers=5
commonMovies = 5
maeLIST = []
for user in users:
    sparklingNewdf = dfTask.copy()
    predictedRatings, removedMovies, mae, rmse, numOfPredictions = compareRemovedRatingsGiga(user = user, moviesdf=sparklingNewdf,numScoresToRemove = numScoresToRemove, topUsers = topUsers, commonMovies = commonMovies)
    print(f"Prediction for user {user}:")
    if mae != None:
        print(f"Mean Absolute Error (MAE): {mae:.2f}")
        maeLIST.append(mae)
        print(f"Root Mean Squared Error (RMSE): {rmse:.2f}")
    print(f"Percentage of predictions made: {(numOfPredictions / numScoresToRemove) * 100}%\n")
    maeSUMtopU5commonM5 = sum(maeLIST)
print(f"Sum of the mean absolute error: {sum(maeLIST)}")
print(f"Average of the means absolute errors: {sum(maeLIST) / len(maeLIST)}")

  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 100:
Mean Absolute Error (MAE): 0.33
Root Mean Squared Error (RMSE): 0.58
Percentage of predictions made: 30.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 106:
Mean Absolute Error (MAE): 0.25
Root Mean Squared Error (RMSE): 0.46
Percentage of predictions made: 50.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Not enough data to calculate evaluation metrics.
Prediction for user 240:
Percentage of predictions made: 0.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 30:
Mean Absolute Error (MAE): 0.13
Root Mean Squared Error (RMSE): 0.24
Percentage of predictions made: 60.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Not enough data to calculate evaluation metrics.
Prediction for user 380:
Percentage of predictions made: 0.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 500:
Mean Absolute Error (MAE): 0.51
Root Mean Squared Error (RMSE): 0.71
Percentage of predictions made: 40.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 90:
Mean Absolute Error (MAE): 1.00
Root Mean Squared Error (RMSE): 1.00
Percentage of predictions made: 10.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 210:
Mean Absolute Error (MAE): 0.33
Root Mean Squared Error (RMSE): 0.40
Percentage of predictions made: 30.0%

Sum of the mean absolute error: 2.5549303450800545
Average of the means absolute errors: 0.4258217241800091


In [94]:
users = [100,106,240,30,380,500,90,210]
numScoresToRemove = 10
topUsers=15
commonMovies = 5
maeLIST = []
for user in users:
    sparklingNewdf = dfTask.copy()
    predictedRatings, removedMovies, mae, rmse, numOfPredictions = compareRemovedRatingsGiga(user = user, moviesdf=sparklingNewdf,numScoresToRemove = numScoresToRemove, topUsers = topUsers, commonMovies = commonMovies)
    print(f"Prediction for user {user}:")
    if mae != None:
        print(f"Mean Absolute Error (MAE): {mae:.2f}")
        maeLIST.append(mae)
        print(f"Root Mean Squared Error (RMSE): {rmse:.2f}")
    print(f"Percentage of predictions made: {(numOfPredictions / numScoresToRemove) * 100}%\n")
    maeSUMtopU15commonM5 = sum(maeLIST)
print(f"Sum of the mean absolute error: {sum(maeLIST)}")
print(f"Average of the means absolute errors: {sum(maeLIST) / len(maeLIST)}")

  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 100:
Mean Absolute Error (MAE): 0.83
Root Mean Squared Error (RMSE): 1.08
Percentage of predictions made: 60.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 106:
Mean Absolute Error (MAE): 0.49
Root Mean Squared Error (RMSE): 0.57
Percentage of predictions made: 90.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 240:
Mean Absolute Error (MAE): 0.74
Root Mean Squared Error (RMSE): 1.01
Percentage of predictions made: 60.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 30:
Mean Absolute Error (MAE): 0.50
Root Mean Squared Error (RMSE): 0.55
Percentage of predictions made: 90.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 380:
Mean Absolute Error (MAE): 2.00
Root Mean Squared Error (RMSE): 2.00
Percentage of predictions made: 10.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 500:
Mean Absolute Error (MAE): 1.00
Root Mean Squared Error (RMSE): 1.22
Percentage of predictions made: 40.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 90:
Mean Absolute Error (MAE): 0.21
Root Mean Squared Error (RMSE): 0.27
Percentage of predictions made: 50.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 210:
Mean Absolute Error (MAE): 0.81
Root Mean Squared Error (RMSE): 1.17
Percentage of predictions made: 30.0%

Sum of the mean absolute error: 6.574698277142042
Average of the means absolute errors: 0.8218372846427553


In [95]:
users = [100,106,240,30,380,500,90,210]
numScoresToRemove = 10
topUsers=25
commonMovies = 5
maeLIST = []
for user in users:
    sparklingNewdf = dfTask.copy()
    predictedRatings, removedMovies, mae, rmse, numOfPredictions = compareRemovedRatingsGiga(user = user, moviesdf=sparklingNewdf,numScoresToRemove = numScoresToRemove, topUsers = topUsers, commonMovies = commonMovies)
    print(f"Prediction for user {user}:")
    if mae != None:
        print(f"Mean Absolute Error (MAE): {mae:.2f}")
        maeLIST.append(mae)
        print(f"Root Mean Squared Error (RMSE): {rmse:.2f}")
    print(f"Percentage of predictions made: {(numOfPredictions / numScoresToRemove) * 100}%\n")
    maeSUMtopU25commonM5 = sum(maeLIST)
print(f"Sum of the mean absolute error: {sum(maeLIST)}")
print(f"Average of the means absolute errors: {sum(maeLIST) / len(maeLIST)}")

  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 100:
Mean Absolute Error (MAE): 0.52
Root Mean Squared Error (RMSE): 0.66
Percentage of predictions made: 30.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 106:
Mean Absolute Error (MAE): 0.69
Root Mean Squared Error (RMSE): 0.80
Percentage of predictions made: 100.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 240:
Mean Absolute Error (MAE): 0.94
Root Mean Squared Error (RMSE): 1.16
Percentage of predictions made: 50.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 30:
Mean Absolute Error (MAE): 0.84
Root Mean Squared Error (RMSE): 0.98
Percentage of predictions made: 80.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 380:
Mean Absolute Error (MAE): 1.00
Root Mean Squared Error (RMSE): 1.00
Percentage of predictions made: 10.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 500:
Mean Absolute Error (MAE): 1.16
Root Mean Squared Error (RMSE): 1.38
Percentage of predictions made: 40.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 90:
Mean Absolute Error (MAE): 0.86
Root Mean Squared Error (RMSE): 1.08
Percentage of predictions made: 40.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 210:
Mean Absolute Error (MAE): 0.67
Root Mean Squared Error (RMSE): 0.95
Percentage of predictions made: 50.0%

Sum of the mean absolute error: 6.664096452184446
Average of the means absolute errors: 0.8330120565230558


In [96]:
users = [100,106,240,30,380,500,90,210]
numScoresToRemove = 10
topUsers=5
commonMovies = 3
maeLIST = []
for user in users:
    sparklingNewdf = dfTask.copy()
    predictedRatings, removedMovies, mae, rmse, numOfPredictions = compareRemovedRatingsGiga(user = user, moviesdf=sparklingNewdf,numScoresToRemove = numScoresToRemove, topUsers = topUsers, commonMovies = commonMovies)
    print(f"Prediction for user {user}:")
    if mae != None:
        print(f"Mean Absolute Error (MAE): {mae:.2f}")
        maeLIST.append(mae)
        print(f"Root Mean Squared Error (RMSE): {rmse:.2f}")
    print(f"Percentage of predictions made: {(numOfPredictions / numScoresToRemove) * 100}%\n")
    maeSUMtopU5commonM3 = sum(maeLIST)
print(f"Sum of the mean absolute error: {sum(maeLIST)}")
print(f"Average of the means absolute errors: {sum(maeLIST) / len(maeLIST)}")

  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 100:
Mean Absolute Error (MAE): 1.00
Root Mean Squared Error (RMSE): 1.00
Percentage of predictions made: 10.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 106:
Mean Absolute Error (MAE): 0.38
Root Mean Squared Error (RMSE): 0.46
Percentage of predictions made: 30.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Not enough data to calculate evaluation metrics.
Prediction for user 240:
Percentage of predictions made: 0.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 30:
Mean Absolute Error (MAE): 0.05
Root Mean Squared Error (RMSE): 0.07
Percentage of predictions made: 20.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 380:
Mean Absolute Error (MAE): 0.00
Root Mean Squared Error (RMSE): 0.00
Percentage of predictions made: 10.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 500:
Mean Absolute Error (MAE): 0.75
Root Mean Squared Error (RMSE): 0.79
Percentage of predictions made: 20.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 90:
Mean Absolute Error (MAE): 0.42
Root Mean Squared Error (RMSE): 0.60
Percentage of predictions made: 40.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Not enough data to calculate evaluation metrics.
Prediction for user 210:
Percentage of predictions made: 0.0%

Sum of the mean absolute error: 2.5908966388275343
Average of the means absolute errors: 0.43181610647125573


In [97]:
users = [100,106,240,30,380,500,90,210]
numScoresToRemove = 10
topUsers=5
commonMovies = 7
maeLIST = []
for user in users:
    sparklingNewdf = dfTask.copy()
    predictedRatings, removedMovies, mae, rmse, numOfPredictions = compareRemovedRatingsGiga(user = user, moviesdf=sparklingNewdf,numScoresToRemove = numScoresToRemove, topUsers = topUsers, commonMovies = commonMovies)
    print(f"Prediction for user {user}:")
    if mae != None:
        print(f"Mean Absolute Error (MAE): {mae:.2f}")
        maeLIST.append(mae)
        print(f"Root Mean Squared Error (RMSE): {rmse:.2f}")
    print(f"Percentage of predictions made: {(numOfPredictions / numScoresToRemove) * 100}%\n")
    maeSUMtopU5commonM7 = sum(maeLIST)
print(f"Sum of the mean absolute error: {sum(maeLIST)}")
print(f"Average of the means absolute errors: {sum(maeLIST) / len(maeLIST)}")

Prediction for user 100:
Mean Absolute Error (MAE): 0.50
Root Mean Squared Error (RMSE): 0.50
Percentage of predictions made: 10.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 106:
Mean Absolute Error (MAE): 0.55
Root Mean Squared Error (RMSE): 0.68
Percentage of predictions made: 80.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 240:
Mean Absolute Error (MAE): 1.32
Root Mean Squared Error (RMSE): 1.36
Percentage of predictions made: 20.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 30:
Mean Absolute Error (MAE): 0.33
Root Mean Squared Error (RMSE): 0.44
Percentage of predictions made: 50.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Not enough data to calculate evaluation metrics.
Prediction for user 380:
Percentage of predictions made: 0.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 500:
Mean Absolute Error (MAE): 0.84
Root Mean Squared Error (RMSE): 1.18
Percentage of predictions made: 20.0%

Prediction for user 90:
Mean Absolute Error (MAE): 0.50
Root Mean Squared Error (RMSE): 0.50
Percentage of predictions made: 20.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 210:
Mean Absolute Error (MAE): 0.27
Root Mean Squared Error (RMSE): 0.36
Percentage of predictions made: 20.0%

Sum of the mean absolute error: 4.299270254459483
Average of the means absolute errors: 0.6141814649227832


In [98]:
users = [100,106,240,30,380,500,90,210]
numScoresToRemove = 10
topUsers=5
commonMovies = 10
maeLIST = []
for user in users:
    sparklingNewdf = dfTask.copy()
    predictedRatings, removedMovies, mae, rmse, numOfPredictions = compareRemovedRatingsGiga(user = user, moviesdf=sparklingNewdf,numScoresToRemove = numScoresToRemove, topUsers = topUsers, commonMovies = commonMovies)
    print(f"Prediction for user {user}:")
    if mae != None:
        print(f"Mean Absolute Error (MAE): {mae:.2f}")
        maeLIST.append(mae)
        print(f"Root Mean Squared Error (RMSE): {rmse:.2f}")
    print(f"Percentage of predictions made: {(numOfPredictions / numScoresToRemove) * 100}%\n")
    maeSUMtopU5commonM10 = sum(maeLIST)
print(f"Sum of the mean absolute error: {sum(maeLIST)}")
print(f"Average of the means absolute errors: {sum(maeLIST) / len(maeLIST)}")

Prediction for user 100:
Mean Absolute Error (MAE): 0.50
Root Mean Squared Error (RMSE): 0.50
Percentage of predictions made: 10.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 106:
Mean Absolute Error (MAE): 0.90
Root Mean Squared Error (RMSE): 1.06
Percentage of predictions made: 80.0%

Prediction for user 240:
Mean Absolute Error (MAE): 0.64
Root Mean Squared Error (RMSE): 0.78
Percentage of predictions made: 30.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 30:
Mean Absolute Error (MAE): 0.49
Root Mean Squared Error (RMSE): 0.71
Percentage of predictions made: 50.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 380:
Mean Absolute Error (MAE): 1.00
Root Mean Squared Error (RMSE): 1.00
Percentage of predictions made: 10.0%

Prediction for user 500:
Mean Absolute Error (MAE): 1.87
Root Mean Squared Error (RMSE): 2.02
Percentage of predictions made: 50.0%

Prediction for user 90:
Mean Absolute Error (MAE): 0.75
Root Mean Squared Error (RMSE): 0.79
Percentage of predictions made: 40.0%

Prediction for user 210:
Mean Absolute Error (MAE): 0.83
Root Mean Squared Error (RMSE): 1.07
Percentage of predictions made: 50.0%

Sum of the mean absolute error: 6.98060465173064
Average of the means absolute errors: 0.87257558146633


------------
negtive correlation (still working)

In [99]:
users = [100,106,240,30,380,500,90,210]
numScoresToRemove = 5
topUsers=5
commonMovies = 10
negativeUsers = 5
maeLIST = []
for user in users:
    sparklingNewdf = dfTask.copy()
    predictedRatings, removedMovies, mae, rmse, numOfPredictions = compareRemovedRatingsGiga(user = user, moviesdf=sparklingNewdf,numScoresToRemove = numScoresToRemove, topUsers = topUsers, commonMovies = commonMovies, negativeUsers=negativeUsers)
    print(f"Prediction for user {user}:")
    if mae != None:
        print(f"Mean Absolute Error (MAE): {mae:.2f}")
        maeLIST.append(mae)
        print(f"Root Mean Squared Error (RMSE): {rmse:.2f}")
    print(f"Percentage of predictions made: {(numOfPredictions / numScoresToRemove) * 100}%\n")
    maeSUMtopU5wor5 = sum(maeLIST)
print(f"Sum of the mean absolute error: {sum(maeLIST)}")
print(f"Average of the means absolute errors: {sum(maeLIST) / len(maeLIST)}")

Prediction for user 100:
Mean Absolute Error (MAE): 1.00
Root Mean Squared Error (RMSE): 1.00
Percentage of predictions made: 20.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 106:
Mean Absolute Error (MAE): 1.17
Root Mean Squared Error (RMSE): 1.30
Percentage of predictions made: 100.0%

Prediction for user 240:
Mean Absolute Error (MAE): 1.58
Root Mean Squared Error (RMSE): 1.58
Percentage of predictions made: 20.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 30:
Mean Absolute Error (MAE): 1.84
Root Mean Squared Error (RMSE): 1.94
Percentage of predictions made: 100.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Not enough data to calculate evaluation metrics.
Prediction for user 380:
Percentage of predictions made: 0.0%

Not enough data to calculate evaluation metrics.
Prediction for user 500:
Percentage of predictions made: 0.0%

Prediction for user 90:
Mean Absolute Error (MAE): 0.64
Root Mean Squared Error (RMSE): 0.67
Percentage of predictions made: 80.0%

Prediction for user 210:
Mean Absolute Error (MAE): 0.96
Root Mean Squared Error (RMSE): 1.09
Percentage of predictions made: 60.0%

Sum of the mean absolute error: 7.1845880565965174
Average of the means absolute errors: 1.1974313427660863


-------------

In [100]:
users = [100,106,240,30,380,500,90,210]
numScoresToRemove = 5
topUsers=25
commonMovies = 10
negativeUsers = 5
maeLIST = []
for user in users:
    sparklingNewdf = dfTask.copy()
    predictedRatings, removedMovies, mae, rmse, numOfPredictions = compareRemovedRatingsGiga(user = user, moviesdf=sparklingNewdf,numScoresToRemove = numScoresToRemove, topUsers = topUsers, commonMovies = commonMovies, negativeUsers=negativeUsers)
    print(f"Prediction for user {user}:")
    if mae != None:
        print(f"Mean Absolute Error (MAE): {mae:.2f}")
        maeLIST.append(mae)
        print(f"Root Mean Squared Error (RMSE): {rmse:.2f}")
    print(f"Percentage of predictions made: {(numOfPredictions / numScoresToRemove) * 100}%\n")
    maeSUMtop2U5wor5 = sum(maeLIST)
print(f"Sum of the mean absolute error: {sum(maeLIST)}")
print(f"Average of the means absolute errors: {sum(maeLIST) / len(maeLIST)}")

Prediction for user 100:
Mean Absolute Error (MAE): 0.88
Root Mean Squared Error (RMSE): 1.04
Percentage of predictions made: 80.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 106:
Mean Absolute Error (MAE): 0.56
Root Mean Squared Error (RMSE): 0.57
Percentage of predictions made: 100.0%

Prediction for user 240:
Mean Absolute Error (MAE): 0.99
Root Mean Squared Error (RMSE): 1.25
Percentage of predictions made: 40.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 30:
Mean Absolute Error (MAE): 0.95
Root Mean Squared Error (RMSE): 1.03
Percentage of predictions made: 100.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 380:
Mean Absolute Error (MAE): 0.59
Root Mean Squared Error (RMSE): 0.59
Percentage of predictions made: 20.0%

Prediction for user 500:
Mean Absolute Error (MAE): 0.44
Root Mean Squared Error (RMSE): 0.54
Percentage of predictions made: 100.0%

Prediction for user 90:
Mean Absolute Error (MAE): 1.08
Root Mean Squared Error (RMSE): 1.27
Percentage of predictions made: 60.0%

Prediction for user 210:
Mean Absolute Error (MAE): 0.96
Root Mean Squared Error (RMSE): 1.01
Percentage of predictions made: 60.0%

Sum of the mean absolute error: 6.449616463481673
Average of the means absolute errors: 0.8062020579352092


In [101]:
users = [100,106,240,30,380,500,90,210]
numScoresToRemove = 5
topUsers=30
commonMovies = 10
negativeUsers = 10
maeLIST = []
for user in users:
    sparklingNewdf = dfTask.copy()
    predictedRatings, removedMovies, mae, rmse, numOfPredictions = compareRemovedRatingsGiga(user = user, moviesdf=sparklingNewdf,numScoresToRemove = numScoresToRemove, topUsers = topUsers, commonMovies = commonMovies, negativeUsers=negativeUsers)
    print(f"Prediction for user {user}:")
    if mae != None:
        print(f"Mean Absolute Error (MAE): {mae:.2f}")
        maeLIST.append(mae)
        print(f"Root Mean Squared Error (RMSE): {rmse:.2f}")
    print(f"Percentage of predictions made: {(numOfPredictions / numScoresToRemove) * 100}%\n")
    maeSUMtopU30wor10 = sum(maeLIST)
print(f"Sum of the mean absolute error: {sum(maeLIST)}")
print(f"Average of the means absolute errors: {sum(maeLIST) / len(maeLIST)}")

Prediction for user 100:
Mean Absolute Error (MAE): 1.16
Root Mean Squared Error (RMSE): 1.27
Percentage of predictions made: 80.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 106:
Mean Absolute Error (MAE): 0.75
Root Mean Squared Error (RMSE): 0.90
Percentage of predictions made: 100.0%

Prediction for user 240:
Mean Absolute Error (MAE): 0.76
Root Mean Squared Error (RMSE): 1.03
Percentage of predictions made: 100.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 30:
Mean Absolute Error (MAE): 0.82
Root Mean Squared Error (RMSE): 0.88
Percentage of predictions made: 100.0%



  correlations[otherUser] = pearsonr(userRatings[commonRatings], otherUserRatings[commonRatings])[0]


Prediction for user 380:
Mean Absolute Error (MAE): 0.58
Root Mean Squared Error (RMSE): 0.62
Percentage of predictions made: 60.0%

Prediction for user 500:
Mean Absolute Error (MAE): 1.26
Root Mean Squared Error (RMSE): 1.42
Percentage of predictions made: 60.0%

Prediction for user 90:
Mean Absolute Error (MAE): 1.16
Root Mean Squared Error (RMSE): 1.62
Percentage of predictions made: 80.0%

Prediction for user 210:
Mean Absolute Error (MAE): 0.40
Root Mean Squared Error (RMSE): 0.47
Percentage of predictions made: 80.0%

Sum of the mean absolute error: 6.884119184205533
Average of the means absolute errors: 0.8605148980256916


Try to improve the system, you can use the following ideas:
 - Can we use more users (e.g. with negative correlation)?
 - Which difference is more important predicting 5 when a real score is 4 or predicting 3 instead of 2?
 - Did we use the best value for the minimal number of common movies?
 - Is prediction for a movie seen by just one user trustworthy?
 
 
Describe your approach, its strengths and weaknesses, and analyze the results. Send the report (notebook with comments/markdown) within 144 hours after the class to gmiebs@cs.put.poznan.pl, start the subject with [IR]

Credits to F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. https://doi.org/10.1145/2827872 and Mateusz Lango