# Date Night Movie

#### Grading:


- Code: 90 pts
- Markdown Documentation: 10 pts


In this assignment we are going to use pandas to figure out - What's the best **date-night movie**?

This assignment is going to use
- Joining
- Groupby
- Sorting


In [1]:
import os
import pandas as pd

##### Read in the movie data: `pd.read_table`

In [2]:
def get_movie_data():
    
    unames = ['user_id','gender','age','occupation','zip']
    users = pd.read_table(os.path.join('../data','users.dat'), 
                          sep='::', header=None, names=unames)
    
    rnames = ['user_id', 'movie_id', 'rating', 'timestamp']
    ratings = pd.read_table(os.path.join('../data', 'ratings.dat'), 
                            sep='::', header=None, names=rnames)
    
    mnames = ['movie_id', 'title','genres']
    movies = pd.read_table(os.path.join('../data', 'movies.dat'), 
                           sep='::', header=None, names=mnames)

    return users, ratings, movies

In [3]:
users, ratings, movies = get_movie_data()

  """
  if __name__ == '__main__':
  del sys.path[0]


In [4]:
users.head()

Unnamed: 0,user_id,gender,age,occupation,zip
0,1,F,1,10,48067
1,2,M,56,16,70072
2,3,M,25,15,55117
3,4,M,45,7,2460
4,5,M,25,20,55455


In [5]:
ratings.head()

Unnamed: 0,user_id,movie_id,rating,timestamp
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291


In [6]:
movies.head()

Unnamed: 0,movie_id,title,genres
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy


##### Clean up the `movies`

- Get the `year`
- Shorten the `title`


In [7]:
tmp = movies.title.str.extract('(.*) \(([0-9]+)\)')
tmp.apply(lambda x:x[0] if len(x) > 0 else None)
tmp.apply(lambda x: x[0][:40] if len(x) > 0 else None)
print(tmp)

                                0     1
0                       Toy Story  1995
1                         Jumanji  1995
2                Grumpier Old Men  1995
3               Waiting to Exhale  1995
4     Father of the Bride Part II  1995
...                           ...   ...
3878             Meet the Parents  2000
3879          Requiem for a Dream  2000
3880                    Tigerland  2000
3881             Two Family House  2000
3882               Contender, The  2000

[3883 rows x 2 columns]


In [8]:
movies['year'] = tmp[1]
movies['short_title'] = tmp[0]

In [9]:
movies.head()

Unnamed: 0,movie_id,title,genres,year,short_title
0,1,Toy Story (1995),Animation|Children's|Comedy,1995,Toy Story
1,2,Jumanji (1995),Adventure|Children's|Fantasy,1995,Jumanji
2,3,Grumpier Old Men (1995),Comedy|Romance,1995,Grumpier Old Men
3,4,Waiting to Exhale (1995),Comedy|Drama,1995,Waiting to Exhale
4,5,Father of the Bride Part II (1995),Comedy,1995,Father of the Bride Part II


In [10]:
movies.head()

Unnamed: 0,movie_id,title,genres,year,short_title
0,1,Toy Story (1995),Animation|Children's|Comedy,1995,Toy Story
1,2,Jumanji (1995),Adventure|Children's|Fantasy,1995,Jumanji
2,3,Grumpier Old Men (1995),Comedy|Romance,1995,Grumpier Old Men
3,4,Waiting to Exhale (1995),Comedy|Drama,1995,Waiting to Exhale
4,5,Father of the Bride Part II (1995),Comedy,1995,Father of the Bride Part II


##### Join the tables with `pd.merge` (20 pts)

In [119]:
userRatings = pd.merge(users,ratings)
userMovieRatings = pd.merge(userRatings,movies)
# userMovieRatings.to_csv('../data/userMovieRatingsMergedData.csv',index=False)

Unnamed: 0,user_id,gender,age,occupation,zip,movie_id,rating,timestamp,title,genres,year,short_title
2886,1,F,1,10,48067,3408,4,978300275,Erin Brockovich (2000),Drama,2000,Erin Brockovich
2887,5,M,25,20,55455,3408,3,978242323,Erin Brockovich (2000),Drama,2000,Erin Brockovich
2888,6,F,50,9,55117,3408,5,978238230,Erin Brockovich (2000),Drama,2000,Erin Brockovich
2889,9,M,25,17,61614,3408,4,978225570,Erin Brockovich (2000),Drama,2000,Erin Brockovich
2890,10,F,35,1,95370,3408,4,978225070,Erin Brockovich (2000),Drama,2000,Erin Brockovich
...,...,...,...,...,...,...,...,...,...,...,...,...
1000203,5556,M,45,6,92103,2198,3,959445515,Modulations (1998),Documentary,1998,Modulations
1000204,5949,M,18,17,47901,2198,5,958846401,Modulations (1998),Documentary,1998,Modulations
1000205,5675,M,35,14,30030,2703,3,976029116,Broken Vessels (1998),Drama,1998,Broken Vessels
1000206,5780,M,18,17,92886,2845,1,958153068,White Boys (1999),Drama,1999,White Boys


##### What's the highest rated movie? (20 pts))

In [197]:
#Groupby movie_id and caluculate the count of people who have rated the movie and the mean of the ratings 
highRatedMovie = userMovieRatings.groupby('movie_id')['rating'].agg(['count','mean']).reset_index()
highRatedMovie

Unnamed: 0,movie_id,count,mean
0,1,2077,4.146846
1,2,701,3.201141
2,3,478,3.016736
3,4,170,2.729412
4,5,296,3.006757
...,...,...,...
3701,3948,862,3.635731
3702,3949,304,4.115132
3703,3950,54,3.666667
3704,3951,40,3.900000


In [None]:
highRatedMovie['Weighted_Average_Rating'] = highRatedMovie['count'] * highRatedMovie['mean']
highestRatedMovie= highRatedMovie.reset_index().sort_values('Weighted_Average_Rating',ascending=False)[['movie_id','Weighted_Average_Rating']]
highestRatedMovie

In [187]:
#Display the movie_id's of top 5 rated movies and their average rating
highestRatedMovie.head(10)

Unnamed: 0,movie_id,Weighted_Average_Rating
2651,2858,14800.0
253,260,13321.0
1106,1196,12836.0
1120,1210,11598.0
1848,2028,11507.0
1108,1198,11257.0
579,593,11219.0
2374,2571,11178.0
2557,2762,10835.0
575,589,10751.0


In [193]:
#Details of the highest rated movie
# movies[movies['movie_id'] == highRatedMovie['movie_id'].iloc[0]]
userMovieRatings[userMovieRatings['movie_id'] == 589]['rating'].agg(['mean','count'])

mean        4.058513
count    2649.000000
Name: rating, dtype: float64

In [194]:
userMovieRatings[userMovieRatings['movie_id'] == 593]['rating'].agg(['mean','count'])

mean        4.351823
count    2578.000000
Name: rating, dtype: float64

###### What is a good rated movie for date night? (60 pts)

- Hint - highly rated movie by 
    - both partners (might be the same gender or not),
    - based on genre preferences,
    - age group can also be combined

### Preferences for watching a movie on a date night
- We would like to watch a romantic comedy(romcom) movie
- Movie that is mostly prefered(rated) by the people in the age of 25 or 30
- Movie which is released in the year 1997-2000
- Move which has high ratings (Top 3 to choose from) 


In [132]:
filterdata = userMovieRatings[userMovieRatings['year'].between('1997','2000',inclusive=True)]
filterdata = filterdata[filterdata['genres'] == 'Comedy|Romance']
filterdata = filterdata[filterdata['age'].isin([25,30])]
highRatedRomCom = filterdata.groupby('movie_id')['rating'].agg(['count','mean']).reset_index().sort_values(['count','mean'],ascending=[False,False])
highRatedRomCom

Unnamed: 0,movie_id,count,mean
22,2396,866,4.081986
11,1777,409,3.559902
28,2502,392,3.760204
7,1569,327,3.348624
23,2424,307,3.250814
35,2671,295,3.549153
31,2572,261,3.43295
37,2724,252,2.97619
32,2581,251,3.155378
51,3536,207,3.748792


In [138]:
movies[movies['movie_id'] == highRatedRomCom['movie_id'].iloc[0]]

Unnamed: 0,movie_id,title,genres,year,short_title
2327,2396,Shakespeare in Love (1998),Comedy|Romance,1998,Shakespeare in Love


In [137]:
movies[movies['movie_id'].isin( highRatedRomCom['movie_id'].head(3))]

Unnamed: 0,movie_id,title,genres,year,short_title
1720,1777,"Wedding Singer, The (1998)",Comedy|Romance,1998,"Wedding Singer, The"
2327,2396,Shakespeare in Love (1998),Comedy|Romance,1998,Shakespeare in Love
2433,2502,Office Space (1999),Comedy|Romance,1999,Office Space


In [127]:
userMovieRatings[(userMovieRatings['short_title']=='Shakespeare in Love') & (userMovieRatings['age'].isin([25,30]))]

Unnamed: 0,user_id,gender,age,occupation,zip,movie_id,rating,timestamp,title,genres,year,short_title
144107,8,M,25,12,11413,2396,5,978229524,Shakespeare in Love (1998),Comedy|Romance,1998,Shakespeare in Love
144109,11,F,25,1,04093,2396,2,978902561,Shakespeare in Love (1998),Comedy|Romance,1998,Shakespeare in Love
144111,15,M,25,7,22903,2396,4,978196817,Shakespeare in Love (1998),Comedy|Romance,1998,Shakespeare in Love
144115,24,F,25,7,10023,2396,4,978134987,Shakespeare in Love (1998),Comedy|Romance,1998,Shakespeare in Love
144116,28,F,25,1,14607,2396,5,978125846,Shakespeare in Love (1998),Comedy|Romance,1998,Shakespeare in Love
...,...,...,...,...,...,...,...,...,...,...,...,...
146467,6024,M,25,12,53705,2396,4,956749422,Shakespeare in Love (1998),Comedy|Romance,1998,Shakespeare in Love
146468,6025,F,25,1,32607,2396,4,956731134,Shakespeare in Love (1998),Comedy|Romance,1998,Shakespeare in Love
146469,6035,F,25,1,78734,2396,4,956712860,Shakespeare in Love (1998),Comedy|Romance,1998,Shakespeare in Love
146470,6036,F,25,15,32603,2396,2,956710191,Shakespeare in Love (1998),Comedy|Romance,1998,Shakespeare in Love


### Date Night Movie choosen is "Shakespeare in Love"
- It is a RomCom produced in the year 1998 and is higly rated by the people with age 25 or 30, so it satisfies all the preferences