Let's experiment on the [MovieLens 100K Dataset](https://grouplens.org/datasets/movielens/). It includes 100,000 ratings from 1000 users on 1700 movies and was released in 1998. Find the zipfile [here](https://grouplens.org/datasets/movielens/100k/)

In [1]:
import pandas as pd
import zipfile
from pathlib import Path

In [2]:
path = Path('data/ml-100k')
with zipfile.ZipFile(str(path) + '.zip', 'r') as zip_ref:
    zip_ref.extractall('data')

In [3]:
!ls {path}

README       u.genre      u.user       u2.test      u4.test      ua.test
allbut.pl    u.info       u1.base      u3.base      u5.base      ub.base
mku.sh       u.item       u1.test      u3.test      u5.test      ub.test
u.data       u.occupation u2.base      u4.base      ua.base


### Loading the DataFrames

In [6]:
user_cols = ['user_id', 'age', 'sex', 'occupation', 'zip_code']
user_df = pd.read_csv(path/'u.user', sep='|', names=user_cols,
                     encoding='latin-1')
user_df.head()

Unnamed: 0,user_id,age,sex,occupation,zip_code
0,1,24,M,technician,85711
1,2,53,F,other,94043
2,3,23,M,writer,32067
3,4,24,M,technician,43537
4,5,33,F,other,15213


In [7]:
rating_cols = ['user_id', 'movie_id', 'ratings', 'unix_timestamp']
rating_df = pd.read_csv(path/'u.data', sep='\t', names=rating_cols,
                       encoding='latin-1')
rating_df.head()

Unnamed: 0,user_id,movie_id,ratings,unix_timestamp
0,196,242,3,881250949
1,186,302,3,891717742
2,22,377,1,878887116
3,244,51,2,880606923
4,166,346,1,886397596


In [8]:
movie_cols = ['movie_id', 'title', 'release_date', 'video_release_date', 'imdb_url']
# Load only the first 5 columns
movie_df = pd.read_csv(path/'u.item', sep='|', names=movie_cols, 
                       usecols=range(5), encoding='latin-1')
movie_df.head()

Unnamed: 0,movie_id,title,release_date,video_release_date,imdb_url
0,1,Toy Story (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Toy%20Story%2...
1,2,GoldenEye (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?GoldenEye%20(...
2,3,Four Rooms (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Four%20Rooms%...
3,4,Get Shorty (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Get%20Shorty%...
4,5,Copycat (1995),01-Jan-1995,,http://us.imdb.com/M/title-exact?Copycat%20(1995)
