# Import the Data Set

## The dataset structure 


Using the <a href="http://files.grouplens.org/datasets/movielens/ml-latest-small-README.html" target="new">README</a> available with the dataset content, you can extract the following details about the data file structure:

- ***Ratings***:

    - `userId` & `movieid`: represent the user id and movie id
    - `rating` : uses a 5-star scale, with 0.5 star increments
    - `timestamp` : use the epoch format (seconds since midnight of January 1, 1970 on UTC time zone)

- ***Tags***:

    - `userId` & `movieid`: represent the user id and movie id
    - `tag` : represent user-generated textual metadata
    - `timestamp` : use the epoch format (seconds since midnight of January 1, 1970 on UTC time zone)

- ***Movies***:

    - `movieid`: represent the movie id
    - `title` : represent the full movie title and may include the year of release
    - `genre` : a pipe-separated list of genres associated with the movie

- ***Links***:

    - `movieid`: represent the movie id
    - `imdbId` : can be used to generate a link to the ***`IMDb`*** site.
    - `tmdbId` : can be used to generate a link to the ***`The Movie DB`*** site.


### **Set the SQLAlchemy import along with IPython-SQL magic**

In [1]:
import sqlalchemy, os
from sqlalchemy import create_engine

%reload_ext sql
%config SqlMagic.displaylimit = 5
%config SqlMagic.feedback = False
%config SqlMagic.autopandas = True

 ### **Define the connection string and the target schema**

In [2]:
hxe_connection = 'hana://ML_USER:Welcome18@hxehost:39015';

### **Inititalize the connection and set the schema**

In [3]:
%sql $hxe_connection

'Connected: @None'

### **Drop the tables if they exists**

In [4]:
%%sql
drop table movielens_links;
drop table movielens_movies;
drop table movielens_ratings;
drop table movielens_tags;

 * hana://ML_USER:Welcome18@hxehost:39015


### **Create the MovieLens tables**

In [5]:
%%sql 
create column table movielens_links(
  movieid integer not null,
  imdbid  integer,
  tmdbid  integer,
  primary key (
    movieid
  )
);

create column table movielens_movies(
  movieid integer not null,
  title   nvarchar(255),
  genres  nvarchar(255),
  primary key (
    movieid
  )
);

create column table movielens_ratings(
  userid    integer not null,
  movieid   integer not null,
  rating    decimal,
  timestamp integer,
  primary key (
    userid,
    movieid
  )
);

create column table movielens_tags(
  userid    integer not null,
  movieid   integer not null,
  tag       nvarchar(255)  not null,
  timestamp integer,
  primary key (
    userid,
    movieid,
    tag
  )
);

 * hana://ML_USER:Welcome18@hxehost:39015


### **Import the data in the MovieLens data**

In [6]:
%%sql 
import from csv file '/usr/sap/HXE/HDB90/work/data/movielens/links.csv' into movielens_links
with
   record delimited by '\n'
   field delimited by ','
   optionally enclosed by '"'
   skip first 1 row
   fail on invalid data
   error log '/home/jupyteradm/log/links.csv.err'
;

import from csv file '/usr/sap/HXE/HDB90/work/data/movielens/movies.csv' into movielens_movies
with
   record delimited by '\n'
   field delimited by ','
   optionally enclosed by '"'
   skip first 1 row
   fail on invalid data
   error log '/home/jupyteradm/log/movies.csv.err'
;

import from csv file '/usr/sap/HXE/HDB90/work/data/movielens/ratings.csv' into movielens_ratings
with
   record delimited by '\n'
   field delimited by ','
   optionally enclosed by '"'
   skip first 1 row
   fail on invalid data
   error log '/home/jupyteradm/log/ratings.csv.err'
;

import from csv file '/usr/sap/HXE/HDB90/work/data/movielens/tags.csv' into movielens_tags
with
   record delimited by '\n'
   field delimited by ','
   optionally enclosed by '"'
   skip first 1 row
   fail on invalid data
   error log '/home/jupyteradm/log/tags.csv.err'
;

 * hana://ML_USER:Welcome18@hxehost:39015


### **Count the number of rows loaded**

In [7]:
%%sql 
select 'links'   as table, count(1) as count from movielens_links
union all
select 'movies'  as table, count(1) as count from movielens_movies
union all
select 'ratings' as table, count(1) as count from movielens_ratings
union all
select 'tags'    as table, count(1) as count from movielens_tags;

 * hana://ML_USER:Welcome18@hxehost:39015


Unnamed: 0,table,count
0,links,9125
1,movies,9125
2,ratings,100004
3,tags,1296
