![BTS](https://github.com/vfp1/bts-mbds-data-science-foundations-2019/raw/master/sessions/img/Logo-BTS.jpg)

# Session 09: 6thAssignmentRecommendationSystem
### Lenin Escobar - Data-driven Business
Open this notebook in Google Colaboratory: [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lveagithub/bts-advanced-data-analysis-2020/blob/master/Assigments/AssociationRulesForMarketBasketAnalysis/AssociationRulesForMarketBasketAnalysis.ipynb)

<p>Please create a recommendation system by using the same data in module 9th. It should be a hybrid solution of Content-Based, Collaborative Item-Based, and Collaborative User-Based. The expectation is the best 10 movie recommendations for user "2".

Required submissions are the python code and two summary slides.</p>

<h1 style="background-color:powderblue;">Setting Virtual Env</h1>

In [None]:
# https://developers.google.com/api-client-library
# https://github.com/googleapis/google-api-python-client

In [1]:
!pip install virtualenv

Collecting virtualenv
  Downloading virtualenv-20.4.6-py2.py3-none-any.whl (7.2 MB)
[K     |████████████████████████████████| 7.2 MB 5.3 MB/s eta 0:00:01
[?25hCollecting distlib<1,>=0.3.1
  Downloading distlib-0.3.1-py2.py3-none-any.whl (335 kB)
[K     |████████████████████████████████| 335 kB 7.0 MB/s eta 0:00:01
Collecting filelock<4,>=3.0.0
  Downloading filelock-3.0.12-py3-none-any.whl (7.6 kB)
Installing collected packages: filelock, distlib, virtualenv
Successfully installed distlib-0.3.1 filelock-3.0.12 virtualenv-20.4.6


In [2]:
!mkdir python-virtual-environments

In [3]:
!cd python-virtual-environments && python3 -m venv env

In [6]:
!ls -ltr python-virtual-environments/

total 4
drwxr-xr-x 5 jovyan users 4096 May 10 17:38 env


In [7]:
!source python-virtual-environments/env/bin/activate

In [9]:
!pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib

Collecting google-api-python-client
  Downloading google_api_python_client-2.3.0-py2.py3-none-any.whl (7.1 MB)
[K     |████████████████████████████████| 7.1 MB 6.5 MB/s eta 0:00:01
[?25hCollecting google-auth-httplib2
  Downloading google_auth_httplib2-0.1.0-py2.py3-none-any.whl (9.3 kB)
Collecting uritemplate<4dev,>=3.0.0
  Downloading uritemplate-3.0.1-py2.py3-none-any.whl (15 kB)
Collecting httplib2<1dev,>=0.15.0
  Downloading httplib2-0.19.1-py3-none-any.whl (95 kB)
[K     |████████████████████████████████| 95 kB 3.1 MB/s eta 0:00:01
Collecting google-api-core<2dev,>=1.21.0
  Downloading google_api_core-1.26.3-py2.py3-none-any.whl (93 kB)
[K     |████████████████████████████████| 93 kB 1.2 MB/s eta 0:00:01
Collecting googleapis-common-protos<2.0dev,>=1.6.0
  Downloading googleapis_common_protos-1.53.0-py2.py3-none-any.whl (198 kB)
[K     |████████████████████████████████| 198 kB 3.4 MB/s eta 0:00:01
Installing collected packages: httplib2, googleapis-common-protos, uritemplate

<h1 style="background-color:powderblue;">Google drive Conn Class</h1>

In [None]:
# According to Github best practices 
# (https://docs.github.com/en/github/managing-large-files/working-with-large-files): 
# GitHub limits the size of files allowed in repositories, 
# and will block a push to a repository if the files are larger than the maximum file limit.
# So, I better use google drive for these cases 

In [54]:
from googleapiclient.discovery import build
from googleapiclient.http import MediaIoBaseDownload
from google.oauth2 import service_account
import io
import os

In [58]:
DATA_PATH = 'Data/'
AUTHENTICATION_PATH = 'Authentication/'

In [59]:
#Safely create directories
if not os.path.exists(DATA_PATH):
    os.makedirs(DATA_PATH)

if not os.path.exists(AUTHENTICATION_PATH):
    os.makedirs(AUTHENTICATION_PATH)

In [28]:
#Following the Python name convention: https://www.python.org/dev/peps/pep-0008/#method-names-and-instance-variables

In [51]:
class GoogleDriveFiles():
    """Custom Google Drive Class"""
    def __init__(self, credentials_file_name):
        #https://console.cloud.google.com/iam-admin/serviceaccounts/details/113530738473514992506/keys?project=feisty-dolphin-313318
        #https://console.cloud.google.com/apis/credentials?organizationId=0&project=feisty-dolphin-313318&supportedpurview=project
        #https://console.cloud.google.com/apis/library/drive.googleapis.com?project=feisty-dolphin-313318
        self.credentials_file_name = credentials_file_name # Credentials compatible with this API
        self.credentials = service_account.Credentials.from_service_account_file(self.credentials_file_name) # Global Service Account Credentials
    def download_file_from_gdrive(self,file_id, downloaded_file_name, verbose = False):
        """Get credentials from file
        :return:boolean
        """
        drive_service = build('drive', 'v3', credentials=self.credentials)

        request = drive_service.files().get_media(fileId=file_id)
        #fh = io.BytesIO() # this can be used to keep in memory
        fh = io.FileIO(downloaded_file_name, 'wb') # this can be used to write to disk
        downloader = MediaIoBaseDownload(fh, request)
        done = False
        while done is False:
            status, done = downloader.next_chunk()
            if verbose:
                print (f'%{int(status.progress() * 100)} downloaded file: {downloaded_file_name}')
                #print("Download %d%%." % int(status.progress() * 100)) 
        return done


In [52]:
#Google Drive- File

#Class-methods params
credentials_file_name = AUTHENTICATION_PATH + 'python_service.json'

#Initializing Class
google_drive = GoogleDriveFiles(credentials_file_name)

#Downloading movie file
file_id = '1vohRvV1h0_t3HAom1s7DYPktphisFSzD'
movie_file_path = DATA_PATH + 'movie.csv'
download_status = google_drive.download_file_from_gdrive(file_id, movie_file_path, verbose = True)
print(download_status)

#Downloading movies_metadata. file
file_id = '1ivsaFIMHqKviywyGJGVQhXVrXkuDOGUo'
movies_metadata_file_path = DATA_PATH + 'movies_metadata.csv'
download_status = google_drive.download_file_from_gdrive(file_id, movies_metadata_file_path, verbose = True)
print(download_status)

#Downloading rating file
file_id = '1vXPUmeZIrbrBJi8FJ7IDSrgTIeqO5skT'
rating_file_path = DATA_PATH + 'rating.csv'
download_status = google_drive.download_file_from_gdrive(file_id, rating_file_path, verbose = True)
print(download_status)


%100 downloaded file: Data/movie.csv
True
%100 downloaded file: Data/movies_metadata.csv
True
%15 downloaded file: Data/rating.csv
%30 downloaded file: Data/rating.csv
%45 downloaded file: Data/rating.csv
%60 downloaded file: Data/rating.csv
%75 downloaded file: Data/rating.csv
%91 downloaded file: Data/rating.csv
%100 downloaded file: Data/rating.csv
True


In [27]:
!ls -ltr

total 717700
-rwxr-xr-x 1 jovyan  1000 690353377 Apr 20 22:55  rating.csv
-rw-rw-r-- 1 jovyan  1000   1493648 May 10 15:17  movie_tmp.csv
-rw-rw-r-- 1 jovyan  1000  34445126 May 10 15:18  movies_metadata.csv
-rw-rw-r-- 1 jovyan  1000   6953818 May 10 15:18 '6- Data_Driven_Business_21th_04_2021_Recommendation05Load.pdf'
-rw-rw-r-- 1 jovyan  1000     38449 May 10 16:05  Recommendation03Load_Tmp.ipynb
-rw-r--r-- 1 jovyan users      1806 May 10 17:06  6thAssignmentRecommendationSystem_Tmp.ipynb
-rw-rw-r-- 1 jovyan  1000     99807 May 10 17:08  Recommendation03Load.ipynb
drwxr-xr-x 3 jovyan users      4096 May 10 17:38  python-virtual-environments
-rw-rw-r-- 1 jovyan  1000      2346 May 10 18:26  python_service.json
-rw-r--r-- 1 jovyan users     12651 May 10 18:44  6thAssignmentRecommendationSystem.ipynb
-rw-r--r-- 1 jovyan users   1493648 May 10 18:45  movie.csv
