# Proyecto 7: Deploy de sistema de recomendación con Watson

En este proyecto llevaremos a cabo la puesta en producción del modelo entrenado en el proyecto 5. Es decir, lo subirmos la nube de IBM y utilizando llamados a la API de Watson tendremos acceso a él para realizar predicciones.

In [1]:
import warnings
warnings.filterwarnings("ignore")
import sklearn
from sklearn.datasets import load_files
moviedir = r'./dataset/movie_reviews' 
movie_reviews = load_files(moviedir, shuffle=True)

In [2]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
movie_reviews.data, movie_reviews.target, test_size = 0.20, stratify=movie_reviews.target, random_state = 12)

In [3]:
movie_reviews.data[0]

b"arnold schwarzenegger has been an icon for action enthusiasts , since the late 80's , but lately his films have been very sloppy and the one-liners are getting worse . \nit's hard seeing arnold as mr . freeze in batman and robin , especially when he says tons of ice jokes , but hey he got 15 million , what's it matter to him ? \nonce again arnold has signed to do another expensive blockbuster , that can't compare with the likes of the terminator series , true lies and even eraser . \nin this so called dark thriller , the devil ( gabriel byrne ) has come upon earth , to impregnate a woman ( robin tunney ) which happens every 1000 years , and basically destroy the world , but apparently god has chosen one man , and that one man is jericho cane ( arnold himself ) . \nwith the help of a trusty sidekick ( kevin pollack ) , they will stop at nothing to let the devil take over the world ! \nparts of this are actually so absurd , that they would fit right in with dogma . \nyes , the film is 

In [4]:
from sklearn.externals import joblib
eclf = joblib.load('sentiment.pkl') 

IBM Watson

**1) Cargá** la biblioteca `WatsonMachineLearningAPIClient`

In [5]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient

**2) Creá** variable con las credenciales que necesita `Watson`. Ellas son: `url, access_key, username, password e instance_id`

In [6]:
wml_credentials={
    
 

}

**3) Declará** la variable `client` y guardá en ella al objeto `WatsonMachineLearningAPIClient` con las credenciales como parámetro

In [7]:
client = WatsonMachineLearningAPIClient(wml_credentials)

**4) Creá** una variable que guarde las propiedades del modelo. Datos del autor y nombre del proyecto.

In [8]:
model_props = {client.repository.ModelMetaNames.AUTHOR_NAME: "Guille Lencina", 
               client.repository.ModelMetaNames.NAME: "Entrega 7. Clasificación de peliculas",
               client.repository.DefinitionMetaNames.RUNTIME_NAME: 'python',
               client.repository.DefinitionMetaNames.RUNTIME_VERSION: '3.6'}

**5) Hacé** un pipeline que contenga como primer paso a un `TfidfVectorizer` y como segundo paso, al mejor modelo que hayas obtenido en el proyecto 5. **Entrená** con este pipeline.

In [9]:
from sklearn.feature_extraction.text import  TfidfVectorizer

tfidf = TfidfVectorizer()




In [10]:
from sklearn.pipeline import Pipeline
from sklearn.pipeline import make_pipeline



In [11]:
pipeline = Pipeline([('tfidf',tfidf),('clf',eclf)])
pipeline.fit(X_train, y_train)

Pipeline(memory=None,
         steps=[('tfidf',
                 TfidfVectorizer(analyzer='word', binary=False,
                                 decode_error='strict',
                                 dtype=<class 'numpy.float64'>,
                                 encoding='utf-8', input='content',
                                 lowercase=True, max_df=1.0, max_features=None,
                                 min_df=1, ngram_range=(1, 1), norm='l2',
                                 preprocessor=None, smooth_idf=True,
                                 stop_words=None, strip_accents=None,
                                 sublinear_tf=False,
                                 token_pattern='(?u)\\b\\w\\w+\\b',
                                 tokenizer=None, use_idf=True,
                                 vocabulary=None)),
                ('clf',
                 LinearSVC(C=1.0, class_weight=None, dual=True,
                           fit_intercept=True, intercept_scaling=1,
               

**6) Subí** al modelo a IBM Cloud usando `client.repository.store_model` con los parámetros correctos.

In [12]:
published_model = client.repository.store_model(model=pipeline, 
                                                meta_props=model_props, 
                                                training_data=X_train, 
                                                training_target=y_train)

**7) Obtené** el `uid` del modelo y guardalo en una variable.

In [13]:
published_model_uid = client.repository.get_model_uid(published_model)


In [14]:
model_details = client.repository.get_details(published_model_uid)

In [15]:
models_details = client.repository.list_models()

------------------------------------  -------------------------------------  ------------------------  -----------------
GUID                                  NAME                                   CREATED                   FRAMEWORK
33298522-ad04-4cf0-9793-e39d1cb80a58  Entrega 7. Clasificación de peliculas  2020-08-22T00:29:34.166Z  scikit-learn-0.22
f9fc583e-8c37-46ad-90af-d2ce3bb8c6c9  Film Reviews classification nuevo      2020-08-22T00:08:47.894Z  scikit-learn-0.22
08ef1fed-feb5-4163-9d69-ed65c0854238  Film Reviews classification nuevo      2020-08-22T00:02:13.252Z  scikit-learn-0.22
4312ba5f-a933-49c5-8388-0aa665a30027  Film Reviews classification nuevo      2020-08-21T23:56:34.523Z  scikit-learn-0.22
424515b0-7444-476f-bd08-a08d4e998042  Film Reviews classification nuevo      2020-08-21T23:53:56.635Z  scikit-learn-0.22
a46f8afb-d068-4f2b-9639-55d3e3f74baf  Film Reviews classification nuevo      2020-08-21T23:47:01.519Z  scikit-learn-0.22
d4aee3ce-ccf0-43cb-83c1-a854cd6ded05  Re

**8) Cargá** el modelo basado en su `uid` y utilizalo para realizar la predicción sobre el conjunto de test

In [16]:
loaded_model = client.repository.load(published_model_uid)

In [17]:
test_predictions = loaded_model.predict(X_test)

**9) Mostrar** el `classification_report` obtenido

In [18]:
from sklearn.metrics import roc_auc_score
from sklearn.metrics import classification_report

print("roc_auc score: ", roc_auc_score(y_test, test_predictions))
print(classification_report(y_test, test_predictions))

roc_auc score:  0.85
              precision    recall  f1-score   support

           0       0.87      0.82      0.85       200
           1       0.83      0.88      0.85       200

    accuracy                           0.85       400
   macro avg       0.85      0.85      0.85       400
weighted avg       0.85      0.85      0.85       400

