# Proyecto 7: Deploy de sistema de recomendación con Watson

En este proyecto llevaremos a cabo la puesta en producción del modelo entrenado en el proyecto 5. Es decir, lo subirmos la nube de IBM y utilizando llamados a la API de Watson tendremos acceso a él para realizar predicciones.

In [1]:
import warnings
warnings.filterwarnings("ignore")
import sklearn
from sklearn.datasets import load_files
moviedir = r'./dataset/movie_reviews' 
movie_reviews = load_files(moviedir, shuffle=True)


In [2]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    movie_reviews.data, movie_reviews.target, test_size = 0.20, stratify=movie_reviews.target, random_state = 12)

In [3]:
from sklearn.externals import joblib
eclf = joblib.load('/home/lbenitez/Escritorio/DS-Acamica/entrega_7/entrega+7/dataset/model/sentiment.pkl') 

IBM Watson

**1) Cargá** la biblioteca `WatsonMachineLearningAPIClient`

In [4]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient



**2) Creá** variable con las credenciales que necesita `Watson`. 

In [5]:
wml_credentials={
  "apikey": " ",    
  "iam_apikey_description": " ",
  "iam_apikey_name": "Credenciales de servicio-1",
  "iam_role_crn":  ",
  "iam_serviceid_crn": "",
  "instance_id": "",
  "url": " "  
}

**3) Declará** la variable `client` y guardá en ella al objeto `WatsonMachineLearningAPIClient` con las credenciales como parámetro

In [6]:
client = WatsonMachineLearningAPIClient(wml_credentials)

**4) Creá** una variable que guarde las propiedades del modelo. Datos del autor y nombre del proyecto.

In [7]:
model_props = {client.repository.ModelMetaNames.AUTHOR_NAME: "Lucia Benitez", 
               client.repository.ModelMetaNames.AUTHOR_EMAIL: "mail@gmail.com",
               client.repository.ModelMetaNames.NAME: "Reviews classification"}

**5) Hacé** un pipeline que contenga como primer paso a un `TfidfVectorizer` y como segundo paso, al mejor modelo que hayas obtenido en el proyecto 5. **Entrená** con este pipeline.

In [8]:
from sklearn.pipeline import make_pipeline
from sklearn.feature_extraction.text import TfidfVectorizer

In [9]:
# Realizá los pasos necesarios para vectorizar los conjuntos de entrenamiento y testeo en esta celda:
from sklearn.feature_extraction.text import TfidfVectorizer

vect = TfidfVectorizer(strip_accents='unicode',
                             stop_words='english',
                             token_pattern='\w+')

pipeline = make_pipeline(vect, eclf)
pipeline.fit(X_train, y_train)

Pipeline(memory=None,
         steps=[('tfidfvectorizer',
                 TfidfVectorizer(analyzer='word', binary=False,
                                 decode_error='strict',
                                 dtype=<class 'numpy.float64'>,
                                 encoding='utf-8', input='content',
                                 lowercase=True, max_df=1.0, max_features=None,
                                 min_df=1, ngram_range=(1, 1), norm='l2',
                                 preprocessor=None, smooth_idf=True,
                                 stop_words='english', strip_accents='unicode',
                                 sublinear_tf=F...
                                                             n_iter_no_change=10,
                                                             nesterovs_momentum=True,
                                                             power_t=0.5,
                                                             random_state=0,
                 

**6) Subí** al modelo a IBM Cloud usando `client.repository.store_model` con los parámetros correctos.

In [10]:
published_model = client.repository.store_model(model=pipeline, 
                                                meta_props=model_props, 
                                                training_data=X_train, 
                                                training_target=y_train)

**7) Obtené** el `uid` del modelo y guardalo en una variable.

In [11]:
published_model_uid = client.repository.get_model_uid(published_model)
model_details = client.repository.get_details(published_model_uid)

In [12]:
models_details = client.repository.list_models()

------------------------------------  ----------------------  ------------------------  -----------------
GUID                                  NAME                    CREATED                   FRAMEWORK
82ec1497-cabe-49f3-85b7-090a753ec170  Reviews classification  2020-03-12T23:14:27.571Z  scikit-learn-0.22
0649c85e-e953-4178-8b84-37f5d2a8a18c  Reviews classification  2020-03-12T23:09:02.591Z  scikit-learn-0.22
0fff0bc7-bb22-4a10-bb42-4de88852ffd5  Reviews classification  2020-03-12T22:59:03.817Z  scikit-learn-0.22
aa05ca88-ca93-4282-99b3-2d3d80267885  Reviews classification  2020-03-12T22:42:08.818Z  scikit-learn-0.22
99740988-8f37-4a03-b960-44c7ccf03735  Reviews classification  2020-03-12T22:22:15.408Z  scikit-learn-0.22
------------------------------------  ----------------------  ------------------------  -----------------


**8) Cargá** el modelo basado en su `uid` y utilizalo para realizar la predicción sobre el conjunto de test

In [13]:
loaded_model = client.repository.load(published_model_uid)
test_predictions = loaded_model.predict(X_test) 

**9) Mostrar** el `classification_report` obtenido

In [14]:
# Mostrá el roc_auc score y el classification_report en esta celda
from sklearn.metrics import classification_report

print('classification report:', classification_report(y_test, test_predictions))

classification report:               precision    recall  f1-score   support

           0       0.84      0.90      0.87       200
           1       0.89      0.83      0.86       200

    accuracy                           0.86       400
   macro avg       0.87      0.86      0.86       400
weighted avg       0.87      0.86      0.86       400



In [15]:
reviews_new = ["Stallone creates credible villains worthy of his heroic character.",
               "Another brilliant Rocky film, probably my favourite one out of the lot",
               "Yeah, this movie sucks.",
               "My favourite rocky film! So good so much heart. Slightly better than 2",
               "What has this got to do with boxing. Also everyone looked like dolls. Also if you are a real true boxing fan (not casuals), you would understand that this stupidity is no description of boxing!!",
               "The new film's narrative is stripped down to essentials, which gives it an emblematic quality.",
               "Absurdly ridiculous, this just isn't a good movie at all", 
               "Very basic and predictable but still an okay movie. No special music to save this one.", 
              "Rocky 4 is an extremely ambitious movie that is definitely worth watching.",
              'Highly beautiful',
               "If it wasn't for the robots (WTF????), and the painfully overwritten lines of an absurdly dogmatic persuasion, then this would otherwise be nothing more than an interminable series of mildly rousing montages. There are some unintentionally funny bits though, and Dolph's Ivan showcases the best and worst of all Rocky's opponents.",
              "While all aspects of realism is thrown out the window, ROCKY IV is an adrenaline rush of action and excitment, with an incredible soundtrack and arguably the best movie fight in history between Balboa and Drago",
              "Just like the songs, exercise routines and repetitive clips, it seems redundant to add another installment in this already falling franchise when you clearly lack material. Rocky IV is petty, childish and seems overlong despite of its 91 minutes of run time for it merely has an idea of a TV drama episode which is stretched to a point of exhaustion. Its painful to watch Sylvester Stallone go through this enormous amount of training and hardly make a point out there. He fails on all the levels here; writer, director and actor, to deliver any loose end of the thread for the audience to hang on to. Rocky IV is predictable, loosely written and choreographed and executed unsupervised."]


In [16]:
predictions = loaded_model.predict(reviews_new)


In [17]:
for review, category in zip(reviews_new, predictions):
    print('%r => %s \n' % (review, movie_reviews.target_names[category]))

'Stallone creates credible villains worthy of his heroic character.' => neg 

'Another brilliant Rocky film, probably my favourite one out of the lot' => pos 

'Yeah, this movie sucks.' => neg 

'My favourite rocky film! So good so much heart. Slightly better than 2' => pos 

'What has this got to do with boxing. Also everyone looked like dolls. Also if you are a real true boxing fan (not casuals), you would understand that this stupidity is no description of boxing!!' => neg 

"The new film's narrative is stripped down to essentials, which gives it an emblematic quality." => pos 

"Absurdly ridiculous, this just isn't a good movie at all" => neg 

'Very basic and predictable but still an okay movie. No special music to save this one.' => neg 

'Rocky 4 is an extremely ambitious movie that is definitely worth watching.' => pos 

'Highly beautiful' => pos 

"If it wasn't for the robots (WTF????), and the painfully overwritten lines of an absurdly dogmatic persuasion, then this would oth