# Training and evaluation of the comments model

This notebook contains the training of the model to classify a comment in the platform as positive or negative. It also contains a section where the model performance is evaluated.

- Created by: Juan Fernandez
- Created on: 2/Jan/2019
- Modified by: William Alexander
- Meidifed on: 16/March/2019

In [1]:
import requests

import numpy as np
import pandas as pd

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.pipeline import make_pipeline

In [2]:
COMMENTS_URL = 'https://jsonplaceholder.typicode.com/comments'

### Data preprocessing

In [3]:
def get_comments(comments_url):
    comments = requests.get(comments_url).json()
    return pd.DataFrame(comments).drop(columns=['id'])

def get_labeled_comments(comments_url):
    comments = get_comments(comments_url)
    comments['sentiment'] = np.random.randint(0, 2, size=len(comments))
    return comments

In [4]:
comments = get_labeled_comments(COMMENTS_URL)
comments.head()

Unnamed: 0,body,email,name,postId,sentiment
0,laudantium enim quasi est quidem magnam volupt...,Eliseo@gardner.biz,id labore ex et quam laborum,1,1
1,est natus enim nihil est dolore omnis voluptat...,Jayne_Kuhic@sydney.com,quo vero reiciendis velit similique earum,1,1
2,quia molestiae reprehenderit quasi aspernatur\...,Nikita@garfield.biz,odio adipisci rerum aut animi,1,0
3,non et atque\noccaecati deserunt quas accusant...,Lew@alysha.tv,alias odio sit,1,1
4,harum non quasi et ratione\ntempore iure ex vo...,Hayden@althea.biz,vero eaque aliquid doloribus et culpa,1,1


In [5]:
X_train, X_test, y_train, y_test = train_test_split(comments['body'], comments['sentiment'])

### Feature engineering

In [6]:
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(X_train)
print('Sparsity:', (np.prod(X.shape) - X.count_nonzero()) / np.prod(X.shape))

Sparsity: 0.879572744014733


### Training

In [7]:
logreg = LogisticRegression()
logreg = logreg.fit(X, y_train)



### Evaluation

In [8]:
accuracy_score(y_test, logreg.predict(vectorizer.transform(X_test)))

0.464