### IMDB 영화평 감성분석

- Pipeline을 이용한 GridSearchCV
- TfidfVectorizer + NaiveBayes, LogisticRegression

In [25]:
import numpy as np
import pandas as pd

In [26]:
df = pd.read_csv('../data/labeledTrainData.tsv', sep='\t', quoting=3)   # 3: QUOTE NONE
df.review = df.review.str.replace('<br />', ' ')
df.review = df.review.str.replace('[^A-Za-z]', ' ', regex=True)

In [27]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    df.review.values, df.sentiment.values, stratify=df.sentiment.values, 
    test_size=0.2, random_state=2023
)

##### Pipelining

In [28]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.pipeline import Pipeline
from sklearn.svm import SVC

In [29]:
tvect = TfidfVectorizer(ngram_range=(1,2), stop_words='english')
# svc = SVC(random_state=2023)
# pipeline = Pipeline([('TVECT', tvect), ('SVC', svc)])

In [30]:
# 학습 시간 오래 걸린다.
# %time pipeline.fit(X_train, y_train)

In [31]:
from sklearn.naive_bayes import MultinomialNB

In [32]:
nb = MultinomialNB()
pline = Pipeline([('TVECT', tvect), ('NB', nb)])

In [33]:
# 학습
%time pline.fit(X_train, y_train)

CPU times: total: 12.6 s
Wall time: 13.7 s


In [34]:
pline.score(X_test, y_test)

0.8804

In [35]:
from sklearn.linear_model import LogisticRegression
lrc = LogisticRegression(random_state=2023)
pline = Pipeline([('TVECT', tvect), ('LRC', lrc)])
%time pline.fit(X_train, y_train)

CPU times: total: 32.7 s
Wall time: 31.5 s


In [36]:
pline.score(X_test, y_test)

0.8818

##### 최적 파라메터 찾기 : 시간소요됨

In [37]:
from sklearn.model_selection import GridSearchCV
params = { 
    'TVECT__max_df': [100, 500],
    'LRC__C': [1, 10]
}

In [38]:
grid_pipe = GridSearchCV(
    pline, params, scoring='accuracy', cv=3
)

%time grid_pipe.fit(X_train, y_train)

CPU times: total: 4min 31s
Wall time: 4min 34s


In [44]:
grid_pipe.best_params_

{'LRC__C': 10, 'TVECT__max_df': 500}

In [45]:
best_pipe = grid_pipe.best_estimator_
best_pipe.score(X_test, y_test)

0.89

- 실 데이터에 적용

In [46]:
review = ['''
I was very much disappointed by this flat action movie and its predictable ending. I am a fan of the old Mission:Impossible series of the 60s and 80s and therefore I think the plot is ridiculous at best. Why should Jim Phelps do what he did? He was always loyal through the many episodes of the series and there he could have gotten much more money if he had betrayed his team.

The reason why Peter Graves (Jim Phelps) did not star in this movie is because he did not agree with what I have just said.

Anyway this movie is NOT for fans of the series because there is nothing left of the teamwork spirit of the series. It is a one man show for Tom Cruise.
''',
'''
This is, without a doubt, one of my favorite films of all time! I'll never forget watching this film for the first time with a good buddy of mine, afterward we couldn't stop talking about it and spent a great deal of time explaining plot points to each other. We finally decided that we just had to see it again, so we did and all of our questions were answered and our theories proven correct.

The story is nothing less than superb! Every time you think you have the movie figured out they throw you for another loop, but not too much as to get you irritated trying to figure out the plot. This is most definitely a film that deserves at least two viewings before you can truly understand and appreciate the story. The characters are all excellent as well, although I was sad to see Jack Harmen (Emilio Estevez) get killed off so quickly, I liked his character.

The cast is extraordinary! Tom Cruise plays Ethan Hunt perfectly! Jon Voight was the perfect choice for Jim Phelps. Emmanuelle Beart was very good in her role. Henry Czerny was superb as Kittridge. Jean Reno was an excellent addition to the cast. Ving Rhames was a very nice touch and really added a lot to the film. Kristin Scott Thomas was lovely as always, although played a somewhat minor role in the greater scheme of things. Vanessa Redgrave was another nice addition to the cast. And finally, Emilio Estevez (as I mentioned above), played a small role, and played it quite well.

I can see why some of the big fans of the show wouldn't like this film due to certain plot points that I can't give away, so if you are a big fan of the show, be forewarned, you may have some issues with the film. Personally, I've never seen a single episode of the old television show, so I had absolutely no frame of reference. Which, I believe, put me in a better position to appreciate the story.

I feel that I have to mention the action scenes in this film! SPECTACULAR!!! The scene where Kittridge and Hunt are talking in the restaurant...just AWESOME! The entire last 20 minutes of the film...UNBELIEVABLE!!! The filming, the action, the special effects and stunts alone make this film worth watching (but luckily, there is so much more to appreciate).

If you are a fan of Tom Cruise, or just crime/mystery/action films in general, be sure to check this one out (at least twice). This is honestly one of my top 20 films of all time, I truly hope that you will enjoy this film. Thanks for reading,

-Chris

'''          
]

In [47]:
import re
review = map(lambda x: re.sub('[^A-Za-z]', ' ', x), review)

In [48]:
best_pipe.predict(review)

array([1, 1], dtype=int64)