#Random Forest?
##Bagging Model
Bootstrap Aggregation or bagging involves taking multiple samples from your training dataset (with replacement) and training a model for each sample.

The final output prediction is averaged across the predictions of all of the sub-models

<b>There are 3 Bagging models </b><br>
1) Bagged Decision Trees<br>
2) Random Forest<br>
3) Extra Trees<br>

# Random Forest
Random forest is an extension of bagged decision trees.

Samples of the training dataset are taken with replacement, but the trees are constructed in a way that reduces the correlation between individual classifiers. Specifically, rather than greedily choosing the best split point in the construction of the tree, only a random subset of features are considered for each split.

You can construct a Random Forest model for classification using the RandomForestClassifier class.


#Data Processing

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
df=pd.read_csv('data.csv')

In [None]:
df.head()

Unnamed: 0,Sentence,Sentiment
0,The GeoSolutions technology will leverage Bene...,positive
1,"$ESI on lows, down $1.50 to $2.50 BK a real po...",negative
2,"For the last quarter of 2010 , Componenta 's n...",positive
3,According to the Finnish-Russian Chamber of Co...,neutral
4,The Swedish buyout firm has sold its remaining...,neutral


In [None]:
df.describe() #this mainly works on numerical columns

Unnamed: 0,Sentence,Sentiment
count,5842,5842
unique,5322,3
top,Managing Director 's comments : `` Net sales f...,neutral
freq,2,3130


In [None]:
df['Sentiment'].value_counts()

neutral     3130
positive    1852
negative     860
Name: Sentiment, dtype: int64

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
x_train,x_test,y_train,y_test = train_test_split(df['Sentence'],df['Sentiment'],test_size=0.25,random_state = 0,shuffle = True)

#Building Model(Random Forest)

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline

In [None]:
#creating pipeline object which will be follow the same sequences of execution
pipeliner= Pipeline([("tfidf",TfidfVectorizer()),("classifier",RandomForestClassifier(n_estimators=100))])

In [None]:
pipeliner.fit(x_train,y_train)

Pipeline(steps=[('tfidf', TfidfVectorizer()),
                ('classifier', RandomForestClassifier())])

## Model is created, and trained

#  Predicting the model(Random Forest)

In [None]:
#to predict we have a predict method in sklearn
y_pred = pipeliner.predict(x_test)

In [None]:
#to compare or check the predicted data we have to import these
from sklearn.metrics import classification_report,accuracy_score,confusion_matrix

In [None]:
accuracy_score(y_test,y_pred)

NameError: ignored

In [None]:
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

    negative       0.20      0.11      0.15       235
     neutral       0.66      0.84      0.74       789
    positive       0.76      0.58      0.66       437

    accuracy                           0.64      1461
   macro avg       0.54      0.51      0.51      1461
weighted avg       0.62      0.64      0.62      1461



# Now take input from the user and test the data weather it is positive/negative/neutral

In [None]:
test1 = ["The GeoSolutions technology will leverage Benefon 's GPS solutions by providing Location Based Search Technology , a Communities Platform , location relevant multimedia content and a new and powerful commercial model "]
test2 = ["$ESI on lows, down $1.50 to $2.50 BK a real possibility"]
test3 = ["According to the Finnish-Russian Chamber of Commerce , all the major construction companies of Finland are operating in Russia ."]
test = ["Netflix has won the best  selection of films","Hulu has a great UI","I dislike like the new crime series","I hate waiting for the next series to come out"]

In [None]:
print(pipeliner.predict(test1))
print(pipeliner.predict(test2))
print(pipeliner.predict(test3))
print(pipeliner.predict(test))

['positive']
['negative']
['neutral']
['positive' 'neutral' 'neutral' 'neutral']


In [None]:
import pickle
with open('Sentiment_analysis_RF.pkl', 'wb') as handle:
  pickle.dump(pipeliner, handle, protocol=pickle.HIGHEST_PROTOCOL)