# PROJECT | Natural Language Processing Challenge

- **dataset/data.csv** dataset containing news articles with the following columns:

    label: 0 if the news is fake, 1 if the news is real.
    title: The headline of the news article.
    text: The full content of the article.
    subject: The category or topic of the news.
    date: The publication date of the article.


## Phase 1: Data Loading and Exploration

### Import Libraries

In [1]:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report,confusion_matrix,accuracy_score

### Load and Read dataset

In [81]:
df=pd.read_csv('./dataset/data.csv', encoding = "ISO-8859-1")


In [3]:
df.head()

Unnamed: 0,label,title,text,subject,date
0,1,"As U.S. budget fight looms, Republicans flip t...",WASHINGTON (Reuters) - The head of a conservat...,politicsNews,"December 31, 2017"
1,1,U.S. military to accept transgender recruits o...,WASHINGTON (Reuters) - Transgender people will...,politicsNews,"December 29, 2017"
2,1,Senior U.S. Republican senator: 'Let Mr. Muell...,WASHINGTON (Reuters) - The special counsel inv...,politicsNews,"December 31, 2017"
3,1,FBI Russia probe helped by Australian diplomat...,WASHINGTON (Reuters) - Trump campaign adviser ...,politicsNews,"December 30, 2017"
4,1,Trump wants Postal Service to charge 'much mor...,SEATTLE/WASHINGTON (Reuters) - President Donal...,politicsNews,"December 29, 2017"


In [21]:
print("Dataset shape: ", df.shape)
print("\nColumns: ", df.columns.tolist())
df.info()


Dataset shape:  (39942, 5)

Columns:  ['label', 'title', 'text', 'subject', 'date']
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 39942 entries, 0 to 39941
Data columns (total 5 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   label    39942 non-null  int64 
 1   title    39942 non-null  object
 2   text     39942 non-null  object
 3   subject  39942 non-null  object
 4   date     39942 non-null  object
dtypes: int64(1), object(4)
memory usage: 1.5+ MB


we can see already there are no missing values

In [66]:
print("Examples of titles:")
print(df[['label','title']].sample(5, random_state=42))

print("Examples of subjects:")
print(df['subject'].unique())


Examples of titles:
       label                                              title
6524       1  Oil business seen in strong position as Trump ...
30902      0  WHOA! COLLEGE SNOWFLAKE FREAKS OUT: Screams Fo...
36459      0  CRONY CORRUPT POLITICS: Obama Admin BLOCKED FB...
9801       1  Cruz campaign vetting Fiorina as a possible VP...
25638      0   Minnesota Woman Writes Amazing F*ck Off Lette...
Examples of subjects:
['politicsNews' 'worldnews' 'News' 'politics' 'Government News'
 'left-news']


In [67]:
df[df['subject'] == 'left-news'][['label', 'title']].head(5)

Unnamed: 0,label,title
37460,0,BARBRA STREISAND Gives Up On Dream Of Impeachi...
37461,0,WATCH: SENATOR LINDSEY GRAHAM DROPS BOMBSHELLâ...
37462,0,âCONSERVATIVE GAY GUYâ BLASTS Penceâs As...
37463,0,WHITE COLLEGE SNOWFLAKES Can âIdentifyâ As...
37464,0,BILL NYE The FAKE Science Guy THREATENS Conser...


We wanted to check if there are any 'left-news' subject that are not fake

In [68]:
df[(df['subject'] == 'left-news') & (df['label'] == 1)][['label', 'subject', 'title']].head()

Unnamed: 0,label,subject,title


We will count how many fake (0) or real (1) articles exist per subject 

In [69]:
pd.crosstab(df['subject'], df['label'], normalize='index')

label,0,1
subject,Unnamed: 1_level_1,Unnamed: 2_level_1
Government News,1.0,0.0
News,1.0,0.0
left-news,1.0,0.0
politics,1.0,0.0
politicsNews,0.0,1.0
worldnews,0.0,1.0


hot encoding

leme (stem)
randomforest

In [82]:
df

Unnamed: 0,label,title,text,subject,date
0,1,"As U.S. budget fight looms, Republicans flip t...",WASHINGTON (Reuters) - The head of a conservat...,politicsNews,"December 31, 2017"
1,1,U.S. military to accept transgender recruits o...,WASHINGTON (Reuters) - Transgender people will...,politicsNews,"December 29, 2017"
2,1,Senior U.S. Republican senator: 'Let Mr. Muell...,WASHINGTON (Reuters) - The special counsel inv...,politicsNews,"December 31, 2017"
3,1,FBI Russia probe helped by Australian diplomat...,WASHINGTON (Reuters) - Trump campaign adviser ...,politicsNews,"December 30, 2017"
4,1,Trump wants Postal Service to charge 'much mor...,SEATTLE/WASHINGTON (Reuters) - President Donal...,politicsNews,"December 29, 2017"
...,...,...,...,...,...
39937,0,THIS IS NOT A JOKE! Soros-Linked Group Has Pla...,"The Left has been organizing for decades, and ...",left-news,"Sep 22, 2016"
39938,0,THE SMARTEST WOMAN In Politics: âHow Trump C...,Monica Crowley offers some of the most brillia...,left-news,"Sep 22, 2016"
39939,0,BREAKING! SHOCKING VIDEO FROM CHARLOTTE RIOTS:...,Protest underway in Charlotte: Things got com...,left-news,"Sep 21, 2016"
39940,0,BREAKING! Charlotte News Station Reports Cops ...,"Local Charlotte, NC news station WSOCTV is rep...",left-news,"Sep 21, 2016"


## Phase 2: Text Preprocessing

In [5]:
import re
import nltk
from nltk.corpus import stopwords
from nltk.stem import SnowballStemmer
from nltk.tokenize import word_tokenize

nltk.download('stopwords')
nltk.download('punkt')

stop_words = set(stopwords.words('english'))
stemmer = SnowballStemmer("english")

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Lain\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Lain\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [83]:
df_label = df['label']

In [84]:
df.drop(columns=['label','date'], inplace=True)

In [85]:
df

Unnamed: 0,title,text,subject
0,"As U.S. budget fight looms, Republicans flip t...",WASHINGTON (Reuters) - The head of a conservat...,politicsNews
1,U.S. military to accept transgender recruits o...,WASHINGTON (Reuters) - Transgender people will...,politicsNews
2,Senior U.S. Republican senator: 'Let Mr. Muell...,WASHINGTON (Reuters) - The special counsel inv...,politicsNews
3,FBI Russia probe helped by Australian diplomat...,WASHINGTON (Reuters) - Trump campaign adviser ...,politicsNews
4,Trump wants Postal Service to charge 'much mor...,SEATTLE/WASHINGTON (Reuters) - President Donal...,politicsNews
...,...,...,...
39937,THIS IS NOT A JOKE! Soros-Linked Group Has Pla...,"The Left has been organizing for decades, and ...",left-news
39938,THE SMARTEST WOMAN In Politics: âHow Trump C...,Monica Crowley offers some of the most brillia...,left-news
39939,BREAKING! SHOCKING VIDEO FROM CHARLOTTE RIOTS:...,Protest underway in Charlotte: Things got com...,left-news
39940,BREAKING! Charlotte News Station Reports Cops ...,"Local Charlotte, NC news station WSOCTV is rep...",left-news


In [86]:
df.isnull().values.any()

False

In [87]:
# Split train & test
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(df, df_label, test_size=0.2, random_state=42)

In [88]:
print("X_train Shape ", X_train.shape)
print("y_train Shape ", y_train.shape)
print("X_test Shape ", X_test.shape)
print("y_test Shape ", y_test.shape)


X_train Shape  (31953, 3)
y_train Shape  (31953,)
X_test Shape  (7989, 3)
y_test Shape  (7989,)


In [89]:
X_train.isnull().values.any()

False

In [90]:
X_train

Unnamed: 0,title,text,subject
38232,MN: Mayoral Candidate Wants To DISARM COPS Aft...,Minneapolis mayoral candidate Raymond Dehn pr...,left-news
17455,"China confirms will amend party constitution, ...",BEIJING (Reuters) - China s ruling Communist P...,worldnews
15433,Saudi mass arrests jolt markets but many see o...,RIYADH (Reuters) - All major Gulf stock market...,worldnews
30412,WATCH: STONE-FACED ANDERSON COOPER Gets School...,President Trump s deputy assistant Sebastian G...,politics
22452,âLetâs Not Be Fake Newsâ: Trump Trolled...,Amateur president Donald Trump routinely propa...,News
...,...,...,...
6265,New York protesters camp out at Goldman Sachs ...,NEW YORK (Reuters) - Dozens of protesters gath...,politicsNews
11284,Nigeria says U.S. agrees delayed $593 million ...,ABUJA (Reuters) - The United States has formal...,worldnews
38158,BREAKING: CAR PLOWS INTO #Charlottesville Prot...,A car plowed into a group of counter-protester...,left-news
860,Republican tax plan would deal financial hit t...,WASHINGTON (Reuters) - The Republican tax plan...,politicsNews


In [52]:
X_test

Unnamed: 0,title,text,subject
6524,Oil business seen in strong position as Trump ...,(This January 3 story was corrected to remove...,politicsNews
30902,WHOA! COLLEGE SNOWFLAKE FREAKS OUT: Screams Fo...,So much for healthy debate on college campus I...,politics
36459,CRONY CORRUPT POLITICS: Obama Admin BLOCKED FB...,The information is spilling out little by litt...,Government News
9801,Cruz campaign vetting Fiorina as a possible VP...,WASHINGTON (Reuters) - U.S. Republican preside...,politicsNews
25638,Minnesota Woman Writes Amazing F*ck Off Lette...,"Attention, conservative men. This one is for y...",News
...,...,...,...
32688,GROSS! MADONNA OFFERS UP Sexual Services If Yo...,[Warning: Story contains graphic language]Pop ...,politics
39811,ANONYMOUS VIDEO Of Bill Clinton Raping 13 Yr O...,Anonymous is a loosely associated internationa...,left-news
3798,Treasury unit to share records with Senate for...,WASHINGTON (Reuters) - A unit of the U.S. Trea...,politicsNews
22854,Scientists Planning A March On Washington To ...,Now that the Women s March successfully outnum...,News


### Define cleaning function

In [31]:
def preprocess_text(text):
    # to lower case
    text = text.lower()

    # remove special characters, digits and punctuation
    text = re.sub(r'[^a-z\s]', '', text)

    # tokenize
    tokens = word_tokenize(text)

    # stopwords
    tokens = [w for w in tokens if w not in stop_words]

    # apply stemming 
    tokens = [stemmer.stem(w) for w in tokens]
    return ' '.join(tokens)

In [34]:
print(X_train['title']+ " " + X_train['text'])

38232    MN: Mayoral Candidate Wants To DISARM COPS Aft...
17455    China confirms will amend party constitution, ...
15433    Saudi mass arrests jolt markets but many see o...
30412    WATCH: STONE-FACED ANDERSON COOPER Gets School...
22452     âLetâs Not Be Fake Newsâ: Trump Trolled...
                               ...                        
6265     New York protesters camp out at Goldman Sachs ...
11284    Nigeria says U.S. agrees delayed $593 million ...
38158    BREAKING: CAR PLOWS INTO #Charlottesville Prot...
860      Republican tax plan would deal financial hit t...
15795    U.N. refugee commissioner says Australia must ...
Length: 31953, dtype: object


In [91]:
X_train['combined'] = X_train['title'] + " " + X_train['text']
X_train['clean_combined'] = X_train['combined'].apply(preprocess_text) # train and test

In [92]:
X_train

Unnamed: 0,title,text,subject,combined,clean_combined
38232,MN: Mayoral Candidate Wants To DISARM COPS Aft...,Minneapolis mayoral candidate Raymond Dehn pr...,left-news,MN: Mayoral Candidate Wants To DISARM COPS Aft...,mn mayor candid want disarm cop muslim cop kil...
17455,"China confirms will amend party constitution, ...",BEIJING (Reuters) - China s ruling Communist P...,worldnews,"China confirms will amend party constitution, ...",china confirm amend parti constitut like inclu...
15433,Saudi mass arrests jolt markets but many see o...,RIYADH (Reuters) - All major Gulf stock market...,worldnews,Saudi mass arrests jolt markets but many see o...,saudi mass arrest jolt market mani see overdu ...
30412,WATCH: STONE-FACED ANDERSON COOPER Gets School...,President Trump s deputy assistant Sebastian G...,politics,WATCH: STONE-FACED ANDERSON COOPER Gets School...,watch stonefac anderson cooper get school trum...
22452,âLetâs Not Be Fake Newsâ: Trump Trolled...,Amateur president Donald Trump routinely propa...,News,âLetâs Not Be Fake Newsâ: Trump Trolled...,let fake news trump troll hard swedish newspap...
...,...,...,...,...,...
6265,New York protesters camp out at Goldman Sachs ...,NEW YORK (Reuters) - Dozens of protesters gath...,politicsNews,New York protesters camp out at Goldman Sachs ...,new york protest camp goldman sach oppos trump...
11284,Nigeria says U.S. agrees delayed $593 million ...,ABUJA (Reuters) - The United States has formal...,worldnews,Nigeria says U.S. agrees delayed $593 million ...,nigeria say us agre delay million fighter plan...
38158,BREAKING: CAR PLOWS INTO #Charlottesville Prot...,A car plowed into a group of counter-protester...,left-news,BREAKING: CAR PLOWS INTO #Charlottesville Prot...,break car plow charlottesvill protesterssever ...
860,Republican tax plan would deal financial hit t...,WASHINGTON (Reuters) - The Republican tax plan...,politicsNews,Republican tax plan would deal financial hit t...,republican tax plan would deal financi hit us ...


In [93]:
X_train.isnull().values.any()

False

In [8]:
X_train['clean_combined']

0        us budget fight loom republican flip fiscal sc...
1        us militari accept transgend recruit monday pe...
2        senior us republican senat let mr mueller job ...
3        fbi russia probe help australian diplomat tipo...
4        trump want postal servic charg much amazon shi...
                               ...                        
39937    joke soroslink group plan destroy trumpwil reg...
39938    smartest woman polit trump knock hillari first...
39939    break shock video charlott riot situat control...
39940    break charlott news station report cop dash ca...
39941    big mistak hillari prove america shes commit k...
Name: clean_combined, Length: 39942, dtype: object

In [None]:
X_train.drop(columns=['title','text', 'combined'], inplace=True)


In [95]:
X_train

Unnamed: 0,subject,clean_combined
38232,left-news,mn mayor candid want disarm cop muslim cop kil...
17455,worldnews,china confirm amend parti constitut like inclu...
15433,worldnews,saudi mass arrest jolt market mani see overdu ...
30412,politics,watch stonefac anderson cooper get school trum...
22452,News,let fake news trump troll hard swedish newspap...
...,...,...
6265,politicsNews,new york protest camp goldman sach oppos trump...
11284,worldnews,nigeria say us agre delay million fighter plan...
38158,left-news,break car plow charlottesvill protesterssever ...
860,politicsNews,republican tax plan would deal financi hit us ...


In [96]:
X_train.isnull().values.any()

False

### Subject hot one encodig

In [73]:
X_train['subject'].isnull().values.any()

False

In [97]:
subject_hot = pd.get_dummies(X_train['subject'], prefix='subject', dtype=int)


In [98]:
subject_hot.isnull().values.any()

False

In [99]:
print('subject ',subject_hot.shape)
print('X_train ',X_train.shape)

subject  (31953, 6)
X_train  (31953, 2)


In [100]:
X_train = pd.concat([X_train, subject_hot], axis=1)

In [101]:
X_train.isnull().values.any()

False

In [103]:
X_train.drop(columns=['subject'], inplace=True)

In [104]:
X_train

Unnamed: 0,clean_combined,subject_Government News,subject_News,subject_left-news,subject_politics,subject_politicsNews,subject_worldnews
38232,mn mayor candid want disarm cop muslim cop kil...,0,0,1,0,0,0
17455,china confirm amend parti constitut like inclu...,0,0,0,0,0,1
15433,saudi mass arrest jolt market mani see overdu ...,0,0,0,0,0,1
30412,watch stonefac anderson cooper get school trum...,0,0,0,1,0,0
22452,let fake news trump troll hard swedish newspap...,0,1,0,0,0,0
...,...,...,...,...,...,...,...
6265,new york protest camp goldman sach oppos trump...,0,0,0,0,1,0
11284,nigeria say us agre delay million fighter plan...,0,0,0,0,0,1
38158,break car plow charlottesvill protesterssever ...,0,0,1,0,0,0
860,republican tax plan would deal financi hit us ...,0,0,0,0,1,0


X_test data preprocessing

In [66]:
X_test.isnull().values.any()

False

In [105]:
X_test['combined'] = X_test['title'] + " " + X_test['text']
X_test['clean_combined'] = X_test['combined'].apply(preprocess_text) # train and test

In [106]:
X_test

Unnamed: 0,title,text,subject,combined,clean_combined
6524,Oil business seen in strong position as Trump ...,(This January 3 story was corrected to remove...,politicsNews,Oil business seen in strong position as Trump ...,oil busi seen strong posit trump tackl tax ref...
30902,WHOA! COLLEGE SNOWFLAKE FREAKS OUT: Screams Fo...,So much for healthy debate on college campus I...,politics,WHOA! COLLEGE SNOWFLAKE FREAKS OUT: Screams Fo...,whoa colleg snowflak freak scream two minut tr...
36459,CRONY CORRUPT POLITICS: Obama Admin BLOCKED FB...,The information is spilling out little by litt...,Government News,CRONY CORRUPT POLITICS: Obama Admin BLOCKED FB...,croni corrupt polit obama admin block fbi clin...
9801,Cruz campaign vetting Fiorina as a possible VP...,WASHINGTON (Reuters) - U.S. Republican preside...,politicsNews,Cruz campaign vetting Fiorina as a possible VP...,cruz campaign vet fiorina possibl vp pick abc ...
25638,Minnesota Woman Writes Amazing F*ck Off Lette...,"Attention, conservative men. This one is for y...",News,Minnesota Woman Writes Amazing F*ck Off Lette...,minnesota woman write amaz fck letter men want...
...,...,...,...,...,...
32688,GROSS! MADONNA OFFERS UP Sexual Services If Yo...,[Warning: Story contains graphic language]Pop ...,politics,GROSS! MADONNA OFFERS UP Sexual Services If Yo...,gross madonna offer sexual servic vote hillari...
39811,ANONYMOUS VIDEO Of Bill Clinton Raping 13 Yr O...,Anonymous is a loosely associated internationa...,left-news,ANONYMOUS VIDEO Of Bill Clinton Raping 13 Yr O...,anonym video bill clinton rape yr old could en...
3798,Treasury unit to share records with Senate for...,WASHINGTON (Reuters) - A unit of the U.S. Trea...,politicsNews,Treasury unit to share records with Senate for...,treasuri unit share record senat trumprussia p...
22854,Scientists Planning A March On Washington To ...,Now that the Women s March successfully outnum...,News,Scientists Planning A March On Washington To ...,scientist plan march washington protest trump ...


In [107]:
X_test.drop(columns=['title','text', 'combined'], inplace=True)


In [108]:
X_test

Unnamed: 0,subject,clean_combined
6524,politicsNews,oil busi seen strong posit trump tackl tax ref...
30902,politics,whoa colleg snowflak freak scream two minut tr...
36459,Government News,croni corrupt polit obama admin block fbi clin...
9801,politicsNews,cruz campaign vet fiorina possibl vp pick abc ...
25638,News,minnesota woman write amaz fck letter men want...
...,...,...
32688,politics,gross madonna offer sexual servic vote hillari...
39811,left-news,anonym video bill clinton rape yr old could en...
3798,politicsNews,treasuri unit share record senat trumprussia p...
22854,News,scientist plan march washington protest trump ...


In [109]:
subject_hot_test = pd.get_dummies(X_test['subject'], prefix='subject', dtype=int)


In [110]:
X_test = pd.concat([X_test, subject_hot_test], axis=1)

In [111]:
X_test.isnull().values.any()

False

In [112]:
X_test.drop(columns=['subject'], inplace=True)

In [113]:
X_test

Unnamed: 0,clean_combined,subject_Government News,subject_News,subject_left-news,subject_politics,subject_politicsNews,subject_worldnews
6524,oil busi seen strong posit trump tackl tax ref...,0,0,0,0,1,0
30902,whoa colleg snowflak freak scream two minut tr...,0,0,0,1,0,0
36459,croni corrupt polit obama admin block fbi clin...,1,0,0,0,0,0
9801,cruz campaign vet fiorina possibl vp pick abc ...,0,0,0,0,1,0
25638,minnesota woman write amaz fck letter men want...,0,1,0,0,0,0
...,...,...,...,...,...,...,...
32688,gross madonna offer sexual servic vote hillari...,0,0,0,1,0,0
39811,anonym video bill clinton rape yr old could en...,0,0,1,0,0,0
3798,treasuri unit share record senat trumprussia p...,0,0,0,0,1,0
22854,scientist plan march washington protest trump ...,0,1,0,0,0,0


## Phase 3: Feature Engineering

In [None]:
# tfid = TfidfVectorizer(max_features=5000, ngram_range=(1, 2))
# X_tfids = tfid.fit_trasform(X_train['clean_combined']).toarray()
# traindataset = pd.DataFrame(X_tfids, columns=tfid.get_feature_names_out())

In [143]:
tfidfvector=TfidfVectorizer(max_features=5000, ngram_range=(1,2))
X_train_text =tfidfvector.fit_transform(X_train['clean_combined'])
X_test_text = tfidfvector.transform(X_test['clean_combined'])

In [144]:
X_train_other = X_train.drop(columns=['clean_combined'])
X_test_other = X_test.drop(columns=['clean_combined'])


In [145]:
from scipy.sparse import hstack

X_train_final = hstack([X_train_text, X_train_other.values])
X_test_final = hstack([X_test_text, X_test_other.values])

In [146]:
# implement RandomForest Classifier
model = RandomForestClassifier(n_estimators=100, max_depth=20, n_jobs=-1, random_state=42)
model.fit(X_train_final, y_train)

In [147]:
predictions = model.predict(X_test_final)

print("Confusion Matrix:\n", confusion_matrix(y_test, predictions))
print("\nAccuracy:", accuracy_score(y_test, predictions))
print("\nClassification Report:\n", classification_report(y_test, predictions))


Confusion Matrix:
 [[3992    4]
 [   2 3991]]

Accuracy: 0.9992489673300788

Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00      3996
           1       1.00      1.00      1.00      3993

    accuracy                           1.00      7989
   macro avg       1.00      1.00      1.00      7989
weighted avg       1.00      1.00      1.00      7989



In [136]:
randomclassifier.fit(traindataset,y_train)

[Parallel(n_jobs=1)]: Done  49 tasks      | elapsed:    9.9s
[Parallel(n_jobs=1)]: Done 199 tasks      | elapsed:   41.3s
[Parallel(n_jobs=1)]: Done 449 tasks      | elapsed:  1.6min


In [137]:
## Predict for the Test Dataset
test_transform= []
for row in range(0,len(X_test.index)):
    test_transform.append(' '.join(str(x) for x in X_test.iloc[row,2:27]))
test_dataset = tfidfvector.transform(test_transform)
predictions = randomclassifier.predict(test_dataset)

[Parallel(n_jobs=1)]: Done  49 tasks      | elapsed:    0.5s
[Parallel(n_jobs=1)]: Done 199 tasks      | elapsed:    2.2s
[Parallel(n_jobs=1)]: Done 449 tasks      | elapsed:    5.0s


In [138]:
matrix=confusion_matrix(y_test,predictions)
print(matrix)
score=accuracy_score(y_test,predictions)
print('accuracy ', score)
report=classification_report(y_test,predictions)
print(report)

[[3996    0]
 [3993    0]]
accuracy  0.5001877581674803
              precision    recall  f1-score   support

           0       0.50      1.00      0.67      3996
           1       0.00      0.00      0.00      3993

    accuracy                           0.50      7989
   macro avg       0.25      0.50      0.33      7989
weighted avg       0.25      0.50      0.33      7989



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


## Phase 4: Model Training (Random Forest)

In [None]:
# implement RandomForest Classifier
randomclassifier=RandomForestClassifier(n_estimators=200,criterion='entropy')
randomclassifier.fit(traindataset,train['Label'])