# Disaster Tweets - NLP Beginner

Predict which Tweets are about real disasters and which ones are not

![](https://storage.googleapis.com/kaggle-media/competitions/nlp1-cover.jpg)

<a id="top"></a>

<div class="list-group" id="list-tab" role="tablist">
<h3 class="list-group-item list-group-item-action active" data-toggle="list" style='color:white; background:#FA497A; border:0' role="tab" aria-controls="home"><center>Quick Navigation</center></h3>

* [Data Loading](#1)
* [TF-IDF Preprocessing](#2)
* [SVM Training](#3)
* [Submission](#100)

In [None]:
import numpy as np 
import pandas as pd 
import os

from sklearn import model_selection as sk_model_selection
from sklearn.feature_extraction import text as sk_fe_text
from sklearn import svm as sk_svm
from sklearn import metrics as sk_metrics

<a id="1"></a>
<h2 style='background:#FA497A; border:0; color:white'><center>Data Loading<center><h2>

In [None]:
base_dir = '../input/nlp-getting-started/'
df_train = pd.read_csv(os.path.join(base_dir, 'train.csv'))
df_test = pd.read_csv(os.path.join(base_dir, 'test.csv'))
df_submission = pd.read_csv(os.path.join(base_dir, 'sample_submission.csv'))

In [None]:
print(f'df_train shape: {df_train.shape}')
df_train.head()

In [None]:
df_train.isna().sum()

<a id="2"></a>
<h2 style='background:#FA497A; border:0; color:white'><center>TF-IDF preprocessing<center><h2>


In [None]:
X_train = df_train["text"]
y_train = df_train["target"].values

In [None]:
tfidf = sk_fe_text.TfidfVectorizer(stop_words = 'english')
tfidf.fit(X_train)
X_train = tfidf.transform(X_train)

<a id="3"></a>
<h2 style='background:#FA497A; border:0; color:white'><center>SVM Training<center><h2>


Using GridSearchCV to find the best parameters for SVM

In [None]:
parameters = { 
    'C': [0.01, 0.1, 1],
    'gamma': [0.7, 1, 'auto', 'scale']
}

model = sk_svm.SVC(
    kernel='rbf', 
    class_weight='balanced',
    random_state=42,
)

model = sk_model_selection.GridSearchCV(
    model, 
    parameters, 
    cv=5,
    scoring='f1',
    n_jobs=-1,
)

model.fit(X_train, y_train)

print(f'Best parameters: {model.best_params_}')
print(f'Mean cross-validated F1 score of the best_estimator: {model.best_score_:.3f}')

<a id="100"></a>
<h2 style='background:#FA497A; border:0; color:white'><center>Submission<center><h2>

In [None]:
X_test = df_test["text"]
X_test = tfidf.transform(X_test)
y_test_pred = model.predict(X_test)

In [None]:
df_submission["target"] = y_test_pred
df_submission.to_csv("submission.csv",index=False)

In [None]:
df_submission