### Support Vector Machines
Using a support vector machine to classify data based on the speech act

In [64]:
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
from sklearn.metrics import accuracy_score, classification_report

from preprocessing import get_sentences_labels

Generating the sentences and labels from the Excel sheet

In [65]:
# file_path = "data/interrater_data.xlsx"
file_path = "data/combined_data_set.xlsx"
# file_path = "data/"

sentences, labels = get_sentences_labels(file_path)

I have sentences:  84
Correct Labels:  ['Request for Situation', 'Statement of Situation', 'Statement of Situation', 'Request for Situation', 'Statement of Situation', 'Statement of Situation', 'Not Classified', 'Statement of Situation', 'Statement of Situation', 'Statement of Situation', 'Statement of Action', 'Statement of Intent', 'Request for Situation', 'Statement of Situation', 'Statement of Situation', 'Not Classified', 'Statement of Situation', 'Statement of Situation', 'Not Classified', 'Not Classified', 'Statement of Intent', 'Statement of Prediction', 'Statement of Situation', 'Not Classified', 'Statement of Intent', 'Statement of Prediction', 'Not Classified', 'Statement of Intent', 'Statement of Intent', 'Statement of Prediction', 'Statement of Prediction', 'Not Classified', 'Not Classified', 'Statement of Intent', 'Statement of Intent', 'Statement of Prediction', 'Statement of Intent', 'Request for Action', 'Statement of Intent', 'Statement of Prediction', 'Statement of P

## Preprocessing
Vectorising based on the Tf-idf values in the data set

In [66]:
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(sentences)

Create the train-test split

In [67]:
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.20)

Selecting the Linear Support Vector Classification model

In [68]:
classifier = LinearSVC()

Training the model

In [69]:
classifier.fit(X_train, y_train)



Evaluating the model, running test set

In [70]:
y_pred = classifier.predict(X_test)

Evaluating Accuracy and Classification Report

In [71]:
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")

print("Classification Report:\n", classification_report(y_test, y_pred, zero_division=0))

Accuracy: 0.47
Classification Report:
                          precision    recall  f1-score   support

         Not Classified       1.00      0.50      0.67         2
     Request for Action       0.00      0.00      0.00         0
    Statement of Action       0.00      0.00      0.00         0
    Statement of Intent       0.25      0.17      0.20         6
Statement of Prediction       0.00      0.00      0.00         3
 Statement of Situation       0.60      1.00      0.75         6

               accuracy                           0.47        17
              macro avg       0.31      0.28      0.27        17
           weighted avg       0.42      0.47      0.41        17


Predict the speech acts of new sentences now that the model has been trained
0 = statement, 1 = request, 2 = request

In [72]:
new_sentences = ["i think you're on it rather next to it. you need to be next to it.",
                 "yeah, i'm putting up northwesterly barrier. you see where firetruck_one, firetruck_two and "
                 "firetruck_three are?",
                 "yeah. confirm all fires extinguished?"]
new_X = vectorizer.transform(new_sentences)
new_predictions = classifier.predict(new_X)

Print predictions for new sentences

In [73]:
for sentence, prediction in zip(new_sentences, new_predictions):
    print(f"Sentence: '{sentence}'\t Predicted Speech Act: {prediction}")

Sentence: 'i think you're on it rather next to it. you need to be next to it.'	 Predicted Speech Act: Statement of Situation
Sentence: 'yeah, i'm putting up northwesterly barrier. you see where firetruck_one, firetruck_two and firetruck_three are?'	 Predicted Speech Act: Statement of Action
Sentence: 'yeah. confirm all fires extinguished?'	 Predicted Speech Act: Request for Situation
