### Import Libraries

In [1]:
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import fasttext

import warnings 
warnings.filterwarnings("ignore")

### FastText Supervised Model
- Using **Automatic Hyperparameter Optimization**

In [2]:
Model_Automatic_Hyperparameter_optimization = fasttext.train_supervised(input = "train_sentiment.txt", 
                                  autotuneValidationFile = "valid_sentiment.txt", 
                                  autotuneDuration = 1200,
                                  autotuneMetric="precisionAtRecall:30")

#### Test with Valid_Sentiment & Test_Sentiment 
- Pass **.txt** files 

In [3]:
Model_Automatic_Hyperparameter_optimization.test("valid_sentiment.txt")

(1311, 0.7048054919908466, 0.7048054919908466)

- **1311**: size of your validation set.
- **0.7048054919908466**:  precision at recall of 30% metric. This is not standard accuracy. It means that, considering only the instances where the model's prediction includes at least 30% recall (i.e., at least 30% of the actual positive cases are correctly identified), the precision (the proportion of true positives among all predicted positives) is approximately 70.5%. This is different than overall accuracy.

In [4]:
Model_Automatic_Hyperparameter_optimization.test("test_sentiment.txt")

(3278, 0.697986577181208, 0.697986577181208)

#### Model Prects Based on Movie Info & Critics Consensus
- Use the trained FastText Automatic Hyperparameter Optimization model to predict the sentiment of the given text. The result shows the predicted label **(e.g., "__label__positive")** and the **probability score**.

In [5]:
Model_Automatic_Hyperparameter_optimization.predict("end line major documentary imminent peril face world ocean narrate ted danson base book charles clover film explore devastating effect fishing have fish stock health ocean scientist predict continue fish current rate planet completely run fish doomsday warning end line offer real practical solution simple doable include advocate control fishing engender specie protect network marine reserve limit fishing educate consumer choice purchase fish sustainable fishery thought provoke insightful documentary danger commercial fishing")

(('__label__positive',), array([0.99992275]))

In [6]:
Model_Automatic_Hyperparameter_optimization.predict("bill heterosexual movie gregg araki doom generation director self style bad taste teen film amy blue rose mcgowan obnoxious teenage speed freak boyfriend jordan white james duval passive slow witted poseur will sex terrify aids claim virgin day run xavier red johnathon schaech charming enigmatic drifter bad habit kill people join young couple seemingly endless road trip xavier verbally challenge jordan insist call prove threatening repulsive strangely alluring companion presence raise issue loyalty sexual identity doom generation dot variety eccentric cameo appearance include comic margaret cho actress parker posey musician perry farrell hollywood madame heidi fleiss onetime brady bunch star christopher knight middle installment araki teen apocalypse trilogy include totally mark deming rovi unknown")

(('__label__negative',), array([0.86603749]))

In [16]:
Model_Automatic_Hyperparameter_optimization.predict("A young outcast from a primitive tribe is forced to defend his people from a brutal onslaught in Independence Day director Roland Emmerich's fast-paced period adventure. Despite the fact that he is low man on the totem pole in his tribe of fearless hunters, a brave young boy (Steven Strait) longs to win the heart of a beautiful princess (Camilla Belle) who is well above his station in life. When an overwhelming horde of powerful invaders forces the hunters into slavery and abducts the princess, the once-aimless boy suddenly finds his destiny taking an unexpected turn. Now, if he has any hope of saving his tribe from certain extinction, this young boy will have to fight for the future to his dying breath. ~ Jason Buchanan, Rovi")

(('__label__negative',), array([0.60171354]))

In [17]:
Model_Automatic_Hyperparameter_optimization.predict("great buster celebrate life career america influential celebrate filmmaker comedian buster keaton singular style fertile output silent era create legacy true cinematic visionary fill stunningly restore archival keaton film cohen film classics library great buster direct peter bogdanovich filmmaker cinema historian landmark writing film renowned director john ford orson welles standard study measure keaton beginning vaudeville circuit chronicle great buster development trademark physical comedy deadpan expression earn lifelong moniker great stone face lead career high year director writer producer star short film feature intersperse interview nearly dozen collaborator filmmaker performer friend include mel brooks quentin tarantino werner herzog dick van dyke johnny knoxville discuss keaton influence modern comedy cinema loss artistic independence career decline mark later year cover bogdanovich cast close eye keaton extraordinary output yield remarkable feature film include general steamboat bill immortalize great actor filmmaker history cinema great buster celebration breathlessly entertaining filmography inspire long overdue primer close essential")

(('__label__positive',), array([0.99178857]))

### FastText Unsupervised Model

In [9]:
model_unsupervised = fasttext.train_unsupervised("train_sentiment.txt", "cbow", epoch = 18, lr = 0.02, dim = 300)

#### Finding the Similar Words using the Unsupervised Model.
- finds the words in the model's vocabulary that are most semantically similar to the given input **word**. It leverages the word vectors learned during the training process.

In [10]:
model_unsupervised.get_nearest_neighbors('alien')

[(0.8837223649024963, 'revenge'),
 (0.8814675807952881, 'survivor'),
 (0.8803136348724365, 'slavery'),
 (0.8730306029319763, 'terrain'),
 (0.8727587461471558, 'scavenge'),
 (0.8726358413696289, 'treat'),
 (0.8716277480125427, 'evil'),
 (0.8713346719741821, 'avenge'),
 (0.8634105920791626, 'savagery'),
 (0.861932635307312, 'ghost')]

In [11]:
model_unsupervised.get_nearest_neighbors('heart')

[(0.964957594871521, 'hearts'),
 (0.9475719928741455, 'heartache'),
 (0.9292012453079224, 'hearty'),
 (0.9243552684783936, 'heartwarme'),
 (0.9219622611999512, 'lift'),
 (0.9112288355827332, 'heartbroken'),
 (0.9054802656173706, 'heartfelt'),
 (0.8925439119338989, 'heartbreak'),
 (0.8909814953804016, 'dreams'),
 (0.8782211542129517, 'dreamy')]

In [12]:
model_unsupervised.get_nearest_neighbors('comedy')

[(0.9100439548492432, 'dramedy'),
 (0.9018700122833252, 'comedic'),
 (0.890731155872345, 'seriocomedy'),
 (0.8893684148788452, 'quirky'),
 (0.8655805587768555, 'comedienne'),
 (0.8475091457366943, 'sharp'),
 (0.8398624658584595, 'chemistry'),
 (0.8391808271408081, 'flashy'),
 (0.8381854891777039, 'bittersweet'),
 (0.836584210395813, 'ladd')]

In [13]:
model_unsupervised.get_nearest_neighbors('director')

[(0.9864038825035095, 'directors'),
 (0.9774737358093262, 'direct'),
 (0.9485283493995667, 'directorial'),
 (0.9075263142585754, 'dire'),
 (0.90326327085495, 'screenwriter'),
 (0.8812114000320435, 'scriptwriter'),
 (0.873555600643158, 'actor'),
 (0.8642226457595825, 'debut'),
 (0.8519041538238525, 'dirk'),
 (0.8476818799972534, 'writer')]

- **(similarity_score, word) :** Each tuple represents a word found to be semantically similar to "alien."
- **similarity_score :** This is a floating-point number representing the cosine similarity between the word vector of "alien" and the word vector of the neighboring word. Higher values (closer to 1) indicate stronger similarity. For example, 0.8837223649024963 is a relatively high similarity score.
- **word :** This is the neighboring word itself.

The **model_unsupervised** suggests that words like **_"revenge," "survivor," "slavery," "terrain,"_** and others are semantically close to **_"alien"_** based on the word vectors learned from the unsupervised training data.