### ENSF 612 Final Project
#### IMDB-Sentiment Analysis of 2022 Movies based on User-Reviews

##### Group Members:
- Samuel Sofela
- Sam Rainbow
- Christopher DiMattia

This document shows the NLP machine learning workflow for evaluating models on movie reviews that were featurized using bag of words.

In [0]:
# Download modules
%pip install nltk

Python interpreter will be restarted.
Python interpreter will be restarted.


### Import modules

In [0]:
import nltk
from pyspark.sql import SparkSession
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
from nltk.corpus import stopwords
stop = stopwords.words('english')
spark = SparkSession.builder.appName("612_Proj").config("spark.task.cpus", "2").getOrCreate()
from nltk.tokenize import word_tokenize
from pyspark.sql.functions import udf, split, col
from pyspark.sql.types import ArrayType, StringType
from pyspark.ml.feature import CountVectorizer
from nltk.stem import WordNetLemmatizer
from pyspark.sql.functions import concat
from pyspark.sql.functions import concat_ws
from nltk.stem import SnowballStemmer
from pyspark.ml.feature import VectorAssembler
from pyspark.sql.functions import col
from pyspark.sql.types import StructType, StructField, StringType


[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


### Loading the csv file containing the movie review data.

In [0]:
# Loads a CSV file into a dataframe
def csv_to_df(fname):
    
    #Location of dbfs
    dbfs_loc = "dbfs:/FileStore/shared_uploads/samuel.sofela@ucalgary.ca/"
    filename_complete = dbfs_loc + fname
    filetype="csv"
    
    #Options for loading
    inf_sch = "true"
    is_header = "true"
    delim = ","
    multiline = "true"
    escape = "\""
    
    # Load the data into a dataframe using options above
    df = spark.read.format(filetype)\
                .option("header", is_header)\
                .option("inferSchema", inf_sch)\
                .option("multiline", multiline)\
                .option("escape", escape).load(filename_complete)
    
    return df

In [0]:
# Load CSVs
all_reviews = csv_to_df("IMDB_dataset_rated_final.csv")

In [0]:
#concatenate review title and review content with space inbetween
all_reviews = all_reviews.withColumn("Complete_Content", concat_ws(" ", "Review Title", "Review Content"))

### Data Preprocesssing

In [0]:
# Function to clean text and convert all letters to lower case
def clean_text(text):
    text = ''.join(c for c in text if c.isalpha() or c.isspace())
    text = text.lower()
    return text

# Function to remove stop words and tokennize the review
def remove_stopwords(text):
    stop = set(nltk.corpus.stopwords.words('english'))
    tokens = word_tokenize(text)
    filtered_tokens = [token for token in tokens if token.lower() not in stop]
    
    return filtered_tokens

# Function to lemmatize tokens with POS
def lemmatize_text_pos(filtered_tokens):
    lemmatizer = WordNetLemmatizer()
    pos_tagged = nltk.pos_tag(filtered_tokens)
    lemmatized_tokens = [lemmatizer.lemmatize(token, get_wordnet_pos(pos)) if get_wordnet_pos(pos) else token for token, pos in pos_tagged]
    return lemmatized_tokens

# Funtion to apply Snowball stemming
def stemmer_snowball(filtered_tokens):
    snowball = SnowballStemmer(language = 'english')
    print(filtered_tokens)
    stemmed_words = [snowball.stem(word) for word in filtered_tokens]
    return stemmed_words

# Helper function to get the WordNet POS tag from the NLTK POS tag
def get_wordnet_pos(nltk_pos_tag):
    if nltk_pos_tag.startswith('J'):
        return 'a'
    elif nltk_pos_tag.startswith('V'):
        return 'v'
    elif nltk_pos_tag.startswith('N'):
        return 'n'
    elif nltk_pos_tag.startswith('R'):
        return 'r'
    else:
        return None

# Function that applies the cleaning, stop word removal, and lemmatization functions
def preprocess_text(text):
    cleaned = clean_text(text)
    filtered = remove_stopwords(cleaned)
    lemmatized = lemmatize_text_pos(filtered)
    print("this is sam")
    return lemmatized

#Function applies cleaning, stop word removal and snowball stemming functions
def preprocess_text_stemming(text):
    cleaned = clean_text(text)
    filtered = remove_stopwords(cleaned)
    stemmed = stemmer_snowball(filtered)
    return stemmed

# Creating UDF for Preprocessing text using lemmatization
preprocess_text_udf = udf(preprocess_text, ArrayType(StringType()))

# Creating UDF Preprocessing text using snowball stemming
preprocess_text_udf_stemmed = udf(preprocess_text_stemming, ArrayType(StringType()))

# Apply the UDF to the "Review Content" column and create a new columns for lemmatization and stemming
df = all_reviews.withColumn("lemmatized_text", preprocess_text_udf("Complete_Content"))
df = df.withColumn("stemmed_text", preprocess_text_udf_stemmed("Complete_Content"))


In [0]:
# create a new column "Month" by splitting the "Review Date" column on dash and accessing the second element
df = df.withColumn("Month", split(col("Review Date"), "-")[1])

In [0]:
# Applying Lemmatization and Stemming to genre column
df = df.withColumn("genre_lemmatized", preprocess_text_udf("Movie Genre"))
df = df.withColumn("genre_stemmed", preprocess_text_udf_stemmed("Movie Genre"))

### Feature Engineering

Count vectorizer was used for engineer features of the text into a Bag of Words. The Bag of Words was used to train and evaluate the machine learning models.

#### Generating vector of reviews

In [0]:
# Using Count Vectorizer for feature engineering on the lemmatized review text
cv = CountVectorizer(inputCol="lemmatized_text", outputCol="lemmatized_text_vector")
cv_model = cv.fit(df)
count_vectorized_df = cv_model.transform(df)

In [0]:
# Using Count Vectorizer for feature engineering on the stemmed review text
from pyspark.ml.feature import IDF
cv = CountVectorizer(inputCol="stemmed_text", outputCol="stemmed_text_vector")
cv_model = cv.fit(count_vectorized_df)
count_vectorized_df = cv_model.transform(count_vectorized_df)

####Generating Vectors of Genres

In [0]:
# Using Count Vectorizer for feature engineering on the lemmatized genre
cv = CountVectorizer(inputCol="genre_lemmatized", outputCol="genre_lemmatized_vector")
cv_model = cv.fit(count_vectorized_df)
count_vectorized_df = cv_model.transform(count_vectorized_df)

In [0]:
# Using Count Vectorizer for feature engineering on the stemmed genre
cv = CountVectorizer(inputCol="genre_stemmed", outputCol="genre_stemmed_vector")
cv_model = cv.fit(count_vectorized_df)
count_vectorized_df = cv_model.transform(count_vectorized_df)

####Generating vectors for month

In [0]:
count_vectorized_df = count_vectorized_df.withColumn("Month_Vector", split(count_vectorized_df["Month"], ","))

In [0]:
# Using Count Vectorizer for feature engineering on the month text
cv = CountVectorizer(inputCol="Month_Vector", outputCol="Month_vectorized")
cv_model = cv.fit(count_vectorized_df)
count_vectorized_df = cv_model.transform(count_vectorized_df)

#### Combining vectors of review, month and genre

The generated vectors for the review, month and genre are combined into one vector using the VectorAssembler

In [0]:

#from pyspark.ml.feature import VectorAssembler

# Create  for 2 VectorAssembler objects each for lemmaztized or stemmed texts in combination with month and genre.
assembler = VectorAssembler(inputCols=["lemmatized_text_vector", "Month_vectorized", "genre_lemmatized_vector"], outputCol="combined_vectors")
assembler1 = VectorAssembler(inputCols=["stemmed_text_vector", "Month_vectorized", "genre_lemmatized_vector"], outputCol="combined_vectors_stem")

# Apply the created VectorAssembler objects to the DataFrame
assembled_df = assembler.transform(count_vectorized_df)
assembled_df = assembler1.transform(assembled_df)


In [0]:
#assembled_df.display()

_c0,Unnamed: 0,Movie Title,Movie Rating,Movie Genre,Review Date,Review Title,Review Content,ReviewerRating,Manual_Combined,Rating_scaled,Complete_Content,lemmatized_text,stemmed_text,Month,genre_lemmatized,genre_stemmed,lemmatized_text_vector,stemmed_text_vector,genre_lemmatized_vector,genre_stemmed_vector,Month_Vector,Month_vectorized,combined_vectors,combined_vectors_stem
0,0,Lightyear,6.1,Animation Action Adventure,17-Jul-22,Really don't get the hate,It wasn't the best Pixar film but it definitely wasn't the worst. Why it gets such a low score on here I'll never know. I'll admit the trailer did make it look better than it actually was but it was still decent and funny.,7,0,1,Really don't get the hate It wasn't the best Pixar film but it definitely wasn't the worst. Why it gets such a low score on here I'll never know. I'll admit the trailer did make it look better than it actually was but it was still decent and funny.,"List(really, dont, get, hate, wasnt, best, pixar, film, definitely, wasnt, bad, get, low, score, ill, never, know, ill, admit, trailer, make, look, good, actually, still, decent, funny)","List(realli, dont, get, hate, wasnt, best, pixar, film, definit, wasnt, worst, get, low, score, ill, never, know, ill, admit, trailer, make, look, better, actual, still, decent, funni)",Jul,"List(animation, action, adventure)","List(anim, action, adventur)","Map(vectorType -> sparse, length -> 11604, indices -> List(1, 4, 6, 11, 14, 21, 27, 33, 40, 42, 59, 75, 106, 117, 140, 151, 154, 203, 236, 360, 396, 518, 693, 1054), values -> List(1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 9578, indices -> List(1, 9, 10, 11, 25, 38, 39, 43, 45, 63, 81, 83, 137, 139, 161, 175, 233, 273, 335, 397, 403, 523, 552, 884), values -> List(1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 17, indices -> List(1, 4, 7), values -> List(1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 17, indices -> List(1, 4, 7), values -> List(1.0, 1.0, 1.0))",List(Jul),"Map(vectorType -> sparse, length -> 12, indices -> List(4), values -> List(1.0))","Map(vectorType -> sparse, length -> 11633, indices -> List(1, 4, 6, 11, 14, 21, 27, 33, 40, 42, 59, 75, 106, 117, 140, 151, 154, 203, 236, 360, 396, 518, 693, 1054, 11608, 11617, 11620, 11623), values -> List(1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 9607, indices -> List(1, 9, 10, 11, 25, 38, 39, 43, 45, 63, 81, 83, 137, 139, 161, 175, 233, 273, 335, 397, 403, 523, 552, 884, 9582, 9591, 9594, 9597), values -> List(1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))"
1,1,Lightyear,6.1,Animation Action Adventure,19-Nov-22,"It's an okay product, but wrong I.P.","""In 1995, Andy got a toy. That toy was based on a movie. This is that movie. "" I think I would've gotten more out of this if they'd left out that opening text. ""Lightyear"" is a decent space adventure movie, the visuals are pretty great and Chris Evans gives a strong performance. But the idea that the Tim Allen Buzz we all know and love came from this movie . . . Eh, I just couldn't get there. There's a bigger ""Top Gun"" vibe going on here than anything ""Toy Story"" related and ""Maverick"" did a much better job of living with failure. I admit I walked into this with low expectations (really, was anyone asking for this movie?) but the Toy Story association is totally forced.",6,0,0,"It's an okay product, but wrong I.P. ""In 1995, Andy got a toy. That toy was based on a movie. This is that movie. "" I think I would've gotten more out of this if they'd left out that opening text. ""Lightyear"" is a decent space adventure movie, the visuals are pretty great and Chris Evans gives a strong performance. But the idea that the Tim Allen Buzz we all know and love came from this movie . . . Eh, I just couldn't get there. There's a bigger ""Top Gun"" vibe going on here than anything ""Toy Story"" related and ""Maverick"" did a much better job of living with failure. I admit I walked into this with low expectations (really, was anyone asking for this movie?) but the Toy Story association is totally forced.","List(okay, product, wrong, ip, andy, get, toy, toy, base, movie, movie, think, wouldve, gotten, theyd, leave, open, text, lightyear, decent, space, adventure, movie, visuals, pretty, great, chris, evans, give, strong, performance, idea, tim, allen, buzz, know, love, come, movie, eh, couldnt, get, theres, big, top, gun, vibe, go, anything, toy, story, relate, maverick, much, good, job, living, failure, admit, walk, low, expectation, really, anyone, ask, movie, toy, story, association, totally, force)","List(okay, product, wrong, ip, andi, got, toy, toy, base, movi, movi, think, wouldv, gotten, theyd, left, open, text, lightyear, decent, space, adventur, movi, visual, pretti, great, chris, evan, give, strong, perform, idea, tim, allen, buzz, know, love, came, movi, eh, couldnt, get, there, bigger, top, gun, vibe, go, anyth, toy, stori, relat, maverick, much, better, job, live, failur, admit, walk, low, expect, realli, anyon, ask, movi, toy, stori, associ, total, forc)",Nov,"List(animation, action, adventure)","List(anim, action, adventur)","Map(vectorType -> sparse, length -> 11604, indices -> List(0, 4, 5, 11, 12, 14, 15, 16, 20, 26, 29, 33, 37, 52, 85, 95, 118, 128, 150, 151, 167, 236, 241, 255, 262, 279, 296, 318, 332, 346, 347, 349, 363, 398, 403, 449, 481, 488, 570, 573, 702, 835, 877, 970, 1004, 1027, 1054, 1076, 1184, 1326, 1436, 1556, 1727, 2381, 2628, 2797, 3371, 4217, 4673, 4902, 5537, 8659), values -> List(5.0, 1.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 9578, indices -> List(0, 4, 10, 11, 12, 15, 20, 22, 31, 39, 43, 46, 50, 56, 103, 113, 130, 132, 155, 161, 169, 191, 198, 211, 239, 271, 273, 279, 284, 308, 343, 362, 374, 375, 384, 392, 394, 395, 414, 492, 543, 544, 574, 616, 698, 884, 901, 1048, 1082, 1111, 1259, 1412, 1559, 1566, 1641, 2254, 2314, 2663, 3228, 3863, 4441, 4551, 4584), values -> List(5.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 17, indices -> List(1, 4, 7), values -> List(1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 17, indices -> List(1, 4, 7), values -> List(1.0, 1.0, 1.0))",List(Nov),"Map(vectorType -> sparse, length -> 12, indices -> List(7), values -> List(1.0))","Map(vectorType -> sparse, length -> 11633, indices -> List(0, 4, 5, 11, 12, 14, 15, 16, 20, 26, 29, 33, 37, 52, 85, 95, 118, 128, 150, 151, 167, 236, 241, 255, 262, 279, 296, 318, 332, 346, 347, 349, 363, 398, 403, 449, 481, 488, 570, 573, 702, 835, 877, 970, 1004, 1027, 1054, 1076, 1184, 1326, 1436, 1556, 1727, 2381, 2628, 2797, 3371, 4217, 4673, 4902, 5537, 8659, 11611, 11617, 11620, 11623), values -> List(5.0, 1.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 9607, indices -> List(0, 4, 10, 11, 12, 15, 20, 22, 31, 39, 43, 46, 50, 56, 103, 113, 130, 132, 155, 161, 169, 191, 198, 211, 239, 271, 273, 279, 284, 308, 343, 362, 374, 375, 384, 392, 394, 395, 414, 492, 543, 544, 574, 616, 698, 884, 901, 1048, 1082, 1111, 1259, 1412, 1559, 1566, 1641, 2254, 2314, 2663, 3228, 3863, 4441, 4551, 4584, 9585, 9591, 9594, 9597), values -> List(5.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))"
2,2,Lightyear,6.1,Animation Action Adventure,16-Jun-22,No signs of intelligent life anywhere....,"6.4/10 The nostalgia was there for ten minutes and then it went away as the story trails away into something that just didn't sit right. While we never got firm info about Buzz's origins in the ""Toy Story"" films. This isn't what I expected for his story. I expected this to be an emotional journey, but there were maybe three moments where a tear was shed. The rest was just ""empty"" space. The voice acting was good. Chris Evans sounded like Buzz and brought the character to life. The surrounding characters were just as enjoyable. Definitely some funny moments amongst them. I did get a few laughs out, but it felt forced sometimes to be able to enjoy the film. The film definitely looked, and sounded, great, but it didn't do enough to grasp a love for the character, or the story. Honestly, I don't think this film was for adults to reach back to the childhood nostalgia, but rather its for a new generation to make the character it's own meaning. Flip side to Buzz was Zurg. Zurg was perhaps the biggest disappointment with how they portrayed him. Who he was in ""Toy Story"" is not the same as who is here, and again, it didn't sit right. I think the biggest thing is to forget the trailer. Go in with an open-mind and make of it what you will. The kids in the theatre seemed to have loved. Overall, not the outstanding film I was expecting even with all the great visual and sound. The story lacked expectations for a nostalgic character, but wasn't necessarily terrible, so again, go in with an open-mind. Thank you for reading my review. Until next time.... Enjoy the show!",6,-1,0,"No signs of intelligent life anywhere.... 6.4/10 The nostalgia was there for ten minutes and then it went away as the story trails away into something that just didn't sit right. While we never got firm info about Buzz's origins in the ""Toy Story"" films. This isn't what I expected for his story. I expected this to be an emotional journey, but there were maybe three moments where a tear was shed. The rest was just ""empty"" space. The voice acting was good. Chris Evans sounded like Buzz and brought the character to life. The surrounding characters were just as enjoyable. Definitely some funny moments amongst them. I did get a few laughs out, but it felt forced sometimes to be able to enjoy the film. The film definitely looked, and sounded, great, but it didn't do enough to grasp a love for the character, or the story. Honestly, I don't think this film was for adults to reach back to the childhood nostalgia, but rather its for a new generation to make the character it's own meaning. Flip side to Buzz was Zurg. Zurg was perhaps the biggest disappointment with how they portrayed him. Who he was in ""Toy Story"" is not the same as who is here, and again, it didn't sit right. I think the biggest thing is to forget the trailer. Go in with an open-mind and make of it what you will. The kids in the theatre seemed to have loved. Overall, not the outstanding film I was expecting even with all the great visual and sound. The story lacked expectations for a nostalgic character, but wasn't necessarily terrible, so again, go in with an open-mind. Thank you for reading my review. Until next time.... Enjoy the show!","List(sign, intelligent, life, anywhere, nostalgia, ten, minute, go, away, story, trail, away, something, didnt, sit, right, never, get, firm, info, buzz, origins, toy, story, film, isnt, expect, story, expect, emotional, journey, maybe, three, moment, tear, shed, rest, empty, space, voice, act, good, chris, evans, sound, like, buzz, bring, character, life, surround, character, enjoyable, definitely, funny, moment, amongst, get, laugh, felt, force, sometimes, able, enjoy, film, film, definitely, look, sounded, great, didnt, enough, grasp, love, character, story, honestly, dont, think, film, adult, reach, back, childhood, nostalgia, rather, new, generation, make, character, meaning, flip, side, buzz, zurg, zurg, perhaps, big, disappointment, portray, toy, story, didnt, sit, right, think, big, thing, forget, trailer, go, openmind, make, kid, theatre, seem, loved, overall, outstanding, film, expect, even, great, visual, sound, story, lack, expectation, nostalgic, character, wasnt, necessarily, terrible, go, openmind, thank, read, review, next, time, enjoy, show)","List(sign, intellig, life, anywher, nostalgia, ten, minut, went, away, stori, trail, away, someth, didnt, sit, right, never, got, firm, info, buzz, origin, toy, stori, film, isnt, expect, stori, expect, emot, journey, mayb, three, moment, tear, shed, rest, empti, space, voic, act, good, chris, evan, sound, like, buzz, brought, charact, life, surround, charact, enjoy, definit, funni, moment, amongst, get, laugh, felt, forc, sometim, abl, enjoy, film, film, definit, look, sound, great, didnt, enough, grasp, love, charact, stori, honest, dont, think, film, adult, reach, back, childhood, nostalgia, rather, new, generat, make, charact, mean, flip, side, buzz, zurg, zurg, perhap, biggest, disappoint, portray, toy, stori, didnt, sit, right, think, biggest, thing, forget, trailer, go, openmind, make, kid, theatr, seem, love, overal, outstand, film, expect, even, great, visual, sound, stori, lack, expect, nostalg, charact, wasnt, necessarili, terribl, go, openmind, thank, read, review, next, time, enjoy, show)",Jun,"List(animation, action, adventure)","List(anim, action, adventur)","Map(vectorType -> sparse, length -> 11604, indices -> List(1, 2, 4, 5, 6, 9, 10, 11, 12, 16, 17, 20, 24, 27, 29, 32, 40, 49, 60, 64, 65, 67, 69, 70, 73, 74, 75, 78, 85, 87, 91, 93, 101, 109, 111, 116, 117, 140, 153, 154, 168, 179, 189, 190, 191, 208, 218, 220, 229, 231, 235, 243, 249, 265, 317, 319, 346, 347, 360, 363, 393, 429, 445, 449, 453, 481, 492, 493, 535, 544, 570, 587, 603, 698, 741, 797, 847, 910, 927, 934, 1076, 1088, 1138, 1153, 1169, 1191, 1240, 1296, 1391, 1770, 1870, 1875, 2122, 2392, 2646, 2692, 2751, 2915, 3192, 3303, 3536, 4884, 4939, 5022), values -> List(5.0, 1.0, 1.0, 6.0, 2.0, 1.0, 5.0, 2.0, 3.0, 2.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 3.0, 3.0, 2.0, 2.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0))","Map(vectorType -> sparse, length -> 9578, indices -> List(1, 2, 4, 5, 7, 8, 9, 11, 14, 15, 20, 21, 22, 25, 27, 31, 35, 38, 53, 56, 62, 67, 68, 73, 74, 80, 81, 82, 84, 85, 92, 97, 105, 113, 117, 119, 124, 125, 137, 139, 155, 168, 175, 190, 200, 206, 210, 217, 220, 226, 236, 237, 248, 258, 260, 261, 270, 326, 338, 342, 353, 359, 374, 384, 403, 414, 433, 461, 492, 529, 579, 616, 627, 644, 666, 735, 857, 860, 891, 943, 955, 960, 974, 1056, 1081, 1095, 1111, 1213, 1219, 1349, 1427, 1745, 1917, 1994, 2274, 2446, 2546, 2848, 3052, 3441, 3788, 5116), values -> List(5.0, 1.0, 6.0, 1.0, 1.0, 5.0, 2.0, 1.0, 1.0, 2.0, 2.0, 1.0, 2.0, 1.0, 1.0, 2.0, 3.0, 1.0, 1.0, 4.0, 1.0, 1.0, 1.0, 3.0, 1.0, 2.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0))","Map(vectorType -> sparse, length -> 17, indices -> List(1, 4, 7), values -> List(1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 17, indices -> List(1, 4, 7), values -> List(1.0, 1.0, 1.0))",List(Jun),"Map(vectorType -> sparse, length -> 12, indices -> List(8), values -> List(1.0))","Map(vectorType -> sparse, length -> 11633, indices -> List(1, 2, 4, 5, 6, 9, 10, 11, 12, 16, 17, 20, 24, 27, 29, 32, 40, 49, 60, 64, 65, 67, 69, 70, 73, 74, 75, 78, 85, 87, 91, 93, 101, 109, 111, 116, 117, 140, 153, 154, 168, 179, 189, 190, 191, 208, 218, 220, 229, 231, 235, 243, 249, 265, 317, 319, 346, 347, 360, 363, 393, 429, 445, 449, 453, 481, 492, 493, 535, 544, 570, 587, 603, 698, 741, 797, 847, 910, 927, 934, 1076, 1088, 1138, 1153, 1169, 1191, 1240, 1296, 1391, 1770, 1870, 1875, 2122, 2392, 2646, 2692, 2751, 2915, 3192, 3303, 3536, 4884, 4939, 5022, 11612, 11617, 11620, 11623), values -> List(5.0, 1.0, 1.0, 6.0, 2.0, 1.0, 5.0, 2.0, 3.0, 2.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 3.0, 3.0, 2.0, 2.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 9607, indices -> List(1, 2, 4, 5, 7, 8, 9, 11, 14, 15, 20, 21, 22, 25, 27, 31, 35, 38, 53, 56, 62, 67, 68, 73, 74, 80, 81, 82, 84, 85, 92, 97, 105, 113, 117, 119, 124, 125, 137, 139, 155, 168, 175, 190, 200, 206, 210, 217, 220, 226, 236, 237, 248, 258, 260, 261, 270, 326, 338, 342, 353, 359, 374, 384, 403, 414, 433, 461, 492, 529, 579, 616, 627, 644, 666, 735, 857, 860, 891, 943, 955, 960, 974, 1056, 1081, 1095, 1111, 1213, 1219, 1349, 1427, 1745, 1917, 1994, 2274, 2446, 2546, 2848, 3052, 3441, 3788, 5116, 9586, 9591, 9594, 9597), values -> List(5.0, 1.0, 6.0, 1.0, 1.0, 5.0, 2.0, 1.0, 1.0, 2.0, 2.0, 1.0, 2.0, 1.0, 1.0, 2.0, 3.0, 1.0, 1.0, 4.0, 1.0, 1.0, 1.0, 3.0, 1.0, 2.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0))"
3,3,Lightyear,6.1,Animation Action Adventure,30-Jun-22,The movie is ok. Many of this film's reviewers seem nuts,"This film seems to spark a lot of people to say that this film is completely terrible or that it's completely fabulous because it included a gay relationship in the periphery of the main story, and to an even lesser extent, a convict character who is sympathetic. So. I'll address that quickly: None of that seemed like a big selling-point or issue with the movie to me. And those reviewer who make the movie all about those things seem to say more about themself in their review than the movie itself. Overall I found the movie better than what I was expecting based on it's IMDB score, but still far from being a Pixar classic. The animation is very good, the story is solid even if it feels like a lot of the ideas have been done before in different ways and feels like a few ideas didn't quite work as well as possible. In some ways it reminded me of an inferior ""Up"". The main protagonist's circumstance were less sympathetic and more a result of his actual decisions. Overall I wonder how the movie would have looked if it had explored a B storyline for another character during some of the time-jumps. I also think the twist could have been established a little better than it was. The voice acting for the film is fine. I didn't think a lot of the comedy landed well for me. But it wasn't awful like a lot of kids movies either and didn't detract much from the drama of the main story. Overall it will be acceptable to most audiences, with the robot cat being a likely favorite character for many viewers. Most kids will probably like it a bit more than adults and not notice or care much about the politically controversial content unless their guardians have already primed them to have an major opinion about it like them. I'd say it's a good time at the theater and better than the average score right now of 5.8. But don't expect it to be a classic Pixar film like Coco, Up, Walle, or Toy Story 1-3.",7,0,1,"The movie is ok. Many of this film's reviewers seem nuts This film seems to spark a lot of people to say that this film is completely terrible or that it's completely fabulous because it included a gay relationship in the periphery of the main story, and to an even lesser extent, a convict character who is sympathetic. So. I'll address that quickly: None of that seemed like a big selling-point or issue with the movie to me. And those reviewer who make the movie all about those things seem to say more about themself in their review than the movie itself. Overall I found the movie better than what I was expecting based on it's IMDB score, but still far from being a Pixar classic. The animation is very good, the story is solid even if it feels like a lot of the ideas have been done before in different ways and feels like a few ideas didn't quite work as well as possible. In some ways it reminded me of an inferior ""Up"". The main protagonist's circumstance were less sympathetic and more a result of his actual decisions. Overall I wonder how the movie would have looked if it had explored a B storyline for another character during some of the time-jumps. I also think the twist could have been established a little better than it was. The voice acting for the film is fine. I didn't think a lot of the comedy landed well for me. But it wasn't awful like a lot of kids movies either and didn't detract much from the drama of the main story. Overall it will be acceptable to most audiences, with the robot cat being a likely favorite character for many viewers. Most kids will probably like it a bit more than adults and not notice or care much about the politically controversial content unless their guardians have already primed them to have an major opinion about it like them. I'd say it's a good time at the theater and better than the average score right now of 5.8. But don't expect it to be a classic Pixar film like Coco, Up, Walle, or Toy Story 1-3.","List(movie, ok, many, film, reviewer, seem, nuts, film, seem, spark, lot, people, say, film, completely, terrible, completely, fabulous, include, gay, relationship, periphery, main, story, even, lesser, extent, convict, character, sympathetic, ill, address, quickly, none, seem, like, big, sellingpoint, issue, movie, reviewer, make, movie, thing, seem, say, themself, review, movie, overall, find, movie, well, expect, base, imdb, score, still, far, pixar, classic, animation, good, story, solid, even, feel, like, lot, idea, do, different, way, feel, like, idea, didnt, quite, work, well, possible, way, remind, inferior, main, protagonist, circumstance, less, sympathetic, result, actual, decision, overall, wonder, movie, would, look, explored, b, storyline, another, character, timejumps, also, think, twist, could, establish, little, good, voice, act, film, fine, didnt, think, lot, comedy, land, well, wasnt, awful, like, lot, kid, movie, either, didnt, detract, much, drama, main, story, overall, acceptable, audience, robot, cat, likely, favorite, character, many, viewer, kid, probably, like, bit, adult, notice, care, much, politically, controversial, content, unless, guardian, already, prim, major, opinion, like, id, say, good, time, theater, well, average, score, right, dont, expect, classic, pixar, film, like, coco, walle, toy, story)","List(movi, ok, mani, film, review, seem, nut, film, seem, spark, lot, peopl, say, film, complet, terribl, complet, fabul, includ, gay, relationship, peripheri, main, stori, even, lesser, extent, convict, charact, sympathet, ill, address, quick, none, seem, like, big, sellingpoint, issu, movi, review, make, movi, thing, seem, say, themself, review, movi, overal, found, movi, better, expect, base, imdb, score, still, far, pixar, classic, anim, good, stori, solid, even, feel, like, lot, idea, done, differ, way, feel, like, idea, didnt, quit, work, well, possibl, way, remind, inferior, main, protagonist, circumst, less, sympathet, result, actual, decis, overal, wonder, movi, would, look, explor, b, storylin, anoth, charact, timejump, also, think, twist, could, establish, littl, better, voic, act, film, fine, didnt, think, lot, comedi, land, well, wasnt, aw, like, lot, kid, movi, either, didnt, detract, much, drama, main, stori, overal, accept, audienc, robot, cat, like, favorit, charact, mani, viewer, kid, probabl, like, bit, adult, notic, care, much, polit, controversi, content, unless, guardian, alreadi, prime, major, opinion, like, id, say, good, time, theater, better, averag, score, right, dont, expect, classic, pixar, film, like, coco, wall, toy, stori)",Jun,"List(animation, action, adventure)","List(anim, action, adventur)","Map(vectorType -> sparse, length -> 11604, indices -> List(0, 1, 2, 4, 5, 6, 9, 10, 13, 15, 17, 18, 20, 22, 23, 24, 25, 27, 32, 34, 36, 40, 41, 42, 43, 45, 47, 50, 55, 64, 69, 70, 84, 85, 88, 90, 93, 99, 111, 113, 114, 116, 122, 133, 141, 154, 156, 163, 167, 168, 189, 202, 203, 209, 221, 226, 234, 241, 249, 250, 277, 284, 288, 293, 305, 316, 336, 340, 363, 372, 390, 396, 401, 407, 409, 478, 491, 492, 504, 518, 523, 553, 582, 591, 593, 595, 608, 632, 731, 743, 751, 805, 806, 860, 917, 960, 973, 997, 1109, 1194, 1335, 1412, 1753, 1814, 1827, 1844, 1867, 2003, 2413, 2522, 2830, 3020, 4915, 5835, 5836, 6035, 6181, 6414, 6741, 6907, 7002, 9390, 10552, 10553, 10850), values -> List(7.0, 5.0, 7.0, 3.0, 4.0, 1.0, 1.0, 3.0, 4.0, 2.0, 2.0, 1.0, 2.0, 3.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 4.0, 1.0, 2.0, 1.0, 1.0, 4.0, 3.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 2.0, 2.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 9578, indices -> List(0, 1, 2, 4, 5, 7, 8, 9, 12, 14, 16, 17, 19, 21, 24, 25, 27, 29, 31, 32, 33, 38, 41, 42, 43, 45, 51, 55, 56, 58, 62, 73, 76, 82, 83, 89, 90, 96, 104, 110, 119, 122, 123, 125, 128, 131, 150, 171, 175, 179, 183, 186, 190, 191, 210, 223, 228, 233, 250, 260, 262, 271, 272, 275, 277, 292, 312, 324, 356, 366, 387, 397, 414, 415, 422, 427, 442, 457, 464, 523, 528, 529, 531, 535, 554, 578, 592, 603, 634, 638, 645, 656, 692, 734, 797, 870, 874, 910, 924, 939, 1020, 1028, 1053, 1139, 1140, 1398, 1452, 1724, 1737, 1895, 1943, 2408, 2497, 2604, 2653, 2832, 3249, 3774, 4382, 5907, 6397, 7812, 7897, 8561), values -> List(7.0, 5.0, 8.0, 4.0, 2.0, 1.0, 3.0, 1.0, 2.0, 2.0, 1.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 1.0, 3.0, 1.0, 1.0, 4.0, 3.0, 1.0, 2.0, 1.0, 2.0, 1.0, 4.0, 3.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 3.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 2.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 17, indices -> List(1, 4, 7), values -> List(1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 17, indices -> List(1, 4, 7), values -> List(1.0, 1.0, 1.0))",List(Jun),"Map(vectorType -> sparse, length -> 12, indices -> List(8), values -> List(1.0))","Map(vectorType -> sparse, length -> 11633, indices -> List(0, 1, 2, 4, 5, 6, 9, 10, 13, 15, 17, 18, 20, 22, 23, 24, 25, 27, 32, 34, 36, 40, 41, 42, 43, 45, 47, 50, 55, 64, 69, 70, 84, 85, 88, 90, 93, 99, 111, 113, 114, 116, 122, 133, 141, 154, 156, 163, 167, 168, 189, 202, 203, 209, 221, 226, 234, 241, 249, 250, 277, 284, 288, 293, 305, 316, 336, 340, 363, 372, 390, 396, 401, 407, 409, 478, 491, 492, 504, 518, 523, 553, 582, 591, 593, 595, 608, 632, 731, 743, 751, 805, 806, 860, 917, 960, 973, 997, 1109, 1194, 1335, 1412, 1753, 1814, 1827, 1844, 1867, 2003, 2413, 2522, 2830, 3020, 4915, 5835, 5836, 6035, 6181, 6414, 6741, 6907, 7002, 9390, 10552, 10553, 10850, 11612, 11617, 11620, 11623), values -> List(7.0, 5.0, 7.0, 3.0, 4.0, 1.0, 1.0, 3.0, 4.0, 2.0, 2.0, 1.0, 2.0, 3.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 4.0, 1.0, 2.0, 1.0, 1.0, 4.0, 3.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 2.0, 2.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 9607, indices -> List(0, 1, 2, 4, 5, 7, 8, 9, 12, 14, 16, 17, 19, 21, 24, 25, 27, 29, 31, 32, 33, 38, 41, 42, 43, 45, 51, 55, 56, 58, 62, 73, 76, 82, 83, 89, 90, 96, 104, 110, 119, 122, 123, 125, 128, 131, 150, 171, 175, 179, 183, 186, 190, 191, 210, 223, 228, 233, 250, 260, 262, 271, 272, 275, 277, 292, 312, 324, 356, 366, 387, 397, 414, 415, 422, 427, 442, 457, 464, 523, 528, 529, 531, 535, 554, 578, 592, 603, 634, 638, 645, 656, 692, 734, 797, 870, 874, 910, 924, 939, 1020, 1028, 1053, 1139, 1140, 1398, 1452, 1724, 1737, 1895, 1943, 2408, 2497, 2604, 2653, 2832, 3249, 3774, 4382, 5907, 6397, 7812, 7897, 8561, 9586, 9591, 9594, 9597), values -> List(7.0, 5.0, 8.0, 4.0, 2.0, 1.0, 3.0, 1.0, 2.0, 2.0, 1.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 1.0, 3.0, 1.0, 1.0, 4.0, 3.0, 1.0, 2.0, 1.0, 2.0, 1.0, 4.0, 3.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 3.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 2.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))"
4,4,Lightyear,6.1,Animation Action Adventure,16-Jun-22,This isn't as bad as everyone is claiming!,"I don't get the bad reviews! The score at the time of writing is pretty horrible, and I don't understand. This isn't the best from Disney and Pixar, but it is a great addition and a clever way to tell a new story set in the Toy Story universe. It brings great animation and the array of emotion that we have come to love from the studio. The movie lightyear is the movie that got Andy excited for the new Buzz lightyear toy in the universe of toy story. A great sci-fi action flick for both children and adults with a story of friendship, acceptance and the importance of taking a step back from time to time. Chris Evans as Buzz does a great job of capturing the essence of the iconic voice delivered by Tim Allen in the original. The side characters are great and have some fun dialogue and one liners, especially the cat Sox is amazing and next to every single joke made the theater laugh. The animation and story is great, but it does feel a bit safe. The animation is stunning but a bit bland at times, and it isn't as colorful compared to other movies, and it doesn't have a distinct style, but stays inside a boundary. The story is good and does have quite a few tear pressers, but it is something you will have seen in other movies before. Lightyear feels safe, not as exploratory and daring as previous Pixar films, but it is a good movie worth watching, especially if you love the original movies. All around it's a fun and entertaining movie, with great humor and a few heart wrenching moments. 7/10.",7,0,1,"This isn't as bad as everyone is claiming! I don't get the bad reviews! The score at the time of writing is pretty horrible, and I don't understand. This isn't the best from Disney and Pixar, but it is a great addition and a clever way to tell a new story set in the Toy Story universe. It brings great animation and the array of emotion that we have come to love from the studio. The movie lightyear is the movie that got Andy excited for the new Buzz lightyear toy in the universe of toy story. A great sci-fi action flick for both children and adults with a story of friendship, acceptance and the importance of taking a step back from time to time. Chris Evans as Buzz does a great job of capturing the essence of the iconic voice delivered by Tim Allen in the original. The side characters are great and have some fun dialogue and one liners, especially the cat Sox is amazing and next to every single joke made the theater laugh. The animation and story is great, but it does feel a bit safe. The animation is stunning but a bit bland at times, and it isn't as colorful compared to other movies, and it doesn't have a distinct style, but stays inside a boundary. The story is good and does have quite a few tear pressers, but it is something you will have seen in other movies before. Lightyear feels safe, not as exploratory and daring as previous Pixar films, but it is a good movie worth watching, especially if you love the original movies. All around it's a fun and entertaining movie, with great humor and a few heart wrenching moments. 7/10.","List(isnt, bad, everyone, claim, dont, get, bad, review, score, time, write, pretty, horrible, dont, understand, isnt, best, disney, pixar, great, addition, clever, way, tell, new, story, set, toy, story, universe, bring, great, animation, array, emotion, come, love, studio, movie, lightyear, movie, get, andy, excite, new, buzz, lightyear, toy, universe, toy, story, great, scifi, action, flick, child, adult, story, friendship, acceptance, importance, take, step, back, time, time, chris, evans, buzz, great, job, capture, essence, iconic, voice, deliver, tim, allen, original, side, character, great, fun, dialogue, one, liners, especially, cat, sox, amaze, next, every, single, joke, make, theater, laugh, animation, story, great, feel, bit, safe, animation, stun, bit, bland, time, isnt, colorful, compare, movie, doesnt, distinct, style, stay, inside, boundary, story, good, quite, tear, pressers, something, see, movie, lightyear, feel, safe, exploratory, dare, previous, pixar, film, good, movie, worth, watch, especially, love, original, movie, around, fun, entertain, movie, great, humor, heart, wrench, moment)","List(isnt, bad, everyon, claim, dont, get, bad, review, score, time, write, pretti, horribl, dont, understand, isnt, best, disney, pixar, great, addit, clever, way, tell, new, stori, set, toy, stori, univers, bring, great, anim, array, emot, come, love, studio, movi, lightyear, movi, got, andi, excit, new, buzz, lightyear, toy, univers, toy, stori, great, scifi, action, flick, children, adult, stori, friendship, accept, import, take, step, back, time, time, chris, evan, buzz, great, job, captur, essenc, icon, voic, deliv, tim, allen, origin, side, charact, great, fun, dialogu, one, liner, especi, cat, sox, amaz, next, everi, singl, joke, made, theater, laugh, anim, stori, great, feel, bit, safe, anim, stun, bit, bland, time, isnt, color, compar, movi, doesnt, distinct, style, stay, insid, boundari, stori, good, quit, tear, presser, someth, seen, movi, lightyear, feel, safe, exploratori, dare, previous, pixar, film, good, movi, worth, watch, especi, love, origin, movi, around, fun, entertain, movi, great, humor, heart, wrench, moment)",Jun,"List(animation, action, adventure)","List(anim, action, adventur)","Map(vectorType -> sparse, length -> 11604, indices -> List(0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 16, 21, 23, 27, 29, 31, 34, 35, 37, 38, 49, 59, 63, 67, 74, 76, 77, 83, 84, 86, 87, 93, 95, 97, 99, 109, 110, 119, 137, 144, 150, 152, 153, 158, 162, 163, 186, 189, 198, 203, 224, 229, 274, 284, 317, 319, 327, 344, 346, 347, 363, 374, 385, 395, 398, 428, 431, 446, 492, 518, 531, 541, 568, 592, 597, 635, 654, 714, 745, 751, 770, 834, 847, 874, 941, 1076, 1084, 1089, 1236, 1287, 1344, 1436, 1556, 1707, 1727, 1793, 1838, 2028, 2100, 2383, 2835, 3312, 4066, 4098, 6940, 6944, 11142), values -> List(7.0, 1.0, 1.0, 2.0, 6.0, 1.0, 1.0, 1.0, 4.0, 1.0, 2.0, 7.0, 2.0, 2.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 3.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 9578, indices -> List(0, 1, 3, 4, 5, 6, 7, 8, 11, 15, 17, 22, 25, 28, 29, 30, 34, 37, 40, 44, 53, 63, 66, 67, 70, 74, 75, 76, 77, 80, 82, 88, 90, 92, 103, 104, 113, 114, 117, 135, 144, 151, 159, 166, 168, 169, 185, 193, 210, 215, 216, 227, 233, 244, 248, 263, 300, 312, 340, 353, 359, 374, 384, 392, 412, 414, 419, 420, 435, 485, 486, 500, 523, 529, 553, 555, 568, 588, 653, 661, 687, 711, 731, 742, 748, 756, 771, 797, 821, 857, 883, 910, 1111, 1162, 1260, 1333, 1412, 1512, 1566, 1641, 1760, 2005, 2296, 2496, 2698, 3073, 3496, 5960, 9483), values -> List(7.0, 1.0, 1.0, 6.0, 2.0, 1.0, 4.0, 1.0, 1.0, 7.0, 2.0, 2.0, 2.0, 2.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 2.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 3.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 17, indices -> List(1, 4, 7), values -> List(1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 17, indices -> List(1, 4, 7), values -> List(1.0, 1.0, 1.0))",List(Jun),"Map(vectorType -> sparse, length -> 12, indices -> List(8), values -> List(1.0))","Map(vectorType -> sparse, length -> 11633, indices -> List(0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 16, 21, 23, 27, 29, 31, 34, 35, 37, 38, 49, 59, 63, 67, 74, 76, 77, 83, 84, 86, 87, 93, 95, 97, 99, 109, 110, 119, 137, 144, 150, 152, 153, 158, 162, 163, 186, 189, 198, 203, 224, 229, 274, 284, 317, 319, 327, 344, 346, 347, 363, 374, 385, 395, 398, 428, 431, 446, 492, 518, 531, 541, 568, 592, 597, 635, 654, 714, 745, 751, 770, 834, 847, 874, 941, 1076, 1084, 1089, 1236, 1287, 1344, 1436, 1556, 1707, 1727, 1793, 1838, 2028, 2100, 2383, 2835, 3312, 4066, 4098, 6940, 6944, 11142, 11612, 11617, 11620, 11623), values -> List(7.0, 1.0, 1.0, 2.0, 6.0, 1.0, 1.0, 1.0, 4.0, 1.0, 2.0, 7.0, 2.0, 2.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 3.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 9607, indices -> List(0, 1, 3, 4, 5, 6, 7, 8, 11, 15, 17, 22, 25, 28, 29, 30, 34, 37, 40, 44, 53, 63, 66, 67, 70, 74, 75, 76, 77, 80, 82, 88, 90, 92, 103, 104, 113, 114, 117, 135, 144, 151, 159, 166, 168, 169, 185, 193, 210, 215, 216, 227, 233, 244, 248, 263, 300, 312, 340, 353, 359, 374, 384, 392, 412, 414, 419, 420, 435, 485, 486, 500, 523, 529, 553, 555, 568, 588, 653, 661, 687, 711, 731, 742, 748, 756, 771, 797, 821, 857, 883, 910, 1111, 1162, 1260, 1333, 1412, 1512, 1566, 1641, 1760, 2005, 2296, 2496, 2698, 3073, 3496, 5960, 9483, 9586, 9591, 9594, 9597), values -> List(7.0, 1.0, 1.0, 6.0, 2.0, 1.0, 4.0, 1.0, 1.0, 7.0, 2.0, 2.0, 2.0, 2.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 2.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 3.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))"
5,5,Lightyear,6.1,Animation Action Adventure,16-Jun-22,Not good,"This movie isn't terrible. But it's not good. As a comedy it fails. I laughed only a few times, almost exclusively at the cat. Most of the jokes were crickets in my audience and a bunch made me roll my eyes. As a sci-fi movie it fails. It gives us the most basic child version of several sci-fi concepts stolen from other movies like Interstellar. And if you think for more than a few seconds, so much of it doesn't make sense. There was a cool fight scene and a couple of good action moments, but for the most part I found myself not caring about the action. The movie does this thing where it keeps throwing a wrench in the plan, creating obstacles that the characters must overcome. This is normally a good strategy and makes any later success feel earned and satisfying. But every obstacle is a result of dumb characters, silly mistakes, or simply being clumsy. This makes it hard to stay invested. When this strategy is executed better, there is a much better reason for the obstacles, such as a really smart antagonist. Late in the movie, it goes in a direction that I wasn't a fan of. And we get a few ridiculously cheesy moments, one that made me cringe so hard it hurt. The great animation doesn't make this worth watching. P. S. The mid- and post-credits scenes are absolutely not worth the wait. (1 viewing, early screening IMAX 6/15/2022)",4,-1,-1,"Not good This movie isn't terrible. But it's not good. As a comedy it fails. I laughed only a few times, almost exclusively at the cat. Most of the jokes were crickets in my audience and a bunch made me roll my eyes. As a sci-fi movie it fails. It gives us the most basic child version of several sci-fi concepts stolen from other movies like Interstellar. And if you think for more than a few seconds, so much of it doesn't make sense. There was a cool fight scene and a couple of good action moments, but for the most part I found myself not caring about the action. The movie does this thing where it keeps throwing a wrench in the plan, creating obstacles that the characters must overcome. This is normally a good strategy and makes any later success feel earned and satisfying. But every obstacle is a result of dumb characters, silly mistakes, or simply being clumsy. This makes it hard to stay invested. When this strategy is executed better, there is a much better reason for the obstacles, such as a really smart antagonist. Late in the movie, it goes in a direction that I wasn't a fan of. And we get a few ridiculously cheesy moments, one that made me cringe so hard it hurt. The great animation doesn't make this worth watching. P. S. The mid- and post-credits scenes are absolutely not worth the wait. (1 viewing, early screening IMAX 6/15/2022)","List(good, movie, isnt, terrible, good, comedy, fails, laugh, time, almost, exclusively, cat, joke, cricket, audience, bunch, make, roll, eye, scifi, movie, fail, give, us, basic, child, version, several, scifi, concept, steal, movie, like, interstellar, think, second, much, doesnt, make, sense, cool, fight, scene, couple, good, action, moment, part, find, care, action, movie, thing, keep, throw, wrench, plan, create, obstacles, character, must, overcome, normally, good, strategy, make, later, success, feel, earn, satisfy, every, obstacle, result, dumb, character, silly, mistake, simply, clumsy, make, hard, stay, invest, strategy, execute, well, much, good, reason, obstacle, really, smart, antagonist, late, movie, go, direction, wasnt, fan, get, ridiculously, cheesy, moment, one, make, cringe, hard, hurt, great, animation, doesnt, make, worth, watch, p, mid, postcredits, scene, absolutely, worth, wait, view, early, screening, imax)","List(good, movi, isnt, terribl, good, comedi, fail, laugh, time, almost, exclus, cat, joke, cricket, audienc, bunch, made, roll, eye, scifi, movi, fail, give, us, basic, child, version, sever, scifi, concept, stolen, movi, like, interstellar, think, second, much, doesnt, make, sens, cool, fight, scene, coupl, good, action, moment, part, found, care, action, movi, thing, keep, throw, wrench, plan, creat, obstacl, charact, must, overcom, normal, good, strategi, make, later, success, feel, earn, satisfi, everi, obstacl, result, dumb, charact, silli, mistak, simpli, clumsi, make, hard, stay, invest, strategi, execut, better, much, better, reason, obstacl, realli, smart, antagonist, late, movi, goe, direct, wasnt, fan, get, ridicul, cheesi, moment, one, made, cring, hard, hurt, great, anim, doesnt, make, worth, watch, p, mid, postcredit, scene, absolut, worth, wait, view, earli, screen, imax)",Jun,"List(animation, action, adventure)","List(anim, action, adventur)","Map(vectorType -> sparse, length -> 11604, indices -> List(0, 2, 3, 4, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 19, 20, 23, 26, 32, 35, 45, 56, 63, 74, 83, 88, 89, 104, 109, 112, 114, 132, 137, 138, 154, 163, 165, 183, 204, 205, 211, 224, 229, 230, 239, 248, 249, 259, 263, 274, 281, 305, 330, 353, 387, 388, 446, 469, 502, 511, 519, 553, 581, 596, 616, 655, 669, 714, 747, 751, 828, 843, 889, 892, 976, 1009, 1035, 1047, 1212, 1226, 1231, 1302, 1416, 1515, 1572, 1575, 1704, 1785, 1855, 1925, 2193, 2298, 2373, 2803, 2835, 2938, 2948, 3223, 3512, 7967, 8094, 8230), values -> List(5.0, 1.0, 1.0, 5.0, 6.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 9578, indices -> List(0, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 15, 17, 18, 27, 30, 31, 37, 43, 50, 59, 66, 80, 88, 89, 90, 94, 98, 117, 120, 123, 127, 151, 153, 156, 172, 175, 177, 186, 199, 201, 219, 232, 240, 241, 248, 255, 257, 260, 278, 292, 294, 300, 364, 377, 379, 407, 426, 500, 536, 549, 557, 576, 583, 590, 591, 598, 602, 603, 609, 673, 691, 718, 731, 740, 792, 797, 830, 833, 1022, 1037, 1100, 1201, 1207, 1251, 1346, 1463, 1522, 1636, 1667, 2165, 2335, 2349, 2414, 2698, 2712, 2856, 3464, 4586, 4883, 9067), values -> List(5.0, 1.0, 1.0, 4.0, 1.0, 1.0, 2.0, 4.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 1.0, 2.0, 1.0, 2.0, 2.0, 1.0, 1.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 17, indices -> List(1, 4, 7), values -> List(1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 17, indices -> List(1, 4, 7), values -> List(1.0, 1.0, 1.0))",List(Jun),"Map(vectorType -> sparse, length -> 12, indices -> List(8), values -> List(1.0))","Map(vectorType -> sparse, length -> 11633, indices -> List(0, 2, 3, 4, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16, 19, 20, 23, 26, 32, 35, 45, 56, 63, 74, 83, 88, 89, 104, 109, 112, 114, 132, 137, 138, 154, 163, 165, 183, 204, 205, 211, 224, 229, 230, 239, 248, 249, 259, 263, 274, 281, 305, 330, 353, 387, 388, 446, 469, 502, 511, 519, 553, 581, 596, 616, 655, 669, 714, 747, 751, 828, 843, 889, 892, 976, 1009, 1035, 1047, 1212, 1226, 1231, 1302, 1416, 1515, 1572, 1575, 1704, 1785, 1855, 1925, 2193, 2298, 2373, 2803, 2835, 2938, 2948, 3223, 3512, 7967, 8094, 8230, 11612, 11617, 11620, 11623), values -> List(5.0, 1.0, 1.0, 5.0, 6.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 9607, indices -> List(0, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 15, 17, 18, 27, 30, 31, 37, 43, 50, 59, 66, 80, 88, 89, 90, 94, 98, 117, 120, 123, 127, 151, 153, 156, 172, 175, 177, 186, 199, 201, 219, 232, 240, 241, 248, 255, 257, 260, 278, 292, 294, 300, 364, 377, 379, 407, 426, 500, 536, 549, 557, 576, 583, 590, 591, 598, 602, 603, 609, 673, 691, 718, 731, 740, 792, 797, 830, 833, 1022, 1037, 1100, 1201, 1207, 1251, 1346, 1463, 1522, 1636, 1667, 2165, 2335, 2349, 2414, 2698, 2712, 2856, 3464, 4586, 4883, 9067, 9586, 9591, 9594, 9597), values -> List(5.0, 1.0, 1.0, 4.0, 1.0, 1.0, 2.0, 4.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 1.0, 2.0, 1.0, 2.0, 2.0, 1.0, 1.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))"
6,6,Lightyear,6.1,Animation Action Adventure,24-Jul-22,Pretty good,Wow I just can't believe the bad reviews. I use imdb scores and written reviews religiously and 95% of the time u think they are spot on. I watched this with my family and we were all very entertained throughout the entire movie. That should say enough. It was pretty well done and interesting.,8,1,1,Pretty good Wow I just can't believe the bad reviews. I use imdb scores and written reviews religiously and 95% of the time u think they are spot on. I watched this with my family and we were all very entertained throughout the entire movie. That should say enough. It was pretty well done and interesting.,"List(pretty, good, wow, cant, believe, bad, review, use, imdb, score, write, review, religiously, time, u, think, spot, watch, family, entertain, throughout, entire, movie, say, enough, pretty, well, do, interesting)","List(pretti, good, wow, cant, believ, bad, review, use, imdb, score, written, review, religi, time, u, think, spot, watch, famili, entertain, throughout, entir, movi, say, enough, pretti, well, done, interest)",Jul,"List(animation, action, adventure)","List(anim, action, adventur)","Map(vectorType -> sparse, length -> 11604, indices -> List(0, 4, 7, 9, 13, 20, 21, 22, 72, 77, 78, 90, 93, 95, 115, 152, 182, 203, 214, 253, 299, 376, 752, 885, 1109, 1652, 5791), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 9578, indices -> List(0, 5, 6, 7, 19, 28, 31, 33, 69, 72, 77, 82, 84, 96, 103, 126, 173, 205, 233, 245, 274, 286, 799, 929, 1140, 1500, 1674), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 17, indices -> List(1, 4, 7), values -> List(1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 17, indices -> List(1, 4, 7), values -> List(1.0, 1.0, 1.0))",List(Jul),"Map(vectorType -> sparse, length -> 12, indices -> List(4), values -> List(1.0))","Map(vectorType -> sparse, length -> 11633, indices -> List(0, 4, 7, 9, 13, 20, 21, 22, 72, 77, 78, 90, 93, 95, 115, 152, 182, 203, 214, 253, 299, 376, 752, 885, 1109, 1652, 5791, 11608, 11617, 11620, 11623), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 9607, indices -> List(0, 5, 6, 7, 19, 28, 31, 33, 69, 72, 77, 82, 84, 96, 103, 126, 173, 205, 233, 245, 274, 286, 799, 929, 1140, 1500, 1674, 9582, 9591, 9594, 9597), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))"
7,7,Lightyear,6.1,Animation Action Adventure,20-Jun-22,Mediocre,"I enjoyed the first thirty minutes or so of Lightyear. The pacing was decent and the humor/heart was in the right place. Once the twist in the trailer happens the story becomes a beautifully predictable entry for Pixar, and my five year old became incredibly antsy to leave the theater. Socks steals the show but that isn't enough to enliven a boring plot. Wait for Lightyear to stream on Disney+.",4,0,-1,"Mediocre I enjoyed the first thirty minutes or so of Lightyear. The pacing was decent and the humor/heart was in the right place. Once the twist in the trailer happens the story becomes a beautifully predictable entry for Pixar, and my five year old became incredibly antsy to leave the theater. Socks steals the show but that isn't enough to enliven a boring plot. Wait for Lightyear to stream on Disney+.","List(mediocre, enjoy, first, thirty, minute, lightyear, pace, decent, humorheart, right, place, twist, trailer, happen, story, becomes, beautifully, predictable, entry, pixar, five, year, old, become, incredibly, antsy, leave, theater, sock, steal, show, isnt, enough, enliven, bore, plot, wait, lightyear, stream, disney)","List(mediocr, enjoy, first, thirti, minut, lightyear, pace, decent, humorheart, right, place, twist, trailer, happen, stori, becom, beauti, predict, entri, pixar, five, year, old, becam, incred, antsi, leav, theater, sock, steal, show, isnt, enough, enliven, bore, plot, wait, lightyear, stream, disney)",Jun,"List(animation, action, adventure)","List(anim, action, adventur)","Map(vectorType -> sparse, length -> 11604, indices -> List(5, 28, 44, 57, 60, 65, 78, 79, 91, 96, 109, 110, 111, 123, 128, 131, 151, 156, 201, 284, 335, 353, 360, 398, 414, 518, 612, 674, 697, 966, 1083, 1338, 1575, 1787, 3231, 4409, 7990, 9900, 11381), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 9578, indices -> List(4, 23, 35, 48, 60, 68, 84, 91, 97, 108, 114, 117, 119, 121, 136, 161, 162, 171, 196, 204, 247, 298, 312, 377, 392, 403, 404, 523, 733, 794, 1044, 1403, 1803, 1810, 2897, 4160, 6747, 6897, 9222), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 17, indices -> List(1, 4, 7), values -> List(1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 17, indices -> List(1, 4, 7), values -> List(1.0, 1.0, 1.0))",List(Jun),"Map(vectorType -> sparse, length -> 12, indices -> List(8), values -> List(1.0))","Map(vectorType -> sparse, length -> 11633, indices -> List(5, 28, 44, 57, 60, 65, 78, 79, 91, 96, 109, 110, 111, 123, 128, 131, 151, 156, 201, 284, 335, 353, 360, 398, 414, 518, 612, 674, 697, 966, 1083, 1338, 1575, 1787, 3231, 4409, 7990, 9900, 11381, 11612, 11617, 11620, 11623), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 9607, indices -> List(4, 23, 35, 48, 60, 68, 84, 91, 97, 108, 114, 117, 119, 121, 136, 161, 162, 171, 196, 204, 247, 298, 312, 377, 392, 403, 404, 523, 733, 794, 1044, 1403, 1803, 1810, 2897, 4160, 6747, 6897, 9222, 9586, 9591, 9594, 9597), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))"
8,8,Lightyear,6.1,Animation Action Adventure,17-Jun-22,Great sci-fi adventure,"It may be Pixar's most straightforward film so far but that doesn't stop Lightyear from being another gorgeously animated installment and a great sci-fi adventure that's thrilling, funny and emotional, with the odd twist and heartfelt message up its sleeve as well. Chris Evans gives an incredible lead performance as Buzz, honouring what's come before but still finding his own interpretation. Keke Palmer is full of likeable innocence and Taika is great as himself. The standout by far is Peter Sohn as Sox, endlessly adorable and always funny. The animation is absolutely phenomenal with a lot of it bordering on photo real. Angus MacLane's direction is excellent, there's a clear love for sci-fi present and the world is satisfyingly clunky in the technology. The music by Michael Giacchino is really good, nothing too memorable but still suitably epic and futuristic.",8,1,1,"Great sci-fi adventure It may be Pixar's most straightforward film so far but that doesn't stop Lightyear from being another gorgeously animated installment and a great sci-fi adventure that's thrilling, funny and emotional, with the odd twist and heartfelt message up its sleeve as well. Chris Evans gives an incredible lead performance as Buzz, honouring what's come before but still finding his own interpretation. Keke Palmer is full of likeable innocence and Taika is great as himself. The standout by far is Peter Sohn as Sox, endlessly adorable and always funny. The animation is absolutely phenomenal with a lot of it bordering on photo real. Angus MacLane's direction is excellent, there's a clear love for sci-fi present and the world is satisfyingly clunky in the technology. The music by Michael Giacchino is really good, nothing too memorable but still suitably epic and futuristic.","List(great, scifi, adventure, may, pixars, straightforward, film, far, doesnt, stop, lightyear, another, gorgeously, animate, installment, great, scifi, adventure, thats, thrill, funny, emotional, odd, twist, heartfelt, message, sleeve, well, chris, evans, give, incredible, lead, performance, buzz, honour, whats, come, still, find, interpretation, keke, palmer, full, likeable, innocence, taika, great, standout, far, peter, sohn, sox, endlessly, adorable, always, funny, animation, absolutely, phenomenal, lot, border, photo, real, angus, maclanes, direction, excellent, there, clear, love, scifi, present, world, satisfyingly, clunky, technology, music, michael, giacchino, really, good, nothing, memorable, still, suitably, epic, futuristic)","List(great, scifi, adventur, may, pixar, straightforward, film, far, doesnt, stop, lightyear, anoth, gorgeous, anim, instal, great, scifi, adventur, that, thrill, funni, emot, odd, twist, heartfelt, messag, sleev, well, chris, evan, give, incred, lead, perform, buzz, honour, what, come, still, find, interpret, keke, palmer, full, likeabl, innoc, taika, great, standout, far, peter, sohn, sox, endless, ador, alway, funni, anim, absolut, phenomen, lot, border, photo, real, angus, maclan, direct, excel, there, clear, love, scifi, present, world, satisfi, clunki, technolog, music, michael, giacchino, realli, good, noth, memor, still, suitabl, epic, futurist)",Jun,"List(animation, action, adventure)","List(anim, action, adventur)","Map(vectorType -> sparse, length -> 11604, indices -> List(1, 4, 13, 14, 16, 26, 29, 37, 42, 43, 45, 52, 63, 68, 80, 94, 107, 113, 117, 121, 125, 133, 135, 156, 157, 163, 170, 206, 228, 239, 246, 248, 264, 332, 346, 347, 392, 398, 429, 446, 460, 467, 513, 584, 647, 705, 712, 716, 794, 985, 1055, 1076, 1170, 1328, 1329, 1336, 1516, 1520, 1686, 1690, 1707, 2170, 2254, 2526, 2594, 2667, 2689, 2858, 3086, 3203, 3300, 3678, 3756, 4174, 4654, 5396, 7159, 9387, 9507, 9887), values -> List(1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 9578, indices -> List(1, 5, 10, 15, 19, 22, 42, 44, 45, 46, 50, 66, 71, 86, 90, 98, 101, 107, 115, 118, 122, 130, 133, 137, 150, 152, 168, 171, 172, 192, 213, 235, 264, 265, 298, 299, 343, 374, 384, 392, 436, 496, 500, 523, 540, 590, 679, 695, 751, 782, 949, 1009, 1087, 1111, 1138, 1175, 1178, 1215, 1265, 1381, 1382, 1457, 1477, 1733, 1760, 1771, 1776, 2140, 2142, 2439, 2500, 2617, 3141, 3200, 3661, 3778, 4354, 5465, 6843), values -> List(1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 17, indices -> List(1, 4, 7), values -> List(1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 17, indices -> List(1, 4, 7), values -> List(1.0, 1.0, 1.0))",List(Jun),"Map(vectorType -> sparse, length -> 12, indices -> List(8), values -> List(1.0))","Map(vectorType -> sparse, length -> 11633, indices -> List(1, 4, 13, 14, 16, 26, 29, 37, 42, 43, 45, 52, 63, 68, 80, 94, 107, 113, 117, 121, 125, 133, 135, 156, 157, 163, 170, 206, 228, 239, 246, 248, 264, 332, 346, 347, 392, 398, 429, 446, 460, 467, 513, 584, 647, 705, 712, 716, 794, 985, 1055, 1076, 1170, 1328, 1329, 1336, 1516, 1520, 1686, 1690, 1707, 2170, 2254, 2526, 2594, 2667, 2689, 2858, 3086, 3203, 3300, 3678, 3756, 4174, 4654, 5396, 7159, 9387, 9507, 9887, 11612, 11617, 11620, 11623), values -> List(1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 9607, indices -> List(1, 5, 10, 15, 19, 22, 42, 44, 45, 46, 50, 66, 71, 86, 90, 98, 101, 107, 115, 118, 122, 130, 133, 137, 150, 152, 168, 171, 172, 192, 213, 235, 264, 265, 298, 299, 343, 374, 384, 392, 436, 496, 500, 523, 540, 590, 679, 695, 751, 782, 949, 1009, 1087, 1111, 1138, 1175, 1178, 1215, 1265, 1381, 1382, 1457, 1477, 1733, 1760, 1771, 1776, 2140, 2142, 2439, 2500, 2617, 3141, 3200, 3661, 3778, 4354, 5465, 6843, 9586, 9591, 9594, 9597), values -> List(1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))"
9,9,Lightyear,6.1,Animation Action Adventure,18-Nov-22,"An ok, generic, kids sci-fi movie","First off, for better or worse, this has nothing to do with Toy Story - absolutely nothing. That out the way this is a well made animated movie about Buzz trying to get home. There are some really clever ideas here - the main aspect here is that Buzz experiences time dilation every time he attempts a test flight - flinging him into the future. There are some nice characters and some touching moments as Buzz lives his life outside time. For the first third of this film I was really engaged and settled down to enjoy this. Unfortunately, outside of the time dilation, this film is very empty. The characters run around waving their hands in the air screaming, not really doing much. When it all comes to the crunch the ending is predictable (I can forgive that) but makes no sense - on any level really. All that running around is revealed as even more pointless than it first appeared. This is fine. You can stick it on and watch it with the kids. There are a few jokes that land and the cat is cute. But in a months time you will probably struggle to remember if you watched this.",6,0,0,"An ok, generic, kids sci-fi movie First off, for better or worse, this has nothing to do with Toy Story - absolutely nothing. That out the way this is a well made animated movie about Buzz trying to get home. There are some really clever ideas here - the main aspect here is that Buzz experiences time dilation every time he attempts a test flight - flinging him into the future. There are some nice characters and some touching moments as Buzz lives his life outside time. For the first third of this film I was really engaged and settled down to enjoy this. Unfortunately, outside of the time dilation, this film is very empty. The characters run around waving their hands in the air screaming, not really doing much. When it all comes to the crunch the ending is predictable (I can forgive that) but makes no sense - on any level really. All that running around is revealed as even more pointless than it first appeared. This is fine. You can stick it on and watch it with the kids. There are a few jokes that land and the cat is cute. But in a months time you will probably struggle to remember if you watched this.","List(ok, generic, kid, scifi, movie, first, well, bad, nothing, toy, story, absolutely, nothing, way, well, make, animated, movie, buzz, try, get, home, really, clever, idea, main, aspect, buzz, experience, time, dilation, every, time, attempt, test, flight, fling, future, nice, character, touch, moment, buzz, live, life, outside, time, first, third, film, really, engage, settle, enjoy, unfortunately, outside, time, dilation, film, empty, character, run, around, wave, hand, air, scream, really, much, come, crunch, end, predictable, forgive, make, sense, level, really, run, around, reveal, even, pointless, first, appear, fine, stick, watch, kid, joke, land, cat, cute, month, time, probably, struggle, remember, watched)","List(ok, generic, kid, scifi, movi, first, better, wors, noth, toy, stori, absolut, noth, way, well, made, anim, movi, buzz, tri, get, home, realli, clever, idea, main, aspect, buzz, experi, time, dilat, everi, time, attempt, test, flight, fling, futur, nice, charact, touch, moment, buzz, live, life, outsid, time, first, third, film, realli, engag, settl, enjoy, unfortun, outsid, time, dilat, film, empti, charact, run, around, wave, hand, air, scream, realli, much, come, crunch, end, predict, forgiv, make, sens, level, realli, run, around, reveal, even, pointless, first, appear, fine, stick, watch, kid, joke, land, cat, cute, month, time, probabl, struggl, rememb, watch)",Nov,"List(animation, action, adventure)","List(anim, action, adventur)","Map(vectorType -> sparse, length -> 11604, indices -> List(0, 1, 5, 6, 7, 9, 10, 11, 13, 14, 15, 17, 21, 28, 30, 34, 37, 58, 65, 68, 73, 74, 83, 141, 162, 167, 168, 176, 183, 194, 195, 216, 221, 222, 239, 250, 270, 274, 334, 347, 363, 382, 412, 414, 424, 425, 430, 446, 459, 480, 483, 486, 487, 504, 624, 744, 751, 753, 820, 941, 973, 998, 1058, 1240, 1378, 1508, 1547, 1552, 1771, 1781, 2722, 2831, 3488, 3529, 4111, 4696, 11188), values -> List(2.0, 2.0, 1.0, 2.0, 1.0, 5.0, 2.0, 1.0, 2.0, 4.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 9578, indices -> List(0, 1, 4, 6, 7, 8, 9, 10, 11, 12, 14, 19, 23, 26, 29, 35, 37, 43, 44, 61, 71, 80, 85, 88, 90, 131, 163, 172, 185, 190, 191, 198, 201, 221, 234, 250, 256, 275, 300, 307, 374, 378, 404, 405, 413, 414, 416, 448, 452, 477, 484, 498, 500, 513, 542, 554, 565, 681, 689, 756, 797, 856, 939, 1034, 1038, 1097, 1219, 1379, 1623, 1632, 1764, 2487, 2685, 3357, 3844, 3942, 7423, 8911), values -> List(2.0, 2.0, 1.0, 2.0, 5.0, 2.0, 1.0, 4.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 17, indices -> List(1, 4, 7), values -> List(1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 17, indices -> List(1, 4, 7), values -> List(1.0, 1.0, 1.0))",List(Nov),"Map(vectorType -> sparse, length -> 12, indices -> List(7), values -> List(1.0))","Map(vectorType -> sparse, length -> 11633, indices -> List(0, 1, 5, 6, 7, 9, 10, 11, 13, 14, 15, 17, 21, 28, 30, 34, 37, 58, 65, 68, 73, 74, 83, 141, 162, 167, 168, 176, 183, 194, 195, 216, 221, 222, 239, 250, 270, 274, 334, 347, 363, 382, 412, 414, 424, 425, 430, 446, 459, 480, 483, 486, 487, 504, 624, 744, 751, 753, 820, 941, 973, 998, 1058, 1240, 1378, 1508, 1547, 1552, 1771, 1781, 2722, 2831, 3488, 3529, 4111, 4696, 11188, 11611, 11617, 11620, 11623), values -> List(2.0, 2.0, 1.0, 2.0, 1.0, 5.0, 2.0, 1.0, 2.0, 4.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> sparse, length -> 9607, indices -> List(0, 1, 4, 6, 7, 8, 9, 10, 11, 12, 14, 19, 23, 26, 29, 35, 37, 43, 44, 61, 71, 80, 85, 88, 90, 131, 163, 172, 185, 190, 191, 198, 201, 221, 234, 250, 256, 275, 300, 307, 374, 378, 404, 405, 413, 414, 416, 448, 452, 477, 484, 498, 500, 513, 542, 554, 565, 681, 689, 756, 797, 856, 939, 1034, 1038, 1097, 1219, 1379, 1623, 1632, 1764, 2487, 2685, 3357, 3844, 3942, 7423, 8911, 9585, 9591, 9594, 9597), values -> List(2.0, 2.0, 1.0, 2.0, 5.0, 2.0, 1.0, 4.0, 1.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))"


In [0]:
# Create a CountVectorizer object for the month
assembled_df = assembled_df.withColumn("Manual_Combined_array", split(assembled_df["Manual_Combined"], ","))
cv = CountVectorizer(inputCol="Manual_Combined_array", outputCol="Manual_Combined_vector")

# Fit the CountVectorizer model on the dataframe
cv_model = cv.fit(assembled_df)

# Transform the dataframe to add the count vectorized features column
count_vectorized_df = cv_model.transform(assembled_df)

In [0]:
# Creating the test dataframe which contains only relevant columsn for the ML model
df_test = count_vectorized_df.select(count_vectorized_df["combined_vectors"],count_vectorized_df["combined_vectors_stem"],count_vectorized_df["Manual_Combined"])


In [0]:
#from pyspark.sql.functions import col
# Function to increase the values of a column by 1. 
def increment_column(df, column_name):
    incremented_col = col(column_name) + 1
    new_df = df.withColumn(column_name, incremented_col) 
    return new_df

In [0]:
# Applying the column incrementer to increase the values of ratings by 1. 
df_test2 = increment_column(df_test, "Manual_Combined")

In [0]:
# This is used to create a sample dataset containing the top 10 rows.
# This sample dataset allowed for testing of model within a short time to be sure there are not errors
df_random = df_test2.limit(10)
(training1, testing1) = df_random.randomSplit([0.7, 0.3])
df_random.show()

+--------------------+---------------------+---------------+
|    combined_vectors|combined_vectors_stem|Manual_Combined|
+--------------------+---------------------+---------------+
|(11633,[1,4,6,11,...| (9607,[1,9,10,11,...|              1|
|(11633,[0,4,5,11,...| (9607,[0,4,10,11,...|              1|
|(11633,[1,2,4,5,6...| (9607,[1,2,4,5,7,...|              0|
|(11633,[0,1,2,4,5...| (9607,[0,1,2,4,5,...|              1|
|(11633,[0,1,3,4,5...| (9607,[0,1,3,4,5,...|              1|
|(11633,[0,2,3,4,6...| (9607,[0,2,3,5,6,...|              0|
|(11633,[0,4,7,9,1...| (9607,[0,5,6,7,19...|              2|
|(11633,[5,28,44,5...| (9607,[4,23,35,48...|              1|
|(11633,[1,4,13,14...| (9607,[1,5,10,15,...|              2|
|(11633,[0,1,5,6,7...| (9607,[0,1,4,6,7,...|              1|
+--------------------+---------------------+---------------+



In [0]:
# Splitting the data into train and test datasets.
(training, testing) = df_test2.randomSplit([0.7, 0.3])

## Models

Three machine learning classification models were used:
- Naive Bayes
- Logistic Regression
- Random Forest Classifier

All models are evaluated using F1-score, Recall and Precision. F1-score is the primary metric.

#### Naive Bayes
Naives Bayes model was investigated for lemmatized and stemmed vectors.
The smoothing hyperparameter was tuned as follows:

smoothing = [0.00, 0.10, 0.15, 0.20, 0.50, 1.00, 10.00, 20.00]

Because we have 3 classes of labels (positive, neutral and negative), we use the MulticlassClassificationEvaluator.

##### Naive Bayes with Lemmatized Text

In [0]:
from pyspark.ml.classification import NaiveBayes
from pyspark.ml.tuning import ParamGridBuilder, CrossValidator
from pyspark.ml.evaluation import MulticlassClassificationEvaluator


# Create Naive Bayes object
nb = NaiveBayes(labelCol="Manual_Combined", featuresCol="combined_vectors", modelType="multinomial")

# Create MulticlassClassificationEvaluator
nbevaluator = MulticlassClassificationEvaluator(labelCol="Manual_Combined", metricName="weightedFMeasure")

# set hyperparameters
params_nb = [0.00, 0.10, 0.15, 0.20, 0.50, 1.00, 10.00, 20.00]

# create list variables to store evaluation metrics
f1_scores_nb =[]
recall_scores_nb = []
precision_scores_nb = []


for i in params_nb:
    # Creating the  ParamGrid for Cross Validation
    nbparamGrid = (ParamGridBuilder()
                   .addGrid(nb.smoothing, [i])
                   .build())


    # Creating and applying 5 fold cross-validation
    nbcv = CrossValidator(estimator = nb,
                        estimatorParamMaps = nbparamGrid,
                        evaluator = nbevaluator,
                        numFolds = 5)
    nbcvModel = nbcv.fit(training)
    prediction = nbcvModel.transform(testing).cache()


    # Evaluating the models using defined metrices (F1 score, precision and recall)
    evaluator_f1_nb = MulticlassClassificationEvaluator(labelCol="Manual_Combined", predictionCol="prediction", metricName="weightedFMeasure")
    f1_score = evaluator_f1_nb.evaluate(prediction)
    f1_scores_nb.append(f1_score)

    evaluator_recall_nb = MulticlassClassificationEvaluator(labelCol="Manual_Combined", predictionCol="prediction", metricName="weightedRecall")
    recall_score = evaluator_recall_nb.evaluate(prediction, {evaluator_recall_nb.metricName: "weightedRecall"})
    recall_scores_nb.append(recall_score)

    evaluator_precision_nb = MulticlassClassificationEvaluator(labelCol="Manual_Combined", predictionCol="prediction", metricName="weightedPrecision")
    precision_score = evaluator_precision_nb.evaluate(prediction, {evaluator_precision_nb.metricName: "weightedPrecision"})
    precision_scores_nb.append(precision_score)



In [0]:
#Storing the metrices into a dataframe and displaying the dataframe
nb_total_result = []
for i in range(len(params_nb)):
    templist = []
    templist.append(params_nb[i])
    templist.append(f1_scores_nb[i])
    templist.append(recall_scores_nb[i])
    templist.append(precision_scores_nb[i])
    nb_total_result.append(templist)

schema = StructType([
    StructField("Smoothing", StringType(), True),
    StructField("f1_score", StringType(), True),
    StructField("recall_score", StringType(), True),
    StructField("precision_score", StringType(), True)
])

df_nb_results = spark.createDataFrame(data=nb_total_result, schema=schema)
df_nb_results.display()

Smoothing,f1_score,recall_score,precision_score
0.0,0.2091387177594074,0.3385579937304075,0.4415374779485487
0.1,0.6348950270573637,0.6394984326018809,0.6340436589313605
0.15,0.6412603501037722,0.6489028213166144,0.6415334926306713
0.2,0.6489959761427255,0.658307210031348,0.6478869572854639
0.5,0.66389340512258,0.6927899686520376,0.6520336443600991
1.0,0.6373555450634558,0.6959247648902821,0.653460382218013
10.0,0.3314252569950992,0.4921630094043887,0.5604659671884735
20.0,0.317778395037234,0.4858934169278996,0.2360924126138697


##### Naive Bayes with Stemmed Text

In [0]:
# Create Naive Bayes object
nb = NaiveBayes(labelCol="Manual_Combined", featuresCol="combined_vectors_stem", modelType="multinomial")

# Create MulticlassClassificationEvaluator
nbevaluator = MulticlassClassificationEvaluator(labelCol="Manual_Combined", metricName="weightedFMeasure")
# set hyperparameters
params_nb = [0.00, 0.10, 0.15, 0.20, 0.50, 1.00, 10.00, 20.00]
# create list variables to store evaluation metrics
f1_scores_nb =[]
recall_scores_nb = []
precision_scores_nb = []


for i in params_nb:
    # Creating the  ParamGrid for Cross Validation
    nbparamGrid = (ParamGridBuilder()
                   .addGrid(nb.smoothing, [i])
                   .build())


    # Create and apply 5 fold cross-validation
    nbcv = CrossValidator(estimator = nb,
                        estimatorParamMaps = nbparamGrid,
                        evaluator = nbevaluator,
                        numFolds = 5)
    nbcvModel = nbcv.fit(training)
    prediction = nbcvModel.transform(testing).cache()

    # Evaluating the models using defined metrices (F1 score, precision and recall)
    evaluator_f1_nb = MulticlassClassificationEvaluator(labelCol="Manual_Combined", predictionCol="prediction", metricName="weightedFMeasure")
    f1_score = evaluator_f1_nb.evaluate(prediction)
    f1_scores_nb.append(f1_score)

    evaluator_recall_nb = MulticlassClassificationEvaluator(labelCol="Manual_Combined", predictionCol="prediction", metricName="weightedRecall")
    recall_score = evaluator_recall_nb.evaluate(prediction, {evaluator_recall_nb.metricName: "weightedRecall"})
    recall_scores_nb.append(recall_score)

    evaluator_precision_nb = MulticlassClassificationEvaluator(labelCol="Manual_Combined", predictionCol="prediction", metricName="weightedPrecision")
    precision_score = evaluator_precision_nb.evaluate(prediction, {evaluator_precision_nb.metricName: "weightedPrecision"})
    precision_scores_nb.append(precision_score)

In [0]:
#Storing the metrices into a dataframe and displaying the dataframe
nb_total_result = []
for i in range(len(params_nb)):
    templist = []
    templist.append(params_nb[i])
    templist.append(f1_scores_nb[i])
    templist.append(recall_scores_nb[i])
    templist.append(precision_scores_nb[i])
    nb_total_result.append(templist)

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType

schema = StructType([
    StructField("Smoothing", StringType(), True),
    StructField("f1_score", StringType(), True),
    StructField("recall_score", StringType(), True),
    StructField("precision_score", StringType(), True)
])

df_nb_results = spark.createDataFrame(data=nb_total_result, schema=schema)
df_nb_results.display()

Smoothing,f1_score,recall_score,precision_score
0.0,0.2393391330711868,0.3573667711598746,0.5077755240027045
0.1,0.6359807569530576,0.6426332288401254,0.6328574062853981
0.15,0.6400759903354278,0.6489028213166144,0.6356871292002383
0.2,0.6524918405442343,0.664576802507837,0.645924394784469
0.5,0.649781388962519,0.6802507836990596,0.6362276530176258
1.0,0.6284662070922841,0.6833855799373041,0.6177341006771211
10.0,0.357765379083336,0.5047021943573667,0.5181470828022552
20.0,0.3246595389009911,0.4890282131661442,0.5597188541235386


%md
The best F1-score was 0.0.6639 for regParam = 0.5 using lemmatized text. The corresponding recall and precision were 0.6928 and 0.65203 respectively. Using stemmed text, best F1-score was 0.6524 for regParam = 0.2. The corresponding recall and precision were 0.6646 and 0.6459 respectively.

Overall, the model performed better with review data featurizated using lemmatization than snowball stemming.

### Logistic Regression
Logistic Regression model was investigated for lemmatized and stemmed vectors. The regParam hyperparameter was tuned while elasticNetParam was kept at the default value of zero.

regParam = [0.0001, 0.001, 0.005, 0.01, 0.1, 1.0, 10.0]

Because we have 3 classes of labels (positive, neutral and negative), we use the MulticlassClassificationEvaluator.

##### Logistic Regression with Lemmatized Text

In [0]:
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.tuning import ParamGridBuilder, CrossValidator
from pyspark.ml.evaluation import MulticlassClassificationEvaluator

# Create LogisticRegression object
lr = LogisticRegression(labelCol="Manual_Combined", featuresCol="combined_vectors")

# Create MulticlassClassificationEvaluator
lrevaluator = MulticlassClassificationEvaluator(labelCol="Manual_Combined", metricName="weightedFMeasure")

# set hyperparameters
params_lr = [0.0001, 0.001, 0.005, 0.01, 0.1, 1.0, 10.0]
# create list variables to store evaluation metrics
f1_scores_lr =[]
recall_scores_lr = []
precision_scores_lr = []


for i in params_lr:
     # Creating the  ParamGrid for Cross Validation
    lrparamGrid = (ParamGridBuilder()
                   .addGrid(lr.regParam, [i])
                   .build())


    # Create and apply 5 fold cross-validation
    lrcv = CrossValidator(estimator = lr,
                        estimatorParamMaps = lrparamGrid,
                        evaluator = lrevaluator,
                        numFolds = 5)
    lrcvModel = lrcv.fit(training)

    prediction = lrcvModel.transform(testing).cache()

     # Evaluating the models using defined metrices (F1 score, precision and recall)
    evaluator_f1_lr = MulticlassClassificationEvaluator(labelCol="Manual_Combined", predictionCol="prediction", metricName="weightedFMeasure")
    f1_score = evaluator_f1_lr.evaluate(prediction)
    f1_scores_lr.append(f1_score)

    evaluator_recall_lr = MulticlassClassificationEvaluator(labelCol="Manual_Combined", predictionCol="prediction", metricName="weightedRecall")
    recall_score = evaluator_recall_lr.evaluate(prediction, {evaluator_recall_lr.metricName: "weightedRecall"})
    recall_scores_lr.append(recall_score)

    evaluator_precision_lr = MulticlassClassificationEvaluator(labelCol="Manual_Combined", predictionCol="prediction", metricName="weightedPrecision")
    precision_score = evaluator_precision_lr.evaluate(prediction, {evaluator_precision_lr.metricName: "weightedPrecision"})
    precision_scores_lr.append(precision_score)


In [0]:
#Storing the metrices into a dataframe and displaying the dataframe
lr_total_result = []
for i in range(len(params_lr)):
    templist = []
    templist.append(params_lr[i])
    templist.append(f1_scores_lr[i])
    templist.append(recall_scores_lr[i])
    templist.append(precision_scores_lr[i])
    lr_total_result.append(templist)

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType

schema = StructType([
    StructField("Alpha", StringType(), True),
    StructField("f1_score", StringType(), True),
    StructField("recall_score", StringType(), True),
    StructField("precision_score", StringType(), True)
])

df_lr_results = spark.createDataFrame(data=lr_total_result, schema=schema)
df_lr_results.display()

Alpha,f1_score,recall_score,precision_score
0.0001,0.6675331242182356,0.7021943573667712,0.6743944059473297
0.001,0.6587970466864819,0.6990595611285266,0.6639461256712749
0.005,0.6606767404234214,0.7021943573667712,0.6688037049947837
0.01,0.6606767404234214,0.7021943573667712,0.6688037049947837
0.1,0.6698721556438328,0.7178683385579937,0.682706220869837
1.0,0.6597611737711275,0.7147335423197492,0.7079347473598623
10.0,0.4541024178911041,0.561128526645768,0.575231124807396


##### Logistic Regression with Stemmed Text

In [0]:
# Create LogisticRegression object
lr = LogisticRegression(labelCol="Manual_Combined", featuresCol="combined_vectors_stem")

# Create MulticlassClassificationEvaluator
lrevaluator = MulticlassClassificationEvaluator(labelCol="Manual_Combined", metricName="weightedFMeasure")

# set hyperparameters
params_lr = [0.0001, 0.001, 0.005, 0.01, 0.1, 1.0, 10.0]


# create list variables to store evaluation metrics
f1_scores_lr =[]
recall_scores_lr = []
precision_scores_lr = []


for i in params_lr:
    # Creating the  ParamGrid for Cross Validation
    lrparamGrid = (ParamGridBuilder()
                   .addGrid(lr.regParam, [i])
                   .build())


    # Create and apply 5 fold cross-validation
    lrcv = CrossValidator(estimator = lr,
                        estimatorParamMaps = lrparamGrid,
                        evaluator = lrevaluator,
                        numFolds = 5)
    lrcvModel = lrcv.fit(training)

    prediction = lrcvModel.transform(testing).cache()

     # Evaluating the models using defined metrices (F1 score, precision and recall)
    evaluator_f1_lr = MulticlassClassificationEvaluator(labelCol="Manual_Combined", predictionCol="prediction", metricName="weightedFMeasure")
    f1_score = evaluator_f1_lr.evaluate(prediction)
    f1_scores_lr.append(f1_score)

    evaluator_recall_lr = MulticlassClassificationEvaluator(labelCol="Manual_Combined", predictionCol="prediction", metricName="weightedRecall")
    recall_score = evaluator_recall_lr.evaluate(prediction, {evaluator_recall_lr.metricName: "weightedRecall"})
    recall_scores_lr.append(recall_score)

    evaluator_precision_lr = MulticlassClassificationEvaluator(labelCol="Manual_Combined", predictionCol="prediction", metricName="weightedPrecision")
    precision_score = evaluator_precision_lr.evaluate(prediction, {evaluator_precision_lr.metricName: "weightedPrecision"})
    precision_scores_lr.append(precision_score)

In [0]:
#Storing the metrices into a dataframe and displaying the dataframe
lr_total_result = []
for i in range(len(params_lr)):
    templist = []
    templist.append(params_lr[i])
    templist.append(f1_scores_lr[i])
    templist.append(recall_scores_lr[i])
    templist.append(precision_scores_lr[i])
    lr_total_result.append(templist)

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType

schema = StructType([
    StructField("Alpha", StringType(), True),
    StructField("f1_score", StringType(), True),
    StructField("recall_score", StringType(), True),
    StructField("precision_score", StringType(), True)
])

df_lr_results = spark.createDataFrame(data=lr_total_result, schema=schema)
df_lr_results.display()

Alpha,f1_score,recall_score,precision_score
0.0001,0.625734683329322,0.658307210031348,0.6312730582111626
0.001,0.6247058907157429,0.6614420062695925,0.6264637919017191
0.005,0.6346392681187767,0.677115987460815,0.6410321441556098
0.01,0.640840291625489,0.6833855799373041,0.6475013228557814
0.1,0.6581156254694192,0.7021943573667712,0.6633075437934373
1.0,0.6479756218056709,0.6990595611285266,0.6606867113196228
10.0,0.4642657781551972,0.567398119122257,0.5804479738221059


The best F1-score was 0.6698 for smoothing = 0.1 using lemmatized text. The corresponding recall and precision were 0.7178 and 0.6817 respectively. Using stemmed text, best F1-score was 0.6581 for smoothing = 0.1. The corresponding recall and precision were 0.7021 and 0.6633 respectively.

Overall, the model performed better with review data featurizated using lemmatization than snowball stemming.

### RandomForestClassifier
Random Forest Classifier model was investigated for lemmatized and stemmed vectors. Two hyperparameters were investigated as follows:  The regParam hyperparameter was tuned while elasticNetParam was kept at the default value of zero.

- numTrees = [5, 10, 30, 50]
- maxDepth = [5, 10, 15, 20, 30]

Because we have 3 classes of labels (positive, neutral and negative), we use the MulticlassClassificationEvaluator.

##### RandomForestClassifier with Lemmatized Text

In [0]:
from pyspark.sql.types import IntegerType
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml.evaluation import MulticlassClassificationEvaluator

# Create RandomForestClassifier object
rf = RandomForestClassifier(labelCol="Manual_Combined", featuresCol="combined_vectors")

# Create MulticlassClassificationEvaluator
rf_evaluator = MulticlassClassificationEvaluator(labelCol="Manual_Combined", metricName="weightedFMeasure")

# set hyperparameters
params_rf_trees = [5, 10, 30, 50]
params_rf_maxdepth = [5, 10, 15, 20, 30]

# create list variables to store evaluation metrics
i=0
f1_scores_rf =[]
recall_scores_rf = []
precision_scores_rf = []

#iterating through max depth
for depth in params_rf_maxdepth:
    j=0
    #iterating through number of trees
    for trees in params_rf_trees:
        # Creating the  ParamGrid for Cross Validation
        rfparamGrid = (ParamGridBuilder()
                       .addGrid(rf.numTrees, [trees]) \
        .addGrid(rf.maxDepth, [depth]) \
                       .build())


        # Create and apply 5 fold cross-validation
        rfcv = CrossValidator(estimator = rf,
                            estimatorParamMaps = rfparamGrid,
                            evaluator = rf_evaluator,
                            numFolds = 5)
        rfcvModel = rfcv.fit(training)

        prediction = rfcvModel.transform(testing).cache()

        #Evaluating F1 score
        evaluator_f1_rf = MulticlassClassificationEvaluator(labelCol="Manual_Combined", predictionCol="prediction", metricName="weightedFMeasure")
        f1_score = evaluator_f1_rf.evaluate(prediction)
        f1_scores_rf.append(f1_score)
        
        # Evaluating the models using defined metrices (F1 score, precision and recall)
        #evaluating recall score
        evaluator_recall_rf = MulticlassClassificationEvaluator(labelCol="Manual_Combined", predictionCol="prediction", metricName="weightedRecall")
        recall_score = evaluator_recall_rf.evaluate(prediction, {evaluator_recall_rf.metricName: "weightedRecall"})
        recall_scores_rf.append(recall_score)

        #evaluating precision score
        evaluator_precision_rf = MulticlassClassificationEvaluator(labelCol="Manual_Combined", predictionCol="prediction", metricName="weightedPrecision")
        precision_score = evaluator_precision_rf.evaluate(prediction, {evaluator_precision_rf.metricName: "weightedPrecision"})
        precision_scores_rf.append(precision_score)



In [0]:
#Storing the metrices into a dataframe and displaying the dataframe
x=0
for i in range(5):
    for j in range(4):
        print("\n\nMax depth is : " + str(params_rf_maxdepth[i]))
        print("Number of trees is: " + str(params_rf_trees[j]))
        print("f1 is: " + str(f1_scores_rf[x]))
        print("recall is: " + str(recall_scores_rf[x]))
        print("precision is: " + str(precision_scores_rf[x]))
        x+=1



Max depth is : 5
Number of trees is: 5
f1 is: 0.4364316358451198
recall is: 0.5397923875432525
precision is: 0.4907203523120478


Max depth is : 5
Number of trees is: 10
f1 is: 0.41653858427414653
recall is: 0.5432525951557093
precision is: 0.5466877287999599


Max depth is : 5
Number of trees is: 30
f1 is: 0.36269080687303384
recall is: 0.5190311418685121
precision is: 0.48415483017559136


Max depth is : 5
Number of trees is: 50
f1 is: 0.3620261998277072
recall is: 0.5190311418685121
precision is: 0.5928046972016927


Max depth is : 10
Number of trees is: 5
f1 is: 0.4746413685076859
recall is: 0.5501730103806228
precision is: 0.4806128696939924


Max depth is : 10
Number of trees is: 10
f1 is: 0.5099986683663995
recall is: 0.5916955017301038
precision is: 0.5925236447520185


Max depth is : 10
Number of trees is: 30
f1 is: 0.45034320171058295
recall is: 0.560553633217993
precision is: 0.5097579212059032


Max depth is : 10
Number of trees is: 50
f1 is: 0.4473206770784625
recall is:

##### RandomForestClassifier with Stemmed Text

In [0]:
from pyspark.sql.types import IntegerType
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml.evaluation import MulticlassClassificationEvaluator


# Create RandomForestClassifier object
rf = RandomForestClassifier(labelCol="Manual_Combined", featuresCol="combined_vectors_stem")

# Create MulticlassClassificationEvaluator
rf_evaluator = MulticlassClassificationEvaluator(labelCol="Manual_Combined", metricName="weightedFMeasure")

# set hyperparameters
params_rf_trees1 = [5, 10, 30, 50]
params_rf_maxdepth1 = [5, 10, 15, 20, 30]

# create list variables to store evaluation metrics
i=0
f1_scores_rf1 =[]
recall_scores_rf1 = []
precision_scores_rf1 = []

#iterating through max depth
for depth in params_rf_maxdepth1:
    j=0
    #iterating through number of trees
    for trees in params_rf_trees1:
        # Creating the  ParamGrid for Cross Validation
        rfparamGrid = (ParamGridBuilder()
                       .addGrid(rf.numTrees, [trees]) \
        .addGrid(rf.maxDepth, [depth]) \
                       .build())


        # Create and apply 5 fold cross-validation
        rfcv = CrossValidator(estimator = rf,
                            estimatorParamMaps = rfparamGrid,
                            evaluator = rf_evaluator,
                            numFolds = 5)
        rfcvModel = rfcv.fit(training)

        prediction = rfcvModel.transform(testing).cache()
        
        # Evaluating the models using defined metrices (F1 score, precision and recall)
        #Evaluating F1 score
        evaluator_f1_rf = MulticlassClassificationEvaluator(labelCol="Manual_Combined", predictionCol="prediction", metricName="weightedFMeasure")
        f1_score = evaluator_f1_rf.evaluate(prediction)
        f1_scores_rf1.append(f1_score)
        
        #evaluating recall score
        evaluator_recall_rf = MulticlassClassificationEvaluator(labelCol="Manual_Combined", predictionCol="prediction", metricName="weightedRecall")
        recall_score = evaluator_recall_rf.evaluate(prediction, {evaluator_recall_rf.metricName: "weightedRecall"})
        recall_scores_rf1.append(recall_score)

        #evaluating precision score
        evaluator_precision_rf = MulticlassClassificationEvaluator(labelCol="Manual_Combined", predictionCol="prediction", metricName="weightedPrecision")
        precision_score = evaluator_precision_rf.evaluate(prediction, {evaluator_precision_rf.metricName: "weightedPrecision"})
        precision_scores_rf1.append(precision_score)



In [0]:
#Storing the metrices into a dataframe and displaying the dataframe
x=0
for i in range(5):
    for j in range(4):
        print("\n\nMax depth is : " + str(params_rf_maxdepth1[i]))
        print("Number of trees is: " + str(params_rf_trees1[j]))
        print("f1 is: " + str(f1_scores_rf1[x]))
        print("recall is: " + str(recall_scores_rf1[x]))
        print("precision is: " + str(precision_scores_rf1[x]))
        x+=1



Max depth is : 5
Number of trees is: 5
f1 is: 0.45850004749518114
recall is: 0.552901023890785
precision is: 0.4590273604481081


Max depth is : 5
Number of trees is: 10
f1 is: 0.4536799824607166
recall is: 0.5631399317406144
precision is: 0.6514716809604637


Max depth is : 5
Number of trees is: 30
f1 is: 0.46156528340828673
recall is: 0.5733788395904437
precision is: 0.5702539214231471


Max depth is : 5
Number of trees is: 50
f1 is: 0.45232717660443006
recall is: 0.5699658703071673
precision is: 0.5867004293735549


Max depth is : 10
Number of trees is: 5
f1 is: 0.5149920711858278
recall is: 0.590443686006826
precision is: 0.4880174075447132


Max depth is : 10
Number of trees is: 10
f1 is: 0.5454026447879204
recall is: 0.6177474402730376
precision is: 0.693188282138794


Max depth is : 10
Number of trees is: 30
f1 is: 0.5555769122456392
recall is: 0.6382252559726962
precision is: 0.5820253040698609


Max depth is : 10
Number of trees is: 50
f1 is: 0.5486077252288856
recall is: 0.

Below is the best performing hyperparameter for lemmatized text:
- Hyperparameters
  - Max depth is : 30
  - Number of trees is: 50
- Metric
  - f1 is: 0.6055
  - recall is: 0.6712
  - precision is: 0.5974

Below is the best performing hyperparameter for lemmatized text:
- Hyperparameters
  - Max depth is : 30
  - Number of trees is: 50
- Metric
  - f1 is: 0.6315
  - recall is: 0.7031
  - precision is: 0.5822

Overall, the model performed better with review data featurizated using snowball stemming than lemmatization.

### Conclusion

Across the entire models, the Naive Bayes Model ourperformed all other models with an  F1-score was 0.0.6639 for regParam = 0.5 using lemmatized text. The corresponding recall and precision were 0.6928 and 0.65203 respectively.

For both Naive Bayes and LogisticRegression, using review data featurized using lemmatization outperformed models investigated using lemmatization.

It is worth noting that random seed were not used in running the models. As such, subsequent runs will yield slightly different values