## Lemmatize Guide Documents

In this section, we lemmatize the guide document (found at [this site](https://medium.com/@m.g.jasper/10-essential-elements-for-movie-reviews-921230d7fb1e)).
Each section in the guide document is lemmatized and written to separate files.

In [5]:
import shared_functions.cleaning as cleaning
import glob
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

# get original filenames of guide docs 
guide_filenames = [file for file in glob.glob("../data/topics/raw_topics/*.txt")]

# read each file, append content to list
guide_documents = []
for file in guide_filenames:
    with open(file, encoding='utf-8') as f:
        guide_documents.append(' '.join(f.read().splitlines()))
    
# lemmatize each document 
lemm_guide_docs = [cleaning.lemmatize(cleaning.clean_text(doc), lemmatizer) for doc in guide_documents]

# for each lemmatized document, write to new filepath
for i in range(len(lemm_guide_docs)):
    new_filepath = '../data/topics/lemm_topics' + guide_filenames[i].split('raw_topics')[1]
    with open(new_filepath, 'w', encoding='utf-8') as f:
        f.write(lemm_guide_docs[i])

## Lemmatize Guide Words

Here, I lemmatize the guide words which were manually extracted from the guide document. The lemmatized
form of the guide words are written to a file.

In [17]:
import os 

basic_guide_words = [['acting', 'actor', 'actress', 'character', 'performance', 'convincing', 'multidimensional', 'authentic', 'portray'], # acting
                    ['attraction', 'premise', 'entertainment', 'interesting', 'pitch', 'amuse', 'enjoy', 'fun'], # attraction
                    ['cinematography', 'visual', 'lighting', 'setting', 'wardrobe', 'camera', 'angles', 'view', 'frame', 'shot', 'aesthetic'], # cinematography
                    ['dialogue', 'storytelling', 'context', 'story', 'monologue', 'speech', 'express'], # dialogue
                    ['directing', 'style', 'execution', 'vision', 'creativity', 'perfect'], # directing
                    ['editing', 'effects', 'tone', 'vfx', 'sfx', 'animation', 'cgi'], # editing and effects
                    ['original', 'innovative', 'best', 'beyond', 'amazing', 'memorable', 'unique', 'special', 'experience'], # it factor
                    ['plot', 'story', 'arc', 'plausibility', 'structure', 'world', 'pace'], # plot 
                    ['sound', 'music', 'harmony', 'mood', 'song', 'soul', 'volume', 'mix'], # sound and music
                    ['theme', 'identity', 'intrigue', 'message', 'powerful', 'meaning', 'emotional', 'thoughtful', 'bond']] # theme

# lemmatize all guide words 
lemm_basic_guide_words = [cleaning.lemmatize(' '.join(row), lemmatizer) for row in basic_guide_words]

# write lemmatized guide words to text file
lemm_guide_words_filepath = '../data/topics/lemm_guide_words.txt'

if os.path.exists(lemm_guide_words_filepath):
    os.remove(lemm_guide_words_filepath)

with open(lemm_guide_words_filepath, 'a', encoding='utf-8') as f: 
    [f.write(topic_words + '\n') for topic_words in lemm_basic_guide_words]
