# Word Cloud Project

This notebook will create a "word cloud" from a text file, which can be very usefull in understanding the emphasis on certain words within a resume.  You will need to provide a simple text file using the `_upload` function.  For resumes, its best to include only the paragraphs/job descriptions (titles should be self explanatory).  The `calculate_frequencies` function will remove punctuation, remove words that do not contain all letters, count the word frequencies, ignore uninteresting or irrelevant words and return a dictionary.  The `wordcloud` module will then generate a word cloud image from that dictionary.

## Installs and Imports

Run the following cell to perform all the installs and imports for your word cloud script and uploader widget.  It may take a minute for all of this to run and there will be a lot of output messages, you should get this as the final output (might be highlighted in red):
<br><br>
**Enabling notebook extension fileupload/extension...**
<br>
**- Validating: <font color =green>OK</font>**

### Installs

In [None]:
!pip install wordcloud
!pip install fileupload
!pip install ipywidgets
!jupyter nbextension install --py --user fileupload
!jupyter nbextension enable --py fileupload

**IMPORTANT!** If this was your first time running the above cell containing the installs and imports, you will need save and restart the notebook. Then under the File menu above,  select Close and Halt. This will close the notebook.  When the notebook has completely shut down (grayed out), reopen it.

### Imports

In [None]:
import wordcloud
import numpy as np
from matplotlib import pyplot as plt
from IPython.display import display
import fileupload
import io
import string
import sys

## File Upload

To upload your text file, you will need an uploader widget.  Run the following cell that contains all the code for a custom uploader widget. Once you run this cell, a "Browse" button should appear below it. Click this button and navigate the window to locate your saved text file.

In [None]:
def _upload():

    _upload_widget = fileupload.FileUploadWidget()

    def _cb(change):
        global file_contents
        decoded = io.StringIO(change['owner'].data.decode('utf-8'))
        filename = change['owner'].filename
        print('Uploaded `{}` ({:.2f} kB)'.format(filename, len(decoded.read()) / 2 **10))
        file_contents = decoded.getvalue()

    _upload_widget.observe(_cb, names='data')
    display(_upload_widget)

_upload()

## Calculate the Word Frequencies

The function in the cell below iterates through the words in *file_contents*, removes punctuation, and counts the frequency of each word.  This function will also remove punctuation, but will then treat these words as unique words (i.e. "you've" will become "youve").  Oh, be sure to update the boring words list to fit your needs (like "Monday" or "January").

In [None]:
def calculate_frequencies(file_contents):
    # Here is a list of punctuations and uninteresting words you can use to process your text.
    # You can add additional words at the end of the uninteresting_words list if you want.
    punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
    uninteresting_words = ['a', 'all', 'am', 'an', 'and', 'any', 'are', 'as', 'at', 'be', 'been', 'being', 'both', 'but', \
                           'by', 'can', 'did', 'do', 'does', 'dozens', 'each', 'few', 'from', 'for', 'had', 'has', 'have', \
                           'he', 'her', 'here', 'hers', 'him', 'his', 'how', 'i', 'if', 'in', 'is', 'it', 'its', 'just', \
                           'me', 'more', 'my', 'no', 'nor', 'of', 'on', 'or', 'our', 'ours', 'she', 'some', 'such', 'that', \
                           'the', 'their', 'them', 'they', 'this', 'through', 'to', 'too', 'very', 'was', 'we', 'were', \
                           'what', 'when', 'where', 'which', 'who', 'whom', 'will', 'with', 'within', 'you', 'your', \
                           'yours', 'youve']
    
    allwords = {}
    for word in file_contents.split():
        # iterate through word to strip puncuation
        no_punct = ""
        for letter in word:
            if letter not in punctuations:
                no_punct = no_punct + letter
        
        word = no_punct
        
        # add or increment the value in the dictionary
        if word.lower() in uninteresting_words:
            pass
        elif word.lower() not in allwords:
            allwords[word.lower()] = 1
        else:
            allwords[word.lower()] += 1
            
    #return allwords in wordcloud
    cloud = wordcloud.WordCloud(background_color ='white', 
                max_words=1000,
                min_font_size = 10)
    cloud.generate_from_frequencies(allwords)
    return cloud.to_array()

## Generate the Word Cloud

Your word cloud image should appear after running the cell below.

In [None]:
# Display your wordcloud image

myimage = calculate_frequencies(file_contents)
plt.figure(figsize = (12, 12), facecolor = None) 
plt.imshow(myimage, interpolation = 'nearest')
plt.axis('off')
plt.show()

You can rerun this Notebook by clicking "Kernal", then selecting "Restart & Clear Output".  You will have to run the "Installs" cell only once, but you will need to run the "Imports" cell everytime you start/restart the Notebook.