# Word Cloud

This project, creates a "word cloud" from a text by writing a script.  This script needs to process the text, remove punctuation, ignore case and words that do not contain all alphabets, count the frequencies, and ignore uninteresting or irrelevant words.  A dictionary is the output of the `calculate_frequencies` function.  The `wordcloud` module will then generate the image from dictionary.

For the input text of script, you will need to provide a file that contains text only.
<br><br>
Now you will need to upload your input file here so that script will be able to process it.  To do the upload, you will need an uploader widget.  Run the following cell to perform all the installs and imports for word cloud script and uploader widget.  It may take a minute for all of this to run and there will be a lot of output messages. But, be patient. Once you get the following final line of output, the code is done executing. Then you can continue on with the rest of this notebook.
<br><br>
**Enabling notebook extension fileupload/extension...**
<br>
**- Validating: <font color =green>OK</font>**

In [None]:
# Here are all the installs and imports you will need for your word cloud script and uploader widget

import sys
import io
import fileupload
from IPython.display import display
from matplotlib import pyplot as plt
import numpy as np
import wordcloud
%pip install wordcloud
%pip install fileu  pload
%pip install ipywidgets
!jupyter nbextension install - -py - -user fileupload
!jupyter nbextension enable - -py fileupload


That was a lot. All of the installs and imports for your word cloud script and uploader widget have been completed.
<br><br>
To upload your text file, run the following cell that contains all the code for a custom uploader widget. Once you run this cell, a "Browse" button should appear below it. Click this button and navigate the window to locate your saved text file.

In [None]:
# This is the uploader widget

def _upload():

    _upload_widget = fileupload.FileUploadWidget()

    def _cb(change):
        global file_contents
        decoded = io.StringIO(change['owner'].data.decode('utf-8'))
        filename = change['owner'].filename
        print('Uploaded `{}` ({:.2f} kB)'.format(
            filename, len(decoded.read()) / 2 ** 10))
        file_contents = decoded.getvalue()

    _upload_widget.observe(_cb, names='data')
    display(_upload_widget)


_upload()


The uploader widget saved the contents of your uploaded file into a string object named *file_contents* that word cloud script can process.

A function in the cell below iterates through the words in *file_contents*, removes punctuation, and counts the frequency of each word, ignore word case, words that do not contain all alphabets and boring words like "and" or "the".  Then used the `generate_from_frequencies` function to generate word cloud!
<br><br>
Storing the results of your iteration in a dictionary before passing them into wordcloud via the `generate_from_frequencies` function.

In [7]:
def calculate_frequencies(file_contents):
    # Here is a list of punctuations and uninteresting words you can use to process your text
    punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
    uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my",
                           "we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them",
                           "their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being",
                           "have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how",
                           "all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]

    # LEARNER CODE START HERE
    word_counter = {}
    finalized_text = []

    for word in file_contents.split():
        text = ""
        for letter in word.lower():
            if letter not in punctuations and letter.isalpha():
                text += letter
        if word not in uninteresting_words:
            finalized_text.append(text)

    for word in finalized_text:
        if word not in word_counter:
            word_counter[word] = 0
        word_counter[word] += 1

    # wordcloud
    cloud = wordcloud.WordCloud()
    cloud.generate_from_frequencies(word_counter)
    return cloud.to_array()


In [None]:
# Display your wordcloud image

myimage = calculate_frequencies(file_contents)
plt.imshow(myimage, interpolation='nearest')
plt.axis('off')
plt.show()
