<a href="https://colab.research.google.com/github/tproffen/ORCSGirlsPython/blob/master/IntroPython/Activity%203.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="https://github.com/tproffen/ORCSGirlsPython/blob/master/Images/Logo.png?raw=1" width="10%" align="right" hpsace="50">

# Introduction to Python

## Activity 3 - Creating word clouds

### As our final *Python* project, we want to create word clouds from articles downloaded from __[Wikipedia](www.wikipedia.org)__. 

This sounds complicated and if you needed to program it without any extensions, it would indeed be complicated. Luckily, one of the great features of *Python* is a great developer community and extensions for nearly everything you can think of. If you want to solve a problem, just google and you will very likely find a *Python* extension to make your life easy. 

Remember and use `shift+enter` to execute the code in a cell. **Since in some cases code depends on cells above, make sure all cells above the one you are working on have been executed.**

### Loading extensions

We load three extension to retrieve Wikipedia articles (`wikipedia`), create the word cloud (`wordcloud`) and plot the image (`matplotlib`). Make sure you execute it using `shift+enter`.

In [None]:
!pip install wordcloud
!pip install wikipedia

In [None]:
import wikipedia
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt 

%matplotlib inline

### Step 1: Retrieve the Wikipedia article text

We will be using the `wikipedia` Python extension. Luckily it is already installed and all we have to do is to import it into this notebook. Simple execute the next cell using `shift+enter` to retrieve the Wikipedia entry for Oak Ridge, TN.

In [None]:
article = wikipedia.page("Oak Ridge, TN")
print (article.url)

Look at the code above. The line `import wikipedia` makes the new commands available in this notebook. Next the Wikipedia article for **Oak Ridge, TN** is retrieved and stored in the variable `article`. The variable `article` has several components and `article.url` gives you the link to the Wikipedia page (click on it and check) and later we use `article.content` which contains the actual text for the word cloud.

### Step 2: Creating the word cloud

Next we using the extension `wordcloud` to create the word cloud from the article we retrieved (remember the text is in `article.content`). Run the next cell by pressing `shift+enter`.

In [None]:
wc = WordCloud().generate(article.content)

### Step 3: Plot the word cloud on screen

Final step, we need to display the word cloud image we just created and stored in variable `wc`. Run the next cell to finally see your word cloud.

In [None]:
plt.imshow(wc)

**Awesome - you made your first word cloud in Python**

## Customize and make it your own

Below is all the code to create and plot in one cell. We do not need to repeat the `import` commands, since the extensions are already loaded. Execute the cell as is and see if you get the same picture. Then read about customizing and keep modifying and executing the cell until you have your perfect word cloud. You can save the image by right clicking on the image with the mouse and selecting save image.

### Changing plot appearence

The following two lines will make the image larger and remove the axes labels. Add them just before `plt.imgshow(wc)`.

> `plt.figure(figsize=(15, 15))`

> `plt.axis("off")`

Add these and see what changes. The next improvement makes the image smoother by changing the `plt.imgshow(wc)` to

> `plt.imshow(wc, interpolation="bilinear")`

Try it.

### Adding options to the word cloud command

In line `wx = WordCloud().generate(atricle.content)` you can add options between the first `()`. You can combine these separated by commas. Here are some examples. Modify the command and observe the differences in the resulting word cloud.

> `wc = WordCloud(max_words=15).generate(article.content)`

> `wc = WordCloud(width=1000, height=1000).generate(article.content)`

> `wc = WordCloud(width=1000, height=1000, background_color="white").generate(article.content)`

> `wc = WordCloud(width=1000, height=1000, background_color="white", stopwords=STOPWORDS).generate(article.content)`

### Change the article

Finally you can change **Oak Ridge, TN** in `article = wikipedia.page("Oak Ridge, TN")`, try for example **Mildred Dresselhaus**. If there is not a single article related to your search term, you will see an error message.

## Have fun.

In [None]:
# As you go on, modify the code in this cell to customize your word cloud

article = wikipedia.page("Oak Ridge High School, TN")
print (article.url)

wc = WordCloud(width=1000, height=1000, background_color="white", stopwords=STOPWORDS).generate(article.content)

plt.figure(figsize=(10, 10))
plt.axis("off")
plt.imshow(wc)

#### Remember you can save your word cloud image by right clicking on it and selecting *save image*.