# Monopoly Topic Modeling: Word Clouds

In our period of focus, the Restoration Period of Europe, *monopolies* become an important concept. But what is discussed when the term "monopoly" is used? To begin to figure this out, we will create word clouds for the common terms/words that appear in texts wherein "monopoly" or one of its other spellings are found to serve as a basis for finding other relvant research directions. Using pandas and the Python wordcloud package, the below code was created.

*Code adapted from work by the 2021 Data+ Team: Rubenstein Library's Card Catalogue*: https://github.com/hsmith221/Data--Rubenstein-Library-Card-Catalog/blob/main/word_cloud.ipynb

In [None]:
# Install relevant packages, skip if already downloaded
!pip install wordcloud

In [6]:
# Import needed packages
import pandas as pd
import wordcloud

from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

from collections import Counter

In [None]:
# Select dataframe
df = pd.read_csv(...)

In [None]:
# Preprocess texts

stop_words = stopwords.words('english')

# Combine full texts of all relevant rows
# CHANGE TEXT TO COLUMN NAME
full_text = " ".join([str(row) for row in df.Text])

# Tokenize
tokenized_text = word_tokenize(full_text)

# Remove unwanted values (eg 2 letter words, stopwords)
filtered_text = [word for word in tokenized_text if word not in stop_words and len(word) > 2]

final_text = (" ").join(filtered_text)

In [None]:
# Count most common words
word_counts = Counter(final_text)
print(word_counts.most_common(50))

In [None]:
# General word cloud for all texts containing some form of "monopoly" in the Restoration Period
# TODO: tweak details of wc
word_cloud = WordCloud(stopwords=stop_words, background_color = "white", width=3000, height=2000, max_words=500, collocations=True).generate_from_frequencies(word_counts)
plt.figure(figsize=(20,10))
plt.imshow(word_cloud)
plt.axis("off")
plt.show()