<img width="8%" alt="Matplotlib.png" src="https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/.github/assets/logos/Matplotlib.png" style="border-radius: 15%">

# Matplotlib - Create content world cloud

**Tags:** #linkedin #worldcloud #content #analytics #dependency

**Author:** [Florent Ravenel](https://www.linkedin.com/in/florent-ravenel/)

**Description:** This notebook demonstrates how to create a word cloud using Matplotlib. It provides a step-by-step guide on how to generate a word cloud from a given text data, which is a popular way to visualize high-frequency words in a dataset. The result is a word cloud image that visualizes the frequency of words in the given text data. The size of each word in the image corresponds to its frequency in the text data.

## Input

### Import libraries

In [None]:
try:
    from wordcloud import WordCloud
except:
    !pip install wordcloud --user
    from wordcloud import WordCloud
import matplotlib.pyplot as plt
import pandas as pd
import os
from datetime import date, datetime, timedelta
import naas_data_product
from naas_drivers import gsheet

### Setup variables
**Inputs**
- `entity_dir`: Entity directory.
- `entity_name`: Entity name.
- `input_dir`: Input directory to retrieve file from.
- `input_file`: Input file.
- `spreadsheet_url`: Google Sheets spreadsheet URL.
- `sheet_name`: Google Sheets sheet name.
- `title`: Graph title.

**Outputs**
- `output_dir`: This variable is used for storing the path to the directory where the output files will be saved.

In [None]:
# Inputs
entity_dir = pload(os.path.join(naas_data_product.OUTPUTS_PATH, "entities", "0"), "entity_dir")
entity_name = pload(os.path.join(naas_data_product.OUTPUTS_PATH, "entities", "0"), "entity_name")
input_dir = os.path.join(entity_dir, "opendata-engine", date.today().isoformat())
input_file = "opendata"
spreadsheet_url = pload(os.path.join(naas_data_product.OUTPUTS_PATH, "entities", "0"), "abi_spreadsheet")
sheet_name = "EVENTS"
column_text = "CONTENT"

# Outputs
output_dir = os.path.join(entity_dir, "opendata-engine", date.today().isoformat())
os.makedirs(output_dir, exist_ok=True)
output_name = "opendata_worldcloud"

## Model

### Set outputs

In [None]:
image_output = os.path.join(output_dir, f"{output_name}.png")

### Get data from spreadsheet

In [None]:
df_init = gsheet.connect(spreadsheet_url).get(sheet_name=sheet_name)
print("- Data fetched:", len(df_init))
df_init.head(1)

### Extract text from CONTENT

In [None]:
# Creating the text variable
text = " ".join(text for text in df_init.astype(str)[column_text])
# text

### Create worldcloud

In [None]:
# Creating word_cloud with text as argument in .generate() method
wordcloud = WordCloud(
    collocations=False,
    background_color="white",
    width=1200,
    height=600
).generate(remove_emojis(text))

# Display the generated image
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()

## Output

### Save and share your graph in image


In [None]:
# Save your image in PNG
wordcloud.to_file(image_output)

# Share output with naas
naas.asset.add(image_output)

# -> Uncomment the line below to remove your asset
# naas.asset.delete(image_output)