# **Autonomía Económica de las Mujeres**

### *Un enfoque impulsado por la inteligencia artificial para lograr una mayor igualdad de genero en la Ciudad de Mexico*




See for an intro on how to use Jupyter Dash [here](https://medium.com/plotly/introducing-jupyterdash-811f1f57c02). 

In [None]:
# install Jupyter 
!pip install fiona==1.8.13
!pip install jupyter-dash
!pip install dash-bootstrap-components
!pip install dash-bootstrap-templates
!pip install dash-extensions
# load libraries
import pandas as pd
import plotly.express as px
from jupyter_dash import JupyterDash
from dash import dcc, html
from dash.dependencies import Input, Output
import dash_bootstrap_components as dbc
from dash_bootstrap_templates import load_figure_template
import base64
from io import BytesIO
from wordcloud import WordCloud
import matplotlib.pyplot as plt
import nltk
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('omw-1.4')
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from nltk.collocations import BigramAssocMeasures, BigramCollocationFinder
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.manifold import TSNE
import gensim
from gensim.utils import simple_preprocess
from gensim.parsing.preprocessing import STOPWORDS
from gensim import corpora
import functools
from matplotlib.colors import LinearSegmentedColormap

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting fiona==1.8.13
  Downloading Fiona-1.8.13.tar.gz (1.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m12.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting click<8,>=4.0 (from fiona==1.8.13)
  Downloading click-7.1.2-py2.py3-none-any.whl (82 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m82.8/82.8 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting cligj>=0.5 (from fiona==1.8.13)
  Downloading cligj-0.7.2-py3-none-any.whl (7.1 kB)
Collecting click-plugins>=1.0 (from fiona==1.8.13)
  Downloading click_plugins-1.1.1-py2.py3-none-any.whl (7.5 kB)
Collecting munch (from fiona==1.8.13)
  Downloading munch-3.0.0-py2.py3-none-any.whl (10 kB)
Building wheels for collected packages: fiona
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mp

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...


In [None]:
# Generate random data
import random
import numpy as np

# Define municipality names
municipalities = [
    "Álvaro Obregón",
    "Azcapotzalco",
    "Benito Juárez",
    "Coyoacán",
    "Cuajimalpa",
    "Cuauhtémoc",
    "Gustavo A. Madero",
    "Iztacalco",
    "Iztapalapa",
    "La Magdalena Contreras",
    "Miguel Hidalgo",
    "Milpa Alta",
    "Tláhuac",
    "Tlalpan",
    "Venustiano Carranza",
    "Xochimilco",
]

# Define age categories
age_categories = ["16-19", "20-24", "25-34", "35-44", "45-54", "55-64", ">65"]

# Define responses
responses = [
    "better public transport",
    "more financial support from government",
    "increased job opportunities",
    "improved access to job training programs",
    "greater investment in small business development",
    "expansion of apprenticeship and internship programs",
    "creation of tax incentives for companies that hire women",
    "more flexible work arrangements",
    "better childcare options for working parents",
    "increased access to affordable healthcare",
    "higher minimum wage",
    "more opportunities for remote work",
    "improved infrastructure for cycling and walking",
    "investment in renewable energy industries",
    "better access to affordable housing",
    "reduction of income inequality",
    "expanded educational opportunities",
    "increased funding for public education",
    "better access to affordable housing",
    "protection against sexual harassment in the workplace",
    "reduced work hours for new mothers",
    "sick leave for caretakers of family members",
    "equal access to promotions and leadership positions",
    "paid paternity leave for fathers",
    "financial support for single mothers",
    "workplace mentoring and coaching programs",
    "affordable and accessible elder care",
    "gender-neutral bathrooms",
    "health and wellness programs for employees",
    "increased access to mental health services",
    "support for survivors of domestic violence",
    "affordable and safe transportation for night shifts",
    "subsidized child care for low-income families",
    "paid time off for community service",
    "mandatory implicit bias training for hiring managers",
    "community-led job training and apprenticeship programs",
    "guaranteed minimum wage for informal workers",
    "legal aid and support for workplace discrimination cases",
    "flexible scheduling for caregivers of disabled family members",
    "access to affordable and nutritious food options",
    "free or low-cost continuing education courses",
    "increased availability of job-sharing opportunities",
    "affordable and accessible menstrual products in the workplace",
    "support for women-owned businesses",
    "gender-responsive budgeting in government agencies",
    "safe and accessible public restrooms",
    "training and resources for domestic workers",
    "better enforcement of labor laws and protections",
    "support for women in STEM fields",
    "protection against pregnancy discrimination",
    "affordable and accessible child care for all",
    "equal access to opportunities for women in the informal sector",
    "resources and support for women entrepreneurs",
    "free or low-cost legal advice for workers",
    "safe and confidential reporting mechanisms for workplace harassment",
    "expanded affordable housing options for women and families",
    "equal access to paid leave for all parents",
    "culturally sensitive workplace policies and practices",
    "incentives for companies to hire and retain women",
    "increased availability of job training in high-growth industries",
    "access to affordable and safe transportation for domestic workers",
    "parenting support and education programs",
    "guaranteed sick leave for all employees",
    "protection against gender-based violence in the workplace",
    "increased opportunities for skills training and professional development",
    "protection of women's rights in the workplace",
    "implementation of gender-neutral hiring practices",
    "greater support for women entrepreneurs",
    "introduction of paid sick leave",
    "increased support for women in male-dominated industries",
    "implementation of flexible scheduling policies",
    "improved access to affordable and high-quality education",
    "creation of workplace support groups for women",
    "promotion of gender diversity in the workplace",
    "introduction of job shadowing and mentorship programs",
    "increased funding for women's health research",
    "improved access to affordable and high-quality reproductive healthcare",
    "provision of lactation support and breastfeeding education programs",
    "creation of women-only training programs",
    "greater support for women pursuing advanced degrees",
    "implementation of diversity and inclusion training programs",
    "improved access to affordable and high-quality child and eldercare",
    "promotion of work-life integration",
    "introduction of flextime policies",
    "increased support for women in non-traditional jobs",
    "implementation of job rotation programs",
    "provision of on-site healthcare facilities for working women",
    "improved access to affordable and high-quality dental care",
    "creation of a supportive and inclusive work environment",
    "promotion of women's entrepreneurship",
    "introduction of job retraining programs",
    "increased representation of women on boards of directors",
    "implementation of fair and transparent pay policies",
    "greater support for women in the gig economy",
    "expansion of telecommuting options",
    "improved access to lactation rooms and nursing facilities",
    "creation of women's leadership development programs",
    "greater support for women returning to work after having children",
    "provision of affordable and accessible eldercare services for working women"
]

# Define categories
categories = [
    "Transport",
    "Financial support",
    "Facilities",
    "Job training",
    "Small business",
    "Apprenticeships",
    "Tax incentives",
    "Equal pay",
    "Flexible work",
    "Remote work"
]

# Define latitude and longitude coordinates
latitude = [
    19.363889,
    19.488056,
    19.398611,
    19.349444,
    19.349444,
    19.431389,
    19.479722,
    19.408611,
    19.355556,
    19.338333,
    19.424722,
    19.129722,
    19.283611,
    19.286667,
    19.442778,
    19.243889,
]

longitude = [
    -99.165833,
    -99.182222,
    -99.169167,
    -99.165278,
    -99.253889,
    -99.137778,
    -99.095278,
    -99.083056,
    -99.011111,
    -99.265833,
    -99.227778,
    -99.202222,
    -99.046944,
    -99.162222,
    -99.109722,
    -99.133333,
]

# Create an empty dataframe
df = pd.DataFrame(
    columns=[
        "municipality",
        "lat",
        "long",
        "gender",
        "age",
        "employed",
        "social security",
        "response",
        "category",
        "confidence",
    ]
)

for i in range(100):
    mun = random.choice(municipalities)
    index = municipalities.index(mun)
    df = df.append(
        {
            "municipality": mun,
            "lat": latitude[index],
            "long": longitude[index],
            "gender": random.choice(["Woman", "Prefer not to say", "Diverse"]),
            "age": random.choice(age_categories),
            "employed": random.choice(["yes", "no"]),
            "social security": random.choice(["yes", "no"]),
            "response": random.choice(responses),
            "category": random.choice(categories),
            "confidence": round(random.uniform(0, 1), 2),
        },
        ignore_index=True,
    )

# Show the dataframe
df.head()


Unnamed: 0,municipality,lat,long,gender,age,employed,social security,response,category,confidence
0,Xochimilco,19.243889,-99.133333,Diverse,>65,no,no,gender-neutral bathrooms,Flexible work,0.32
1,Azcapotzalco,19.488056,-99.182222,Prefer not to say,>65,no,yes,health and wellness programs for employees,Facilities,0.86
2,Coyoacán,19.349444,-99.165278,Diverse,16-19,no,no,improved access to affordable and high-quality...,Facilities,0.7
3,Iztapalapa,19.355556,-99.011111,Woman,20-24,no,yes,implementation of gender-neutral hiring practices,Tax incentives,0.13
4,La Magdalena Contreras,19.338333,-99.265833,Prefer not to say,35-44,no,yes,creation of women-only training programs,Remote work,0.17


# Dashboard 

Build dash app with dummy dataset to show histogram of age groups (x-axis) and gender (color). The filter will be set for municipality to allow filtering by region (dropdown).

filter (dropdowns):

facility
category (barriers / enablers) - per municipality (to guide policies)
profession
municipality (map)
words (word cloud)

In [None]:
import pandas as pd
import numpy as np
import random
import nltk
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('omw-1.4')
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from nltk.collocations import BigramAssocMeasures, BigramCollocationFinder
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.decomposition import LatentDirichletAllocation
from sklearn.manifold import TSNE
import gensim
from gensim.utils import simple_preprocess
from gensim.parsing.preprocessing import STOPWORDS
from gensim import corpora
import plotly.express as px

# Define a set of stopwords
stop_words = set(stopwords.words('english'))

# Define a set of bigram collocation measures
bigram_measures = BigramAssocMeasures()

# Preprocess the text data and include bigrams
def preprocess(text):
    result = []
    lemmatizer = WordNetLemmatizer()
    tokens = simple_preprocess(text)
    bigram_finder = BigramCollocationFinder.from_words(tokens)
    bigrams = bigram_finder.nbest(bigram_measures.raw_freq, 10)
    for token in tokens + bigrams:
        if token not in STOPWORDS and len(token) > 3 and token not in stop_words:
            result.append(lemmatizer.lemmatize(token, pos='v'))
    return ' '.join(result)

df['response'] = df['response'].apply(preprocess)

# Convert text data to a document-term matrix
cv = CountVectorizer(max_df=0.8, min_df=2, stop_words='english', ngram_range=(1, 2))
doc_term_matrix = cv.fit_transform(df['response'])

# Extract feature names and create a DataFrame of bigrams and their frequencies
feature_names = cv.get_feature_names()
total_counts = doc_term_matrix.sum(axis=0).tolist()[0]
bigram_freq = [(word, total_counts[idx]) for word, idx in cv.vocabulary_.items()]
bigram_freq = sorted(bigram_freq, key=lambda x: x[1], reverse=True)
bigram_df = pd.DataFrame(bigram_freq, columns=['bigram', 'count'])

# Plot a histogram of bigrams in ascending order
fig = px.histogram(bigram_df, x='count', y='bigram', title='Words and Bigrams in Descending Order')
fig.update_layout(xaxis_title='Bigrams', yaxis_title='Count')
fig.show()

# Fit the LDA model
lda_model = LatentDirichletAllocation(n_components=6, random_state=42)
lda_model.fit(doc_term_matrix)

# Transform the document-term matrix to a topic-term matrix
lda_output = lda_model.transform(doc_term_matrix)

# Assign the most probable topic to each document
df['topic'] = lda_output.argmax(axis=1)

# Visualize the topics
topics = {}
for idx, topic in enumerate(lda_model.components_):
    topics["Topic #%d" % (idx)] = [" ".join([cv.get_feature_names()[i]
                        for i in topic.argsort()[:-5 - 1:-1]])]

topics_df = pd.DataFrame.from_dict(topics, orient='index', columns=['Words'])
topics_df = topics_df.reset_index().rename(columns={'index': 'Topic'})

fig = px.bar(topics_df, x='Topic', y='Words', title='Topic Distribution', color='Topic')
fig.show()

# Visualize the clusters in a 2D space using t-SNE algorithm
tsne_model = TSNE(n_components=2, verbose=1, random_state=42, n_iter=500)
tsne_lda = tsne_model.fit_transform(lda_output)

# Plot the clusters using Plotly scatter plot
df['x'] = tsne_lda[:, 0]
df['y'] = tsne_lda[:, 1]

# Create a new column with the top 5 words for each topic
df['topic_words'] = df['topic'].apply(lambda x: ' '.join(topics["Topic #%d" % (x)]))
df['topic'] = df['topic'].astype('category')
df.head()

# # Define the colors for each topic
# fig = px.scatter(df, x="x", y="y", color="topic", 
#                  hover_name='response')

# fig.show()


In [None]:
############# ANALYSIS #####################################
############################################################

# Define a set of stopwords
stop_words = set(stopwords.words('english'))

# Generate the word frequency table and remove stopwords
df_words = (
    df["response"]
    .str.lower()
    .str.split(expand=True)
    .stack()
    .reset_index(level=1, drop=True)
    .reset_index(name="word")
)
df_words = pd.merge(df_words, df, left_index=True, right_index=True)
df_words['word'] = df_words['word'].apply(lambda x: x.strip())
df_words = df_words[~df_words['word'].isin(stop_words)]
word_frequency = (
    df_words.groupby(["municipality", "word"]).size().reset_index(name="count")
)

# Define a set of stopwords
stop_words = set(stopwords.words('english'))

# Define a set of bigram collocation measures
bigram_measures = BigramAssocMeasures()

# Preprocess the text data and include bigrams
def preprocess(text):
    result = []
    lemmatizer = WordNetLemmatizer()
    tokens = simple_preprocess(text)
    bigram_finder = BigramCollocationFinder.from_words(tokens)
    bigrams = bigram_finder.nbest(bigram_measures.raw_freq, 10)
    for token in tokens + bigrams:
        if token not in STOPWORDS and len(token) > 3 and token not in stop_words:
            result.append(lemmatizer.lemmatize(token, pos='v'))
    return ' '.join(result)

df['response'] = df['response'].apply(preprocess)

# Convert text data to a document-term matrix
cv = CountVectorizer(max_df=0.8, min_df=2, stop_words='english', ngram_range=(1, 2))
doc_term_matrix = cv.fit_transform(df['response'])

# Extract feature names and create a DataFrame of bigrams and their frequencies
feature_names = cv.get_feature_names()
total_counts = doc_term_matrix.sum(axis=0).tolist()[0]
bigram_freq = [(word, total_counts[idx]) for word, idx in cv.vocabulary_.items()]
bigram_freq = sorted(bigram_freq, key=lambda x: x[1], reverse=True)
bigram_df = pd.DataFrame(bigram_freq, columns=['bigram', 'count'])

# Filter for the top most common bigrams by count
threshold = 5
bigram_df = bigram_df[bigram_df['count'] >= threshold]

# Fit the LDA model
lda_model = LatentDirichletAllocation(n_components=5, random_state=42)
lda_model.fit(doc_term_matrix)

# Transform the document-term matrix to a topic-term matrix
lda_output = lda_model.transform(doc_term_matrix)

# Assign the most probable topic to each document
df['topic'] = lda_output.argmax(axis=1)

# Visualize the clusters in a 2D space using t-SNE algorithm
tsne_model = TSNE(n_components=2, verbose=1, random_state=42, n_iter=500)
tsne_lda = tsne_model.fit_transform(lda_output)

# Plot the clusters using Plotly scatter plot
df['x'] = tsne_lda[:, 0]
df['y'] = tsne_lda[:, 1]

# Create a new column with the top 5 words for each topic
df['topic_words'] = df['topic'].apply(lambda x: ' '.join(topics["Topic #%d" % (x)]))
df['topic'] = df['topic'].astype('category')

# WORD CLOUD 
# Define a function to generate the word cloud for a given municipality or all municipalities
@functools.lru_cache(maxsize=128)
def generate_wordcloud(municipality):
    if municipality == "all municipalities":
        # Combine the words and their counts for all municipalities into a dictionary
        word_counts = dict(word_frequency.groupby("word")["count"].sum())
    else:
        # Filter the word frequency table for the selected municipality
        words = word_frequency[word_frequency["municipality"] == municipality]["word"]
        counts = word_frequency[word_frequency["municipality"] == municipality]["count"]

        # Combine the words and their counts into a dictionary
        word_counts = dict(zip(words, counts))

    # Generate the word cloud using the WordCloud library
    wordcloud = WordCloud(
        width=1500, height=800, background_color="white", colormap=spacelab_cmap
    ).generate_from_frequencies(word_counts)

    return wordcloud

########### LAYOUT AND TEMPLATE ###############################
###############################################################

# loads the "sketchy" template and sets it as the default
load_figure_template("spacelab")

# Define the spacelab color palette
spacelab_palette = [
    "#F9573E",
    "#CF5E53",
    "#B06163",
    "#956472",
    "#80677C",
    "#6C6887",
    "#536D93",
]

# Define the colormap
spacelab_cmap = LinearSegmentedColormap.from_list("spacelab", spacelab_palette)

# Subset unique values for dropdowns
municipality = df.municipality.unique()
gender = df.gender.unique()
category = df.category.unique()

# Count responses on multiple column
df_count = df.groupby(["municipality", "long", "lat"])["response"].count().reset_index()

# Import logos
logo1 = "https://raw.githubusercontent.com/marianigcr/dashboard_chatbot/main/logos/Logo_CDMX.png"
logo2 = "https://raw.githubusercontent.com/marianigcr/dashboard_chatbot/main/logos/WRA_logo.png"
logo3 = "https://raw.githubusercontent.com/marianigcr/dashboard_chatbot/main/logos/giz_logo.png"

# Styles for the sidebar content
SIDEBAR_STYLE = {
    "position": "fixed",
    "top": 0,
    "left": 0,
    "bottom": 0,
    "width": "20rem",
    "padding": "20px 10px",
    "background-color": "#f8f9fa",
    "box-shadow": "2px 0px 4px rgba(0, 0, 0, 0.1)",
    "border-radius": "0px 10px 10px 0px",
}

# Styles for the main content
CONTENT_STYLE = {"margin-left": "20rem", "margin-right": "1rem", "padding": "1rem 1rem"}

SIDEBAR = html.Div(
    [
        html.H3("Empowering Women* in Mexico City​"),
        html.Hr(className="my-2"),
        html.P(
            "An AI-Driven Approach to greater Economic Independence and Gender Equality",
            className="lead",
            style={"font-size": "16px"},
        ),
        html.Br(),
        dbc.Nav(
            [
                html.Label("Select municipality"),
                dcc.Dropdown(
                    id="municipality-dropdown",
                    options=[
                        {"label": "All municipalities", "value": "all municipalities"}
                    ]
                    + [
                        {"label": municipality, "value": municipality}
                        for municipality in df["municipality"].unique()
                    ],
                    value="all municipalities",
                    clearable=False,
                ),
                html.Br(),
                html.Label("Select gender"),
                dcc.Dropdown(
                    id="gender-dropdown",
                    options=gender,
                    value="Woman",
                    clearable=False,
                ),
                html.Br(),
                html.Label("Select category"),
                dcc.Dropdown(
                    id="category-dropdown",
                    options=category,
                    value="Transport",
                    clearable=False,
                ),
            ],
            vertical=True,
            pills=True,
        ),
        html.Div(
            [
                html.Img(src=logo1, height="30", style={"margin-right": "10px"}),
                html.Img(src=logo2, height="30", style={"margin-right": "10px"}),
                html.Img(src=logo3, height="30", style={"margin-right": "10px"}),
            ],
            style={
                "position": "absolute",
                "bottom": 10,
                "left": 0,
                "right": 0,
                "margin-bottom": "10px",
                "display": "flex",
                "justify-content": "center",
            },
        ),
    ],
    style=SIDEBAR_STYLE,
)


first_card = dbc.Card(
    dbc.CardBody(
        [
            dcc.Graph(id="scatter-map"),
        ]
    ),
    style={"box-shadow": "0px 0px 5px 0px rgba(0, 0, 0, 0.1)"},
)

second_card = dbc.Card(
    dbc.CardBody(
        [
            dcc.Graph(id="graph-age"),
        ]
    ),
    style={"box-shadow": "0px 0px 5px 0px rgba(0, 0, 0, 0.1)"},
)

third_card = dbc.Card(
    dbc.CardBody(
        [
            dcc.Graph(id="graph-category"),
        ]
    ),
    style={"box-shadow": "0px 0px 5px 0px rgba(0, 0, 0, 0.1)", "height": "520px"},
)

fourth_card = dbc.Card(
    dbc.CardBody(
        [
            dbc.Tabs(
                [
                    dbc.Tab(label="Word Cloud", tab_id="word_cloud"),
                    dbc.Tab(label="Bar Chart", tab_id="bar_chart"),
                    dbc.Tab(label="LDA plot", tab_id="lda_chart")

                ],
                id="tabs",
                active_tab="word_cloud",
            ),
            html.Div(id="tab-content"),
        ]
    ),
    style={"box-shadow": "0px 0px 5px 0px rgba(0, 0, 0, 0.1)", "height": "520px"},
)

BODY = html.Div(
    children=[
        dbc.Row(
            [
                dbc.Col(first_card, width=6),
                dbc.Col(second_card, width=6),
            ],
            style=CONTENT_STYLE,
        ),
        dbc.Row(
            [
                dbc.Col(third_card, width=6),
                dbc.Col(fourth_card, width=6),
            ],
            style=CONTENT_STYLE,
        ),
    ]
)

###################### APP #############################
########################################################

# Build app and import bootstrap stylesheet
app = JupyterDash(
    __name__,
    suppress_callback_exceptions=True,
    external_stylesheets=[dbc.themes.SPACELAB],
)

# Define the layout of the app
app.layout = html.Div(children=[SIDEBAR, BODY])

# Build callback that returns scatter map
@app.callback(
    Output("scatter-map", "figure"), [Input("municipality-dropdown", "value")]
)
def update_scatter_map(municipality):
    if municipality == "all municipalities":
        filtered_df = df_count
    else:
        filtered_df = df_count[df_count["municipality"] == municipality]

    lat = filtered_df["lat"].mean()
    lon = filtered_df["long"].mean()
    zoom = 8.5

    scatter_map = px.scatter_mapbox(
        filtered_df,
        lat="lat",
        lon="long",
        color="response",
        size="response",
        hover_name="municipality",
        zoom=zoom,
        mapbox_style="carto-positron",
    )
    scatter_map.update_layout(
        title="Responses for {}".format(municipality),
        margin=dict(l=40, r=40, t=60, b=40),
    )

    return scatter_map


# Build callback that returns two output graphs
@app.callback(
    Output("graph-age", "figure"),
    Output("graph-category", "figure"),
    Input("municipality-dropdown", "value"),
)
def update_graph(municipality):
    if municipality == "all municipalities":
        mask = df["municipality"] != ""
    else:
        mask = df["municipality"] == municipality
    # Build age stacked bar chart
    fig_age = px.histogram(df[mask], x="age", color="gender", barmode="stack")
    # Update the graph so that x axis values are in desc order
    fig_age.update_layout(
        title="Responses by Gender and Age for {}".format(municipality),
        barmode="stack",
        xaxis={"categoryorder": "total descending"},
    )

    # Build category bar chart
    fig_cat = px.histogram(df[mask], y="category")
    # Update the graph so that y axis values are in desc order
    fig_cat.update_layout(
        title="Responses by Category for {}".format(municipality),
        yaxis={"categoryorder": "total ascending"},
        height=500
    )

    return fig_age, fig_cat

@app.callback(
    Output("tab-content", "children"),
    [Input("tabs", "active_tab"), 
     Input("municipality-dropdown", "value")],
)
def render_tab_content(active_tab, municipality):
    if active_tab == "word_cloud":
        # Generate the word cloud for the selected municipality or all municipalities
        wordcloud = generate_wordcloud(municipality)

        # Use Plotly Express to display the word cloud image in the app
        fig = px.imshow(wordcloud.to_array())
        fig.update_layout(
            title="Word Cloud for {}".format(municipality),
            margin=dict(l=40, r=40, t=60, b=40),
            xaxis=dict(visible=False),
            yaxis=dict(visible=False),
            hovermode=False,
        )

        return dcc.Graph(id="wordcloud", figure=fig)

    elif active_tab == "bar_chart":
        if municipality == "all municipalities":
            mask = word_frequency["municipality"] != ""
        else:
            mask = word_frequency["municipality"] == municipality
            
        filtered_word_frequency = word_frequency[mask]

        hist_word = px.histogram(
            bigram_df,
            x="count",
            y="bigram",
            orientation="h",
            title="Most frequent Bigrams for {}".format(municipality),
            labels={"Bigram": "Bigram", "Count": "Count"},
        )

        hist_word.update_layout(yaxis={"categoryorder": "total ascending", 
                                       "tickmode": "linear",
                                       "dtick": 1,
                                       "tickfont": {"size": 10},
                                       },
                                margin=dict(l=80, r=20, t=60, b=20),
        )
        # show the plot
        hist_word.show()
        return dcc.Graph(id="bar-chart", figure=hist_word)

    elif active_tab == "lda_chart":
        if municipality == "all municipalities":
            mask = df["municipality"] != ""
        else:
            mask = df["municipality"] == municipality

        lda_plot = px.scatter(
            df[mask],
            x="x",
            y="y",
            color="topic",
            hover_name='response'            
            )
        
        lda_plot.update_layout(
            title="Topic Keywords for {}".format(municipality)        
        )
        # show the plot
        lda_plot.show()
        return dcc.Graph(id="lda_chart", figure=lda_plot)

# Run app
app.run_server(port=8050)


Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead.


The default initialization in TSNE will change from 'random' to 'pca' in 1.2.


The default learning rate in TSNE will change from 200.0 to 'auto' in 1.2.



[t-SNE] Computing 91 nearest neighbors...
[t-SNE] Indexed 100 samples in 0.000s...
[t-SNE] Computed neighbors for 100 samples in 0.006s...
[t-SNE] Computed conditional probabilities for sample 100 / 100
[t-SNE] Mean sigma: 0.561171
[t-SNE] KL divergence after 250 iterations with early exaggeration: 55.292812
[t-SNE] KL divergence after 500 iterations: 0.055229
Dash app running on:


<IPython.core.display.Javascript object>