# Meno Topic Modeling: Basic Workflow

This notebook demonstrates a basic topic modeling workflow using Meno.

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
from meno.meno import MenoTopicModeler
import plotly.express as px
import matplotlib.pyplot as plt

# Set up plotting
%matplotlib inline

## Load sample data

For this example, we'll use a dataset of insurance claims. In a real application, you would load your own text data.

In [None]:
# Sample data - in a real application, load your own data here
data = [
    "Customer's vehicle was damaged in a parking lot by a shopping cart. Front bumper has scratches.",
    "Claimant's home flooded due to heavy rain. Water damage to first floor and basement.",
    "Vehicle collided with another car at an intersection. Front-end damage and airbag deployment.",
    "Tree fell on roof during storm causing damage to shingles and gutters.",
    "Insured slipped on ice in parking lot and broke wrist requiring medical treatment.",
    "Customer's laptop was stolen from car. Window was broken to gain entry.",
    "Kitchen fire caused smoke damage throughout home. Fire started from unattended cooking.",
    "Rear-end collision at stoplight. Minor bumper damage to insured vehicle.",
    "Hail damaged roof and required full replacement of shingles.",
    "Burst pipe in bathroom caused water damage to flooring and walls.",
    "Dog bit visitor to home requiring stitches and antibiotics.",
    "Vandalism to vehicle in parking garage. Scratches on multiple panels.",
    "Cyclist hit by insured's vehicle at crosswalk. Minor injuries reported.",
    "Lightning strike caused electrical surge damaging home appliances and electronics.",
    "Fell on wet floor at grocery store resulting in back injury and ongoing physical therapy.",
]

# Convert to DataFrame
df = pd.DataFrame({"claim_text": data})
df.head()

## Initialize Topic Modeler

We'll create a topic modeler instance using the default configuration.

In [None]:
# Create topic modeler with default configuration
modeler = MenoTopicModeler()

# Check embedding model being used
print(f"Using embedding model: {modeler.config.modeling.embeddings.model_name}")

## Preprocess Text

Now we'll preprocess the text data before modeling.

In [None]:
# Preprocess documents
processed_docs = modeler.preprocess(
    df,
    text_column="claim_text"
)

# View original and processed text
processed_docs[["text", "processed_text"]].head(3)

## Generate Embeddings

We'll create document embeddings using the configured model.

In [None]:
# Generate embeddings
embeddings = modeler.embed_documents()

# Check the shape of the embeddings
print(f"Embeddings shape: {embeddings.shape}")

## Unsupervised Topic Discovery

First, let's try unsupervised topic discovery using embedding clustering.

In [None]:
# Discover topics using embedding clustering
topics_df = modeler.discover_topics(
    method="embedding_cluster",
    num_topics=5  # Specify number of topics, or leave as None to use config default
)

# View the topic assignments
topics_df[["text", "topic"]].head(10)

## Visualize Document Embeddings

Let's visualize the document embeddings colored by topic.

In [None]:
# Create UMAP visualization of documents colored by topic
fig = modeler.visualize_embeddings()
fig.show()

## Supervised Topic Matching

Now let's try supervised topic matching with predefined topics.

In [None]:
# Define topics and descriptions
predefined_topics = [
    "Vehicle Damage",
    "Water Damage",
    "Personal Injury",
    "Property Damage",
    "Theft/Vandalism"
]

topic_descriptions = [
    "Damage to vehicles from collisions, parking incidents, or natural events",
    "Damage from water including floods, leaks, and burst pipes",
    "Injuries to people including slips, falls, and accidents",
    "Damage to property from fire, storms, or other causes",
    "Theft of property or intentional damage"
]

# Match documents to predefined topics
matched_df = modeler.match_topics(
    topics=predefined_topics,
    descriptions=topic_descriptions,
    threshold=0.5  # Similarity threshold
)

# View the topic assignments
matched_df[["text", "topic", "topic_probability"]].head(10)

## Visualize Topic Distribution

Let's see the distribution of topics in our dataset.

In [None]:
# Create topic distribution visualization
fig = modeler.visualize_topic_distribution()
fig.show()

## Generate HTML Report

Finally, let's generate an HTML report with our findings.

In [None]:
# Generate HTML report
report_path = modeler.generate_report(
    output_path="insurance_claims_topics.html",
    include_interactive=True
)

print(f"Report generated at {report_path}")

## Export Results

Let's export the results to CSV and JSON formats.

In [None]:
# Export results
export_paths = modeler.export_results(
    output_path="export_results",
    formats=["csv", "json"],
    include_embeddings=False
)

print("Results exported to:")
for fmt, path in export_paths.items():
    print(f"  - {fmt}: {path}")