# Interactive Topic Feedback in Meno

This notebook demonstrates how to use Meno's new feedback system to improve topic modeling results. The feedback system allows you to:

1. Review topic assignments for individual documents
2. Correct any misclassifications
3. Incorporate your feedback into the model
4. Export and import your feedback for collaboration

This approach doesn't require a web app and works directly in Jupyter notebooks.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
import sys

# Add parent directory to path if running in the examples directory
try:
    import meno
except ImportError:
    # Add parent directory to path
    parent_dir = str(Path().absolute().parent)
    if parent_dir not in sys.path:
        sys.path.insert(0, parent_dir)

# Import Meno components
from meno import MenoTopicModeler
from meno.active_learning.simple_feedback import TopicFeedbackManager

## 1. Create Sample Data

First, we'll create some sample data with distinct topic categories.

In [None]:
def create_sample_data():
    """Create sample data for demonstration."""
    # Create sample data with 3 topic categories
    np.random.seed(42)
    
    # Technology documents
    tech_docs = [
        "Machine learning algorithms can analyze large datasets to identify patterns.",
        "Neural networks are computing systems inspired by biological brains.",
        "Deep learning uses multiple layers to progressively extract higher level features.",
        "Artificial intelligence is intelligence demonstrated by machines.",
        "Data science combines domain expertise, programming, and math to extract insights."
    ]
    
    # Healthcare documents
    health_docs = [
        "Medical imaging techniques like MRI help diagnose conditions non-invasively.",
        "Electronic health records store patient information digitally.",
        "Telemedicine allows healthcare providers to consult with patients remotely.",
        "Preventive medicine focuses on maintaining health and preventing disease.",
        "Personalized medicine tailors treatments based on individual genetic profiles."
    ]
    
    # Environmental documents
    env_docs = [
        "Climate change refers to long-term shifts in global weather patterns.",
        "Renewable energy sources like solar and wind power help reduce emissions.",
        "Conservation efforts focus on protecting biodiversity and natural habitats.",
        "Sustainable development balances economic growth with environmental protection.",
        "Electric vehicles reduce reliance on fossil fuels and lower carbon emissions."
    ]
    
    # Deliberately add some documents that could be in multiple categories
    mixed_docs = [
        "AI is being used to analyze medical images for more accurate diagnoses.",  # tech + health
        "Machine learning helps predict climate patterns and environmental changes.",  # tech + env
        "Electronic health records are stored in data centers that consume significant energy.",  # health + env
        "Wearable technology monitors health metrics and sends data to healthcare providers.",  # tech + health
        "Smart grid technology optimizes energy distribution and reduces waste."  # tech + env
    ]
    
    # Combine all documents
    all_docs = tech_docs + health_docs + env_docs + mixed_docs
    
    # Create a true category label for each document (for evaluation)
    true_categories = (["Technology"] * 5 + ["Healthcare"] * 5 + 
                       ["Environment"] * 5 + ["Mixed"] * 5)
    
    # Create a dataframe
    df = pd.DataFrame({
        "text": all_docs,
        "true_category": true_categories
    })
    
    print(f"Created sample dataset with {len(df)} documents")
    return df

# Create the sample data
df = create_sample_data()
df.head()

## 2. Run Initial Topic Modeling

Now we'll run topic modeling on our sample data. We know there are 3 main topics, so we'll set `num_topics=3`.

In [None]:
# Initialize topic modeler
modeler = MenoTopicModeler()

# Preprocess the data
modeler.preprocess(df, text_column="text")

# Discover topics
topic_df = modeler.discover_topics(
    method="embedding_cluster",
    num_topics=3,
    random_state=42
)

# Print topic information
topic_info = modeler.get_topic_info()
print("\nDiscovered Topics:")
for _, row in topic_info.iterrows():
    print(f"Topic {row['Topic']}: {row['Name']}")

# Show document assignments
doc_info = modeler.get_document_info()
result_df = pd.DataFrame({
    "text": df["text"],
    "true_category": df["true_category"],
    "assigned_topic": doc_info["Topic"]
})

# Display the results
result_df.head(10)

## 3. Evaluate Initial Topic Assignments

Let's see how well our initial topic assignments match the true categories.

In [None]:
# Count documents by topic and true category
topic_category_counts = pd.crosstab(
    doc_info["Topic"],
    df["true_category"],
    margins=True,
    margins_name="Total"
)

# Display the counts
print("Topic-Category Distribution:")
topic_category_counts

In [None]:
# Calculate metrics: what percentage of each topic comes from each true category
topic_composition = topic_category_counts.div(topic_category_counts["Total"], axis=0) * 100
topic_composition = topic_composition.drop("Total", axis=1)

# Plot topic composition
plt.figure(figsize=(12, 6))
topic_composition.drop("Total", axis=0).plot(
    kind="bar",
    title="Topic Composition (Before Feedback)"
)
plt.ylabel("Percentage")
plt.xticks(rotation=0)
plt.legend(title="True Category")
plt.tight_layout()
plt.show()

## 4. Set up the Feedback System

Now we'll set up the feedback system to help refine our topic assignments.

In [None]:
# Create a feedback manager
feedback_manager = TopicFeedbackManager(modeler)

# Create topic descriptions to help with feedback
topic_names = modeler.get_topic_info()["Name"].tolist()
topic_descriptions = [
    "Documents related to technology, computing, and data science",
    "Documents related to healthcare, medicine, and patient care",
    "Documents related to environment, climate, and sustainability"
]

## 5. Begin the Feedback Process

Now we'll start the feedback process. This will:
1. Identify documents that are most uncertain or diverse
2. Create an interactive interface to review and correct topic assignments
3. Allow us to save our feedback for later use

In [None]:
# Set up the feedback system
feedback_system = feedback_manager.setup_feedback(
    n_samples=10,  # Review 10 documents
    uncertainty_ratio=0.7,  # 70% uncertain, 30% diverse documents
    uncertainty_method="entropy",  # Use entropy to measure uncertainty
    topic_descriptions=topic_descriptions  # Add descriptions for clarity
)

In [None]:
# Start the review process
feedback_manager.start_review()

## 6. Review Feedback Summary

After providing feedback, we can see a summary of the changes we made.

In [None]:
# Display the feedback summary
feedback_system.display_summary()

## 7. Apply Feedback and Update the Model

Now we'll apply our feedback to update the model.

In [None]:
# Apply the feedback to update the model
feedback_system.apply_updates()

# Get the updated model
updated_modeler = feedback_manager.get_updated_model()

# Check the updated document assignments
updated_doc_info = updated_modeler.get_document_info()
updated_result_df = pd.DataFrame({
    "text": df["text"],
    "true_category": df["true_category"],
    "original_topic": doc_info["Topic"],
    "updated_topic": updated_doc_info["Topic"]
})

# Display documents where the topic changed
changed_df = updated_result_df[updated_result_df["original_topic"] != updated_result_df["updated_topic"]]
changed_df.head()

## 8. Evaluate Updated Topic Assignments

Let's see how our topic assignments improved after incorporating feedback.

In [None]:
# Count documents by topic and true category after feedback
updated_topic_counts = pd.crosstab(
    updated_doc_info["Topic"],
    df["true_category"],
    margins=True,
    margins_name="Total"
)

# Display the counts
print("Topic-Category Distribution (After Feedback):")
updated_topic_counts

In [None]:
# Calculate metrics: what percentage of each topic comes from each true category
updated_composition = updated_topic_counts.div(updated_topic_counts["Total"], axis=0) * 100
updated_composition = updated_composition.drop("Total", axis=1)

# Plot topic composition
plt.figure(figsize=(12, 6))
updated_composition.drop("Total", axis=0).plot(
    kind="bar",
    title="Topic Composition (After Feedback)"
)
plt.ylabel("Percentage")
plt.xticks(rotation=0)
plt.legend(title="True Category")
plt.tight_layout()
plt.show()

## 9. Export Feedback for Collaboration

We can export our feedback to a CSV file to share with colleagues or save for later.

In [None]:
# Export feedback to CSV
feedback_system.export_to_csv("topic_feedback.csv")

## 10. Import Feedback (Future Use)

In a future session, we could import the feedback from a CSV file.

In [None]:
# Example of importing feedback in a future session
# new_feedback_system = new_feedback_manager.setup_feedback(...)
# new_feedback_system.import_from_csv("topic_feedback.csv")
# new_feedback_system.apply_updates()

## Conclusion

In this notebook, we've demonstrated how to use Meno's feedback system to improve topic modeling results:

1. We ran initial topic modeling on our data
2. We set up the feedback system to identify uncertain documents
3. We reviewed and corrected topic assignments interactively
4. We applied our feedback to update the model
5. We evaluated the improvement in topic assignments
6. We exported our feedback for future use or collaboration

This approach provides a simple way to incorporate domain expertise into topic modeling without requiring a web app or complex infrastructure.