# üöÄ AI Insights with IBM Watson Natural Language Understanding (NLU)

Welcome to the enhanced version of the Watson NLU workshop! This notebook is designed for students and junior developers who want to understand how to turn raw text into actionable insights using AI.

## üß† Understanding the NLU Pipeline

Before we dive into the code, let's look at how data flows through our application:

![NLU Workflow](nlu_workflow.png)

1.  **Text Data**: We start with customer complaints from a CSV file.
2.  **Watson NLU**: This is our "AI Engine." It processes the text and extracts key features.
3.  **Insights**: We get back structured data like Sentiment (Positive/Negative) and Emotions (Anger, Joy, etc.).
4.  **Analysis**: We use Pandas and Matplotlib to visualize these trends.

## 1.0 Setup - Installing the "Waiter"

In programming, an **SDK (Software Development Kit)** acts like a waiter in a restaurant. You (the client) give it an order, and it goes to the kitchen (IBM Cloud) and brings back the information you need.

In [None]:
# Install the necessary libraries
!pip install --upgrade ibm-watson ibm-cloud-sdk-core PyJWT python-dotenv pandas matplotlib

### üóùÔ∏è Authentication & Security

We use the `python-dotenv` library to keep our API keys secret. We store them in a file named `.env` and load them here. 

> **Student Tip**: Never hardcode your API keys directly in the notebook! If you share the notebook, others could use your credits.

In [None]:
import os
import json
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from dotenv import load_dotenv
from ibm_watson import NaturalLanguageUnderstandingV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
from ibm_watson.natural_language_understanding_v1 import Features, CategoriesOptions, EmotionOptions, KeywordsOptions

# Load environment variables and force override to pick up any recent .env changes
load_dotenv(override=True)

IAM_KEY = os.getenv('IAM_KEY')
SERVICE_URL = os.getenv('SERVICE_URL')

if not IAM_KEY or not SERVICE_URL:
    print("‚ùå Error: API Key or Service URL missing. Please check your .env file!")
else:
    print("‚úÖ Credentials loaded successfully.")

## 2.0 Testing the Service

Let's make sure everything is working by analyzing a simple URL. We're asking Watson to look at `www.ibm.com` and give us the top 3 **Categories** (what is the site about?).

In [None]:
authenticator = IAMAuthenticator(IAM_KEY)
nlu = NaturalLanguageUnderstandingV1(
    version='2022-04-07', # This version date controls the API's behavior rules
    authenticator=authenticator
)

nlu.set_service_url(SERVICE_URL)

try:
    response = nlu.analyze(
        url='https://www.ibm.com',
        features=Features(categories=CategoriesOptions(limit=3))
    ).get_result()

    print(json.dumps(response, indent=2))
except Exception as e:
    print(f"‚ùå NLU Analysis failed: {e}")

## 3.0 Working with Real Data

We'll load a dataset of consumer complaints about a bank. This is where the real power of NLU shines‚Äîprocessing hundreds of text entries at once.

In [None]:
data_url = 'https://raw.githubusercontent.com/IBM/python-and-analytics/master/data/cfpbciti.csv'
df = pd.read_csv(data_url)
print(f"Dataset loaded: {df.shape[0]} rows found.")
df.head(3)

## 4.0 Data Cleaning üßπ

AI is like a chef: if you give it bad ingredients (noisy data), you'll get a bad result. 

Our data contains "XX/XX/XXXX" placeholders where real names or dates were removed for privacy. We should clean these out to help Watson focus on the important words.

In [None]:
# 1. Drop rows where there is no text to analyze
df_clean = df.dropna(subset=['Consumer complaint narrative'])

# 2. Remove the 'X' privacy masks using Regular Expressions (Regex)
df_clean = df_clean.replace(regex=['X+'], value='')

# 3. Reset the index so our row numbers are sequential (0, 1, 2...)
df_clean = df_clean.reset_index(drop=True)

print(f"Cleaned data: {df_clean.shape[0]} rows ready for analysis.")
df_clean['Consumer complaint narrative'].head(3)

## 5.0 Advanced NLU Analysis

Now we'll do something cool: we'll take the first 20 complaints and ask Watson to find the **top keywords** AND the **emotions** associated with those keywords.

In [None]:
num_to_analyze = 20
results = []

for i in range(num_to_analyze):
    text = df_clean.loc[i, 'Consumer complaint narrative']
    
    try:
        # We analyze Keywords AND Emotions for each complaint
        response = nlu.analyze(
            text=text,
            features=Features(keywords=KeywordsOptions(emotion=True, limit=2))
        ).get_result()
        
        # Extract the highest emotion score for each entry
        if response['keywords']:
            top_keyword = response['keywords'][0]
            results.append({
                'id': i,
                'keyword': top_keyword['text'],
                'anger': top_keyword['emotion']['anger'],
                'joy': top_keyword['emotion']['joy'],
                'sadness': top_keyword['emotion']['sadness'],
                'Product': df_clean.loc[i, 'Product'],
                'Sub-product': df_clean.loc[i, 'Sub-product']
            })
    except Exception as e:
        print(f"Skipping row {i} due to analysis error.")

results_df = pd.DataFrame(results)
results_df.head()

## 6.0 Data Visualization üìä

Finally, we'll plot our findings in a 3D chart. This lets us see if certain products (like Credit Cards) have higher "Anger" scores than others.

In [None]:
from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure(figsize=(12, 8))
ax = fig.add_subplot(111, projection='3d')

# Convert categories to numbers for the axes
x_labels, x_indices = np.unique(results_df['Sub-product'], return_inverse=True)
y_labels, y_indices = np.unique(results_df['Product'], return_inverse=True)
z_data = results_df['anger']

sc = ax.scatter(x_indices, y_indices, z_data, c=z_data, cmap='Reds', s=100)

ax.set_xticks(range(len(x_labels)))
ax.set_xticklabels(x_labels, rotation=45, ha='right')
ax.set_yticks(range(len(y_labels)))
ax.set_yticklabels(y_labels)
ax.set_zlabel('Anger Score')
ax.set_title('Anger Analysis by Product Category')

plt.colorbar(sc, label='Level of Anger')
plt.show()

## üèÅ Conclusion

Congratulations! You've learned how to:
1.  Securely authenticate with IBM Watson.
2.  Clean text data for AI analysis.
3.  Extract Keywords and Emotions from consumer complaints.
4.  Visualize emotional trends in 3D.

Try changing the `num_to_analyze` variable or adding more `features` like `Entities` or `Sentiment` to explore more!