# Analysis: Counting Keyword Mentions in Text

## Purpose

A common and powerful analysis technique is to track the frequency of specific keywords in financial reports over time. This can help identify trends, measure focus on strategic initiatives (like "AI" or "ESG"), or flag risk-related terms (like "inflation" or "supply chain disruption").

This notebook provides a simple, reusable script to count a list of keywords within a text file and present the results in a clean table using `pandas`.

## 1. Setup

We'll import the `pandas` library to organize our results and Python's `re` module for regular expressions, which helps us find whole words accurately.

In [None]:
import pandas as pd
import re
from collections import Counter

## 2. Define Keywords and Load Data

First, we define the list of keywords we want to search for. Our search will be case-insensitive.

Then, we'll load the `filing_snippet.txt` file included in this directory.

In [None]:
# Define the keywords you are interested in (will be treated as case-insensitive)
keywords_to_track = [
    'ESG',
    'sustainability',
    'inflation',
    'supply chain',
    'AI',
    'risk'
]

# Load the sample text
with open('filing_snippet.txt', 'r', encoding='utf-8') as f:
    sample_text = f.read()

print("--- Filing Snippet ---")
print(sample_text[:500] + '...') # Print first 500 chars

## 3. Perform the Count

We'll process the text to find all whole-word matches for our keywords. The `Counter` object is a highly efficient way to tally the results.

In [None]:
def count_keywords(text, keywords):
    """Counts occurrences of a list of keywords in a given text, case-insensitively."""
    # Create a single regex pattern: \b(word1|word2|...)\b for whole-word matching
    # The `re.IGNORECASE` flag handles case-insensitivity.
    pattern = r'\b(' + '|'.join(re.escape(k) for k in keywords) + r')\b'
    
    # Find all non-overlapping matches in the string
    matches = re.findall(pattern, text, re.IGNORECASE)
    
    # The findall returns a list of strings that matched. We need to standardize them
    # to lowercase to ensure 'Risk' and 'risk' are counted together.
    standardized_matches = [match.lower() for match in matches]
    
    # Count the occurrences of each standardized match
    counts = Counter(standardized_matches)
    
    # Ensure all original keywords are in the final result, even if their count is 0
    final_counts = {kw.lower(): counts.get(kw.lower(), 0) for kw in keywords}
    
    return final_counts

# Run the counting function
keyword_counts = count_keywords(sample_text, keywords_to_track)

print("Raw counts:", keyword_counts)

## 4. Display Results

Finally, we load our dictionary of counts into a `pandas` DataFrame for a clean, professional presentation.

In [None]:
# Convert the dictionary to a pandas DataFrame for better visualization
df_counts = pd.DataFrame(list(keyword_counts.items()), columns=['Keyword', 'Count'])
df_counts = df_counts.sort_values(by='Count', ascending=False).reset_index(drop=True)

display(df_counts)