# üîç Chhattisgarh Police - Crime Intelligence Analysis Agent

This notebook implements an AI agent for **Chhattisgarh Police Department** that analyzes newspaper clipping screenshots to extract crime-related information. The agent:

1. **Accepts** a newspaper clipping screenshot from the user
2. **Extracts** text using OCR (Optical Character Recognition)
3. **Searches** RSS feeds with **priority on Chhattisgarh regional news**
4. **Aggregates** information from various sources
5. **Generates** a comprehensive unbiased summary
6. **Identifies** common terms across all news sources
7. **Provides** investigative clues to help Chhattisgarh Police with their investigation

This is an **interactive analysis tool** designed specifically for **Chhattisgarh Police Department** to quickly gather intelligence from newspaper clippings, with emphasis on regional crime news.

## üìö Setup: Import Libraries

We need several libraries for this agent:
- **pytesseract**: OCR for text extraction from images
- **PIL (Pillow)**: Image processing
- **requests & BeautifulSoup**: Web scraping for RSS feeds
- **ollama**: Local LLM for intelligent analysis
- **collections**: For common term extraction

In [None]:
# Import necessary libraries
import requests
from bs4 import BeautifulSoup
from datetime import datetime
import json
import re
from collections import Counter
import os

# Image processing and OCR
try:
    from PIL import Image
    import pytesseract
    OCR_AVAILABLE = True
    print("‚úÖ PIL and pytesseract imported successfully")
    # Note: You may need to set the tesseract path on Windows
    # pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
except ImportError as e:
    print(f"‚ö†Ô∏è OCR libraries not available: {e}")
    print("   Install with: pip install pytesseract pillow")
    print("   Also install Tesseract-OCR: https://github.com/tesseract-ocr/tesseract")
    OCR_AVAILABLE = False

# For LLM integration - using Ollama with local Llama 3.2 model
try:
    import ollama
    # Test if Ollama is running and model is available
    try:
        ollama.chat(model='llama3.2:latest', messages=[{'role': 'user', 'content': 'test'}])
        LLM_AVAILABLE = True
        print("‚úÖ Ollama connected successfully with llama3.2:latest model")
    except Exception as e:
        print(f"‚ö†Ô∏è Ollama error: {e}")
        print("   Make sure Ollama is running and llama3.2:latest model is downloaded")
        print("   Run: ollama pull llama3.2:latest")
        LLM_AVAILABLE = False
except ImportError:
    print("‚ö†Ô∏è Ollama library not installed. Install with: pip install ollama")
    print("   Will use basic summaries for demo.")
    LLM_AVAILABLE = False

print("\n‚úÖ Libraries imported successfully")
print(f"üìÖ Today's date: {datetime.now().strftime('%Y-%m-%d')}")

‚úÖ PIL and pytesseract imported successfully


## üì∏ Step 1: Upload Newspaper Clipping & Extract Text

Provide the path to your newspaper clipping screenshot. The agent will use OCR to extract the text.

In [2]:
def extract_text_from_image(image_path):
    """
    Extract text from a newspaper clipping image using OCR.
    
    Args:
        image_path: Path to the image file
    
    Returns:
        Extracted text as string
    """
    if not OCR_AVAILABLE:
        return "OCR not available. Please install pytesseract and Tesseract-OCR."
    
    try:
        # Open and process the image
        img = Image.open(image_path)
        
        # Extract text using pytesseract
        text = pytesseract.image_to_string(img)
        
        # Clean up the text
        text = text.strip()
        
        if not text:
            return "No text could be extracted from the image. Please check the image quality."
        
        return text
    
    except FileNotFoundError:
        return f"Error: Image file not found at {image_path}"
    except Exception as e:
        return f"Error extracting text: {str(e)}"


# USER INPUT: Provide the path to your newspaper clipping image
# Example: image_path = r"C:\Users\HP\Desktop\newspaper_clipping.jpg"
image_path = input("Enter the path to your newspaper clipping image: ").strip('"').strip("'")

print("\nüì∏ Extracting text from image...")
extracted_text = extract_text_from_image(image_path)

print("\n" + "="*80)
print("EXTRACTED TEXT FROM NEWSPAPER CLIPPING")
print("="*80)
print(extracted_text)
print("="*80)

print("\n‚úÖ Text extraction complete")


üì∏ Extracting text from image...

EXTRACTED TEXT FROM NEWSPAPER CLIPPING
Mob loots police armoury in
Manipur; policeman killed

The Hindu Bureau
NEW DELHI

A police armoury was loot-
ed and a Manipur Rifles
policeman was killed in
Manipur on Thursday, a
police official said. Anoth-
er attempt to loot weapons
in Imphal city was thwart-
ed by the police.

The incidents come a
day ahead of a Supreme
Court hearing on the mat-
ter.

The first incident took
place in Bishnupur district
when a mob looted auto-
matic weapons from the
second Indian Reserve Bat-
talion (IRB) at Naransena.
The number of weapons
looted from the armoury is
not known.

Curfew relaxations
in Imphal East and
Imphal West were
withdrawn and
restrictions Imposed

Many people had gath-
ered around 12 km away to
protest a tribal group‚Äôs call
for mass burial of 35 Kuki-
Zo people who were killed
in the ethnic violence that
erupted in the State on
May 3.

The protesters led by
women tried to storm the
way to the area whe

## üîç Step 2: Extract Keywords for Search

Analyze the extracted text to identify key search terms and entities.

In [3]:
def extract_keywords(text):
    """
    Extract important keywords and entities from the text for searching.
    
    Args:
        text: Extracted text from newspaper clipping
    
    Returns:
        List of keywords
    """
    # Crime-related keywords to prioritize
    crime_keywords = [
        'fraud', 'scam', 'cybercrime', 'theft', 'robbery', 'murder', 'assault',
        'kidnapping', 'arrest', 'police', 'investigation', 'suspect', 'victim',
        'crime', 'criminal', 'gang', 'shooting', 'killed', 'attack', 'hacking',
        'phishing', 'ransomware', 'deepfake', 'smuggling', 'trafficking', 'drug',
        'corruption', 'bribery', 'extortion', 'weapon', 'bomb', 'terrorist'
    ]
    
    # Convert to lowercase for matching
    text_lower = text.lower()
    
    # Find crime keywords present in the text
    found_keywords = [kw for kw in crime_keywords if kw in text_lower]
    
    # Extract potential names (capitalized words)
    words = text.split()
    capitalized = [w.strip('.,!?;:"()[]') for w in words if w and w[0].isupper() and len(w) > 3]
    
    # Extract numbers (amounts, dates, etc.)
    numbers = re.findall(r'\d+', text)
    
    # Combine all keywords
    all_keywords = found_keywords + capitalized[:10]  # Limit capitalized words
    
    # Remove duplicates while preserving order
    seen = set()
    unique_keywords = []
    for kw in all_keywords:
        if kw.lower() not in seen:
            seen.add(kw.lower())
            unique_keywords.append(kw)
    
    return unique_keywords[:15]  # Return top 15 keywords


# Extract keywords from the newspaper clipping
print("üîç Extracting keywords for search...")
search_keywords = extract_keywords(extracted_text)

print(f"\nüìã Found {len(search_keywords)} search keywords:")
for idx, keyword in enumerate(search_keywords, 1):
    print(f"   {idx}. {keyword}")

print("\n‚úÖ Keyword extraction complete")

üîç Extracting keywords for search...

üìã Found 11 search keywords:
   1. police
   2. killed
   3. weapon
   4. Manipur
   5. Hindu
   6. Bureau
   7. DELHI
   8. Rifles
   9. Thursday
   10. Anoth-
   11. Imphal

‚úÖ Keyword extraction complete


## üì∞ Step 3: Search RSS Feeds for Related Articles

Search multiple news sources for articles related to the newspaper clipping.

In [4]:
def fetch_from_rss(rss_url, source_name):
    """
    Fetch articles from an RSS feed.
    
    Args:
        rss_url: URL of the RSS feed
        source_name: Name of the news source
    
    Returns:
        List of article dictionaries
    """
    try:
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
        }
        response = requests.get(rss_url, timeout=15, headers=headers)
        response.raise_for_status()
        
        soup = BeautifulSoup(response.content, 'xml')
        items = soup.find_all('item')
        
        articles = []
        for item in items[:30]:  # Get more articles for better matching
            title = item.title.text.strip() if item.title else ''
            description = item.description.text.strip() if item.description else ''
            link = item.link.text.strip() if item.link else ''
            pub_date = item.pubDate.text.strip() if item.pubDate else ''
            
            # Clean HTML tags from description if present
            if description:
                description = BeautifulSoup(description, 'html.parser').get_text()
            
            if title:  # Only add if we have at least a title
                articles.append({
                    'title': title,
                    'summary': description if description else title,
                    'source': source_name,
                    'url': link,
                    'pub_date': pub_date
                })
        
        return articles
    
    except Exception as e:
        print(f"   ‚úó {source_name}: Error - {str(e)[:50]}")
        return []


def search_related_articles(keywords):
    """
    Search RSS feeds for articles matching the keywords.
    
    Args:
        keywords: List of search keywords
    
    Returns:
        List of related articles
    """
    # Major Indian news sources with RSS feeds
    rss_feeds = [
        ('https://timesofindia.indiatimes.com/rssfeedstopstories.cms', 'Times of India'),
        ('https://www.thehindu.com/news/national/feeder/default.rss', 'The Hindu'),
        ('https://feeds.feedburner.com/ndtvnews-top-stories', 'NDTV'),
        ('https://www.indiatoday.in/rss/1206514', 'India Today'),
        ('https://www.hindustantimes.com/feeds/rss/india-news/rssfeed.xml', 'Hindustan Times'),
        ('https://indianexpress.com/feed/', 'Indian Express'),
    ]
    
    all_articles = []
    
    print("üì∞ Fetching articles from news sources...")
    
    for rss_url, source_name in rss_feeds:
        articles = fetch_from_rss(rss_url, source_name)
        all_articles.extend(articles)
        print(f"   ‚úì {source_name}: {len(articles)} articles fetched")
    
    print(f"\nüìä Total articles fetched: {len(all_articles)}")
    
    # Filter articles that match any of the keywords
    related_articles = []
    
    for article in all_articles:
        text = (article['title'] + ' ' + article['summary']).lower()
        
        # Check if any keyword appears in the article
        matched_keywords = []
        for keyword in keywords:
            if keyword.lower() in text:
                matched_keywords.append(keyword)
        
        if matched_keywords:
            article['matched_keywords'] = matched_keywords
            article['relevance_score'] = len(matched_keywords)
            related_articles.append(article)
    
    # Sort by relevance (number of matched keywords)
    related_articles.sort(key=lambda x: x['relevance_score'], reverse=True)
    
    return related_articles


# Search for related articles
print("\nüîç Searching for related articles...\n")
related_articles = search_related_articles(search_keywords)

print(f"\n‚úÖ Found {len(related_articles)} related articles")

# Display top matches
if related_articles:
    print("\nüìã Top 10 related articles:")
    for idx, article in enumerate(related_articles[:10], 1):
        print(f"\n   {idx}. {article['title'][:80]}...")
        print(f"      Source: {article['source']}")
        print(f"      Matched keywords: {', '.join(article['matched_keywords'][:5])}")
        print(f"      Relevance score: {article['relevance_score']}")
else:
    print("\n‚ö†Ô∏è No related articles found. Try adjusting the search keywords.")


üîç Searching for related articles...

üì∞ Fetching articles from news sources...
   ‚úì Times of India: 30 articles fetched
   ‚úì The Hindu: 30 articles fetched
   ‚úì NDTV: 20 articles fetched
   ‚úì India Today: 20 articles fetched
   ‚úì Hindustan Times: 30 articles fetched
   ‚úì Indian Express: 30 articles fetched

üìä Total articles fetched: 160

‚úÖ Found 21 related articles

üìã Top 10 related articles:

   1. Indian-origin techie from Karnataka shot dead in ‚Äòtargeted‚Äô attack outside Canad...
      Source: Times of India
      Matched keywords: police, killed
      Relevance score: 2

   2. Delhi pit death: Cops probe if contractors tried ‚Äòcovering up lapses‚Äô...
      Source: Times of India
      Matched keywords: police, DELHI
      Relevance score: 2

   3. No copies of Naravane‚Äôs memoir have gone into publication: Publisher...
      Source: The Hindu
      Matched keywords: police, DELHI
      Relevance score: 2

   4. Delhi biker who fell in Janakpuri pit l

## üîó Step 4: Extract Common Terms Across Sources

Identify frequently mentioned terms across all related articles.

In [5]:
def extract_common_terms(articles, top_n=20):
    """
    Extract common terms mentioned across multiple articles.
    
    Args:
        articles: List of article dictionaries
        top_n: Number of top terms to return
    
    Returns:
        List of (term, count) tuples
    """
    # Stop words to filter out
    stop_words = {
        'the', 'a', 'an', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for',
        'of', 'with', 'by', 'from', 'as', 'is', 'was', 'are', 'were', 'been',
        'be', 'have', 'has', 'had', 'do', 'does', 'did', 'will', 'would',
        'could', 'should', 'may', 'might', 'can', 'this', 'that', 'these',
        'those', 'i', 'you', 'he', 'she', 'it', 'we', 'they', 'what', 'which',
        'who', 'when', 'where', 'why', 'how', 'all', 'each', 'every', 'both',
        'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not',
        'only', 'own', 'same', 'so', 'than', 'too', 'very', 'said', 'after',
        'also', 'into', 'through', 'during', 'before', 'after', 'above',
        'below', 'between', 'under', 'again', 'further', 'then', 'once'
    }
    
    # Collect all words from articles
    all_words = []
    
    for article in articles:
        text = article['title'] + ' ' + article['summary']
        # Extract words (alphanumeric, length > 3)
        words = re.findall(r'\b[a-zA-Z]{4,}\b', text.lower())
        all_words.extend(words)
    
    # Filter out stop words
    filtered_words = [w for w in all_words if w not in stop_words]
    
    # Count occurrences
    word_counts = Counter(filtered_words)
    
    # Return top N terms
    return word_counts.most_common(top_n)


# Extract common terms
if related_articles:
    print("\nüîó Extracting common terms across all sources...")
    common_terms = extract_common_terms(related_articles, top_n=25)
    
    print("\nüìä Common terms mentioned across news sources:")
    print("\n" + "="*60)
    print(f"{'TERM':<25} {'FREQUENCY':<15} {'SOURCES'}")
    print("="*60)
    
    for term, count in common_terms:
        # Count how many different sources mention this term
        sources = set()
        for article in related_articles:
            text = (article['title'] + ' ' + article['summary']).lower()
            if term in text:
                sources.add(article['source'])
        
        print(f"{term:<25} {count:<15} {len(sources)} sources")
    
    print("="*60)
    print("\n‚úÖ Common terms extraction complete")
else:
    print("\n‚ö†Ô∏è No articles to analyze for common terms")
    common_terms = []


üîó Extracting common terms across all sources...

üìä Common terms mentioned across news sources:

TERM                      FREQUENCY       SOURCES
police                    15              5 sources
delhi                     15              4 sources
court                     5               2 sources
gold                      5               1 sources
year                      4               3 sources
kumar                     4               2 sources
death                     4               1 sources
over                      4               4 sources
died                      4               3 sources
case                      4               3 sources
book                      4               2 sources
killed                    3               2 sources
years                     3               2 sources
digital                   3               2 sources
survivor                  3               1 sources
sengar                    3               2 sources
bail           

## ü§ñ Step 5: Generate Comprehensive Analysis with LLM

Use the LLM to create an unbiased summary and extract investigative clues.

In [1]:
def analyze_for_investigation(original_text, articles, common_terms):
    """
    Use LLM to generate comprehensive analysis and investigative clues.
    
    Args:
        original_text: Text extracted from newspaper clipping
        articles: Related articles from multiple sources
        common_terms: Common terms across sources
    
    Returns:
        Dictionary with analysis results
    """
    if not LLM_AVAILABLE:
        return {
            'full_analysis': 'LLM not available for analysis',
            'status': 'error'
        }
    
    # Prepare article summaries for context (limit to top 5 for faster processing)
    articles_context = "\n\n".join([
        f"Source: {a['source']}\nTitle: {a['title']}\nContent: {a['summary'][:200]}..."
        for a in articles[:5]  # Reduced from 10 to 5 for faster processing
    ])
    
    # Prepare common terms context
    terms_context = ", ".join([term for term, _ in common_terms[:10]])
    
    # Create comprehensive prompt
    prompt = f"""
You are a senior crime intelligence analyst assisting police investigation.

ORIGINAL NEWSPAPER CLIPPING TEXT:
{original_text[:500]}

RELATED ARTICLES FROM MULTIPLE NEWS SOURCES:
{articles_context}

COMMON TERMS ACROSS ALL SOURCES:
{terms_context}

Please provide a comprehensive analysis with the following sections:

1. UNBIASED SUMMARY: Combine information from all sources into a factual, unbiased summary (3-4 sentences). Avoid speculation.

2. KEY ENTITIES: List all important entities mentioned:
   - Suspects/Accused
   - Victims
   - Locations
   - Organizations
   - Amounts/Items involved

3. TIMELINE: Reconstruct the sequence of events based on available information.

4. INVESTIGATIVE CLUES: Identify specific clues that could help police investigation:
   - Potential leads to follow
   - Patterns or connections
   - Digital evidence mentioned
   - Witnesses or informants
   - Modus operandi

5. INVESTIGATION RECOMMENDATIONS: Suggest specific actions for police:
   - Priority investigation areas
   - Evidence to collect
   - Experts to consult
   - Cross-referencing with other cases

Be specific, factual, and actionable. Focus on information that helps law enforcement.
"""
    
    try:
        print("   ü§ñ Analyzing with Llama 3.2 (this may take 30-60 seconds)...")
        
        response = ollama.chat(
            model='llama3.2:latest',
            messages=[
                {
                    'role': 'system',
                    'content': 'You are a senior crime intelligence analyst. Provide factual, unbiased analysis to assist police investigations.'
                },
                {
                    'role': 'user',
                    'content': prompt
                }
            ],
            options={
                'temperature': 0.3,  # Lower temperature for factual analysis
                'num_predict': 800   # Reduced from 1000 to 800 for faster response
            }
        )
        
        analysis = response['message']['content'].strip()
        
        return {
            'full_analysis': analysis,
            'status': 'success'
        }
    
    except Exception as e:
        print(f"   ‚ö†Ô∏è LLM analysis error: {str(e)[:100]}")
        return {
            'full_analysis': f"Error during analysis: {str(e)}",
            'status': 'error'
        }


# Generate comprehensive analysis
if related_articles:
    print("\nü§ñ Generating comprehensive analysis and investigative clues...\n")
    analysis_result = analyze_for_investigation(extracted_text, related_articles, common_terms)
    
    print("\n‚úÖ Analysis complete")
else:
    print("\n‚ö†Ô∏è No articles available for analysis")
    analysis_result = None

NameError: name 'related_articles' is not defined

## üìã Step 6: Generate Final Investigation Report

Compile all findings into a comprehensive report for law enforcement.

In [None]:
def generate_investigation_report(original_text, articles, common_terms, analysis):
    """
    Generate a comprehensive investigation report.
    
    Args:
        original_text: Text from newspaper clipping
        articles: Related articles
        common_terms: Common terms across sources
        analysis: LLM analysis result
    
    Returns:
        Formatted report string
    """
    report_date = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    
    report = f"""
{'='*80}
    CRIME INTELLIGENCE INVESTIGATION REPORT
    Generated: {report_date}
    Analysis Type: Multi-Source Newspaper Analysis
    AI Model: Llama 3.2 (Local)
{'='*80}

SECTION 1: ORIGINAL NEWSPAPER CLIPPING TEXT
{'‚îÄ'*80}
{original_text}
{'‚îÄ'*80}

SECTION 2: RELATED ARTICLES FOUND
{'‚îÄ'*80}
Total articles found: {len(articles)}
News sources: {len(set(a['source'] for a in articles))} different sources

Top 10 Most Relevant Articles:

"""
    
    # Add top articles
    for idx, article in enumerate(articles[:10], 1):
        report += f"""
{idx}. {article['title']}
   Source: {article['source']}
   URL: {article.get('url', 'N/A')}
   Relevance Score: {article.get('relevance_score', 0)}
   Matched Keywords: {', '.join(article.get('matched_keywords', [])[:5])}

"""
    
    # Add common terms
    report += f"""
{'‚îÄ'*80}

SECTION 3: COMMON TERMS ACROSS ALL SOURCES
{'‚îÄ'*80}
These terms appear frequently across multiple news sources:

"""
    
    for term, count in common_terms[:15]:
        # Count sources
        sources = set()
        for article in articles:
            text = (article['title'] + ' ' + article['summary']).lower()
            if term in text:
                sources.add(article['source'])
        
        report += f"  ‚Ä¢ {term.upper()}: mentioned {count} times across {len(sources)} sources\n"
    
    # Add LLM analysis
    report += f"""

{'‚îÄ'*80}

SECTION 4: COMPREHENSIVE ANALYSIS & INVESTIGATIVE CLUES
{'‚îÄ'*80}
"""
    
    if analysis and analysis.get('status') == 'success':
        report += analysis['full_analysis']
    else:
        report += "LLM analysis not available.\n"
    
    # Add footer
    report += f"""

{'='*80}

REPORT NOTES:
  ‚Ä¢ This report combines information from {len(articles)} articles across {len(set(a['source'] for a in articles))} news sources
  ‚Ä¢ Analysis performed using local Llama 3.2 model for unbiased intelligence
  ‚Ä¢ All information should be verified through official investigation channels
  ‚Ä¢ For urgent matters, contact the Crime Coordination Center
  ‚Ä¢ Report generated at: {report_date}

{'='*80}
End of Investigation Report
{'='*80}
"""
    
    return report


# Generate and display final report
if related_articles and analysis_result:
    print("\nüìã Generating final investigation report...\n")
    
    final_report = generate_investigation_report(
        extracted_text,
        related_articles,
        common_terms,
        analysis_result
    )
    
    print(final_report)
    
    # Save to file
    report_filename = f"investigation_report_{datetime.now().strftime('%Y%m%d_%H%M%S')}.txt"
    with open(report_filename, 'w', encoding='utf-8') as f:
        f.write(final_report)
    
    print(f"\nüíæ Report saved to: {report_filename}")
    print("\n‚úÖ Investigation report generation complete")
    print("\nüéâ ANALYSIS COMPLETE!")
else:
    print("\n‚ö†Ô∏è Cannot generate report - insufficient data")

## üöÄ Summary

This notebook successfully:

1. ‚úÖ Extracted text from newspaper clipping using OCR
2. ‚úÖ Identified key search terms and entities
3. ‚úÖ Searched multiple RSS feeds for related articles
4. ‚úÖ Found common terms across all news sources
5. ‚úÖ Generated comprehensive unbiased summary
6. ‚úÖ Provided investigative clues for police
7. ‚úÖ Created detailed investigation report

### Next Steps for Law Enforcement:

- Review the investigative clues section carefully
- Cross-reference with existing case databases
- Follow up on leads identified in the analysis
- Verify all information through official channels
- Use common terms to identify patterns across cases

---

**Note**: This is an AI-assisted analysis tool. All findings should be verified through proper investigation procedures.