# REE Patent Citation Analysis for PATLIB Community

## üéØ Executive Summary

This notebook demonstrates **AI-enhanced patent analysis** using the EPO Technology Intelligence Platform (TIP) for **Rare Earth Elements (REE) technology intelligence**. 

**Target Audience**: Patent Information Experts at German and European PATLIBs  
**Business Goal**: Transform static patent searches into **strategic business intelligence**  
**Value Proposition**: Cost-effective alternative to commercial patent analytics platforms

### Key Deliverables
- ‚úÖ **High-quality REE patent dataset** (keyword + classification intersection)
- ‚úÖ **Forward & backward citation analysis** with quality scoring
- ‚úÖ **Geographic intelligence** showing technology transfer patterns
- ‚úÖ **Interactive visualizations** for stakeholder presentations
- ‚úÖ **Professional exports** (Excel, CSV, JSON) for further analysis

---

## üìö 1. Setup & Dependencies

Import our tested components and essential libraries for patent analysis.

In [None]:
# Core libraries
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.offline as pyo
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Set up Plotly for Jupyter
pyo.init_notebook_mode(connected=True)

print("üìö Libraries imported successfully!")
print(f"üïê Analysis started at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

## üîó 2. Database Connection & Verification

Connect to the EPO Technology Intelligence Platform (TIP) using our tested connection pattern.

In [None]:
# Import our tested database connection component
from database_connection import get_database_connection

# Establish connection to PATSTAT PROD environment
print("üîó Connecting to EPO TIP Platform...")
db = get_database_connection()

if db:
    print("‚úÖ Connection successful! Ready for patent analysis.")
else:
    print("‚ùå Connection failed. Please check your TIP access credentials.")
    raise Exception("Database connection required for analysis")

## üîç 3. REE Patent Dataset Construction

Build a high-quality REE patent dataset using our **dual-strategy approach**:
1. **Keyword-based identification** (title + abstract search)
2. **Classification-based filtering** (CPC codes)
3. **Quality intersection** for maximum relevance

In [None]:
# Import our tested REE dataset builder
from ree_dataset_builder import build_ree_dataset, validate_ree_dataset

print("üîç Building REE patent dataset...")
print("üìã Search Strategy:")
print("   ‚Ä¢ Keywords: rare earth, neodymium, dysprosium, lanthanide, REE recovery")
print("   ‚Ä¢ CPC Codes: C22B7/19/25, Y02W30, Y02P10, H01M, C09K11")
print("   ‚Ä¢ Date Range: 2014-2024 (10-year window)")
print("   ‚Ä¢ Quality Filter: Keyword + Classification intersection")

# Build dataset (set test_mode=False for full analysis)
ree_dataset = build_ree_dataset(db, test_mode=True)  # Change to False for production

if not ree_dataset.empty:
    validated_dataset = validate_ree_dataset(ree_dataset)
    print(f"\n‚úÖ REE dataset ready for analysis!")
else:
    print("‚ùå No REE patents found. Check search criteria.")
    raise Exception("REE dataset required for analysis")

## üìä 4. Citation Network Analysis

Analyze **forward and backward citations** to understand:
- **Technology impact** (who cites REE patents?)
- **Knowledge foundation** (what REE patents cite)
- **Innovation networks** (citation quality and patterns)

In [None]:
# Import our tested citation analyzer
from citation_analyzer import (
    get_forward_citations, get_backward_citations, 
    get_family_level_citations, analyze_citation_patterns
)

print("üìä Analyzing citation networks...")

# Extract IDs for citation analysis
appln_ids = validated_dataset['appln_id'].tolist()
family_ids = validated_dataset['docdb_family_id'].dropna().tolist()

print(f"   ‚Ä¢ Processing {len(appln_ids):,} applications")
print(f"   ‚Ä¢ Processing {len(family_ids):,} patent families")

# Perform citation analysis
forward_citations = get_forward_citations(db, appln_ids, test_mode=True)
backward_citations = get_backward_citations(db, appln_ids, test_mode=True)
family_citations = get_family_level_citations(db, family_ids, test_mode=True)

# Analyze patterns
citation_insights = analyze_citation_patterns(forward_citations, backward_citations)

print("\n‚úÖ Citation analysis completed!")

## üåç 5. Geographic Intelligence

Add **country-level analysis** to understand:
- **Global innovation hubs** for REE technology
- **Technology transfer patterns** between countries
- **Competitive landscape** and market opportunities

In [None]:
# Import our tested geographic enricher
from geographic_enricher import (
    enrich_with_geographic_data, analyze_country_citations, 
    create_country_summary, analyze_geographic_trends, get_country_coordinates
)

print("üåç Enriching with geographic intelligence...")

# Add geographic data to REE dataset
enriched_ree_dataset = enrich_with_geographic_data(db, validated_dataset)

# Analyze country-level citation patterns
citation_flows, top_citing_countries = analyze_country_citations(forward_citations, backward_citations)

# Create country summary
country_summary = create_country_summary(enriched_ree_dataset)

# Analyze geographic trends over time
geographic_trends = analyze_geographic_trends(enriched_ree_dataset)

# Get coordinates for mapping
country_coords = get_country_coordinates()

print("\n‚úÖ Geographic analysis completed!")

## ‚úÖ 6. Data Validation & Quality Assessment

Comprehensive **quality checks** to ensure reliable business intelligence.

In [None]:
# Import our tested data validator
from data_validator import (
    validate_dataset_quality, generate_summary_report, 
    check_data_consistency, export_validation_report
)

print("‚úÖ Validating data quality...")

# Comprehensive quality assessment
quality_metrics = validate_dataset_quality(
    enriched_ree_dataset, forward_citations, backward_citations
)

# Generate business summary
summary_report = generate_summary_report(
    enriched_ree_dataset, forward_citations, backward_citations, quality_metrics
)

# Check data consistency
consistency_issues = check_data_consistency(
    enriched_ree_dataset, forward_citations, backward_citations
)

print("\nüéØ Ready for visualization and business intelligence!")

## üìà 7. Executive Dashboard

**Interactive dashboard** with key metrics for stakeholder presentations.

In [None]:
# Create executive dashboard with key metrics
fig_dashboard = make_subplots(
    rows=2, cols=2,
    subplot_titles=(
        'REE Patent Filings Over Time',
        'Top Countries by Patent Count',
        'Citation Impact Analysis',
        'Technology Distribution'
    ),
    specs=[[{"secondary_y": False}, {"type": "bar"}],
           [{"type": "scatter"}, {"type": "pie"}]]
)

# 1. Timeline of REE patent filings
if 'appln_filing_year' in enriched_ree_dataset.columns:
    yearly_data = enriched_ree_dataset['appln_filing_year'].value_counts().sort_index()
    fig_dashboard.add_trace(
        go.Scatter(
            x=yearly_data.index, 
            y=yearly_data.values,
            mode='lines+markers',
            name='Patent Filings',
            line=dict(color='#1f77b4', width=3)
        ),
        row=1, col=1
    )

# 2. Top countries by patent count
if not country_summary.empty:
    top_countries = country_summary.head(8)
    fig_dashboard.add_trace(
        go.Bar(
            x=top_countries['total_applications'],
            y=top_countries.index,
            orientation='h',
            name='Patent Count',
            marker_color='#ff7f0e'
        ),
        row=1, col=2
    )

# 3. Citation impact (if we have citation data)
if not forward_citations.empty and 'citing_country' in forward_citations.columns:
    citation_summary = forward_citations['citing_country'].value_counts().head(10)
    fig_dashboard.add_trace(
        go.Bar(
            x=citation_summary.index,
            y=citation_summary.values,
            name='Citations Received',
            marker_color='#2ca02c'
        ),
        row=2, col=1
    )

# 4. Filing authority distribution
if 'appln_auth' in enriched_ree_dataset.columns:
    auth_data = enriched_ree_dataset['appln_auth'].value_counts().head(6)
    fig_dashboard.add_trace(
        go.Pie(
            labels=auth_data.index,
            values=auth_data.values,
            name="Filing Authority"
        ),
        row=2, col=2
    )

# Update layout
fig_dashboard.update_layout(
    height=800,
    title_text="REE Patent Intelligence Dashboard - Executive Summary",
    title_x=0.5,
    showlegend=False,
    font=dict(size=12)
)

fig_dashboard.show()

print("üìä Executive dashboard created!")

## üó∫Ô∏è 8. Global Patent Activity Map

**Interactive world map** showing REE patent activity by country.

In [None]:
# Create global patent activity map
if not country_summary.empty:
    # Prepare data for mapping
    map_data = country_summary.reset_index()
    map_data['country_name'] = map_data.index
    
    # Create choropleth map
    fig_map = px.choropleth(
        map_data,
        locations='country_name',
        locationmode='country names',
        color='total_applications',
        hover_name='country_name',
        hover_data={
            'total_applications': ':,',
            'unique_families': ':,',
            'first_year': True,
            'last_year': True
        },
        color_continuous_scale='Viridis',
        title='Global REE Patent Activity (2014-2024)'
    )
    
    fig_map.update_layout(
        title_x=0.5,
        geo=dict(
            showframe=False,
            showcoastlines=True,
            projection_type='natural earth'
        ),
        font=dict(size=12)
    )
    
    fig_map.show()
    print("üó∫Ô∏è Global patent activity map created!")
else:
    print("‚ö†Ô∏è No country data available for mapping")

## üîó 9. Citation Network Visualization

**Interactive network graph** showing citation relationships between countries.

In [None]:
# Create citation network visualization
if not citation_flows.empty:
    # Create network-style visualization using scatter plot
    fig_network = px.scatter(
        citation_flows,
        x='citing_patents',
        y='cited_ree_patents',
        size='citing_patents',
        hover_name='country',
        title='REE Patent Citation Network - Country Analysis',
        labels={
            'citing_patents': 'Number of Citing Patents',
            'cited_ree_patents': 'Number of REE Patents Cited'
        },
        color='citing_patents',
        color_continuous_scale='Plasma'
    )
    
    fig_network.update_layout(
        title_x=0.5,
        height=600,
        font=dict(size=12)
    )
    
    fig_network.show()
    print("üîó Citation network visualization created!")
else:
    print("‚ö†Ô∏è No citation flow data available for network visualization")

# Alternative: Simple citation summary chart
if not forward_citations.empty and 'citing_country' in forward_citations.columns:
    citation_summary = forward_citations['citing_country'].value_counts().head(10)
    
    fig_citations = px.bar(
        x=citation_summary.values,
        y=citation_summary.index,
        orientation='h',
        title='Top Countries Citing REE Patents',
        labels={'x': 'Number of Citations', 'y': 'Country'},
        color=citation_summary.values,
        color_continuous_scale='Viridis'
    )
    
    fig_citations.update_layout(
        title_x=0.5,
        height=500,
        font=dict(size=12)
    )
    
    fig_citations.show()
    print("üìä Citation summary chart created!")

## üìã 10. Business Intelligence Summary

**Executive summary** with key findings and strategic recommendations.

In [None]:
# Display comprehensive business summary
print("\n" + "="*80)
print("REE PATENT LANDSCAPE - STRATEGIC BUSINESS INTELLIGENCE")
print("="*80)

# Key metrics
total_apps = len(enriched_ree_dataset)
total_families = enriched_ree_dataset['docdb_family_id'].nunique()
total_countries = enriched_ree_dataset['appln_auth'].nunique() if 'appln_auth' in enriched_ree_dataset.columns else 0
forward_cites = len(forward_citations)
backward_cites = len(backward_citations)

print(f"\nüìä PORTFOLIO OVERVIEW:")
print(f"   ‚Ä¢ Total REE Applications: {total_apps:,}")
print(f"   ‚Ä¢ Unique Patent Families: {total_families:,}")
print(f"   ‚Ä¢ Filing Jurisdictions: {total_countries}")
print(f"   ‚Ä¢ Forward Citations: {forward_cites:,}")
print(f"   ‚Ä¢ Backward Citations: {backward_cites:,}")

# Geographic insights
if not country_summary.empty:
    top_3_countries = country_summary.head(3)
    print(f"\nüåç GEOGRAPHIC LEADERS:")
    for i, (country, data) in enumerate(top_3_countries.iterrows(), 1):
        print(f"   {i}. {country}: {data['total_applications']:,} applications ({data['unique_families']:,} families)")

# Citation insights
if citation_insights:
    print(f"\nüìà CITATION IMPACT:")
    if 'forward_citations_total' in citation_insights:
        print(f"   ‚Ä¢ Patents receiving citations: {citation_insights.get('unique_cited_ree_patents', 'N/A')}")
        print(f"   ‚Ä¢ Total forward citations: {citation_insights['forward_citations_total']:,}")
    if 'top_citing_countries' in citation_insights:
        top_citing = list(citation_insights['top_citing_countries'].keys())[:3]
        print(f"   ‚Ä¢ Top citing countries: {', '.join(top_citing)}")

# Temporal insights
if 'appln_filing_year' in enriched_ree_dataset.columns:
    recent_years = enriched_ree_dataset[enriched_ree_dataset['appln_filing_year'] >= 2020]
    recent_count = len(recent_years)
    recent_pct = (recent_count / total_apps * 100) if total_apps > 0 else 0
    
    print(f"\nüìÖ TEMPORAL TRENDS:")
    print(f"   ‚Ä¢ Recent activity (2020-2024): {recent_count:,} applications ({recent_pct:.1f}%)")
    
    yearly_data = enriched_ree_dataset['appln_filing_year'].value_counts().sort_index()
    peak_year = yearly_data.idxmax()
    peak_count = yearly_data.max()
    print(f"   ‚Ä¢ Peak filing year: {peak_year} ({peak_count:,} applications)")

print(f"\nüíº BUSINESS RECOMMENDATIONS:")
print(f"   ‚Ä¢ Monitor top 3 countries for competitive intelligence")
print(f"   ‚Ä¢ Analyze citation patterns for technology transfer opportunities")
print(f"   ‚Ä¢ Focus on recent filings for emerging technology trends")
print(f"   ‚Ä¢ Leverage family-level analysis for freedom-to-operate studies")

print(f"\nüéØ VALUE DELIVERED:")
print(f"   ‚Ä¢ Professional patent landscape analysis")
print(f"   ‚Ä¢ Cost-effective alternative to commercial tools")
print(f"   ‚Ä¢ Real-time data from authoritative PATSTAT database")
print(f"   ‚Ä¢ Customizable analysis for specific business needs")

print("="*80)

## üíæ 11. Export Results for Business Use

Export analysis results in **multiple formats** for stakeholder sharing and further analysis.

In [None]:
# Import integrated pipeline for exports
from integrated_pipeline import export_pipeline_results, create_visualization_data
import os

# Prepare results dictionary
analysis_results = {
    'enriched_ree_dataset': enriched_ree_dataset,
    'forward_citations': forward_citations,
    'backward_citations': backward_citations,
    'family_citations': family_citations,
    'citation_flows': citation_flows,
    'country_summary': country_summary,
    'summary_report': summary_report,
    'quality_metrics': quality_metrics
}

print("üíæ Exporting analysis results...")

# Create exports directory
export_dir = "./exports/"
os.makedirs(export_dir, exist_ok=True)

# Export results
try:
    exported_files = export_pipeline_results(analysis_results, export_dir)
    
    # Create visualization data
    viz_data = create_visualization_data(analysis_results)
    
    print(f"\n‚úÖ Export completed successfully!")
    print(f"üìÅ Files created: {len(exported_files)}")
    print(f"üìä Visualization datasets: {len(viz_data)}")
    
    # List exported files
    if exported_files:
        print(f"\nüìã Exported files:")
        for file in exported_files:
            file_size = os.path.getsize(file) / 1024  # KB
            print(f"   ‚Ä¢ {os.path.basename(file)} ({file_size:.1f} KB)")
    
except Exception as e:
    print(f"‚ùå Export failed: {e}")
    print("Manual export available - use pandas .to_csv() methods")

## üéì 12. Methodology & Technical Notes

### Data Sources & Quality
- **Database**: EPO PATSTAT (Production environment)
- **Coverage**: Global patent applications (2014-2024)
- **Quality Control**: Keyword + Classification intersection
- **Citation Source**: Search report citations (TLS212_CITATION)

### Search Strategy
**Keywords**: rare earth, neodymium, dysprosium, terbium, europium, yttrium, lanthanide, REE recovery, REE recycling  
**CPC Classifications**: C22B7/19/25 (extraction), Y02W30 (recycling), Y02P10 (processing), H01M (batteries), C09K11 (phosphors)

### Technical Implementation
- **Approach**: Modular Python components with direct SQL queries
- **Database**: Direct PATSTAT access (avoiding ORM complications)
- **Visualization**: Plotly for interactive charts
- **Exports**: Multiple formats (CSV, JSON, Excel-compatible)

### Business Value
- **Cost Savings**: Free alternative to commercial patent analytics (‚Ç¨10,000+ annually)
- **Customization**: Adaptable to any technology domain
- **Real-time**: Direct access to authoritative PATSTAT data
- **Professional**: Publication-ready visualizations and reports

---

## üéØ Conclusion

This notebook demonstrates how **Claude Code + EPO TIP** can transform patent information work from static searches to **strategic business intelligence**. 

**Next Steps for PATLIB Professionals**:
1. **Adapt methodology** to other technology domains
2. **Integrate with local workflows** and client services
3. **Scale analysis** for comprehensive landscape studies
4. **Develop consulting services** using this foundation

**Contact**: This analysis demonstrates capabilities available to the PATLIB community for enhancing patent information services.

---
*Generated with Claude Code for EPO PATLIB 2025 - Showcasing AI-enhanced patent analytics*