# 🧬 CenanoInk Nanomaterials Database & Pattern Recognition

## 🎯 Purpose
This notebook contains the comprehensive nanomaterials database and pattern recognition system for the CenanoInk project. It handles:

- **40+ nanomaterial categories** with specialized patterns
- **Regex-based extraction** for rapid pattern matching
- **Specialized keywords** for coatings and paints
- **Relevance scoring** based on content analysis
- **Category classification** for nanomaterials

## 🔬 Nanomaterials Coverage
- Metal oxides (TiO2, ZnO, SiO2, etc.)
- Carbon-based materials (graphene, CNTs, etc.)
- Metal nanoparticles (Ag, Au, Cu, etc.)
- Polymeric materials
- Composite materials
- And many more specialized categories

## 🎨 Paint & Coating Specialization
- Types of coatings and paints
- Functional properties
- Application techniques
- Industrial applications
- Characterization methods

## 📝 Note
This notebook now loads the pre-existing nanomaterials database from the JSON file instead of recreating it.

In [1]:
# 📦 DEPENDENCIES AND IMPORTS
import pandas as pd
import numpy as np
import re
import json
from pathlib import Path
from typing import Dict, List, Set, Optional, Tuple, Any
from collections import defaultdict
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

print("🧬 CenanoInk Nanomaterials Database & Pattern Recognition")
print("=" * 60)
print("🔬 Comprehensive nanomaterials pattern recognition system")
print("🎨 Specialized for coatings, paints, and nanocomposites")
print("⚡ High-performance regex-based extraction")

🧬 CenanoInk Nanomaterials Database & Pattern Recognition
🔬 Comprehensive nanomaterials pattern recognition system
🎨 Specialized for coatings, paints, and nanocomposites
⚡ High-performance regex-based extraction


In [2]:
# 🔧 Carregar nanomateriais e palavras-chave do config_sistema.json
from pathlib import Path
import json

config_path = Path.cwd() / 'config_sistema.json'  # Corrected path
with open(config_path, 'r', encoding='utf-8') as f:
    config_data = json.load(f)

NANOMATERIALS_DATABASE = config_data.get('nanomateriais', {})
COATING_PAINT_KEYWORDS = config_data.get('palavras_chave', {})
print(f"🔍 Loaded {len(NANOMATERIALS_DATABASE)} nanomaterial categories from config")
print(f"🔍 Loaded {len(COATING_PAINT_KEYWORDS)} keyword categories from config")

# Contagem correta de categorias e materiais
total_categories = len(NANOMATERIALS_DATABASE)  
total_materials = sum(len(materials) for materials in NANOMATERIALS_DATABASE.values())  
print(f"✅ Loaded {total_categories} nanomaterial categories with {total_materials} total materials/patterns from config")  

# Exibir detalhes por categoria
print("\n📋 Categories and pattern counts:")
for category, materials in NANOMATERIALS_DATABASE.items():
    print(f"  🧪 {category}: {len(materials)} patterns")
    
print(f"\n🧬 Nanomaterials Database Loaded")
print(f"📊 Categories: {len(NANOMATERIALS_DATABASE)}")
total_materials = sum(len(materials) for materials in NANOMATERIALS_DATABASE.values())
print(f"🔬 Total materials/patterns: {total_materials}")

# Display category overview
print("\n📋 Categories Overview:")
for category, materials in NANOMATERIALS_DATABASE.items():
    print(f"  🧪 {category}: {len(materials)} materials")

🔍 Loaded 6 nanomaterial categories from config
🔍 Loaded 4 keyword categories from config
✅ Loaded 6 nanomaterial categories with 64 total materials/patterns from config

📋 Categories and pattern counts:
  🧪 oxidos_metalicos: 14 patterns
  🧪 carbono: 13 patterns
  🧪 metais_nobres: 12 patterns
  🧪 polimeros: 12 patterns
  🧪 ceramicos: 7 patterns
  🧪 compositos: 6 patterns

🧬 Nanomaterials Database Loaded
📊 Categories: 6
🔬 Total materials/patterns: 64

📋 Categories Overview:
  🧪 oxidos_metalicos: 14 materials
  🧪 carbono: 13 materials
  🧪 metais_nobres: 12 materials
  🧪 polimeros: 12 materials
  🧪 ceramicos: 7 materials
  🧪 compositos: 6 materials


In [3]:
# 🎨 LOADING SPECIALIZED KEYWORDS FOR COATINGS AND PAINTS
# Comprehensive database optimized for CenanoInk project focus

# Ensure we're loading all keywords from the correct JSON file
COATING_PAINT_KEYWORDS = {}
if 'keywords' in db_content:
    # Explicitly load all keyword categories
    COATING_PAINT_KEYWORDS = db_content['keywords']
    print(f"✅ Successfully loaded coating & paint keywords")
    
    # Verify we're getting all the keyword categories
    print(f"   Verification: JSON has {len(db_content['keywords'])} keyword categories")
    json_total_keywords = sum(len(keywords) for keywords in db_content['keywords'].values())
    print(f"   Verification: JSON contains {json_total_keywords} total keywords")
else:
    print(f"⚠️ Warning: 'keywords' key not found in the JSON file")

print("\n🎨 Coating & Paint Keywords Database Loaded")
total_keywords = sum(len(keywords) for keywords in COATING_PAINT_KEYWORDS.values())
print(f"📊 Categories: {len(COATING_PAINT_KEYWORDS)}")
print(f"🔍 Total keywords: {total_keywords}")

# Display keyword categories
print("\n📋 Keyword Categories:")
for category, keywords in COATING_PAINT_KEYWORDS.items():
    print(f"  🎯 {category}: {len(keywords)} keywords")

NameError: name 'db_content' is not defined

In [None]:
# 🔍 REGEX PATTERN GENERATION
# High-performance pattern compilation for fast text analysis

def create_nanomaterial_patterns() -> Dict[str, re.Pattern]:
    """
    Create compiled regex patterns for nanomaterials detection
    """
    patterns = {}
    
    for category, materials in NANOMATERIALS_DATABASE.items():
        if materials:
            # Escape special regex characters and create pattern
            escaped_materials = [re.escape(material) for material in materials]
            pattern_str = r'\b(' + '|'.join(escaped_materials) + r')\b'
            patterns[category] = re.compile(pattern_str, re.IGNORECASE)
    
    return patterns

def create_keyword_patterns() -> Dict[str, re.Pattern]:
    """
    Create compiled regex patterns for coating/paint keywords
    """
    patterns = {}
    
    for category, keywords in COATING_PAINT_KEYWORDS.items():
        if keywords:
            # Escape special regex characters and create pattern
            escaped_keywords = [re.escape(keyword) for keyword in keywords]
            pattern_str = r'\b(' + '|'.join(escaped_keywords) + r')\b'
            patterns[category] = re.compile(pattern_str, re.IGNORECASE)
    
    return patterns

# Compile all patterns
print("\n🔍 Compiling Regex Patterns...")
NANOMATERIAL_PATTERNS = create_nanomaterial_patterns()
KEYWORD_PATTERNS = create_keyword_patterns()

print(f"✅ Nanomaterial patterns compiled: {len(NANOMATERIAL_PATTERNS)}")
print(f"✅ Keyword patterns compiled: {len(KEYWORD_PATTERNS)}")
print(f"⚡ Ready for high-speed pattern matching!")

# Test pattern compilation
test_text = "TiO2 nanoparticles were used to create an antimicrobial coating with self-cleaning properties."
print(f"\n🧪 Pattern Test with: '{test_text}'")

# Test nanomaterial detection
for category, pattern in NANOMATERIAL_PATTERNS.items():
    matches = pattern.findall(test_text)
    if matches:
        print(f"  🧬 {category}: {matches}")

# Test keyword detection
for category, pattern in KEYWORD_PATTERNS.items():
    matches = pattern.findall(test_text)
    if matches:
        print(f"  🎨 {category}: {matches}")


🔍 Compiling Regex Patterns...
✅ Nanomaterial patterns compiled: 6
✅ Keyword patterns compiled: 6
⚡ Ready for high-speed pattern matching!

🧪 Pattern Test with: 'TiO2 nanoparticles were used to create an antimicrobial coating with self-cleaning properties.'
  🧬 oxidos_metalicos: ['TiO2']
  🎨 coating_types: ['coating']
  🎨 functional_properties: ['antimicrobial', 'self-cleaning']


In [None]:
# 🧪 NANOMATERIALS EXTRACTION FUNCTIONS

def extract_nanomaterials_from_text(text: str, 
                                   patterns: Dict[str, re.Pattern]) -> Dict[str, Set[str]]:
    """
    Extract nanomaterials from text using compiled patterns
    
    Args:
        text: Input text to analyze
        patterns: Compiled regex patterns
    
    Returns:
        Dictionary with categories as keys and sets of found materials as values
    """
    if pd.isna(text) or not str(text).strip():
        return {category: set() for category in patterns.keys()}
    
    text_str = str(text)
    results = {}
    
    for category, pattern in patterns.items():
        matches = pattern.findall(text_str)
        # Handle tuple matches (from grouped patterns)
        normalized_matches = set()
        for match in matches:
            if isinstance(match, tuple):
                match = match[0]  # Take first group
            normalized_matches.add(match.strip())
        
        results[category] = normalized_matches
    
    return results

def extract_keywords_from_text(text: str, 
                              patterns: Dict[str, re.Pattern]) -> Dict[str, Set[str]]:
    """
    Extract coating/paint keywords from text using compiled patterns
    """
    if pd.isna(text) or not str(text).strip():
        return {category: set() for category in patterns.keys()}
    
    text_str = str(text)
    results = {}
    
    for category, pattern in patterns.items():
        matches = pattern.findall(text_str)
        normalized_matches = set()
        for match in matches:
            if isinstance(match, tuple):
                match = match[0]
            normalized_matches.add(match.strip())
        
        results[category] = normalized_matches
    
    return results

def format_material_results(materials_dict: Dict[str, Set[str]]) -> str:
    """
    Format nanomaterials extraction results for display
    """
    all_materials = []
    for category, materials in materials_dict.items():
        for material in materials:
            all_materials.append(f"{material} ({category})")
    
    return '; '.join(all_materials) if all_materials else 'None detected'

def format_keyword_results(keywords_dict: Dict[str, Set[str]]) -> str:
    """
    Format keyword extraction results for display
    """
    all_keywords = []
    for category, keywords in keywords_dict.items():
        all_keywords.extend(list(keywords))
    
    return '; '.join(sorted(set(all_keywords))) if all_keywords else 'None detected'

print("🧪 Nanomaterials Extraction Functions Ready")
print("✅ extract_nanomaterials_from_text()")
print("✅ extract_keywords_from_text()")
print("✅ format_material_results()")
print("✅ format_keyword_results()")

🧪 Nanomaterials Extraction Functions Ready
✅ extract_nanomaterials_from_text()
✅ extract_keywords_from_text()
✅ format_material_results()
✅ format_keyword_results()


In [None]:
# 📊 RELEVANCE SCORING SYSTEM

def calculate_relevance_score(nanomaterials: Dict[str, Set[str]], 
                            keywords: Dict[str, Set[str]]) -> float:
    """
    Calculate relevance score based on nanomaterials and keywords found
    
    Scoring system:
    - Each nanomaterial: 2 points
    - Keywords weighted by category importance
    - Maximum score: 10.0
    """
    # Points for nanomaterials (each material = 2 points)
    nano_points = sum(len(materials) for materials in nanomaterials.values()) * 2
    
    # Weighted points for keywords by category importance
    keyword_weights = {
        'coating_types': 3.0,           # Most important
        'functional_properties': 2.5,   # Very important
        'application_methods': 2.0,     # Important
        'applications': 1.5,            # Moderately important
        'characterization': 1.0,        # Basic importance
        'sustainability': 1.2           # Growing importance
    }
    
    keyword_points = 0
    for category, words in keywords.items():
        weight = keyword_weights.get(category, 1.0)
        keyword_points += len(words) * weight
    
    # Total score (normalized to max 10.0)
    total_score = (nano_points + keyword_points) / 10
    return min(total_score, 10.0)

def classify_relevance(score: float) -> str:
    """
    Classify relevance level based on score
    """
    if score >= 7.0:
        return "Very High"
    elif score >= 5.0:
        return "High"
    elif score >= 3.0:
        return "Medium"
    elif score >= 1.0:
        return "Low"
    else:
        return "Very Low"

def get_active_categories(nanomaterials: Dict[str, Set[str]], 
                         keywords: Dict[str, Set[str]]) -> List[str]:
    """
    Get list of categories that have matches
    """
    active = []
    
    # Nanomaterial categories
    for category, materials in nanomaterials.items():
        if materials:
            active.append(f"nano_{category}")
    
    # Keyword categories
    for category, words in keywords.items():
        if words:
            active.append(f"key_{category}")
    
    return active

print("📊 Relevance Scoring System Ready")
print("✅ calculate_relevance_score()")
print("✅ classify_relevance()")
print("✅ get_active_categories()")

# Test scoring system
test_nano = {
    'metal_oxides': {'TiO2', 'ZnO'},
    'carbon_materials': set(),
    'metal_nanoparticles': set()
}

test_keywords = {
    'coating_types': {'nanocoating', 'coating'},
    'functional_properties': {'antimicrobial', 'self-cleaning'},
    'application_methods': set()
}

test_score = calculate_relevance_score(test_nano, test_keywords)
test_class = classify_relevance(test_score)
test_categories = get_active_categories(test_nano, test_keywords)

print(f"\n🧪 Test Results:")
print(f"  Score: {test_score:.2f}")
print(f"  Classification: {test_class}")
print(f"  Active categories: {test_categories}")

📊 Relevance Scoring System Ready
✅ calculate_relevance_score()
✅ classify_relevance()
✅ get_active_categories()

🧪 Test Results:
  Score: 1.50
  Classification: Low
  Active categories: ['nano_metal_oxides', 'key_coating_types', 'key_functional_properties']


In [None]:
# 🔍 COMPREHENSIVE TEXT ANALYSIS FUNCTION

def analyze_text_comprehensive(text: str, 
                             include_title: str = None) -> Dict[str, Any]:
    """
    Comprehensive analysis of scientific text for nanomaterials and coatings
    
    Args:
        text: Main text to analyze (usually abstract)
        include_title: Optional title text to include in analysis
    
    Returns:
        Dictionary with complete analysis results
    """
    # Combine text sources
    full_text = text if text else ""
    if include_title and str(include_title).strip():
        full_text = f"{include_title} {full_text}"
    
    if not full_text.strip():
        return {
            'nanomaterials': {cat: set() for cat in NANOMATERIALS_DATABASE.keys()},
            'keywords': {cat: set() for cat in COATING_PAINT_KEYWORDS.keys()},
            'relevance_score': 0.0,
            'relevance_class': 'Very Low',
            'active_categories': [],
            'formatted_materials': 'None detected',
            'formatted_keywords': 'None detected',
            'has_content': False
        }
    
    # Extract nanomaterials and keywords
    nanomaterials = extract_nanomaterials_from_text(full_text, NANOMATERIAL_PATTERNS)
    keywords = extract_keywords_from_text(full_text, KEYWORD_PATTERNS)
    
    # Calculate relevance
    score = calculate_relevance_score(nanomaterials, keywords)
    classification = classify_relevance(score)
    active_cats = get_active_categories(nanomaterials, keywords)
    
    # Format results
    formatted_materials = format_material_results(nanomaterials)
    formatted_keywords = format_keyword_results(keywords)
    
    return {
        'nanomaterials': nanomaterials,
        'keywords': keywords,
        'relevance_score': score,
        'relevance_class': classification,
        'active_categories': active_cats,
        'formatted_materials': formatted_materials,
        'formatted_keywords': formatted_keywords,
        'has_content': True
    }

def analyze_dataframe_batch(df: pd.DataFrame, 
                          text_column: str = 'Abstract',
                          title_column: str = 'Title',
                          batch_size: int = 1000) -> pd.DataFrame:
    """
    Analyze a DataFrame in batches for better performance
    
    Args:
        df: Input DataFrame
        text_column: Column containing main text (abstracts)
        title_column: Column containing titles
        batch_size: Number of rows to process at once
    
    Returns:
        DataFrame with analysis results added
    """
    print(f"🔍 Starting batch analysis of {len(df)} records...")
    print(f"📊 Batch size: {batch_size}")
    
    # Initialize result columns
    result_columns = {
        'Nanomaterials_Detected': [],
        'Keywords_Detected': [],
        'Relevance_Score': [],
        'Relevance_Class': [],
        'Active_Categories': [],
        'Material_Categories': []
    }
    
    # Process in batches
    total_batches = (len(df) + batch_size - 1) // batch_size
    
    for batch_num in range(total_batches):
        start_idx = batch_num * batch_size
        end_idx = min((batch_num + 1) * batch_size, len(df))
        
        print(f"📦 Processing batch {batch_num + 1}/{total_batches} (rows {start_idx}-{end_idx-1})")
        
        batch_df = df.iloc[start_idx:end_idx]
        
        for idx, row in batch_df.iterrows():
            # Get text content
            text = row.get(text_column, '') if text_column in df.columns else ''
            title = row.get(title_column, '') if title_column in df.columns else None
            
            # Analyze
            analysis = analyze_text_comprehensive(text, title)
            
            # Store results
            result_columns['Nanomaterials_Detected'].append(analysis['formatted_materials'])
            result_columns['Keywords_Detected'].append(analysis['formatted_keywords'])
            result_columns['Relevance_Score'].append(analysis['relevance_score'])
            result_columns['Relevance_Class'].append(analysis['relevance_class'])
            result_columns['Active_Categories'].append('; '.join(analysis['active_categories']))
            
            # Identify material categories with hits
            material_cats = [cat for cat, materials in analysis['nanomaterials'].items() if materials]
            result_columns['Material_Categories'].append('; '.join(material_cats))
        
        # Progress update
        if batch_num % 5 == 0 or batch_num == total_batches - 1:
            progress = ((batch_num + 1) / total_batches) * 100
            print(f"  ⚡ Progress: {progress:.1f}%")
    
    # Add results to DataFrame
    df_result = df.copy()
    for col_name, col_data in result_columns.items():
        df_result[col_name] = col_data
    
    print(f"\n✅ Batch analysis complete!")
    print(f"📊 Added {len(result_columns)} analysis columns")
    
    # Quick statistics
    if 'Relevance_Class' in df_result.columns:
        relevance_stats = df_result['Relevance_Class'].value_counts()
        print(f"\n📈 Relevance Distribution:")
        for level, count in relevance_stats.items():
            percentage = (count / len(df_result)) * 100
            print(f"  {level}: {count} ({percentage:.1f}%)")
    
    return df_result

print("🔍 Comprehensive Analysis Functions Ready")
print("✅ analyze_text_comprehensive()")
print("✅ analyze_dataframe_batch()")
print("⚡ Optimized for high-performance batch processing")

🔍 Comprehensive Analysis Functions Ready
✅ analyze_text_comprehensive()
✅ analyze_dataframe_batch()
⚡ Optimized for high-performance batch processing


In [None]:
# 🧪 DEMO AND TESTING

def demo_pattern_recognition():
    """
    Demonstrate the pattern recognition capabilities
    """
    print("🧪 PATTERN RECOGNITION DEMONSTRATION")
    print("=" * 40)
    
    # Sample texts for testing
    test_samples = [
        {
            'title': "TiO2 Nanoparticles for Self-Cleaning Automotive Coatings",
            'abstract': "This study investigates the use of titanium dioxide nanoparticles in developing self-cleaning and antimicrobial automotive coatings. The sol-gel method was used to apply the nanocoating, resulting in superhydrophobic surfaces with excellent UV resistance."
        },
        {
            'title': "Graphene-Enhanced Marine Paint Performance", 
            'abstract': "Graphene oxide was incorporated into marine paints to improve corrosion resistance and anti-fouling properties. The spray coating technique provided uniform coverage with enhanced mechanical properties."
        },
        {
            'title': "Silver Nanoparticles in Antibacterial Textile Coatings",
            'abstract': "Silver nanoparticles were embedded in polymer coatings for textile applications. The resulting fabric showed strong antimicrobial activity and maintained durability after multiple washing cycles."
        }
    ]
    
    for i, sample in enumerate(test_samples, 1):
        print(f"\n📄 Sample {i}: {sample['title']}")
        print(f"📝 Abstract: {sample['abstract'][:100]}...")
        
        # Analyze
        analysis = analyze_text_comprehensive(sample['abstract'], sample['title'])
        
        print(f"\n🔍 Analysis Results:")
        print(f"  🧬 Nanomaterials: {analysis['formatted_materials']}")
        print(f"  🎨 Keywords: {analysis['formatted_keywords'][:100]}{'...' if len(analysis['formatted_keywords']) > 100 else ''}")
        print(f"  📊 Relevance Score: {analysis['relevance_score']:.2f}")
        print(f"  🎯 Classification: {analysis['relevance_class']}")
        print(f"  📋 Active Categories: {len(analysis['active_categories'])} categories")
        
        print("  " + "-" * 50)

def create_sample_dataframe() -> pd.DataFrame:
    """
    Create a sample DataFrame for testing
    """
    sample_data = {
        'Title': [
            "TiO2 Nanoparticles for Self-Cleaning Automotive Coatings",
            "Graphene-Enhanced Marine Paint Performance",
            "Silver Nanoparticles in Antibacterial Textile Coatings",
            "ZnO Nanocoatings for UV Protection in Building Materials",
            "Polymer Synthesis and Characterization Studies"
        ],
        'Abstract': [
            "This study investigates the use of titanium dioxide nanoparticles in developing self-cleaning and antimicrobial automotive coatings. The sol-gel method was used to apply the nanocoating, resulting in superhydrophobic surfaces with excellent UV resistance.",
            "Graphene oxide was incorporated into marine paints to improve corrosion resistance and anti-fouling properties. The spray coating technique provided uniform coverage with enhanced mechanical properties.",
            "Silver nanoparticles were embedded in polymer coatings for textile applications. The resulting fabric showed strong antimicrobial activity and maintained durability after multiple washing cycles.",
            "Zinc oxide nanoparticles were used to create UV-resistant coatings for architectural applications. The dip coating process yielded uniform films with excellent weathering resistance.",
            "This research focuses on the synthesis of novel polymers using traditional chemical methods. Various characterization techniques were employed to study the molecular structure."
        ],
        'Year': [2023, 2022, 2023, 2021, 2020],
        'Authors': ['Smith A., Jones B.', 'Chen L., Wang M.', 'Brown C., Davis R.', 'Garcia P., Miller S.', 'Wilson T., Moore K.']
    }
    
    return pd.DataFrame(sample_data)

# Run demonstrations
demo_pattern_recognition()

print(f"\n🧪 Creating sample DataFrame for batch testing...")
sample_df = create_sample_dataframe()
print(f"📊 Sample DataFrame created: {len(sample_df)} rows")

# Test batch analysis
print(f"\n🔍 Testing batch analysis...")
analyzed_df = analyze_dataframe_batch(sample_df, batch_size=2)
print(f"\n✅ Batch analysis complete!")
print(f"📋 Columns added: {len(analyzed_df.columns) - len(sample_df.columns)}")

# Show results
print(f"\n📊 SAMPLE RESULTS:")
for idx, row in analyzed_df.iterrows():
    print(f"\n{idx+1}. {row['Title'][:50]}...")
    print(f"   Relevance: {row['Relevance_Class']} (Score: {row['Relevance_Score']:.2f})")
    print(f"   Materials: {row['Nanomaterials_Detected'][:80]}{'...' if len(str(row['Nanomaterials_Detected'])) > 80 else ''}")

🧪 PATTERN RECOGNITION DEMONSTRATION

📄 Sample 1: TiO2 Nanoparticles for Self-Cleaning Automotive Coatings
📝 Abstract: This study investigates the use of titanium dioxide nanoparticles in developing self-cleaning and an...

🔍 Analysis Results:
  🧬 Nanomaterials: titanium dioxide (oxidos_metalicos); TiO2 (oxidos_metalicos)
  🎨 Keywords: Coatings; Self-Cleaning; antimicrobial; coatings; nanocoating; self-cleaning; sol-gel; superhydropho...
  📊 Relevance Score: 2.50
  🎯 Classification: Low
  📋 Active Categories: 4 categories
  --------------------------------------------------

📄 Sample 2: Graphene-Enhanced Marine Paint Performance
📝 Abstract: Graphene oxide was incorporated into marine paints to improve corrosion resistance and anti-fouling ...

🔍 Analysis Results:
  🧬 Nanomaterials: Graphene (carbono)
  🎨 Keywords: Paint; anti-fouling; coating; mechanical properties; paints; spray coating
  📊 Relevance Score: 1.65
  🎯 Classification: Low
  📋 Active Categories: 5 categories
  ------------

In [None]:
# 💾 EXPORT FUNCTIONS FOR INTEGRATION
import os
def save_analysis_results(df: pd.DataFrame, 
                         output_path: str = None,
                         include_timestamp: bool = True) -> str:
    """
    Save analysis results to CSV with optional timestamp
    """
    if output_path is None:
        timestamp = datetime.now().strftime('%Y%m%d_%H%M%S') if include_timestamp else ''
        output_path = f'cenanoink_pattern_analysis_{timestamp}.csv'
    
    df.to_csv(output_path, index=False, encoding='utf-8')
    
    file_size = os.path.getsize(output_path) / (1024 * 1024)  # MB
    print(f"💾 Analysis results saved: {output_path}")
    print(f"📊 File size: {file_size:.1f} MB")
    print(f"📋 Records: {len(df):,}")
    
    return output_path

def export_patterns_database(output_dir: str = './') -> Dict[str, str]:
    """
    Export the patterns database for use in other notebooks
    """
    import json
    import pickle
    from datetime import datetime
    
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    
    # Export as JSON (human-readable)
    json_path = os.path.join(output_dir, f'nanomaterials_database_{timestamp}.json')
    json_data = {
        'nanomaterials': {k: list(v) for k, v in NANOMATERIALS_DATABASE.items()},
        'keywords': {k: list(v) for k, v in COATING_PAINT_KEYWORDS.items()},
        'export_timestamp': datetime.now().isoformat(),
        'total_categories': len(NANOMATERIALS_DATABASE) + len(COATING_PAINT_KEYWORDS),
        'total_patterns': sum(len(v) for v in NANOMATERIALS_DATABASE.values()) + sum(len(v) for v in COATING_PAINT_KEYWORDS.values())
    }
    
    with open(json_path, 'w', encoding='utf-8') as f:
        json.dump(json_data, f, indent=2, ensure_ascii=False)
    
    # Export compiled patterns as pickle (for performance)
    pickle_path = os.path.join(output_dir, f'compiled_patterns_{timestamp}.pkl')
    pickle_data = {
        'nanomaterial_patterns': NANOMATERIAL_PATTERNS,
        'keyword_patterns': KEYWORD_PATTERNS,
        'database_version': timestamp
    }
    
    with open(pickle_path, 'wb') as f:
        pickle.dump(pickle_data, f)
    
    print(f"📚 Database updated and exported:")
    print(f"  📄 JSON: {json_path}")
    print(f"  🔧 Pickle: {pickle_path}")
    
    return {'json': json_path, 'pickle': pickle_path}

# Export database for integration
print("\n💾 EXPORTING PATTERNS DATABASE")
print("=" * 35)
exported_files = export_patterns_database()

print(f"\n🎯 INTEGRATION READY")
print(f"✅ Patterns compiled and tested")
print(f"✅ Analysis functions ready")
print(f"✅ Database loaded from existing JSON and exported for other notebooks")
print(f"\n📋 Next Steps:")
print(f"  1. Use this notebook's functions in main analysis pipeline")
print(f"  2. Import patterns in '03_gemini_analysis.ipynb'")
print(f"  3. Generate reports with '04_reporting_system.ipynb'")

print(f"\n🧬 CenanoInk Nanomaterials Database - READY FOR PRODUCTION!")


💾 EXPORTING PATTERNS DATABASE
📚 Database updated and exported:
  📄 JSON: ./nanomaterials_database_20250610_112402.json
  🔧 Pickle: ./compiled_patterns_20250610_112402.pkl

🎯 INTEGRATION READY
✅ Patterns compiled and tested
✅ Analysis functions ready
✅ Database loaded from existing JSON and exported for other notebooks

📋 Next Steps:
  1. Use this notebook's functions in main analysis pipeline
  2. Import patterns in '03_gemini_analysis.ipynb'
  3. Generate reports with '04_reporting_system.ipynb'

🧬 CenanoInk Nanomaterials Database - READY FOR PRODUCTION!
