# LangChain Output Parsers - Environmental Data Processing

This notebook demonstrates various LangChain output parsers for processing environmental and climate data with real OpenAI GPT-4o-mini integration.

## What You'll Learn:
- **StrOutputParser**: Basic text processing
- **JSONOutputParser**: Structured data extraction
- **CommaSeparatedListOutputParser**: List generation
- **XMLOutputParser**: XML document creation
- **MarkdownListOutputParser**: Markdown formatting
- **Pydantic Models**: Type-safe validation

## Prerequisites:
- OpenAI API key set as environment variable
- LangChain packages installed
- Virtual environment activated

In [None]:
# Install all required packages for this notebook (run this cell if packages are not installed)
!pip install --upgrade pip
!pip install langchain
!pip install langchain-core
!pip install langchain-openai
!pip install pydantic
!pip install openai
!pip install defusedxml
!pip install tiktoken
!pip install ipywidgets
!pip install rich

## Import Dependencies

Let's start by importing all necessary libraries for our LangChain output parser demonstrations.

In [None]:
import os
import json
import re
from typing import List, Dict, Any
from datetime import datetime
from pydantic import BaseModel, Field, validator

# LangChain imports
from langchain_core.output_parsers import (
    StrOutputParser,
    JsonOutputParser,
    CommaSeparatedListOutputParser,
    XMLOutputParser,
    MarkdownListOutputParser,
)
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

print("✅ All dependencies imported successfully!")

## Setup OpenAI LLM

Configure the OpenAI GPT-4o-mini model for our demonstrations. Make sure you have your OpenAI API key set as an environment variable.

In [None]:
def create_openai_llm():
    """Create and return an OpenAI LLM instance"""
    api_key = os.getenv("OPENAI_API_KEY")
    if not api_key:
        print("⚠️ Warning: OPENAI_API_KEY environment variable not set.")
        print("Please set it with: export OPENAI_API_KEY='your-api-key-here'")
        print("Using gpt-4o-mini model with temperature=0.3 for consistent outputs")
        return None

    return ChatOpenAI(
        model="gpt-4o-mini",
        temperature=0.3,  # Lower temperature for more consistent outputs
        api_key=api_key
    )

# Test the LLM setup
llm = create_openai_llm()
if llm:
    print("✅ OpenAI LLM configured successfully!")
else:
    print("❌ OpenAI LLM setup failed. Please check your API key.")

## Pydantic Models for Data Validation

Define structured data models for environmental data validation using Pydantic.

In [None]:
class EmissionData(BaseModel):
    """Model for CO2 emission data with validation"""
    co2_levels: float = Field(description="CO2 levels in parts per million")
    temperature_rise: float = Field(description="Temperature rise in Celsius")
    sea_level_rise_mm_per_year: float = Field(description="Sea level rise in mm per year")
    arctic_ice_loss_percent: float = Field(description="Arctic ice loss percentage")
    renewable_energy_percent: float = Field(description="Renewable energy percentage")
    deforestation_rate_hectares_per_year: int = Field(description="Deforestation rate in hectares per year")

    @validator('co2_levels')
    def validate_co2_levels(cls, v):
        if v < 280 or v > 500:
            raise ValueError('CO2 levels should be between 280-500 ppm for realistic values')
        return v

    @validator('temperature_rise')
    def validate_temperature(cls, v):
        if v < 0 or v > 5:
            raise ValueError('Temperature rise should be between 0-5°C')
        return v

class SustainabilityReport(BaseModel):
    """Comprehensive sustainability report model"""
    report_title: str = Field(description="Title of the sustainability report")
    assessment_date: datetime = Field(default_factory=datetime.now, description="Date of assessment")
    carbon_footprint_tons: float = Field(description="Carbon footprint in tons CO2 equivalent")
    renewable_percentage: float = Field(ge=0, le=100, description="Percentage of renewable energy use")
    waste_reduction_percent: float = Field(description="Waste reduction percentage")
    water_conservation_liters: int = Field(description="Water conserved in liters")
    sustainability_score: int = Field(ge=0, le=100, description="Overall sustainability score")
    recommendations: List[str] = Field(description="List of sustainability recommendations")

print("✅ Pydantic models defined successfully!")

## 1. StrOutputParser - Basic String Output

The **StrOutputParser** is the simplest parser that returns the raw string output from the LLM. Perfect for generating human-readable text summaries.

In [None]:
def demonstrate_str_output_parser():
    """Demonstrate StrOutputParser for basic text processing"""
    print("=" * 60)
    print("1. StrOutputParser - Basic String Output")
    print("=" * 60)

    # Create parser and prompt
    parser = StrOutputParser()
    prompt = PromptTemplate(
        template="Provide a brief summary of current climate change impacts: {topic}",
        input_variables=["topic"]
    )

    # Create chain with real OpenAI LLM
    llm = create_openai_llm()
    if not llm:
        print("❌ Cannot proceed without OpenAI LLM")
        return

    chain = prompt | llm | parser

    # Execute
    result = chain.invoke({"topic": "global warming effects"})
    print(f"Input: Global warming effects summary")
    print(f"Output: {result}")
    print(f"Output Type: {type(result)}")
    print()

# Run the demonstration
demonstrate_str_output_parser()

## 2. JSONOutputParser - Structured Environmental Data

The **JSONOutputParser** extracts structured data in JSON format. Ideal for creating structured environmental metrics and data records.

In [None]:
def demonstrate_json_output_parser():
    """Demonstrate JSONOutputParser for structured environmental data"""
    print("=" * 60)
    print("2. JSONOutputParser - Environmental Metrics")
    print("=" * 60)

    # Create parser with format instructions
    parser = JsonOutputParser()
    prompt = PromptTemplate(
        template="""Provide environmental metrics in JSON format: {query}

        CRITICAL: Return ONLY valid JSON. NO underscores in numbers (use 10000000 not 10_000_000).
        Example: {{"deforestation": 10000000, "co2": 415}}

        {format_instructions}""",
        input_variables=["query"],
        partial_variables={"format_instructions": parser.get_format_instructions()}
    )

    # Create chain with real OpenAI LLM
    llm = create_openai_llm()
    if not llm:
        print("❌ Cannot proceed without OpenAI LLM")
        return

    chain = prompt | llm | parser

    # Execute with error handling
    try:
        result = chain.invoke({"query": "current global environmental metrics"})
        print(f"Input: Current global environmental metrics")
        print(f"Output: {json.dumps(result, indent=2)}")
        print(f"Output Type: {type(result)}")
        print(f"CO2 Levels: {result.get('co2_levels', result.get('co2', 'N/A'))}")
    except Exception as e:
        print(f"JSON Parser Error: {str(e)[:100]}...")
        print("Attempting fallback with fixed JSON...")

        # Fallback: Get raw response and fix it
        llm_only_chain = prompt | llm
        raw_result = llm_only_chain.invoke({"query": "current global environmental metrics"})

        # Fix numeric underscores and parse
        try:
            fixed_json = re.sub(r'(\d+)_(\d+)', r'\1\2', str(raw_result))
            parsed_result = json.loads(fixed_json)
            print(f"✅ Fixed JSON Output: {json.dumps(parsed_result, indent=2)}")
            print(f"Output Type: {type(parsed_result)}")
        except Exception as parse_error:
            print(f"❌ Could not fix JSON: {parse_error}")
            print(f"Raw output: {raw_result[:200]}...")

    print()

# Run the demonstration
demonstrate_json_output_parser()

## 3. CommaSeparatedListOutputParser - Climate Data Lists

The **CommaSeparatedListOutputParser** generates comma-separated lists that are automatically parsed into Python lists. Perfect for creating lists of climate indicators, recommendations, or categories.

In [None]:
def demonstrate_csv_output_parser():
    """Demonstrate CommaSeparatedListOutputParser for climate data"""
    print("=" * 60)
    print("3. CommaSeparatedListOutputParser - Climate Data List")
    print("=" * 60)

    # Create parser
    parser = CommaSeparatedListOutputParser()
    prompt = PromptTemplate(
        template="""Provide a comma-separated list of climate indicators: {request}

        {format_instructions}""",
        input_variables=["request"],
        partial_variables={"format_instructions": parser.get_format_instructions()}
    )

    # Create chain with real OpenAI LLM
    llm = create_openai_llm()
    if not llm:
        print("❌ Cannot proceed without OpenAI LLM")
        return

    chain = prompt | llm | parser

    # Execute
    result = chain.invoke({"request": "key climate indicators for 2024"})
    print(f"Input: Key climate indicators for 2024")
    print(f"Output: {result}")
    print(f"Output Type: {type(result)}")
    print(f"Number of items: {len(result)}")
    print()

# Run the demonstration
demonstrate_csv_output_parser()

## 4. XMLOutputParser - Environmental Reports

The **XMLOutputParser** generates structured XML documents. Excellent for creating formal environmental reports with hierarchical data structures.

In [None]:
def demonstrate_xml_output_parser():
    """Demonstrate XMLOutputParser for structured environmental reports"""
    print("=" * 60)
    print("4. XMLOutputParser - Environmental Report")
    print("=" * 60)

    # Create parser
    parser = XMLOutputParser()
    prompt = PromptTemplate(
        template="""Generate an environmental report in XML format: {topic}

        Please format your response as XML with elements for title, summary, findings, and urgency_level.""",
        input_variables=["topic"]
    )

    # Create chain with real OpenAI LLM
    llm = create_openai_llm()
    if not llm:
        print("❌ Cannot proceed without OpenAI LLM")
        return

    chain = prompt | llm | parser

    # Execute
    result = chain.invoke({"topic": "global environmental status"})
    print(f"Input: Global environmental status report")
    print(f"Output: {result}")
    print(f"Output Type: {type(result)}")

    # Extract key information if available
    if isinstance(result, dict):
        if 'environmental_report' in result:
            report_data = result['environmental_report']
            if isinstance(report_data, list) and len(report_data) > 0:
                for item in report_data:
                    if isinstance(item, dict) and 'title' in item:
                        print(f"📊 Report Title: {item['title']}")
                        break
    print()

# Run the demonstration
demonstrate_xml_output_parser()

## 5. Simple JSON Validation - Environmental Data

Demonstrate basic JSON validation for environmental data to ensure data quality and completeness.

In [None]:
def demonstrate_simple_json_validation():
    """Demonstrate simple JSON validation for environmental data"""
    print("=" * 60)
    print("5. Simple JSON Validation - Environmental Data")
    print("=" * 60)

    # Create parser
    parser = JsonOutputParser()
    prompt = PromptTemplate(
        template="""Provide environmental data in JSON format: {request}

        Please return valid JSON with keys: temperature, co2_levels, sea_level_rise""",
        input_variables=["request"]
    )

    # Create chain with real OpenAI LLM
    llm = create_openai_llm()
    if not llm:
        print("❌ Cannot proceed without OpenAI LLM")
        return

    chain = prompt | llm | parser

    # Execute
    try:
        result = chain.invoke({"request": "current climate metrics"})
        print(f"Input: Current climate metrics")
        print(f"Output: {json.dumps(result, indent=2)}")
        print(f"Output Type: {type(result)}")

        # Basic validation
        required_keys = ['co2_levels', 'temperature_rise']
        missing_keys = [key for key in required_keys if key not in result]
        if missing_keys:
            print(f"⚠️ Missing keys: {missing_keys}")
        else:
            print("✅ JSON validation: PASSED")
    except Exception as e:
        print(f"❌ Parsing Error: {e}")
    print()

# Run the demonstration
demonstrate_simple_json_validation()

## 6. MarkdownListOutputParser - Environmental Recommendations

The **MarkdownListOutputParser** generates markdown-formatted lists that are parsed into Python lists. Perfect for creating structured recommendations and action items.

In [None]:
def demonstrate_markdown_list_parser():
    """Demonstrate MarkdownListOutputParser for environmental recommendations"""
    print("=" * 60)
    print("6. MarkdownListOutputParser - Environmental Recommendations")
    print("=" * 60)

    # Create parser
    parser = MarkdownListOutputParser()
    prompt = PromptTemplate(
        template="""Provide environmental recommendations as a markdown list: {topic}

        Format your response as a markdown list with - bullets.""",
        input_variables=["topic"]
    )

    # Create chain with real OpenAI LLM
    llm = create_openai_llm()
    if not llm:
        print("❌ Cannot proceed without OpenAI LLM")
        return

    chain = prompt | llm | parser

    try:
        result = chain.invoke({"topic": "corporate sustainability initiatives"})
        print(f"Input: Corporate sustainability initiatives")
        print(f"Output: {result}")
        print(f"Output Type: {type(result)}")
        print(f"Number of recommendations: {len(result) if isinstance(result, list) else 'N/A'}")

        # Display first few recommendations in a readable format
        if isinstance(result, list) and len(result) > 0:
            print("\n📋 Top 5 Recommendations:")
            for i, rec in enumerate(result[:5], 1):
                print(f"   {i}. {rec}")
    except Exception as e:
        print(f"❌ Parsing Error: {e}")
    print()

# Run the demonstration
demonstrate_markdown_list_parser()

## Run All Demonstrations

Execute all output parser demonstrations in sequence to see the complete functionality.

In [None]:
def run_all_demonstrations():
    """Run all LangChain output parser demonstrations"""
    print("🌍 LangChain Output Parsers Environmental Data Demo")
    print("=" * 60)
    print("Recommended LangChain version: 0.1.0+")
    print("Install: pip install langchain>=0.1.0 langchain-core pydantic openai")
    print()

    # Check if LLM is available
    llm = create_openai_llm()
    if not llm:
        print("❌ Cannot run demonstrations without OpenAI LLM setup")
        return

    # Run all demonstrations
    demonstrate_str_output_parser()
    demonstrate_json_output_parser()
    demonstrate_csv_output_parser()
    demonstrate_xml_output_parser()
    demonstrate_simple_json_validation()
    demonstrate_markdown_list_parser()

    print("=" * 60)
    print("🎉 Demo Complete!")
    print("=" * 60)
    print("\n📚 Key Takeaways:")
    print("• StrOutputParser: Simple text processing")
    print("• JsonOutputParser: Structured data with flexible schema")
    print("• CSVOutputParser: Tabular data for analysis")
    print("• XMLOutputParser: Hierarchical document structure")
    print("• MarkdownListOutputParser: Formatted list generation")
    print("\n🔬 Next Steps:")
    print("• Experiment with different prompts and topics")
    print("• Add custom error handling for production use")
    print("• Customize Pydantic models for your specific use case")
    print("• Integrate with real environmental data sources")

# Run all demonstrations
run_all_demonstrations()

## Practical Examples and Use Cases

### Real-World Applications

1. **Environmental Monitoring Systems**
   - Use JSONOutputParser for structured sensor data
   - Use XMLOutputParser for regulatory reports
   - Use CSVOutputParser for data export

2. **Sustainability Reporting**
   - Use StrOutputParser for executive summaries
   - Use MarkdownListOutputParser for action items
   - Use Pydantic models for data validation

3. **Climate Data Analysis**
   - Parse various data formats from scientific papers
   - Convert unstructured text to structured datasets
   - Validate data quality and completeness

### Best Practices

- Always include error handling for production use
- Use appropriate temperature settings (0.3 for consistency)
- Validate output data with Pydantic models
- Implement fallback strategies for parsing failures
- Test with various input formats and edge cases

## Exercise: Create Your Own Parser

Try modifying the examples above to:

1. **Create a custom environmental report** using XMLOutputParser
2. **Generate a list of renewable energy sources** using CSVOutputParser  
3. **Extract carbon footprint data** using JSONOutputParser with validation
4. **Create sustainability goals** using MarkdownListOutputParser

### Challenge: 
Combine multiple parsers to create a comprehensive environmental dashboard that includes:
- Text summary (StrOutputParser)
- Structured metrics (JSONOutputParser) 
- Action items (MarkdownListOutputParser)
- Detailed report (XMLOutputParser)

## Conclusion

This notebook demonstrated the power and versatility of LangChain output parsers for environmental data processing. Key benefits include:

- ✅ **Structured Data Extraction**: Convert unstructured AI responses into usable formats
- ✅ **Type Safety**: Use Pydantic models for validation and error prevention  
- ✅ **Multiple Formats**: Support JSON, XML, CSV, Markdown, and plain text
- ✅ **Error Handling**: Robust parsing with fallback strategies
- ✅ **Real-World Applications**: Environmental monitoring, sustainability reporting, climate analysis

### Next Steps:
- Explore advanced parsing techniques
- Integrate with environmental APIs and databases
- Build production-ready data processing pipelines
- Implement custom parsers for specialized formats

Happy parsing! 🌱

In [None]:
# --- OpenAI API Key Setup (Colab & Local) ---
import os
try:
    import google.colab
    IN_COLAB = True
except ImportError:
    IN_COLAB = False

if IN_COLAB:
    from google.colab import userdata
    try:
        openai_api_key = userdata.get('OPENAI_API_KEY')
        if openai_api_key:
            os.environ["OPENAI_API_KEY"] = openai_api_key
            print("✅ API key loaded from Google Colab secrets!")
        else:
            from getpass import getpass
            print("OpenAI API key not found in Colab secrets.")
            os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API key: ")
            print("✅ API key set from input")
    except Exception as e:
        print(f"Note: {e}")
        print("Enter your OpenAI API key below:")
        from getpass import getpass
        os.environ["OPENAI_API_KEY"] = getpass("OpenAI API key: ")
else:
    try:
        from dotenv import load_dotenv
        load_dotenv()
        api_key = os.getenv("OPENAI_API_KEY")
        if api_key:
            print("✅ API key loaded from .env or environment variable!")
        else:
            from getpass import getpass
            os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API key: ")
            print("✅ API key set from input")
    except Exception as e:
        print(f"Note: {e}")
        from getpass import getpass
        os.environ["OPENAI_API_KEY"] = getpass("OpenAI API key: ")