**Notice**: The `semantica.kg_qa` module is temporarily unavailable and will be reintroduced in a future release. Any quality assessment examples in this notebook are disabled.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hawksight-AI/semantica/blob/main/cookbook/use_cases/renewable_energy/02_Environmental_Impact.ipynb)

# Environmental Impact Analysis Pipeline

## Overview

This notebook demonstrates a complete environmental impact analysis pipeline: ingest environmental data from multiple sources (EPA APIs, climate databases, sustainability feeds), extract environmental entities, build impact knowledge graph, analyze relationships, and assess environmental impact.


**Documentation**: [API Reference](https://semantica.readthedocs.io/use-cases/)

### Modules Used (20+)

- **Ingestion**: FileIngestor, WebIngestor, FeedIngestor, StreamIngestor, DBIngestor, RepoIngestor, EmailIngestor, MCPIngestor
- **Parsing**: JSONParser, CSVParser, StructuredDataParser, DocumentParser
- **Extraction**: NERExtractor, RelationExtractor, EventDetector, SemanticAnalyzer
- **KG**: GraphBuilder, GraphAnalyzer, CentralityCalculator, CommunityDetector
- **Analytics**: ConnectivityAnalyzer, TemporalGraphQuery, TemporalPatternDetector
- **Ontology**: OntologyGenerator, ClassInferrer, PropertyGenerator, OntologyValidator
- **Reasoning**: InferenceEngine, RuleManager, ExplanationGenerator
- **Quality**: KGQualityAssessor, ConflictDetector
- **Export**: JSONExporter, CSVExporter, RDFExporter, OWLExporter, ReportGenerator
- **Visualization**: KGVisualizer, OntologyVisualizer, AnalyticsVisualizer

### Pipeline

**Environmental Data Sources â†’ Parse â†’ Extract Entities â†’ Build Impact KG â†’ Analyze Relationships â†’ Assess Impact â†’ Generate Ontology â†’ Reports â†’ Visualize**

## Installation

Install Semantica from PyPI:

```bash
pip install semantica
# Or with all optional dependencies:
pip install semantica[all]
```

---

## Step 1: Ingest Environmental Data from Multiple Sources

Ingest environmental data from EPA APIs, climate databases, and sustainability feeds.


In [None]:
from semantica.ingest import FileIngestor, WebIngestor, DBIngestor, FeedIngestor
from semantica.parse import JSONParser, CSVParser, StructuredDataParser, DocumentParser
from semantica.semantic_extract import NERExtractor, RelationExtractor, EventDetector, SemanticAnalyzer
from semantica.kg import GraphBuilder, GraphAnalyzer, CentralityCalculator, CommunityDetector
from semantica.kg import ConnectivityAnalyzer, TemporalGraphQuery, TemporalPatternDetector
from semantica.ontology import OntologyGenerator, ClassInferrer, PropertyGenerator, OntologyValidator
from semantica.reasoning import InferenceEngine, RuleManager, ExplanationGenerator
from semantica.conflicts import ConflictDetector
from semantica.export import JSONExporter, CSVExporter, RDFExporter, OWLExporter, ReportGenerator
from semantica.visualization import KGVisualizer, OntologyVisualizer, AnalyticsVisualizer
import tempfile
import os
import json
from datetime import datetime, timedelta

file_ingestor = FileIngestor()
web_ingestor = WebIngestor()
db_ingestor = DBIngestor()
feed_ingestor = FeedIngestor()

json_parser = JSONParser()
csv_parser = CSVParser()
structured_parser = StructuredDataParser()
document_parser = DocumentParser()

# Real environmental data sources
environmental_apis = [
    "https://www.epa.gov/enviro/facts-service",  # EPA Environmental Facts Service
    "https://www.epa.gov/airdata",  # EPA Air Data
    "https://api.github.com/repos/climate-data/aggregator"  # Climate data aggregator
]

environmental_feeds = [
    "https://www.epa.gov/rss",  # EPA RSS Feed
    "https://www.energy.gov/rss",  # US Energy Department RSS
    "https://feeds.reuters.com/reuters/environment"  # Reuters Environment News
]

# Real database connection for environmental data
db_connection_string = "postgresql://user:password@localhost:5432/environmental_db"
db_query = "SELECT project_id, energy_type, co2_reduction, water_saved, land_impact, timestamp FROM environmental_impact WHERE timestamp > CURRENT_DATE - INTERVAL '1 year' ORDER BY timestamp DESC"

temp_dir = tempfile.mkdtemp()

# Sample environmental impact data
environmental_file = os.path.join(temp_dir, "environmental_impact.json")
environmental_data = [
    {
        "project_id": "PROJ-001",
        "energy_type": "Solar",
        "co2_reduction_tons": 5000,
        "water_saved_gallons": 1000000,
        "land_impact_acres": 50,
        "carbon_offset": 5000,
        "timestamp": (datetime.now() - timedelta(days=60)).isoformat(),
        "region": "California"
    },
    {
        "project_id": "PROJ-002",
        "energy_type": "Wind",
        "co2_reduction_tons": 8000,
        "water_saved_gallons": 2000000,
        "land_impact_acres": 100,
        "carbon_offset": 8000,
        "timestamp": (datetime.now() - timedelta(days=30)).isoformat(),
        "region": "Texas"
    },
    {
        "project_id": "PROJ-003",
        "energy_type": "Hydroelectric",
        "co2_reduction_tons": 3000,
        "water_saved_gallons": 500000,
        "land_impact_acres": 200,
        "carbon_offset": 3000,
        "timestamp": datetime.now().isoformat(),
        "region": "Pacific Northwest"
    }
]

with open(environmental_file, 'w') as f:
    json.dump(environmental_data, f, indent=2)

file_objects = file_ingestor.ingest_file(environmental_file, read_content=True)
parsed_data = structured_parser.parse_json(environmental_file)

# Ingest from environmental APIs
environmental_api_list = []
for api_url in environmental_apis[:1]:
    api_content = web_ingestor.ingest_url(api_url)
    if api_content:
        environmental_api_list.append(api_content)
        print(f"  Ingested environmental API: {api_url}")

# Ingest from environmental feeds
environmental_feed_list = []
for feed_url in environmental_feeds:
    feed_data = feed_ingestor.ingest_feed(feed_url)
    if feed_data:
        environmental_feed_list.append(feed_data)
        print(f"  Ingested feed: {feed_url}")

print(f"\nðŸ“Š Environmental Data Ingestion Summary:")
print(f"  Environmental data files: {len([file_objects]) if file_objects else 0}")
print(f"  Environmental APIs: {len(environmental_api_list)}")
print(f"  Environmental feeds: {len(environmental_feed_list)}")
print(f"  Database sources: 1")


## Step 2: Extract Environmental Entities and Build Impact Knowledge Graph

Extract environmental entities and build impact knowledge graph.


In [None]:
ner_extractor = NERExtractor()
relation_extractor = RelationExtractor()
event_detector = EventDetector()
semantic_analyzer = SemanticAnalyzer()

environmental_entities = []
environmental_relationships = []

# Extract from environmental data
if parsed_data and parsed_data.data:
    for entry in parsed_data.data if isinstance(parsed_data.data, list) else [parsed_data.data]:
        if isinstance(entry, dict):
            project_id = entry.get("project_id", "")
            energy_type = entry.get("energy_type", "")
            region = entry.get("region", "")
            
            environmental_entities.append({
                "id": project_id,
                "type": "Project",
                "name": project_id,
                "properties": {
                    "energy_type": energy_type,
                    "region": region,
                    "timestamp": entry.get("timestamp", "")
                }
            })
            
            environmental_entities.append({
                "id": energy_type,
                "type": "Energy_Source",
                "name": energy_type,
                "properties": {}
            })
            
            environmental_entities.append({
                "id": f"{project_id}_co2_reduction",
                "type": "Environmental_Impact",
                "name": "CO2 Reduction",
                "properties": {
                    "metric": "CO2",
                    "value": entry.get("co2_reduction_tons", 0),
                    "unit": "tons"
                }
            })
            
            environmental_entities.append({
                "id": f"{project_id}_water_saved",
                "type": "Environmental_Impact",
                "name": "Water Saved",
                "properties": {
                    "metric": "Water",
                    "value": entry.get("water_saved_gallons", 0),
                    "unit": "gallons"
                }
            })
            
            environmental_relationships.append({
                "source": project_id,
                "target": energy_type,
                "type": "uses",
                "properties": {}
            })
            
            environmental_relationships.append({
                "source": project_id,
                "target": f"{project_id}_co2_reduction",
                "type": "reduces",
                "properties": {}
            })
            
            environmental_relationships.append({
                "source": project_id,
                "target": f"{project_id}_water_saved",
                "type": "saves",
                "properties": {}
            })

builder = GraphBuilder()
graph_analyzer = GraphAnalyzer()
centrality_calculator = CentralityCalculator()
community_detector = CommunityDetector()

impact_kg = builder.build(environmental_entities, environmental_relationships)

metrics = graph_analyzer.compute_metrics(impact_kg)
centrality_result = centrality_calculator.calculate_degree_centrality(impact_kg)
centrality_scores = centrality_result.get('centrality', {})
communities = community_detector.detect_communities(impact_kg)

print(f"Extracted {len(environmental_entities)} environmental entities")
print(f"Extracted {len(environmental_relationships)} relationships")
print(f"Built impact knowledge graph with {len(impact_kg.get('entities', []))} entities")
print(f"Graph density: {metrics.get('density', 0):.3f}")


## Step 3: Analyze Environmental Relationships

Analyze environmental relationships using graph analytics.


In [None]:
connectivity_analyzer = ConnectivityAnalyzer()
temporal_query = TemporalGraphQuery()
temporal_pattern_detector = TemporalPatternDetector()

connectivity = connectivity_analyzer.analyze_connectivity(impact_kg)

start_time = (datetime.now() - timedelta(days=365)).isoformat()
end_time = datetime.now().isoformat()

temporal_results = temporal_query.query_time_range(
    graph=impact_kg,
    query="Find environmental impacts in the last year",
    start_time=start_time,
    end_time=end_time
)

temporal_patterns = temporal_pattern_detector.detect_temporal_patterns(
    impact_kg,
    pattern_type="trend",
    min_frequency=1
)

# Analyze impact by energy type
impact_analysis = {}
if parsed_data and parsed_data.data:
    for entry in parsed_data.data if isinstance(parsed_data.data, list) else [parsed_data.data]:
        if isinstance(entry, dict):
            energy_type = entry.get("energy_type", "")
            
            if energy_type not in impact_analysis:
                impact_analysis[energy_type] = {
                    "co2_reduction": [],
                    "water_saved": [],
                    "projects": []
                }
            
            impact_analysis[energy_type]["co2_reduction"].append(entry.get("co2_reduction_tons", 0))
            impact_analysis[energy_type]["water_saved"].append(entry.get("water_saved_gallons", 0))
            impact_analysis[energy_type]["projects"].append(entry.get("project_id", ""))

# Calculate totals
for energy_type, data in impact_analysis.items():
    impact_analysis[energy_type]["total_co2_reduction"] = sum(data["co2_reduction"])
    impact_analysis[energy_type]["total_water_saved"] = sum(data["water_saved"])
    impact_analysis[energy_type]["project_count"] = len(data["projects"])

print(f"Environmental relationships analyzed")
print(f"  Connected components: {len(connectivity.get('components', []))}")
print(f"  Temporal patterns: {len(temporal_patterns)}")
print(f"  Energy types analyzed: {len(impact_analysis)}")
for energy_type, data in impact_analysis.items():
    print(f"    {energy_type}: {data['total_co2_reduction']:,} tons CO2 reduced, {data['total_water_saved']:,} gallons saved, {data['project_count']} projects")


## Step 4: Assess Environmental Impact

Assess environmental impact using inference engine and generate ontology.


In [None]:
inference_engine = InferenceEngine()
rule_manager = RuleManager()
explanation_generator = ExplanationGenerator()
ontology_generator = OntologyGenerator()
class_inferrer = ClassInferrer()
property_generator = PropertyGenerator()
ontology_validator = OntologyValidator()

# Environmental impact assessment rules
inference_engine.add_rule("IF co2_reduction > 5000 AND water_saved > 1000000 THEN high_impact_project")
inference_engine.add_rule("IF energy_type is Solar AND co2_reduction > 3000 THEN sustainable_solar")
inference_engine.add_rule("IF multiple projects use same energy_type THEN scalable_solution")

# Assess impact
impact_assessments = []
if parsed_data and parsed_data.data:
    for entry in parsed_data.data if isinstance(parsed_data.data, list) else [parsed_data.data]:
        if isinstance(entry, dict):
            co2_reduction = entry.get("co2_reduction_tons", 0)
            water_saved = entry.get("water_saved_gallons", 0)
            
            impact_score = (co2_reduction / 1000) + (water_saved / 100000)
            impact_level = "high" if impact_score > 10 else "medium" if impact_score > 5 else "low"
            
            assessment = {
                "project_id": entry.get("project_id", ""),
                "energy_type": entry.get("energy_type", ""),
                "co2_reduction": co2_reduction,
                "water_saved": water_saved,
                "impact_score": impact_score,
                "impact_level": impact_level
            }
            impact_assessments.append(assessment)
            
            inference_engine.add_fact({
                "project_id": entry.get("project_id", ""),
                "energy_type": entry.get("energy_type", ""),
                "co2_reduction": co2_reduction,
                "water_saved": water_saved
            })

impact_insights = inference_engine.forward_chain()

# Generate environmental ontology
impact_ontology = ontology_generator.generate_ontology({
    "entities": environmental_entities,
    "relationships": environmental_relationships
})
classes = class_inferrer.infer_classes(environmental_entities)
properties = property_generator.infer_properties(environmental_entities, environmental_relationships, classes)
validation_result = ontology_validator.validate_ontology(impact_ontology)

print(f"Environmental impact assessment complete")
print(f"  Projects assessed: {len(impact_assessments)}")
print(f"  High impact projects: {len([a for a in impact_assessments if a.get('impact_level') == 'high'])}")
print(f"  Generated {len(impact_insights)} impact insights")
print(f"  Ontology valid: {validation_result.valid}")


## Step 5: Generate Reports and Visualize

Generate environmental impact reports and visualize results.


In [None]:
quality_assessor = KGQualityAssessor()
json_exporter = JSONExporter()
csv_exporter = CSVExporter()
rdf_exporter = RDFExporter()
owl_exporter = OWLExporter()
report_generator = ReportGenerator()

quality_score = quality_assessor.assess_overall_quality(impact_kg)

json_exporter.export_knowledge_graph(impact_kg, os.path.join(temp_dir, "environmental_impact_kg.json"))
csv_exporter.export_entities(environmental_entities, os.path.join(temp_dir, "environmental_entities.csv"))
rdf_exporter.export_knowledge_graph(impact_kg, os.path.join(temp_dir, "environmental_impact_kg.rdf"))
owl_exporter.export(impact_ontology, os.path.join(temp_dir, "environmental_ontology.owl"))

total_co2_reduction = sum(a.get("co2_reduction", 0) for a in impact_assessments)
total_water_saved = sum(a.get("water_saved", 0) for a in impact_assessments)

report_data = {
    "summary": f"Environmental impact analysis identified {len(impact_assessments)} projects with {total_co2_reduction:,} tons CO2 reduction and {total_water_saved:,} gallons water saved",
    "projects_analyzed": len(impact_assessments),
    "total_co2_reduction": total_co2_reduction,
    "total_water_saved": total_water_saved,
    "high_impact_projects": len([a for a in impact_assessments if a.get("impact_level") == "high"]),
    "insights": len(impact_insights),
    "quality_score": quality_score.get('overall_score', 0)
}

report = report_generator.generate_report(report_data, format="markdown")

kg_visualizer = KGVisualizer()
ontology_visualizer = OntologyVisualizer()
analytics_visualizer = AnalyticsVisualizer()

kg_viz = kg_visualizer.visualize_network(impact_kg, output="interactive")
ontology_viz = ontology_visualizer.visualize_hierarchy(impact_ontology, output="interactive")
analytics_viz = analytics_visualizer.visualize_analytics(impact_kg, output="interactive")

print(f"Total modules used: 20+")
print(f"Pipeline complete: Environmental Data â†’ Parse â†’ Extract â†’ Build Impact KG â†’ Analyze Relationships â†’ Assess Impact â†’ Generate Ontology â†’ Reports â†’ Visualize")
