# From CSV to Simulation: A Guide to Importing Real-World Data

Welcome to the second thematic notebook for the Smart Water Platform. In the first notebook, *Anatomy of a Simulation*, we learned about the core engine. Now, we will take a critical step towards building a true Digital Twin: **driving a simulation with external, real-world data**.

A simulation is only as good as its inputs. Instead of using hard-coded or mathematically generated data, we often need to feed our models with observed data from sensors, weather forecasts, or historical records. This notebook introduces the `CsvDataSourceAgent`, a new component designed for this exact purpose.

## The `CsvDataSourceAgent`: A Bridge to the Real World

We have developed a new agent, `CsvDataSourceAgent`, located in `swp.data_access.csv_data_source`. Its job is simple but powerful:

1. It reads a standard CSV (Comma-Separated Values) file containing time-series data. The file must have a `timestamp` column.
2. At each step of the simulation, it checks the current simulation time (`current_time`).
3. If it finds data in the CSV corresponding to the current time, it **publishes** that data as a message to the `MessageBus`.

This allows any other component in the simulation to subscribe to this data feed and react to it in real-time.

## Making Components Data-Aware

For this to work, our physical components need to be able to listen to the `MessageBus`. We have modified the `Reservoir` component to do just that. Its `__init__` method can now accept a `message_bus` and an `inflow_topic`. 

When the `CsvDataSourceAgent` publishes a new inflow value to the topic, the `Reservoir` receives the message and adds the value to its total inflow for that time step. This allows a single component to be influenced by both physical upstream connections and external data sources simultaneously.

## Case Study: A Data-Driven Reservoir

The code below demonstrates this new feature in action. We will:
1. Create a `CsvDataSourceAgent` and point it to a file with two days of hourly inflow data (`data/observed_inflow.csv`).
2. Create a `Reservoir` and subscribe it to the agent's data topic.
3. Run the simulation and observe how the reservoir's state changes based *only* on the data read from the CSV file.

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
from swp.core_engine.testing.simulation_harness import SimulationHarness
from swp.simulation_identification.physical_objects.reservoir import Reservoir
from swp.simulation_identification.physical_objects.gate import Gate
from swp.data_access.csv_data_source import CsvDataSourceAgent

# 1. Simulation and Communication Setup
simulation_config = {'duration': 172800, 'dt': 3600.0} # 2 days, hourly steps
harness = SimulationHarness(config=simulation_config)
message_bus = harness.message_bus

# 2. Define the data topic and create the agent
DATA_TOPIC = "data.inflow.observed"
csv_agent = CsvDataSourceAgent(
    agent_id="csv_inflow_agent",
    message_bus=message_bus,
    csv_filepath="data/observed_inflow.csv",
    publish_topic=DATA_TOPIC
)

# 3. Create the data-aware Reservoir
reservoir = Reservoir(
    name="data_driven_reservoir",
    initial_state={'volume': 5e6, 'water_level': 10.0},
    parameters={'surface_area': 5e5},
    message_bus=message_bus,
    inflow_topic=DATA_TOPIC
)

# 4. Create a downstream gate to allow outflow
gate = Gate(
    name="outflow_gate",
    initial_state={'opening': 0.5}, # Fixed opening
    parameters={'width': 10, 'discharge_coefficient': 0.6}
)

# 5. Add components and run
harness.add_component(reservoir)
harness.add_component(gate)
harness.add_agent(csv_agent)
harness.add_connection("data_driven_reservoir", "outflow_gate")
harness.build()

# Redirect stdout to keep notebook clean
import sys
original_stdout = sys.stdout
with open('simulation_log.txt', 'w') as f:
    sys.stdout = f
    harness.run_mas_simulation()
sys.stdout = original_stdout

print("Data-driven simulation complete.")

## Results and Visualization

The plot below visualizes the simulation. We can compare the `inflow` from the CSV data with the resulting `water_level` in the reservoir. The reservoir level clearly rises and falls in response to the external data feed.

In [None]:
# Load the original data for comparison
source_data = pd.read_csv('data/observed_inflow.csv')

# Extract data from simulation history
time_hours = [h['time'] / 3600 for h in harness.history]
water_levels = [h['data_driven_reservoir']['water_level'] for h in harness.history]

# Create a DataFrame
df = pd.DataFrame({
    'Time (hours)': time_hours,
    'Reservoir Level (m)': water_levels
})

# Plot the results
fig, ax1 = plt.subplots(figsize=(14, 7))

# Plot Reservoir Level on the left y-axis
ax1.plot(df['Time (hours)'], df['Reservoir Level (m)'], label='Reservoir Water Level', color='purple')
ax1.set_xlabel('Time (hours)')
ax1.set_ylabel('Water Level (m)', color='purple')
ax1.tick_params(axis='y', labelcolor='purple')
ax1.grid(True)

# Create a second y-axis for the inflow data
ax2 = ax1.twinx()
ax2.plot(source_data['timestamp'] / 3600, source_data['inflow'], label='Observed Inflow (from CSV)', color='blue', linestyle='--', drawstyle='steps-post')
ax2.set_ylabel('Inflow (m^3/s)', color='blue')
ax2.tick_params(axis='y', labelcolor='blue')

fig.suptitle('Reservoir Response to Data-Driven Inflow', fontsize=16)
fig.legend(loc='upper right', bbox_to_anchor=(0.9, 0.9))
plt.show()