## Agent Based Simulations

An agent-based simulation allows you to model complex phenomena by specifying the behaviours and population characterstics of the agents at a lower level of complexity. If you are interested in how human cooperation gives rise to settlements, you might try to model a small population of humans and give them basic ruls of behaviour about how they exploit or share resources. Then you run the model thousands of times, each time changing the various settings for your model such that you have data on multiple runs, multiple times across the full gamut of the possible combinations.

This is called 'sweeping the behaviour space', and it helps you validate or demonstrate that your model is simulating the phenomenon correctly (eg, you haven't made some error of code that is having an impact on any emergent effects you spot). Then you can fit the archaeological evidence against your model to see what the model says about that combination of evidence, ie, 'I've got data that looks like this; the model reproduces that distribution only when 'altruism' is high and 'resources' are high, which therefore suggests...

In this notebook we are building this model using a version of the Netlogo programming language ported to R. Netlogo comes with its own programming environment; you can check it out [behind this link](https://netlogoweb.org/). You might want to select the web version just to play around with that interface and explore some of the demonstration models - are there any that might usefully be 'reconceptualized' for archaeological or historical situations? It's often easier to start with an exsiting model and then modify it for yourself. 

Netlogo code is pretty readable - here's the main code chunk for a model about a spreading virus on a network:

```netlogo
to go
  if all? turtles [not infected?]
    [ stop ]
  ask turtles
  [
     set virus-check-timer virus-check-timer + 1
     if virus-check-timer >= virus-check-frequency
       [ set virus-check-timer 0 ]
  ]
  spread-virus
  do-virus-checks
  tick
end
```

Each one of those commands calls smaller chunks of code. It's a relatively easy language to get started with and then do interesting/complicated things! I wrote a model once about the [triggers of civil violence in the Roman world](https://www.digitalstudies.org/article/id/7198/). The argument of the simulation is expressed in its code. 

Now, I didn't want to have you downloading and installing another platform (netlogo depends on Java, and I just didn't want to face installing that on all of your different machines); we'll use the version of Netlogo for R here as I build a simple model. Read it carefully to see what's going on. At the end of this, I'll suggest trying to make some changes to enable you to explore what the model might mean. 

The simulation is a simple model of information diffusion along the connections between cities in the Roman empire.

Before we go any further, what do you think will happen when it runs? Read through it and try to get a sense of what might happen. What might we learn from this model?


In [None]:
install.packages("NetLogoR")

In [None]:
# Load necessary libraries
# If you haven't installed them, run this line once: install.packages(c("readr", "igraph", "ggplot2", "scales"))
library(readr)
library(igraph)
library(ggplot2)
library(scales)

cat("Loading network data...\n")

# Load edge list (connections between nodes)
# Make sure "edges.csv" is in your working directory.
edges <- read_csv("edges.csv")
print("Edge data structure:")
print(head(edges))
print(paste("Number of edges:", nrow(edges)))

# Load node list (settlement information)
# Make sure "nodes-extended.csv" is in your working directory.
nodes <- read_csv("nodes-extended.csv")
print("Node data structure:")
print(head(nodes))
print(paste("Number of nodes:", nrow(nodes)))

In [None]:
cat("\nCreating network object...\n")
# Create igraph network object using the node and edge data frames
network <- graph_from_data_frame(d = edges, directed = FALSE)

# Initialize network vertex attributes
V(network)$has_been_reached <- FALSE
V(network)$time_reached <- NA_integer_
V(network)$variants_reached <- vector("list", vcount(network))
for(i in 1:vcount(network)) {
  V(network)$variants_reached[[i]] <- integer(0)
}

# Add edge weights (we'll use 'days' as travel time)
E(network)$weight <- edges$days

# Pre-compute travel times for efficiency
edge_travel_times <- setNames(E(network)$weight, paste(edges[,1], edges[,2], sep="-"))
edge_travel_times_rev <- setNames(E(network)$weight, paste(edges[,2], edges[,1], sep="-"))
all_travel_times <- c(edge_travel_times, edge_travel_times_rev)

# Print network summary
print("Network summary:")
print(network)

In [None]:
cat("\nVisualizing network...\n")
# Use node coordinates from your file for a spatial layout
layout_coords <- as.matrix(nodes[, c("x", "y")])

# Create network plot
plot(network,
     vertex.size = 8,
     vertex.color = "lightblue",
     vertex.label = V(network)$name,
     vertex.label.cex = 0.7,
     edge.width = 1,
     edge.color = "gray",
     layout = layout_coords,
     main = "Initial Network Structure")


In [None]:
cat("\nSetting up ABM parameters...\n")
# Model parameters
n_agents <- 101
initial_message <- "Hello World"
mutation_rate <- 0.1
transmission_rate <- 0.3
max_time_steps <- 1000
target_coverage <- 0.95

In [None]:
cat("\nCreating world and agents...\n")
# Create a data frame to represent agents
agents <- data.frame(
  id = 1:n_agents,
  has_message = FALSE,
  variant_id = NA_integer_,
  time_received = NA_integer_,
  current_node = as.numeric(sample(V(network), n_agents, replace = TRUE)),
  target_node = NA_integer_,
  travel_time = 0L,
  stringsAsFactors = FALSE
)

# Initially, agents are stationary. Set their target to their current location.
agents$target_node <- agents$current_node

# Give the first agent the initial message
agents$has_message[1] <- TRUE
agents$variant_id[1] <- 0L
agents$time_received[1] <- 0L

# Update the network to mark the starting node of Agent 1 as reached
initial_node <- agents$current_node[1]
V(network)$has_been_reached[initial_node] <- TRUE
V(network)$time_reached[initial_node] <- 0L
V(network)$variants_reached[[initial_node]] <- c(V(network)$variants_reached[[initial_node]], 0L)

cat("Agent 1 has the initial message (Variant ID: 0)\n")
cat(paste("Node", initial_node, "is the starting point and is marked as reached.\n")) 


In [None]:
cat("\nDefining helper functions...\n")

# Function to handle variant message creation
generate_variant_id <- function(sender_variant_id, current_time_step, rate = 0.1) {
  if (runif(1) < rate) {
    return(current_time_step)
  } else {
    return(sender_variant_id)
  }
}

# Function to get travel time between nodes efficiently
get_travel_time <- function(from_node, to_node, default_time = 1) {
  key <- paste(from_node, to_node, sep = "-")
  time <- all_travel_times[key]
  if (is.na(time)) {
    return(default_time)
  }
  return(time)
}

# Function to handle message transmission between agents
transmit_messages <- function(agents, network, time_step) {
  # Find agents who are stationary (at nodes, not traveling)
  stationary_agents <- which(agents$travel_time == 0)
  
  if (length(stationary_agents) < 2) {
    return(list(agents = agents, network = network))
  }
  
  # Group agents by their current node
  stationary_df <- data.frame(
    agent_idx = stationary_agents,
    node = agents$current_node[stationary_agents],
    has_message = agents$has_message[stationary_agents],
    variant_id = agents$variant_id[stationary_agents]
  )
  
  # Process each node that has multiple agents
  node_groups <- split(stationary_df, stationary_df$node)
  
  for (node_id in names(node_groups)) {
    node_agents <- node_groups[[node_id]]
    
    if (nrow(node_agents) > 1) {
      senders <- node_agents[node_agents$has_message & !is.na(node_agents$variant_id), ]
      receivers <- node_agents[!node_agents$has_message, ]
      
      if (nrow(senders) > 0 && nrow(receivers) > 0) {
        for (s in 1:nrow(senders)) {
          sender_idx <- senders$agent_idx[s]
          sender_variant <- senders$variant_id[s]
          
          for (r in 1:nrow(receivers)) {
            receiver_idx <- receivers$agent_idx[r]
            
            # Attempt transmission
            if (runif(1) < transmission_rate) {
              # Generate variant (might mutate)
              new_variant_id <- generate_variant_id(sender_variant, time_step, mutation_rate)
              
              # Update receiver agent
              agents$has_message[receiver_idx] <- TRUE
              agents$variant_id[receiver_idx] <- new_variant_id
              agents$time_received[receiver_idx] <- time_step
              
              # Update network node tracking
              node_num <- as.numeric(node_id)
              if (!new_variant_id %in% V(network)$variants_reached[[node_num]]) {
                V(network)$variants_reached[[node_num]] <- c(V(network)$variants_reached[[node_num]], new_variant_id)
              }
              
              cat(paste("Time", time_step, ": Agent", receiver_idx, "received variant", new_variant_id, 
                       "at node", node_id, "\n"))
            }
          }
        }
      }
    }
  }
  
  return(list(agents = agents, network = network))
}

# Enhanced function to move agents along the network
move_agents <- function(agents, network, time_step) {
  # Process all agents in a single pass
  for (i in 1:nrow(agents)) {
    if (agents$travel_time[i] > 0) {
      # Agent is traveling - decrement travel time
      agents$travel_time[i] <- agents$travel_time[i] - 1
      
      # Check if agent has just arrived
      if (agents$travel_time[i] == 0) {
        destination_node_id <- agents$target_node[i]
        agents$current_node[i] <- destination_node_id
        
        # Check for first contact with informed agent
        if (agents$has_message[i] && !V(network)$has_been_reached[destination_node_id]) {
          V(network)$has_been_reached[destination_node_id] <- TRUE
          V(network)$time_reached[destination_node_id] <- time_step
          
          # Add this variant to the node's variant list
          variant_id <- agents$variant_id[i]
          if (!is.na(variant_id) && !variant_id %in% V(network)$variants_reached[[destination_node_id]]) {
            V(network)$variants_reached[[destination_node_id]] <- c(V(network)$variants_reached[[destination_node_id]], variant_id)
          }
          
          cat(paste("!!! Time", time_step, ": Node", destination_node_id, 
                   "reached by informed agent with variant", variant_id, "!\n"))
        }
      }
    } else {
      # Agent is stationary - assign new destination
      current <- agents$current_node[i]
      neighbors_list <- neighbors(network, current)
      
      if (length(neighbors_list) > 0) {
        # More realistic movement: prefer closer or more connected nodes
        neighbor_weights <- rep(1, length(neighbors_list))
        
        # Add slight preference for higher degree nodes (cities)
        degrees <- degree(network)[neighbors_list]
        neighbor_weights <- neighbor_weights * (1 + degrees / max(degree(network)))
        
        # Sample target based on weights
        target <- sample(neighbors_list, 1, prob = neighbor_weights)
        agents$target_node[i] <- as.numeric(target)
        
        # Get travel time efficiently
        travel_time <- get_travel_time(current, target)
        agents$travel_time[i] <- travel_time
      }
    }
  }
  
  return(list(agents = agents, network = network))
}

memo to shawn: study that transmission function and message mutation function. a future version of this model, a more complicated version, would take distance between nodes into account.

In [None]:
cat("\n=== RUNNING SIMULATION ===\n")
# Initialize tracking variables
time_step <- 0L
coverage_history <- numeric()
variant_ids_found <- integer()
nodes_reached_history <- numeric()

# Main simulation loop
repeat {
  time_step <- time_step + 1L
  
  # Move agents
  simulation_state <- move_agents(agents, network, time_step)
  agents <- simulation_state$agents
  network <- simulation_state$network
  
  # Handle message transmission
  transmission_state <- transmit_messages(agents, network, time_step)
  agents <- transmission_state$agents
  network <- transmission_state$network
  
  # Record statistics
  coverage <- sum(agents$has_message) / n_agents
  coverage_history <- c(coverage_history, coverage)
  nodes_reached <- sum(V(network)$has_been_reached)
  nodes_reached_history <- c(nodes_reached_history, nodes_reached)
  
  # Track unique variants
  current_variants <- unique(agents$variant_id[agents$has_message & !is.na(agents$variant_id)])
  variant_ids_found <- unique(c(variant_ids_found, current_variants))
  
  # Progress report
  if (time_step %% 50 == 0) {
    cat(paste("--- Time step:", time_step, "| Coverage:",
             round(coverage * 100, 1), "%",
             "| Nodes Reached:", nodes_reached,
             "| Variants:", length(variant_ids_found), "---\n"))
  }
  
  # Check stopping conditions
  if (coverage >= target_coverage) {
    cat(paste("\n*** SIMULATION COMPLETE: Target coverage of", target_coverage * 100, 
             "% reached at time step:", time_step, "***\n"))
    break
  }
  
  if (time_step >= max_time_steps) {
    cat(paste("\n*** SIMULATION TIMEOUT: Maximum time steps reached. Final coverage:", 
             round(coverage * 100, 1), "% ***\n"))
    break
  }
}

cat("\n*** SIMULATION FINISHED ***\n")

In [None]:
cat("\n\n=== DETAILED SIMULATION RESULTS ===\n")
final_coverage <- sum(agents$has_message) / n_agents
final_nodes_reached <- sum(V(network)$has_been_reached)
total_nodes <- vcount(network)

cat(paste("Final coverage:", round(final_coverage * 100, 1), "% (", sum(agents$has_message), "/", n_agents, "agents )\n"))
cat(paste("Simulation duration:", time_step, "time steps\n"))
cat(paste("Nodes reached:", final_nodes_reached, "/", total_nodes, "(", round(final_nodes_reached/total_nodes * 100, 1), "%)\n"))
cat(paste("Number of unique variants:", length(variant_ids_found), "\n"))

# Variant analysis
cat("\n--- Variant Distribution ---\n")
if (length(variant_ids_found) > 0) {
  variant_counts <- table(agents$variant_id[agents$has_message])
  sorted_variants <- sort(as.numeric(names(variant_counts)))
  
  for (v_id in sorted_variants) {
    count <- variant_counts[as.character(v_id)]
    percentage <- round(count / sum(agents$has_message) * 100, 1)
    
    if (v_id == 0) {
      cat(paste("  - Variant 0 (Original):", count, "agents (", percentage, "%)\n"))
    } else {
      cat(paste("  - Variant", v_id, "(Created at t=", v_id, "):", count, "agents (", percentage, "%)\n"))
    }
  }
} else {
  cat("No messages were transmitted.\n")
}

# Node-level variant analysis
cat("\n--- Node Variant Penetration ---\n")
nodes_with_variants <- which(V(network)$has_been_reached)
if (length(nodes_with_variants) > 0) {
  variant_penetration <- sapply(nodes_with_variants, function(node_idx) {
    length(V(network)$variants_reached[[node_idx]])
  })
  
  cat(paste("Average variants per reached node:", round(mean(variant_penetration), 2), "\n"))
  cat(paste("Max variants in a single node:", max(variant_penetration), "\n"))
  cat(paste("Nodes with multiple variants:", sum(variant_penetration > 1), "\n"))
}

# Time-to-reach statistics
reached_times <- V(network)$time_reached[!is.na(V(network)$time_reached)]
if (length(reached_times) > 0) {
  cat("\n--- Time-to-Reach Statistics ---\n")
  cat(paste("First node reached at time:", min(reached_times), "\n"))
  cat(paste("Last node reached at time:", max(reached_times), "\n"))
  cat(paste("Median time to reach:", median(reached_times), "\n"))
  cat(paste("25th percentile:", quantile(reached_times, 0.25), "\n"))
  cat(paste("75th percentile:", quantile(reached_times, 0.75), "\n"))
}


In [None]:
cat("\n=== CREATING VISUALIZATIONS ===\n")

# Plot 1: Coverage over time
coverage_plot <- ggplot(data.frame(time = 1:length(coverage_history), coverage = coverage_history), 
                       aes(x = time, y = coverage)) +
  geom_line(color = "blue", size = 1) +
  geom_hline(yintercept = target_coverage, linetype = "dashed", color = "red") +
  labs(title = "Information Coverage Over Time", 
       x = "Time Steps", 
       y = "Proportion of Agents with Message") +
  scale_y_continuous(labels = percent_format()) +
  theme_minimal()

print(coverage_plot)
ggsave(coverage_plot, file="time-curve.png")

# Plot 2: Network visualization with variant information
node_colors <- rep("lightgray", vcount(network))
node_colors[V(network)$has_been_reached] <- "lightblue"

# Color nodes by number of variants
if (length(nodes_with_variants) > 0) {
  max_variants <- max(variant_penetration)
  if (max_variants > 1) {
    color_scale <- colorRampPalette(c("lightblue", "red"))(max_variants)
    for (i in seq_along(nodes_with_variants)) {
      node_idx <- nodes_with_variants[i]
      n_variants <- variant_penetration[i]
      node_colors[node_idx] <- color_scale[n_variants]
    }
  }
}

# Save to file using png() device
# because the map is generated via networkx and we can't use ggplot with that.

# Save as PDF for better quality
pdf("map-dispersion.pdf", width = 12, height = 8)

plot(network,
     vertex.size = 2,
     vertex.color = node_colors,
     vertex.label = ifelse(V(network)$has_been_reached, 
                          paste0("N", 1:vcount(network)), ""),
     vertex.label.cex = 0.5,
     edge.width = 1,
     edge.color = "gray",
     layout = layout_coords,
     main = "Final Network State\n(Blue = Reached, Red intensity = More variants)")

legend("topright", 
       legend = c("Not reached", "Reached", "Multiple variants"),
       fill = c("lightgray", "lightblue", "red"),
       cex = 0.8)

dev.off()

cat("Network visualization also saved as 'map-dispersion.pdf'\n")


cat("\n=== SIMULATION ANALYSIS COMPLETE ===\n")

## Now...

What does this mean? Do you know of any phenomena in the ancient Roman world that might map onto this simple model of diffusion that we created? This model is similar to the [[virus-on-a-network.ipynb]] model, though with a different mechanic in mind when it comes of 'infection'. What are some ways you might want to extend this model to better capture... economic change? social change? cultural phenomena? disease? One thing that springs to mind is that I might want to know the difference that the pattern of interconnections make. I might want to run these dynamics on simple network structures - rings, stars, small-worlds, random - and measure what emerges, then compare to this Roman network. Alternatively, or additionally, maybe the S-I-R model of disease infection is similar to how ideas spread- people are open to an idea, people are resistant to an idea, people are lit on fire by an idea. Maybe an agent with an 'idea' can be made to wander not randomly, but deliberately seek people out.

...there's no end of trouble you can get into with simulation modeling! If any of this really captures your fancy, go see [Romanowska, Wren, and Crabtree](https://santafeinstitute.github.io/ABMA/) who'll show you how to build from scratch within Netlogo.