# Agent Based Simulation
An agent-based simulation allows you to model complex phenomena by specifying the behaviours and population characterstics of the agents at a lower level of complexity. If you are interested in how human cooperation gives rise to settlements, you might try to model a small population of humans and give them basic ruls of behaviour about how they exploit or share resources. Then you run the model thousands of times, each time changing the various settings for your model such that you have data on multiple runs, multiple times across the full gamut of the possible combinations.

This is called 'sweeping the behaviour space', and it helps you validate or demonstrate that your model is simulating the phenomenon correctly (eg, you haven't made some error of code that is having an impact on any emergent effects you spot). Then you can fit the archaeological evidence against your model to see what the model says about that combination of evidence, ie, 'I've got data that looks like this; the model reproduces that distribution only when 'altruism' is high and 'resources' are high, which therefore suggests...

In this notebook we are building this model using a version of the Netlogo programming language ported to R. Netlogo comes with its own programming environment; you can check it out [behind this link](https://netlogoweb.org/). You might want to select the web version just to play around with that interface and explore some of the demonstration models - are there any that might usefully be 'reconceptualized' for archaeological or historical situations? It's often easier to start with an exsiting model and then modify it for yourself. 

Netlogo code is pretty readable - here's the main code chunk for a model about a spreading virus on a network:

```netlogo
to go
  if all? turtles [not infected?]
    [ stop ]
  ask turtles
  [
     set virus-check-timer virus-check-timer + 1
     if virus-check-timer >= virus-check-frequency
       [ set virus-check-timer 0 ]
  ]
  spread-virus
  do-virus-checks
  tick
end
```

Each one of those commands calls smaller chunks of code. It's a relatively easy language to get started with and then do interesting/complicated things! I wrote a model once about the [triggers of civil violence in the Roman world](https://www.digitalstudies.org/article/id/7198/). The argument of the simulation is expressed in its code. 

Now, I didn't want to have you downloading and installing another platform; we'll use the version of Netlogo for R here as I build a simple model. Read it carefully to see what's going on. At the end of this, I show you how you can set up an experiment. The code here is created in a similar way to the Netlogo example mode, '[Virus on a Network](https://netlogoweb.org/launch#https://netlogoweb.org/assets/modelslib/Sample%20Models/Networks/Virus%20on%20a%20Network.nlogo)':

> This model demonstrates the spread of a virus through a network. Although the model is somewhat abstract, one interpretation is that each node represents a computer, and we are modeling the progress of a computer virus (or worm) through this network. Each node may be in one of three states: susceptible, infected, or resistant. In the academic literature such a model is sometimes referred to as an SIR model for epidemics.

Let's imagine its a network of individuals in a city, infected by the [Plague of Justinian](https://en.wikipedia.org/wiki/Plague_of_Justinian). Examine the code, and then in the experiment section, adjust the parameters to match your idea of that plague's dynamics. What do you change? What is missing? How does the model behave? What breaks? Why? What does the simulation suggest re the events of 541-544?


In [None]:
# 1. preliminaries: gotta import the R libraries we'll use
library(NetLogoR)
library(ggplot2)
library(igraph)
library(tidyr)

In [None]:
## Global Variables

We define some global parameters that will define our simulation.

In [None]:
# 2. Global parameters
num_nodes <- 200          # Number of people in the network
initial_infected <- 3     # Number of people initially infected
infection_prob <- 0.05    # Probability of transmitting the virus to a neighbor
recovery_prob <- 0.08     # Probability of an infected person recovering in a tick
max_ticks <- 150          # How many time steps to run the simulation

In [None]:
# 3. Create the agents as a standard R data.frame
agents <- data.frame(
  who = 0:(num_nodes - 1),  # 0-based ID to match igraph/links later
  status = "susceptible",
  color = "blue",
  stringsAsFactors = FALSE
)

# Create a small-world network using igraph
g <- sample_smallworld(dim = 1, size = num_nodes, nei = 4, p = 0.1)

# Manually create the links data frame from the igraph object
# This gives us a two-column data frame of connections
edge_list <- as_edgelist(g, names = FALSE)
links <- as.data.frame(edge_list)
colnames(links) <- c("end1", "end2")

# IMPORTANT: Adjust from igraph's 1-based index to our 0-based 'who' ID
links$end1 <- links$end1 - 1
links$end2 <- links$end2 - 1

# Quick visualization of the initial network structure
plot(g, vertex.label = NA, vertex.size = 5, vertex.color = agents$color)
title("Initial Network Structure")

In [None]:
# 4. Randomly select 'who' IDs to be infected
infected_ids <- sample(agents$who, size = initial_infected)

# Update the status and color for those agents using standard R subsetting
agents$status[agents$who %in% infected_ids] <- "infected"
agents$color[agents$who %in% infected_ids] <- "red"

In [None]:
# 5. Define the simulation
go <- function() {
  
  # 1. SPREAD THE VIRUS
  #--------------------
  
  # Get 'who' IDs of currently infected agents
  infected_nodes_who <- agents$who[agents$status == "infected"]
  
  if (length(infected_nodes_who) > 0) {
    # Find all neighbors of the infected agents by querying the 'links' data frame
    neighbor_ids_end1 <- links$end2[links$end1 %in% infected_nodes_who]
    neighbor_ids_end2 <- links$end1[links$end2 %in% infected_nodes_who]
    all_neighbors <- unique(c(neighbor_ids_end1, neighbor_ids_end2))
    
    # Find which of these neighbors are susceptible
    is_susceptible <- agents$status[agents$who %in% all_neighbors] == "susceptible"
    susceptible_neighbors_who <- all_neighbors[is_susceptible]
    
    if (length(susceptible_neighbors_who) > 0) {
      # Each susceptible neighbor has a chance to become infected
      infection_roll <- runif(length(susceptible_neighbors_who))
      newly_infected_ids <- susceptible_neighbors_who[infection_roll < infection_prob]
      
      # Update the main 'agents' data frame for the newly infected
      if (length(newly_infected_ids) > 0) {
        agents$status[agents$who %in% newly_infected_ids] <<- "infected"
        agents$color[agents$who %in% newly_infected_ids] <<- "red"
      }
    }
  }

  # 2. RECOVERY
  #------------
  
  # All infected nodes (including newly infected ones) have a chance to recover
  all_infected_who <- agents$who[agents$status == "infected"]
  
  if (length(all_infected_who) > 0) {
    recovery_roll <- runif(length(all_infected_who))
    recovered_ids <- all_infected_who[recovery_roll < recovery_prob]
    
    # Update the main 'agents' data frame for the recovered
    if (length(recovered_ids) > 0) {
      agents$status[agents$who %in% recovered_ids] <<- "recovered"
      agents$color[agents$who %in% recovered_ids] <<- "gray"
    }
  }
}

In [None]:
# 6. RUN THE SIMULATION
# Prepare a dataframe to store the results
results <- data.frame(tick = 0:max_ticks, S = 0, I = 0, R = 0)

# Record initial state
results[1, "S"] <- sum(agents$status == "susceptible")
results[1, "I"] <- sum(agents$status == "infected")
results[1, "R"] <- sum(agents$status == "recovered")

# Run the simulation loop
for (tick in 1:max_ticks) {
  go() # Run one step
  
  # Record the new counts
  results[tick + 1, "S"] <- sum(agents$status == "susceptible")
  results[tick + 1, "I"] <- sum(agents$status == "infected")
  results[tick + 1, "R"] <- sum(agents$status == "recovered")
  
  # Stop if the infection has died out
  if (results[tick + 1, "I"] == 0) {
    results[(tick + 2):(max_ticks + 1), ] <- results[tick + 1, ]
    results$tick <- 0:max_ticks
    break
  }
}
head(results)

In [None]:
# 7. VISUALIZE THE RESULTS

#1. Plot the final state of the network
plot(g, 
     vertex.label = NA, 
     vertex.size = 5, 
     vertex.color = agents$color) # Use the final colors from our data frame
title("Final Network State")
legend("bottomleft", 
       legend = c("Susceptible", "Infected", "Recovered"), 
       fill = c("blue", "red", "gray"), 
       bty = "n")

# 2. Plot the epidemic curve (S-I-R over time)
results_long <- pivot_longer(
  results,
  cols = c("S", "I", "R"),
  names_to = "Status",
  values_to = "Count"
)

ggplot(results_long, aes(x = tick, y = Count, color = Status, group = Status)) +
  geom_line(linewidth = 1.2) +
  scale_color_manual(values = c("S" = "blue", "I" = "red", "R" = "gray")) +
  labs(
    title = "SIR Epidemic Curve on a Network",
    x = "Time (Ticks)",
    y = "Number of Individuals",
    color = "Status"
  ) +
  theme_minimal()

# Doing an experiment

We don't generally just run a simulation once. In order for us to explore the different situations under which the virus will spread, we need to take all of that code we just defined and move it into a single function we can run with different parameters.

In [None]:
# we use dplyr for data manipulation
library(dplyr)

# This function runs ONE full simulation and returns the results.
run_sir_network_simulation <- function(num_nodes = 200,   ## you might want to change this later
                                       initial_infected = 3,  ## you might want to change this later
                                       infection_prob = 0.05, 
                                       recovery_prob = 0.08, 
                                       max_ticks = 150) {
  
  # --- Setup within the function ---
  agents <- data.frame(
    who = 0:(num_nodes - 1),
    status = "susceptible",
    color = "blue",
    stringsAsFactors = FALSE
  )
  
  g <- sample_smallworld(dim = 1, size = num_nodes, nei = 4, p = 0.1)
  edge_list <- as_edgelist(g, names = FALSE)
  links <- as.data.frame(edge_list)
  colnames(links) <- c("end1", "end2")
  links$end1 <- links$end1 - 1
  links$end2 <- links$end2 - 1
  
  # --- Initialization ---
  infected_ids <- sample(agents$who, size = initial_infected)
  agents$status[agents$who %in% infected_ids] <- "infected"
  agents$color[agents$who %in% infected_ids] <- "red"
  
  # --- "Go" function (defined inside to access agents/links) ---
  go <- function() {
    # (The go function logic remains the same as before)
    infected_nodes_who <- agents$who[agents$status == "infected"]
    if (length(infected_nodes_who) > 0) {
      neighbor_ids_end1 <- links$end2[links$end1 %in% infected_nodes_who]
      neighbor_ids_end2 <- links$end1[links$end2 %in% infected_nodes_who]
      all_neighbors <- unique(c(neighbor_ids_end1, neighbor_ids_end2))
      is_susceptible <- agents$status[agents$who %in% all_neighbors] == "susceptible"
      susceptible_neighbors_who <- all_neighbors[is_susceptible]
      if (length(susceptible_neighbors_who) > 0) {
        infection_roll <- runif(length(susceptible_neighbors_who))
        newly_infected_ids <- susceptible_neighbors_who[infection_roll < infection_prob]
        if (length(newly_infected_ids) > 0) {
          agents$status[agents$who %in% newly_infected_ids] <<- "infected"
          agents$color[agents$who %in% newly_infected_ids] <<- "red"
        }
      }
    }
    all_infected_who <- agents$who[agents$status == "infected"]
    if (length(all_infected_who) > 0) {
      recovery_roll <- runif(length(all_infected_who))
      recovered_ids <- all_infected_who[recovery_roll < recovery_prob]
      if (length(recovered_ids) > 0) {
        agents$status[agents$who %in% recovered_ids] <<- "recovered"
        agents$color[agents$who %in% recovered_ids] <<- "gray"
      }
    }
  }
  
  # --- Simulation Loop ---
  results <- data.frame(tick = 0:max_ticks, S = 0, I = 0, R = 0)
  results[1, "S"] <- sum(agents$status == "susceptible")
  results[1, "I"] <- sum(agents$status == "infected")
  results[1, "R"] <- sum(agents$status == "recovered")
  
  for (tick in 1:max_ticks) {
    go()
    results[tick + 1, "S"] <- sum(agents$status == "susceptible")
    results[tick + 1, "I"] <- sum(agents$status == "infected")
    results[tick + 1, "R"] <- sum(agents$status == "recovered")
    if (results[tick + 1, "I"] == 0) {
      # Use tidyr::fill to propagate the last observation down
      results <- tidyr::fill(results, S, I, R, .direction = "down")
      break
    }
  }
  
  return(results)
}

In [None]:
# This function runs experiments with different parameter sets, n_runs times each.
run_experiments <- function(parameter_sets, n_runs) {
  
  # A list to store all results
  all_results <- list()
  
  # Loop through each row of the parameter data frame
  for (i in 1:nrow(parameter_sets)) {
    params <- parameter_sets[i, ]
    
    cat(paste0("Running setting ", i, ": infection_prob = ", params$infection_prob, 
               ", recovery_prob = ", params$recovery_prob, "...\n"))
    
    # Run the simulation n_runs times for this parameter set
    for (run in 1:n_runs) {
      
      single_run_results <- run_sir_network_simulation(
        infection_prob = params$infection_prob,
        recovery_prob = params$recovery_prob,
        max_ticks = params$max_ticks
        # You can add other parameters here if you vary them
      )
      
      # Add columns to identify this specific run and its parameters
      single_run_results$run_id <- run
      single_run_results$infection_prob <- params$infection_prob
      single_run_results$recovery_prob <- params$recovery_prob
      
      all_results[[length(all_results) + 1]] <- single_run_results
    }
  }
  
  # Combine the list of data frames into one big data frame
  return(dplyr::bind_rows(all_results))
}

In [None]:
# This function visualizes the aggregated results from the experiments
plot_experiment_results <- function(experiment_data) {
  
  # Reshape data to long format for ggplot
  data_long <- tidyr::pivot_longer(
    experiment_data,
    cols = c("S", "I", "R"),
    names_to = "Status",
    values_to = "Count"
  )
  
  # Calculate summary statistics (mean and sd) for each tick, status, and parameter set
  summary_data <- data_long %>%
    group_by(tick, Status, infection_prob, recovery_prob) %>%
    summarise(
      mean_count = mean(Count),
      sd_count = sd(Count),
      .groups = "drop"
    ) %>%
    # Calculate the upper and lower bounds for the ribbon
    mutate(
      ribbon_min = pmax(0, mean_count - sd_count), # prevent negative counts
      ribbon_max = mean_count + sd_count
    )

  # Create the plot
  ggplot(summary_data, aes(x = tick, group = Status, color = Status, fill = Status)) +
    # The ribbon for variability (mean +/- 1 standard deviation)
    geom_ribbon(aes(ymin = ribbon_min, ymax = ribbon_max), alpha = 0.2, linetype = 0) +
    # The line for the mean trend
    geom_line(aes(y = mean_count), linewidth = 1) +
    # Facet by the parameters you varied
    facet_grid(infection_prob ~ recovery_prob, labeller = label_both) +
    scale_color_manual(values = c("S" = "blue", "I" = "red", "R" = "gray")) +
    scale_fill_manual(values = c("S" = "blue", "I" = "red", "R" = "gray")) +
    labs(
      title = "Aggregated SIR Simulation Results",
      subtitle = "Lines show mean count; ribbons show ±1 standard deviation across runs",
      x = "Time (Ticks)",
      y = "Average Number of Individuals"
    ) +
    theme_minimal() +
    theme(legend.position = "bottom")
}

## And now we define and run the experiment!

Run this, then read the wikipedia article about [the Plague of Justinian](https://en.wikipedia.org/wiki/Plague_of_Justinian) and make your observations in a new markdown document.

BTW, if you want to change the population size and the number of initial infected, you'll have to go up to the cell where `run_sir_network_simulation` was defined and change the number of agents and number of initial infected, if you decide that those variables matter. If you change those variables there, they'll stay constant for every run you set in this next block.

In [None]:
# 1. Define the parameter sets you want to test
# expand.grid is great for creating all combinations of parameters
parameter_sets <- expand.grid(
  infection_prob = c(0.02, 0.05, 0.10), #ie, first we go with 2%, then 5%, then 10%
  recovery_prob = c(0.02), # Keeping recovery constant for this experiment
  max_ticks = 500
)

# 

# 2. Run the experiments (e.g., 20 runs for each setting)
# This might take a moment to run.
experiment_data <- run_experiments(parameter_sets = parameter_sets, n_runs = 20)

# 3. Visualize the results
plot_experiment_results(experiment_data)

Ok - now here's a more complicated ABM: [[abm-on-roman-network.ipynb]] which uses a similar idea of 'contagion' though implemented differently, on top of the Roman communications network we looked at earlier. And for something rather different, try the [[abm-foraging.ipynb]] simulation, which in a way, is about the emergence of social networks in the first place, as a way of fostering cooperation and survival.