## Welcome to the Second Lab - Week 1, Day 3

Today we will work with lots of models! This is a way to get comfortable with APIs.

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Important point - please read</h2>
            <span style="color:#ff7800;">The way I collaborate with you may be different to other courses you've taken. I prefer not to type code while you watch. Rather, I execute Jupyter Labs, like this, and give you an intuition for what's going on. My suggestion is that you carefully execute this yourself, <b>after</b> watching the lecture. Add print statements to understand what's going on, and then come up with your own variations.<br/><br/>If you have time, I'd love it if you submit a PR for changes in the community_contributions folder - instructions in the resources. Also, if you have a Github account, use this to showcase your variations. Not only is this essential practice, but it demonstrates your skills to others, including perhaps future clients or employers...
            </span>
        </td>
    </tr>
</table>

In [3]:
# Start with imports - ask ChatGPT to explain any package that you don't know

import os
import json
from dotenv import load_dotenv
from openai import OpenAI
from anthropic import Anthropic
from IPython.display import Markdown, display

In [4]:
# Always remember to do this!
load_dotenv(override=True)

True

In [5]:
# Print the key prefixes to help with any debugging

openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

OpenAI API Key exists and begins sk-proj-
Anthropic API Key not set (and this is optional)
Google API Key exists and begins AI
DeepSeek API Key not set (and this is optional)
Groq API Key not set (and this is optional)


In [6]:
request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. "
request += "Answer only with the question, no explanation."
messages = [{"role": "user", "content": request}]

In [7]:
openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-5-mini",
    messages=messages,
)
question = response.choices[0].message.content
print(question)


You are an AI advisor tasked by the mayor of a mid-sized city (population ~500,000) to design a five-year plan to reduce annual traffic-related deaths by 50% within a fixed budget of $200 million, while minimizing displacement of harm to vulnerable populations and preserving mobility for people who rely on cars. Provide all of the following in your answer: (a) a prioritized list of specific interventions (engineering, policy, enforcement, education, technology, or combinations) with brief explanations of how each reduces fatalities, estimated cost per intervention, expected reduction in annual deaths (with uncertainty ranges and a confidence level for each estimate), and time-to-impact; (b) a practical data-collection and evaluation plan that would allow the city to measure impact and attribute causality (including at least one feasible randomized or quasi-experimental design and what data would be needed), plus key metrics to track; (c) the main ethical trade-offs and distributional i

In [8]:
competitors = []
answers = []
messages = [{"role": "user", "content": question}]

## Note - update since the videos

I've updated the model names to use the latest models below, like GPT 5 and Claude Sonnet 4.5. It's worth noting that these models can be quite slow - like 1-2 minutes - but they do a great job! Feel free to switch them for faster models if you'd prefer, like the ones I use in the video.

In [9]:
# The API we know well
# I've updated this with the latest model, but it can take some time because it likes to think!
# Replace the model with gpt-4.1-mini if you'd prefer not to wait 1-2 mins

model_name = "gpt-5-nano"

response = openai.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Below is a five-year plan designed for a mid-sized city (~500,000 people) to cut annual traffic-related deaths by 50% within a $200 million budget, while limiting harm to vulnerable groups and preserving mobility for car users. The plan is organized to (a) interventions with costs and expected impacts, (b) data collection and evaluation to attribute causality, (c) ethical and equity considerations with concrete mitigations, (d) three alternative strategies if the budget drops to $100 million, and (e) explicit assumptions, major uncertainties, and ambiguities relevant to recommendations.

Important note on baseline and targets
- Assumed baseline annual traffic fatalities (citywide): 60 per year. This is a plausible mid-range for a city of this size in the U.S. when considering all fatalities (drivers, pedestrians, bicyclists, etc.). If your city’s current data differ markedly, please adjust the numbers below accordingly.
- Target by year 5: about 30 fatalities per year (50% reduction from 60).
- The plan considers overlapping effects: many interventions reduce different crash types (speed-related, intersection crashes, pedestrian–vehicle conflicts, etc.). Full realization of benefits may require coordinated deployment and time to change behaviors and speeds.

(a) Prioritized interventions (engineering, policy, enforcement, education, technology, or combinations)
Note: All cost figures are five-year program costs (not annual), unless stated otherwise. All death-reduction estimates are annual, with uncertainty ranges and a confidence level. Time-to-impact describes when most of the effect is expected to be realized.

1) Speed management citywide + automated speed enforcement (high-priority)
- What this is: Lower default speeds on many corridors (including near schools and high-pedestrian activity areas), enhanced road design for safe speeds (traffic calming where appropriate), and automated speed enforcement (speed cameras) with clear enforcement presence.
- Estimated total cost: $70–85 million.
- How it reduces fatalities: Speed is a dominant driver of fatal crash risk. Reducing speeds lowers crash energy, increases the likelihood of surviving crashes, and reduces crash frequency in many settings.
- Expected annual deaths reduced: 15–24 deaths per year (roughly 25–40% of baseline 60 deaths).
- Uncertainty (range) and confidence: 90% credible interval roughly 10–28 deaths avoided per year; confidence level: High to Moderate-High (well-supported by multiple urban studies and meta-analyses, though city-specific effects depend on enforcement intensity and driver compliance).
- Time-to-impact: Early implementation can yield measurable effects in year 1–2 (with substantial effect by year 3–4 as speeds respond and enforcement routines mature); full effect by year 4–5.
- Rationale for priority status: Most cost-effective lever with broad potential impact across many crash types, including fatal and severe injury crashes, while supporting “car-mobility” through safer speeds.

2) Targeted high-crash intersection safety upgrades (engineering + signal optimization)
- What this is: Redesign and convert 25 of the city’s highest-crash intersections to safer configurations (protected turning lanes, pedestrian refuge islands, curb extensions where appropriate, better lighting, optimized signal timing), with possible small-scale geometric changes.
- Estimated total cost: $50 million (range $45–60M depending on site complexity).
- How it reduces fatalities: Conflicts at intersections are a leading source of severe injuries and fatalities; reducing crash angles and improving pedestrian crossing improves survivability for all modes.
- Expected annual deaths reduced: 8–20 deaths per year (roughly 13–33% of baseline).
- Uncertainty and confidence: 90% credible interval about 4–14 deaths avoided per year; confidence level: High (robust evidence base for intersection safety, with well-documented cost ranges and strong local impact potential).
- Time-to-impact: Installations completed over 1–3 years; safety benefits begin as construction completes, with ongoing gains as drivers adjust.
- Rationale: Complements speed management; targeted investments in high-risk locales yield outsized benefits with relatively contained costs.

3) Pedestrian and cycling network improvements (protected bike lanes + school-route safety) with multimodal streetscape upgrades
- What this is: 20 miles of protected or semi-protected bike lanes, along with safer pedestrian crossings (refuges, curb extensions, audible/visual cues), enhanced lighting, and improvements on primary school routes.
- Estimated total cost: $40 million (range $35–45M).
- How it reduces fatalities: Directly reduces pedestrian–vehicle and cyclist–vehicle conflicts, particularly for children and older adults; improves visibility and predictability for all road users.
- Expected annual deaths reduced: 3–10 deaths per year (roughly 5–17% of baseline).
- Uncertainty and confidence: 90% credible interval 1–12 deaths avoided per year; confidence level: Moderate (strong qualitative evidence, with greater uncertainty about exact magnitudes city-to-city due to behavior responses and mode shares).
- Time-to-impact: 2–3 years to complete; benefits accrue as new facilities are used more frequently.
- Rationale: Supports vulnerable users while preserving mobility options; co-benefits include encouraging walking/biking and potentially reducing driving for short trips.

4) Transit priority and multi-modal improvements (traffic signal priority for buses, bus lanes, better stops)
- What this is: Implement bus rapid transit–style priority on major corridors, add a few miles of bus lanes where feasible, improve stops, and optimize signal timings to favor transit flow.
- Estimated total cost: $15–25 million.
- How it reduces fatalities: Helps shift trips from cars to transit, reducing vehicle exposure and congestion; smoother traffic flow can reduce aggressive driving and speed variability in corridors with high transit use; improves safety for pedestrians and cyclists at bus stops.
- Expected annual deaths reduced: 2–7 deaths per year (roughly 3–12% of baseline).
- Uncertainty and confidence: 90% credible interval 0–12 deaths avoided per year; confidence level: Moderate (benefits accrue with ridership shifts and queueing changes; direct safety benefits are smaller than speed or intersection interventions but still meaningful in aggregate).
- Time-to-impact: 2–4 years for deployment and observed behavior changes; full effects may take longer as transit usage patterns shift.
- Rationale: Complements the safety interventions by offering alternatives to car travel and reducing local traffic volumes on key corridors.

5) Education/awareness + data-driven governance (policy + outreach)
- What this is: Public education campaigns about safe speeds, pedestrian/cicycle etiquette, school-zone rules; transparency in enforcement; engaging communities; and a robust data-management program to monitor progress and guide decisions.
- Estimated total cost: $5–8 million.
- How it reduces fatalities: Increases voluntary compliance and supports sustainable behavioral change; drives acceptance and proper use of newly engineered safety features.
- Expected annual deaths reduced: 0.6–2 deaths per year (roughly 1–3% of baseline).
- Uncertainty and confidence: 90% credible interval 0–5 deaths avoided per year; confidence level: Low to Moderate (effects are incremental and mediated by other safety investments; difficult to isolate effects without a strong evaluation).
- Time-to-impact: Immediate to 2 years for awareness effects; synergy with other interventions improves overall effectiveness.
- Rationale: Low cost with cross-cutting benefits; helps equity and legitimacy of the plan.

Total budget fit and overall ambition
- Combined plan cost: roughly $180–210 million, depending on site conditions and exact equipment/pricing. With a careful procurement strategy and phased roll-out, the full plan is achievable within a $200 million budget.
- Cumulative impact trajectory: Early gains from speed management and intersection safety could contribute substantially in year 1–2; gains from bike/pedestrian networks and transit priority accrue in years 2–4; education/data governance supports all components and helps sustain reductions. Collectively, a credible path exists to reach roughly a 50% reduction in annual fatalities by year 4–5, within the stated budget, assuming high implementation fidelity and consistent enforcement.

(b) Data collection and evaluation plan (causality-focused)
Goals:
- Measure crash outcomes and attribute observed changes to specific interventions, while accounting for exposure and other confounders.
- Track equity impacts (how harms and benefits distribute across income groups, races/ethnicities, age groups, and neighborhoods).

Design and data approach:
- Primary design: Stepped-wedge randomized rollout (preferred) or quasi-experimental difference-in-differences (DiD) with matched controls.
  - If feasible, randomize the order in which neighborhoods or corridors receive A–E interventions (or pair A and B units first, then roll out to C, D, E on a schedule). This creates internal validity for causal inference.
  - If randomization is infeasible due to political/operational constraints, use a matched control design: identify comparable neighborhoods/ corridors that will not receive the interventions at first, then use DiD to estimate treatment effects as interventions are rolled out.
- Control/comparator options: Comparable mid-size cities or neighborhoods within the city not yet treated; synthetic control methods using metro- or regional data to emulate a counterfactual crash trend if a perfect internal control does not exist.
- Key data needs (sources and updates):
  - Police-reported crashes: geocoded, with crash type, severity (fatal, severe injury, minor injury), road user type, time, weather, lighting; standardized coding for “cause” where available.
  - Hospital/EMS data: for severe injuries not captured in police data; linkage to crashes via anonymized identifiers or probabilistic matching (with privacy protections).
  - Exposure data: vehicle miles traveled (VMT) by corridor/zone, pedestrian volumes, bike counts, transit ridership, and city-wide traffic volumes; annualized and seasonally adjusted.
  - Speed data: 85th percentile speeds and average speeds from road sensors, speed trailers, or mobile data; pre/post comparisons in treated vs control areas.
  - Enforcement data: hours deployed, tickets issued, camera performance, and maintenance logs.
  - Implementation data: exact dates of interventions, scope of work, and fidelity metrics (e.g., percent of planned intersections completed on schedule).
  - Demographic and equity data: neighborhood income, race/ethnicity, age distributions; poverty rates; access to transit; access to vehicles.
- Outcome measures:
  - Primary: fatal crashes per year and per 100,000 population citywide and by neighborhood.
  - Secondary: severe injuries (e.g., disabling injuries), total crashes, crash severity mix, pedestrian/bicycle fatalities, and mode-specific injury/fatality rates.
  - Process/equity metrics: speed compliance rates, proportion of high-crash corridors with safety improvements, share of benefits accruing to historically disadvantaged neighborhoods, changes in mode share (car vs. transit vs. walking/biking), access to essential services.
- Analysis approach:
  - Difference-in-differences with appropriate fixed effects and controls for weather, seasonality, economic activity, and exposure changes (VMT, pedestrian volumes).
  - Event-study analyses to examine pre- and post-intervention trends.
  - Bayesian hierarchical models to synthesize information across corridors and provide probabilistic estimates with uncertainty intervals.
  - Sensitivity analyses to test robustness to unobserved confounding and spillover effects to neighboring corridors.
- Implementation and governance:
  - Create a centralized data dashboard and data-sharing agreements among the police department, fire/EMS, public health, transportation, and hospital systems.
  - Establish an independent evaluation advisory panel including academic partners, community representatives, and civil rights/advocacy groups.
  - Data privacy: anonymize or de-identify individual crash reports; limit access to sensitive data; publish annual, aggregated results to the public.
- Key milestones and metrics:
  - Baseline year: compile 3–5 years of pre-intervention crash data.
  - Annual reporting: fatal crashes, severe injuries, total crashes, exposure-adjusted rates, speed metrics, and equity indicators.
  - Causal attribution: estimated treatment effects with CIs; documentation of attribution assumptions and sensitivity analyses.
  - Process metrics: implementation fidelity (e.g., % of planned lanes completed, % intersections upgraded on schedule), enforcement intensity, transit priority operations uptime.

(c) Ethical trade-offs and distributional impacts; mitigation measures
Ethical considerations and potential harms:
- Enforcement equity: Increased enforcement can disproportionately affect marginalized communities if not designed and implemented with transparency and oversight.
- Gentrification/shifted burdens: Safety improvements may influence land use and traffic patterns, potentially displacing burdens or benefits.
- Access and mobility: Some safety measures may constrain car movement if poorly designed or poorly communicated; ensuring equitable access to mobility is essential.
- Privacy and civil liberties: Data collection must protect individual privacy and avoid surveillance creep.

Mitigation measures and concrete actions:
- Equity-first design: Prioritize safety improvements in historically high-risk and underserved neighborhoods; seat a community advisory board to review projects; set explicit equity objectives (e.g., reduce fatality rates in low-income/patient-vulnerable groups by a targeted amount).
- Inclusive engagement: Hold accessible, multilingual public meetings; provide safe routes to school maps; gather input from community organizations representing pedestrians, bus riders, cyclists, and drivers.
- Oversight and transparency: An independent safety and civil rights review board to monitor enforcement practices, camera deployments, and equity outcomes; publish annual equity dashboards showing who benefits.
- Enforcement reform and training: Racial-bias and de-escalation training for enforcement personnel; require transparent enforcement metrics (e.g., enforcement days, citations by neighborhood) and quarterly public reports.
- Mobility protections for car users: Maintain reasonable travel options for essential trips; preserve parking access on non-safety-critical streets; ensure detours/roadworks don’t disproportionately burden essential workers.
- Privacy protections: Anonymization of crash data; limit geospatial detail in public reports to protect individual privacy where appropriate.
- Job and economic support: If road work causes temporary disruption, offer local hiring, traffic management jobs to residents of affected neighborhoods, and clear communications about construction timelines and alternative routes.

(d) Three alternative prioritized strategies if the budget is reduced to $100 million (and how outcomes would likely change)
If only $100M is available, prioritize strategies with the strongest evidence-to-cost ratios and ensure equity is still central. Three plausible options:

Strategy A: Targeted speed management + high-crash intersections in the top corridors (the “core safety package”)
- Estimated cost: about $90–100M (reduced scale of corridor coverage and some intersections).
- Expected impact: Comparable to the lower-to-mid range of the original plan for fatality reductions in treated corridors; citywide impact might fall short of 50% unless spillover benefits are large. Estimated annual deaths reduced citywide: ~15–25 in early years (depending on coverage).
- Time-to-impact: Partial benefits in year 1–2; stronger impact by year 3–4.
- Equity considerations: Focus on high-risk neighborhoods and school routes to maximize safety gains where they are most needed.
- Likelihood of achieving 50% citywide reduction: Lower than the $200M plan unless safety improvements yield indirect benefits beyond treated corridors; more sensitivity to enforcement intensity and maintenance.

Strategy B: School-zone pedestrian safety focus + targeted transit-priority where feasible
- Estimated cost: about $90–100M (major investments around school routes plus a few transit-priority corridors).
- Expected impact: Reduction concentrated around school routes and high pedestrian activity; citywide fatality reduction might be moderate (roughly 15–30%), with strong equity benefits for children and older adults who walk to school.
- Time-to-impact: 2–4 years to full effect.
- Equity considerations: Very positive for children, families, and low-income neighborhoods with high reliance on walking to school.
- Likelihood of achieving 50% citywide reduction: Unlikely on a $100M budget unless there are additional external funds or large behavioral changes elsewhere.

Strategy C: Transit-first approach with limited safety investments + external funding leverage
- Estimated cost: about $90–100M (bus-priority corridors, a few bus lanes, and limited safety work).
- Expected impact: Moderate citywide fatality reductions if mode shift away from driving is achieved; strongest benefits where transit use displaces car trips along major corridors.
- Time-to-impact: 3–5 years for substantial mode-shift and safety benefits.
- Equity considerations: Good for transit-dependent residents, but safety effects may be less pronounced than with direct speed/intersection improvements.
- Likelihood of achieving 50% citywide reduction: Low to moderate; highly contingent on ridership gains and remaining car use.

Note: All three options assume that local politics allow efficient procurement, and that external grants or private/public partnerships could supplement the city’s funds in some components. In all cases, maintaining a robust evaluation design is essential to understand actual effects given the smaller scope.

(e) Assumptions, top uncertainties, and ambiguities in the prompt
Explicit assumptions made in these recommendations:
- Baseline fatalities: 60 per year citywide. If your city’s current fatality count is different, scale the required effect and adjust intervention sizing accordingly.
- Effectiveness of interventions: Each intervention’s effect size is estimated from a combination of urban safety literature, experience from comparable cities, and plausible city-specific responses. Actual effects depend on enforcement intensity, public acceptance, and local contexts.
- Cost estimates: Five-year capital and operating costs are approximate ranges based on typical costs in similar cities; real costs depend on land use, right-of-way, labor costs, local bidding, permitting, and maintenance budgets.
- Time-to-impact: Assumes phased implementation with adequate staffing and contractor capacity; significant delays could push benefits later.
- Implementation fidelity: Assumes proper project management, stakeholder engagement, and timely completion of safety upgrades and enforcement programs.

Top three sources of uncertainty:
- Behavioral response and enforcement efficacy: The magnitude of speed reduction, compliance with new speed limits, and the degree to which drivers and pedestrians adapt to changes is inherently uncertain and sensitive to enforcement and social norms.
- Baseline crash patterns and spillover effects: If drivers reroute to other corridors or neighborhoods in response to safety changes, citywide effects could be attenuated or, in some cases, enhanced in unintended ways.
- External factors and trends: Economic conditions, fuel prices, weather patterns, ride-sharing adoption, and regional traffic changes could confound pre-post comparisons and complicate attribution.

Ambiguities in the prompt that materially affect recommendations:
- The exact current fatality and injury burdens in your city (e.g., breakdown by pedestrian vs. vehicle occupant vs. cyclist, and age groups) are not specified. Tailoring the intervention mix to the most dangerous crash types and populations would improve effectiveness.
- Whether the city has existing bike/pedestrian networks, transit facilities, and enforcement infrastructure that could accelerate or constrain implementation.
- The degree to which the city could leverage external funding (state/federal grants, private partnerships) to augment the $200M budget, and whether those funds would be restricted to specific uses.
- Political and community acceptance of enforcement measures (e.g., speed cameras, red-light cameras) and design choices (e.g., road diets, curb extensions) which can influence both costs and effectiveness.
- The availability of high-quality data, inter-agency data-sharing capabilities, and privacy protections, which strongly determine the feasibility and precision of the evaluation plan.

Summary: key takeaways
- A balanced, multi-pronged plan focusing on speed management/enforcement, high-crash intersection safety, and pedestrian/cyclist network improvements can plausibly achieve a 50% reduction in fatalities within five years in a city of 500,000 under a $200M budget, assuming good implementation and sustained political/administrative support.
- A rigorous evaluation plan using stepped-wedge or DiD designs is feasible and essential to credibly attribute reductions to the interventions, with a robust data plan covering crashes, injuries, exposure, and equity indicators.
- Ethical considerations require proactive equity focus, independent oversight of enforcement, community engagement, privacy protections, and transparent reporting to minimize unintended harms.
- If budget is constrained to $100M, the most cost-effective paths still emphasize speed/intersection safety, with smaller but meaningful gains toward equity and pedestrian safety; achieving a citywide 50% reduction would be more challenging and likely require external funds or very high effectiveness of a narrower set of interventions.
- All recommendations rely on several key assumptions and face uncertainties, particularly around behavior change, enforcement efficacy, spillovers, and external economic or traffic trends. Clear, ongoing measurement and an adaptive management approach are essential to maximize outcomes and to adjust strategies as real-world data accumulate.

If you’d like, I can tailor the plan further to your city’s actual crash data, geography (e.g., districts with the worst safety records), existing infrastructure, and political constraints, and I can provide a concrete, one-page implementation plan with proposed timelines, procurement steps, and a draft evaluation protocol.

In [None]:
# Anthropic has a slightly different API, and Max Tokens is required

model_name = "claude-sonnet-4-5"

claude = Anthropic()
response = claude.messages.create(model=model_name, messages=messages, max_tokens=1000)
answer = response.content[0].text

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [10]:
gemini = OpenAI(api_key=google_api_key, base_url="https://generativelanguage.googleapis.com/v1beta/openai/")
model_name = "gemini-2.0-flash"

response = gemini.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Okay, Mayor, here's my proposed five-year plan to drastically reduce traffic fatalities in our city, while staying within your budget and minimizing negative impacts. This is a complex challenge, and I'll be explicit about the uncertainties involved.

**(a) Prioritized Interventions for Traffic Fatality Reduction**

This plan utilizes a "Safe System Approach" prioritizing safety over speed and accepting human error as inevitable.  It's a layered approach with multiple interventions reinforcing each other.

| Priority | Intervention                               | Explanation                                                                                                                                                                      | Estimated Cost      | Expected Death Reduction (per year)      | Time to Impact |
|----------|---------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------|-------------------------------------------|----------------|
| 1        | **High-Risk Corridor Redesign**           | Focus on redesigning the top 10 most dangerous corridors based on fatality and serious injury data. Features include: protected bike lanes, pedestrian islands, shorter pedestrian crossings, curb extensions, leading pedestrian intervals, and reduced lane widths.                                        | $60 Million      | 20-30% (Confidence Level: 80%)        | 1-3 years       |
| 2        | **Vision Zero Enforcement Program**          | Data-driven enforcement targeting speeding, impaired driving, and failure to yield in high-risk locations. Include increased use of speed cameras (in identified corridors and school zones) and red-light cameras.  Requires public education campaign to avoid perception of revenue-generation. | $25 Million (including public education) | 10-20% (Confidence Level: 75%)        | Immediately      |
| 3        | **City-Wide Speed Limit Reduction**         | Reduce default speed limits to 20 mph on residential streets and 25 mph on arterial roads, coupled with increased signage and public awareness campaign.                    | $5 Million       | 5-10% (Confidence Level: 70%)         | 1 year         |
| 4        | **Complete Streets Policy Implementation**  |  Mandate that all new street designs and major reconstructions incorporate Complete Streets principles: safe accommodation for all users (pedestrians, cyclists, transit users, drivers).  Includes staff training. | $10 Million  | 5-10% (Confidence Level: 60%)        | 3-5 years        |
| 5        | **Public Transit Improvement & Expansion** |  Improve bus frequency, expand routes, and improve bus stop amenities (shelters, lighting) to increase transit ridership and reduce reliance on private vehicles.  Focus on underserved areas.                                                     | $70 Million      | 10-15% (Confidence Level: 65%)        | 2-5 years        |
| 6        | **Automated Vehicle Safety Technology Incentives** | Partner with local businesses to subsidize the installation of Advanced Driver-Assistance Systems (ADAS) such as automatic emergency braking (AEB) and lane departure warning in commercial fleets and high-risk drivers' vehicles (e.g., elderly drivers).                                       | $15 Million      | 2-5% (Confidence Level: 50%)        | 1-3 years        |
| 7        | **Community-Based Safety Education Program**    | Targeted safety education programs focusing on vulnerable populations (children, seniors, recent immigrants) and high-risk behaviors (distracted driving, pedestrian safety). Partner with community organizations.                                                                         | $15 Million      | 2-5% (Confidence Level: 50%)        | 1-5 years        |

**Total Cost: $200 Million**

**Expected Cumulative Death Reduction: 54-105% (Highly uncertain due to interaction effects). My target is a 50% reduction, so a range of 45%-55% should be reasonably achievable. This plan assumes significant synergies between the interventions.**

**Notes:**

*   **Confidence Levels:** Reflect my subjective assessment of the reliability of the estimated impact, considering the existing evidence base and local conditions.
*   **Uncertainty Ranges:** Reflect the variability inherent in predicting human behavior and the effectiveness of interventions.
*   **Time to Impact:** Represents the time it takes for the intervention to have a noticeable effect on fatality rates. Some interventions have immediate impacts (e.g., enforcement), while others take longer (e.g., infrastructure changes).
*   **Interaction Effects:** The reductions are not simply additive.  For example, speed limit reduction enhances the effectiveness of corridor redesign.

**(b) Data Collection and Evaluation Plan**

A rigorous evaluation plan is critical to measure the effectiveness of the interventions and attribute causality.

*   **Data Sources:**
    *   Police crash reports (detailed location, contributing factors, injury severity).
    *   Hospital emergency room and trauma center data (injury types, demographics).
    *   EMS data (response times, patient outcomes).
    *   Traffic volume data (from sensors and cameras).
    *   Pedestrian and bicycle counts (automated counters and manual counts).
    *   Public transit ridership data.
    *   Socioeconomic data (from Census and local surveys).
    *   Public opinion surveys (attitudes toward safety interventions).

*   **Key Metrics:**
    *   Annual traffic fatalities.
    *   Serious injuries (broken down by mode: pedestrian, cyclist, driver, passenger).
    *   Crash rates (crashes per vehicle-mile traveled).
    *   Speeding violations (detected by speed cameras).
    *   Pedestrian and cyclist near-miss incidents.
    *   Public transit ridership.
    *   Perceptions of safety (measured through surveys).
    *   Equity metrics: Distribution of fatalities and injuries across different demographic groups and geographic areas.

*   **Causality Assessment:**

    *   **Before-and-After Studies:** Compare fatality and injury rates in intervention areas before and after implementation.  This is the most common, but weakest, approach due to potential confounding factors.

    *   **Quasi-Experimental Design: Interrupted Time Series Analysis:** Collect data on fatalities and injuries for several years *before* and several years *after* the implementation of the intervention. Analyze the time series data to determine if there is a statistically significant change in the trend *after* the intervention. This helps control for pre-existing trends and seasonal variations.  For example, apply this to the city-wide speed limit reduction.

        *   **Data Needs:** Monthly or quarterly data on fatalities and injuries for at least 3 years before and 3 years after the speed limit reduction. Data on traffic volume, weather conditions, and other potential confounding factors.
        *   **Analysis:** Use time series regression models to estimate the impact of the speed limit reduction on fatalities and injuries, controlling for other factors.

    *   **Randomized Controlled Trial (RCT):** While difficult in transportation, consider a phased implementation of protected bike lanes. Randomly select which sections of the high-risk corridors receive protected bike lanes in the first year, and which sections get them in the second year. Compare the change in crashes involving cyclists in the treated vs. control sections.  This is the strongest method but also the most politically challenging and expensive.

        *   **Data Needs:** Detailed crash data, cyclist counts, and demographic information for the treated and control sections.
        *   **Analysis:** Use statistical methods to compare the change in crash rates between the treated and control groups.

* **Data Transparency:** Make all data and evaluation results publicly available to build trust and accountability.

**(c) Ethical Trade-offs and Distributional Impacts**

This plan involves several ethical trade-offs and potential distributional impacts:

*   **Increased Enforcement:** While necessary, increased enforcement (especially speed cameras) can disproportionately affect low-income communities and people of color if not carefully implemented.
    *   **Mitigation:**  Focus enforcement on the most dangerous behaviors (e.g., impaired driving) and locations, rather than minor infractions.  Use a "warning period" before issuing tickets with new speed cameras.  Invest revenue from fines into community safety programs in affected areas.  Ensure equitable placement of cameras and transparent data on enforcement patterns.

*   **Speed Limit Reductions:**  Some residents may perceive speed limit reductions as an inconvenience, especially those who rely heavily on cars. This could be amplified in areas with limited public transit options.
    *   **Mitigation:**  Clearly communicate the safety benefits of speed limit reductions.  Prioritize transit improvements in areas where reduced speed limits may create hardship.

*   **Corridor Redesign:**  Some business owners may be concerned about the impact of corridor redesign on parking and traffic flow.
    *   **Mitigation:**  Engage with business owners early in the planning process.  Design corridors to minimize parking loss and maintain access to businesses.  Highlight the potential benefits of improved pedestrian and cyclist traffic for local businesses.

*   **Accessibility:** Ensure that all interventions are designed to be accessible to people with disabilities.

**(d) Alternative Strategies for a $100 Million Budget**

With a $100 million budget, we would need to prioritize even more ruthlessly:

1.  **Focus on High-Risk Corridor Redesign (Prioritized):** $50 million. Focus on the *five* deadliest corridors only. Expected death reduction: 10-15%.
2.  **Vision Zero Enforcement Program (Scaled Down):** $20 million. Reduce the scope of the enforcement program, focusing on speeding and impaired driving in the redesigned corridors. Expected death reduction: 5-10%.
3.  **City-Wide Speed Limit Reduction:** $5 million. This remains a cost-effective intervention. Expected death reduction: 5-10%.
4.  **Community-Based Safety Education Program (Scaled Down):** $10 million. Target the most vulnerable populations in the redesigned corridors. Expected death reduction: 1-3%.
5.  **Complete Streets Policy Implementation (Focus on High-Risk):** $15 million.  Focus Complete Streets training and policy implementation on the specific high-risk corridors.

**Expected Cumulative Death Reduction (with $100 Million): 21-38%.  This is significantly lower than the target 50% reduction.** We'd have to accept that the reduction will be smaller and take longer.

Alternative Strategy 2:
1. **Public transit improvements:** $60 million. Focus on the most underserved areas and the most heavily-used routes, increase bus frequency, improve routes, and improve bus stop amenities.
2. **High-risk corridor safety education** $5 million. A focused public education to promote safety in the deadliest intersections.
3. **Vision Zero Enforcement:** $20 million.
4. **City-wide speed limit reduction:** $5 million.
5. **Automated vehicle safety technology incentives:** $10 million.

Alternative Strategy 3:
1. **High risk corridor redesign (Scaled down):** $30 million. Focus on a small number of extremely high-risk intersections that are geographically diverse.
2. **Automated vehicle safety technology incentives:** $30 million. A targeted investment in ADAS technologies.
3. **Vision Zero Enforcement:** $20 million.
4. **City-wide speed limit reduction:** $5 million.
5. **Community-Based Safety Education Program (Scaled Down):** $15 million.

**(e) Assumptions, Uncertainties, and Ambiguities**

**Assumptions:**

*   The existing data on traffic fatalities and injuries are accurate and reliable.
*   The proposed interventions will be implemented effectively and as planned.
*   There will be no major changes in traffic patterns, demographics, or technology that could significantly affect the results.
*   The public will largely comply with the new speed limits and other safety regulations.
*   Political support for the plan will remain strong throughout the five-year period.

**Top Three Sources of Uncertainty:**

1.  **Human Behavior:** Predicting how people will respond to changes in speed limits, enforcement, and infrastructure is inherently difficult.
2.  **Interaction Effects:** The cumulative impact of the interventions is uncertain due to complex interactions between them.
3.  **External Factors:** Economic conditions, demographic shifts, and technological advancements could significantly affect traffic safety.

**Ambiguities in the Prompt:**

*   **"Minimizing Displacement of Harm to Vulnerable Populations":** This is open to interpretation. I've prioritized minimizing *negative* impacts, but some interventions (e.g., enforcement) may have unavoidable disproportionate impacts.
*   **"Preserving Mobility for People Who Rely on Cars":**  This can be interpreted in many ways. I've tried to balance safety with maintaining reasonable levels of car mobility, but some interventions (e.g., speed limit reductions) will inevitably affect travel times.
*   **"Mid-sized city (population ~500,000)":** I have assumed that the city's demographics, geography, and transportation infrastructure are relatively typical for a city of this size. More specific information about these factors could lead to different recommendations.

**Important Caveats:**

*   This plan is a starting point. It should be continuously monitored and adjusted based on data and evaluation results.
*   Political feasibility is a major consideration. Gaining public support for these interventions will require effective communication and community engagement.
*   This plan cannot guarantee a 50% reduction in traffic fatalities.  Achieving this goal will require sustained effort and a commitment to safety from all stakeholders.

Mayor, I am confident that this plan, if implemented effectively, can significantly improve traffic safety in our city. However, it's crucial to acknowledge the uncertainties and adapt the plan as needed. Let's work together to make our city a safer place for everyone.


In [None]:
deepseek = OpenAI(api_key=deepseek_api_key, base_url="https://api.deepseek.com/v1")
model_name = "deepseek-chat"

response = deepseek.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [None]:
# Updated with the latest Open Source model from OpenAI

groq = OpenAI(api_key=groq_api_key, base_url="https://api.groq.com/openai/v1")
model_name = "openai/gpt-oss-120b"

response = groq.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)


## For the next cell, we will use Ollama

Ollama runs a local web service that gives an OpenAI compatible endpoint,  
and runs models locally using high performance C++ code.

If you don't have Ollama, install it here by visiting https://ollama.com then pressing Download and following the instructions.

After it's installed, you should be able to visit here: http://localhost:11434 and see the message "Ollama is running"

You might need to restart Cursor (and maybe reboot). Then open a Terminal (control+\`) and run `ollama serve`

Useful Ollama commands (run these in the terminal, or with an exclamation mark in this notebook):

`ollama pull <model_name>` downloads a model locally  
`ollama ls` lists all the models you've downloaded  
`ollama rm <model_name>` deletes the specified model from your downloads

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Super important - ignore me at your peril!</h2>
            <span style="color:#ff7800;">The model called <b>llama3.3</b> is FAR too large for home computers - it's not intended for personal computing and will consume all your resources! Stick with the nicely sized <b>llama3.2</b> or <b>llama3.2:1b</b> and if you want larger, try llama3.1 or smaller variants of Qwen, Gemma, Phi or DeepSeek. See the <A href="https://ollama.com/models">the Ollama models page</a> for a full list of models and sizes.
            </span>
        </td>
    </tr>
</table>

In [None]:
!ollama pull llama3.2

In [11]:
ollama = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')
model_name = "llama3"

response = ollama.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

**Interventions for Reducing Traffic-Related Deaths**

Here are the specific interventions with brief explanations, estimated costs, expected reductions in annual deaths, time-to-impact, and confidence levels:

1. **Pedestrianization of High-Impact Crosswalks** (Engineering): Prioritize converting 20 high-risk intersections into pedestrian-only zones, reducing conflicts between pedestrians and vehicles.
	* Estimated cost per intervention: $500,000
	* Expected reduction in annual deaths: 12-18 fatalities (confidence level: 90%)
	* Time-to-impact: 6 months
2. **Speed Limit Reduction** (Policy): Lower the speed limit on main roads to 25 mph and enforce using automated traffic cameras.
	* Estimated cost per intervention: Low (camera installation) or negligible (policy change)
	* Expected reduction in annual deaths: 18-24 fatalities (confidence level: 85%)
	* Time-to-impact: Immediate
3. **Vision Zero Enforcement** (Enforcement): Increase police presence at high-risk areas, with a focus on speeding and drunk driving.
	* Estimated cost per intervention: $200,000 per year for additional staffing and equipment
	* Expected reduction in annual deaths: 8-12 fatalities (confidence level: 80%)
	* Time-to-impact: Within 6 months to a year
4. **Cycling Infrastructure Expansion** (Engineering): Add 20 miles of protected bike lanes, increasing safety for cyclists.
	* Estimated cost per intervention: $1 million per mile
	* Expected reduction in annual deaths: 4-6 fatalities (confidence level: 75%)
	* Time-to-impact: Within a year to 18 months
5. **Alcohol-Related Crashes Prevention** (Education): Launch targeted public awareness campaigns and increase sobriety checkpoints.
	* Estimated cost per intervention: $100,000 per year for campaign development and enforcement costs
	* Expected reduction in annual deaths: 6-8 fatalities (confidence level: 70%)
	* Time-to-impact: Within a year to 18 months
6. **Traffic Signal Optimization** (Technology): Upgrade traffic signal systems to prioritize pedestrian and cyclist safety.
	* Estimated cost per intervention: $500,000 per intersection upgrade
	* Expected reduction in annual deaths: 2-4 fatalities (confidence level: 65%)
	* Time-to-impact: Within a year

**Evaluation Plan**

To measure the impact of these interventions and attribute causality:

1. **Randomized Controlled Trial (RCT)**: Conduct an RCT involving 5-10 high-risk intersections, with 50% pedestrianized and 50% remaining unaltered.
	* Data needed: Crash data before and after intervention, traffic volume, speed and violation rates
2. **Quasi-Experimental Design**: Compare citywide crash trends before and after implementation of the plan.
	* Data needed: Citywide crash data, traffic volume, speed and violation rates

**Key Metrics to Track**

1. Number of fatalities
2. Injury rate (severity and frequency)
3. Speeding and drunk driving violations
4. Average speed at high-risk intersections
5. Bike lane usage and safety

**Ethical Trade-offs and Distributional Impacts**

The plan may disproportionately affect vulnerable populations, such as:

1. **Low-income communities**: Increased parking restrictions or pricing could lead to gentrification concerns.
2. **People with disabilities**: Improved mobility options (e.g., accessible pedestrian infrastructure) must be balanced against the potential displacement of parked vehicles.

Mitigation measures:

1. Public engagement and outreach
2. Targeted financial assistance for affected communities
3. Accessibility-focused design and enforcement

**Alternative Prioritized Strategies (Budget $100 million)**

If political constraints force a reduced budget, consider prioritizing:

1. **Speed Limit Reduction and Enforcement**: Focus on lowering speed limits and increasing enforcement through automated traffic cameras.
	* Estimated cost savings: $50-70 million
2. **Vision Zero Enforcement**: Increase police presence at high-risk areas, focusing on speeding and drunk driving violations.
	* Estimated cost savings: $30-40 million
3. **Targeted Infrastructure Improvements**: Upgrade critical infrastructure (e.g., high-risk intersections) while deferring less pressing improvements.
	* Estimated cost savings: $20-30 million

**Assumptions, Sources of Uncertainty, and Ambiguities**

1. **Crash data accuracy**: Assumes accurate crash reporting and underreporting is minimal.
2. **Traffic volume and speed patterns**: Assumes these factors will remain stable or trend positively during the implementation period.
3. **Enforcement effectiveness**: Assumes increased policing and automated camera enforcement will reduce speeding and drunk driving violations.

Top three sources of uncertainty:

1. Crash data accuracy
2. Traffic volume and speed patterns
3. Enforcement effectiveness

Ambiguities in this prompt that would materially affect recommendations:

1. Changes in vehicle-to-pedestrian ratio over time
2. Potential displacement of parked vehicles or businesses due to infrastructure changes
3. The impact of weather-related events on traffic crashes

In [12]:
# So where are we?

print(competitors)
print(answers)


['gpt-5-nano', 'gemini-2.0-flash', 'llama3']


In [13]:
# It's nice to know how to use "zip"
for competitor, answer in zip(competitors, answers):
    print(f"Competitor: {competitor}\n\n{answer}")


Competitor: gpt-5-nano

Below is a five-year plan designed for a mid-sized city (~500,000 people) to cut annual traffic-related deaths by 50% within a $200 million budget, while limiting harm to vulnerable groups and preserving mobility for car users. The plan is organized to (a) interventions with costs and expected impacts, (b) data collection and evaluation to attribute causality, (c) ethical and equity considerations with concrete mitigations, (d) three alternative strategies if the budget drops to $100 million, and (e) explicit assumptions, major uncertainties, and ambiguities relevant to recommendations.

Important note on baseline and targets
- Assumed baseline annual traffic fatalities (citywide): 60 per year. This is a plausible mid-range for a city of this size in the U.S. when considering all fatalities (drivers, pedestrians, bicyclists, etc.). If your city’s current data differ markedly, please adjust the numbers below accordingly.
- Target by year 5: about 30 fatalities pe

In [14]:
# Let's bring this together - note the use of "enumerate"

together = ""
for index, answer in enumerate(answers):
    together += f"# Response from competitor {index+1}\n\n"
    together += answer + "\n\n"

In [15]:
print(together)

# Response from competitor 1

Below is a five-year plan designed for a mid-sized city (~500,000 people) to cut annual traffic-related deaths by 50% within a $200 million budget, while limiting harm to vulnerable groups and preserving mobility for car users. The plan is organized to (a) interventions with costs and expected impacts, (b) data collection and evaluation to attribute causality, (c) ethical and equity considerations with concrete mitigations, (d) three alternative strategies if the budget drops to $100 million, and (e) explicit assumptions, major uncertainties, and ambiguities relevant to recommendations.

Important note on baseline and targets
- Assumed baseline annual traffic fatalities (citywide): 60 per year. This is a plausible mid-range for a city of this size in the U.S. when considering all fatalities (drivers, pedestrians, bicyclists, etc.). If your city’s current data differ markedly, please adjust the numbers below accordingly.
- Target by year 5: about 30 fatalit

In [16]:
judge = f"""You are judging a competition between {len(competitors)} competitors.
Each model has been given this question:

{question}

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}}

Here are the responses from each competitor:

{together}

Now respond with the JSON with the ranked order of the competitors, nothing else. Do not include markdown formatting or code blocks."""


In [17]:
print(judge)

You are judging a competition between 3 competitors.
Each model has been given this question:

You are an AI advisor tasked by the mayor of a mid-sized city (population ~500,000) to design a five-year plan to reduce annual traffic-related deaths by 50% within a fixed budget of $200 million, while minimizing displacement of harm to vulnerable populations and preserving mobility for people who rely on cars. Provide all of the following in your answer: (a) a prioritized list of specific interventions (engineering, policy, enforcement, education, technology, or combinations) with brief explanations of how each reduces fatalities, estimated cost per intervention, expected reduction in annual deaths (with uncertainty ranges and a confidence level for each estimate), and time-to-impact; (b) a practical data-collection and evaluation plan that would allow the city to measure impact and attribute causality (including at least one feasible randomized or quasi-experimental design and what data wo

In [18]:
judge_messages = [{"role": "user", "content": judge}]

In [19]:
# Judgement time!

openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-4o-mini",
    messages=judge_messages,
)
results = response.choices[0].message.content
print(results)


{"results": ["1", "2", "3"]}


In [20]:
# OK let's turn this into results!

results_dict = json.loads(results)
ranks = results_dict["results"]
for index, result in enumerate(ranks):
    competitor = competitors[int(result)-1]
    print(f"Rank {index+1}: {competitor}")

Rank 1: gpt-5-nano
Rank 2: gemini-2.0-flash
Rank 3: llama3


<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/exercise.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Exercise</h2>
            <span style="color:#ff7800;">Which pattern(s) did this use? Try updating this to add another Agentic design pattern.
            </span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#00bfff;">Commercial implications</h2>
            <span style="color:#00bfff;">These kinds of patterns - to send a task to multiple models, and evaluate results,
            are common where you need to improve the quality of your LLM response. This approach can be universally applied
            to business projects where accuracy is critical.
            </span>
        </td>
    </tr>
</table>