## Welcome to the Second Lab - Week 1, Day 3

Today we will work with lots of models! This is a way to get comfortable with APIs.

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Important point - please read</h2>
            <span style="color:#ff7800;">The way I collaborate with you may be different to other courses you've taken. I prefer not to type code while you watch. Rather, I execute Jupyter Labs, like this, and give you an intuition for what's going on. My suggestion is that you carefully execute this yourself, <b>after</b> watching the lecture. Add print statements to understand what's going on, and then come up with your own variations.<br/><br/>If you have time, I'd love it if you submit a PR for changes in the community_contributions folder - instructions in the resources. Also, if you have a Github account, use this to showcase your variations. Not only is this essential practice, but it demonstrates your skills to others, including perhaps future clients or employers...
            </span>
        </td>
    </tr>
</table>

In [None]:
# Start with imports - ask ChatGPT to explain any package that you don't know

import os
import json
from dotenv import load_dotenv
from openai import OpenAI
from anthropic import Anthropic
from IPython.display import Markdown, display

In [None]:
# Always remember to do this!
load_dotenv(override=True)

True

In [None]:
# Print the key prefixes to help with any debugging

openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

OpenAI API Key exists and begins sk-proj-
Anthropic API Key not set (and this is optional)
Google API Key not set (and this is optional)
DeepSeek API Key not set (and this is optional)
Groq API Key not set (and this is optional)


In [None]:
request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. "
request += "Answer only with the question, no explanation."
messages = [{"role": "user", "content": request}]

In [None]:
messages

[{'role': 'user',
  'content': 'Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. Answer only with the question, no explanation.'}]

In [None]:
openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-5-mini",
    messages=messages,
)
question = response.choices[0].message.content
print(question)


You are an independent advisor to the government of a mid-sized coastal city (population 600,000) that faces an accelerating combination of sea-level rise, more frequent storm surges, aging infrastructure, a large low-income waterfront community, and an annual budget constraint that allows only a limited number of major projects over the next decade. Within a 20-year planning horizon, propose a prioritized, actionable adaptation strategy that balances technical feasibility, cost-effectiveness, equity, political acceptability, and legal/ethical constraints: for each top-level action (no more than six), specify the expected timeline, a low/medium/high cost estimate with numeric ranges, the groups who benefit and who are likely to lose out, key uncertainties and failure modes, three measurable indicators to monitor success, a contingency trigger that would cause you to escalate or reverse the action, and one feasible fallback option if it proves unworkable; then describe three plausible f

In [None]:
competitors = []
answers = []
messages = [{"role": "user", "content": question}]

## Note - update since the videos

I've updated the model names to use the latest models below, like GPT 5 and Claude Sonnet 4.5. It's worth noting that these models can be quite slow - like 1-2 minutes - but they do a great job! Feel free to switch them for faster models if you'd prefer, like the ones I use in the video.

In [None]:
# The API we know well
# I've updated this with the latest model, but it can take some time because it likes to think!
# Replace the model with gpt-4.1-mini if you'd prefer not to wait 1-2 mins

model_name = "gpt-5-nano"

response = openai.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Below is a compact, actionable adaptation strategy tailored for a mid-sized coastal city of about 600,000 people. The plan is designed for a 20-year horizon, balances technical feasibility, cost-effectiveness, equity, political acceptability, and legal/ethical constraints, and includes clear triggers, fallbacks, and three future scenarios to test robustness.

Executive framing
- Vision: Protect lives and essential services, reduce flood and storm-risk exposure, advance equity for the waterfront community, and tighten long-run fiscal resilience with a limited set of major actions.
- Core design principles: prioritize low-income and politically vulnerable communities, deploy a mix of gray (hard) and green-blue (nature-based) solutions, anchor with resilient core infrastructure, finance plans that blend public funds with grants/markets, and build flexible, modular programs that can be scaled up or down.

Top-level actions (six)

1) Integrated coastal protection and nature-based flood defense
- Objective: Significantly reduce storm surge and tidal flooding exposure in prioritized high-risk, low-income waterfront neighborhoods through a phased mix of engineered protections and ecosystem-based interventions (living shorelines, wetlands, dunes, green infrastructure).
- Timeline: 0–20 years (phased; milestones at years 3, 7, 12, 18–20).
- Cost (present-value, USD):
  - Low: 250–400 million
  - Medium: 400–900 million
  - High: 1.0–2.0 billion
- Beneficiaries vs losers:
  - Beneficiaries: waterfront residents and small businesses in protected zones; citywide property and infrastructure resilience; reduced evacuation risk.
  - Losers: some development outside protection envelope; landowners pricing pressure in unprotected segments; possible environmental permitting constraints.
- Key uncertainties and failure modes:
  - Uncertain local sea-level rise and surge magnitudes; sediment supply and long-term sediment management; permitting timelines; long-term maintenance costs; interagency coordination.
  - Failure modes: design underestimates surge, materials/contractor delays, cost overruns, public opposition to siting.
- Three measurable indicators:
  1) Percent of protected shoreline length and critical nodes reached (km protected vs planned).
  2) Modeled flood depth reduction in target 100-year surge scenarios (target: 40–60% reduction in protected zones).
  3) Annual avoided flood damages and insurance-claims reductions in protected zones (dollar value, year-over-year trend).
- Contingency trigger to escalate/reverse:
  - If after five years documented flood reduction in protected zones is less than 40% of target and capital costs exceed baseline by more than 25% due to permitting or supply constraints, escalate to re-scoping or phased retreat/retrofit of limited segments.
- Feasible fallback option if unworkable:
  - Pivot to a concentrated program of structural “hardening plus targeted relocation” in the most exposed pockets, while continuing ecosystem restoration in adjacent areas; reallocate funds toward retrofit of housing and critical facilities elsewhere.
  
2) Equitable housing resilience program for waterfront communities
- Objective: Protect and improve housing security for low-income residents in the waterfront zone through a spectrum of retrofits (elevations, floodproofing, and weatherization), rental protections, and, where necessary, buyouts/relocation with strong tenant-inclusive mechanisms.
- Timeline: 0–15 years (design/tenant agreements 0–5; retrofits 5–10; relocations/buyouts or long-term leases 10–15).
- Cost:
  - Low: 80–200 million
  - Medium: 200–500 million
  - High: 600–1,000 million
- Beneficiaries vs losers:
  - Beneficiaries: waterfront homeowners and renters in vulnerable sectors; neighborhood stability; reduced displacement risk.
  - Losers: broader property market cross-subsidization opponents; some non-waterfront homeowners if resources are reprioritized; potential disruptions during relocations.
- Key uncertainties and failure modes:
  - Funding stability; tenant consent and rights; displacement approvals; long-term affordability of replacement housing.
  - Failure modes: insufficient funds, legal challenges from residents or landlords, inadequate coordination with relocation/transit/education services.
- Three measurable indicators:
  1) Share of target waterfront housing with completed retrofits/elevations (percentage of units).
  2) Number of tenants/homeowners represented in protection agreements and legally protected rents.
  3) Percentage of residents in waterfront zones with stabilized or improved housing affordability (income-cost burden metric).
- Contingency trigger to escalate/reverse:
  - If program uptake stalls and demolition/new build costs exceed targets by >40% for two consecutive years, escalate to a scaled retrofit-first approach and extend rental assistance; consider incremental buyouts only for the most exposed blocks.
- Feasible fallback option:
  - Expand retrofit-centric approach for a broader set of housing units citywide (not only waterfront), couple with enhanced transit access and in-situ resilience grants; maintain buyout option as a reserve.

3) Critical infrastructure resilience and utilities hardening
- Objective: Ensure continuity of essential services (power, water, wastewater, drainage) during climate events through infrastructure hardening, redundancy, and local energy resilience (microgrids where feasible) tied to a robust maintenance program.
- Timeline: 0–15 years (0–3 planning/design; 3–7 upgrades; 7–15 full resilience enhancement).
- Cost:
  - Low: 100–250 million
  - Medium: 250–700 million
  - High: 1.0–2.5 billion
- Beneficiaries vs losers:
  - Beneficiaries: entire city; critical facilities (hospitals, emergency services, water/wastewater plants); neighborhoods repeatedly hit by outages.
  - Losers: none in principle, though up-front budgets may crowd out other programs in tight years.
- Key uncertainties and failure modes:
  - Regulatory approvals; supply chain and procurement delays; cybersecurity and reliability of microgrids; long-term maintenance costs.
  - Failure modes: project delays, insufficient interconnections, reliability failures, higher-than-anticipated O&M costs.
- Three measurable indicators:
  1) Time-to-restoration for critical facilities after storms (target: restore essential loads within 24–48 hours).
  2) Percentage of critical facilities with independent backup power and microgrid capability.
  3) Reduction in days of service disruption for water/wastewater and drainage during storm events.
- Contingency trigger to escalate/reverse:
  - If post-installation resilience metrics underperform (e.g., restoration times exceed 72 hours for multiple events) or cyber/operational risks materialize, escalate to a regional coordination approach or pause non-critical upgrades to preserve core system resilience.
- Feasible fallback option:
  - Implement decentralized, mobile-asset solutions (temporary generators, portable pumps) and regional mutual-aid agreements; defer noncritical upgrades until funding and regulatory clarity improve.

4) Strategic land-use planning and managed retreat framework
- Objective: Reduce exposure by updating zoning, protecting essential services, and offering a staged, voluntary, rights-respecting managed retreat framework with protections for tenants and low-income residents.
- Timeline: 3–20 years (planning 3–7; zoning/tooling 7–12; implementation/retreat 12–20).
- Cost:
  - Low: 20–60 million
  - Medium: 60–200 million
  - High: 400–800 million
- Beneficiaries vs losers:
  - Beneficiaries: city-wide risk reduction; residents in high-risk zones who participate; protection of critical facilities.
  - Losers: property owners in high-risk zones; potential effects on property values and local tax bases; displaced households if buyouts occur.
- Key uncertainties and failure modes:
  - Political acceptability; legal challenges to takings or re-zoning; property rights and compensation adequacy; displacement burdens and social cohesion.
  - Failure modes: lawsuits delaying zoning changes; insufficient relocation options; inequitable outcomes if not co-designed with communities.
- Three measurable indicators:
  1) Number of zones re-zoned or protected against high-risk development; 2) Number of successful buyouts/relocation agreements and time-to-completion; 3) Percentage of essential city services relocated away from zones at high flood risk.
- Contingency trigger to escalate/reverse:
  - If two consecutive legal challenges or cost overruns delay implementation by more than two years, escalate to a revised, voluntary, community-led planning process with enhanced protections and targeted, smaller-scale retreats.
- Feasible fallback option:
  - Implement tighter land-use restrictions and improved elevation standards in risk zones with a phased, non-coercive approach, while maintaining current density and avoiding eminent domain in the near term.

5) Financing, governance, and risk-transfer mechanisms
- Objective: Create a resilient, diversified financing framework to fund adaptation with transparency, equity, and fiscal discipline; deploy blended finance, grants, insurance tools, and public-private partnerships where appropriate.
- Timeline: 0–10 years
- Cost:
  - Low: 25–50 million (design, governance, and startup)
  - Medium: 50–150 million (program setup, staffing, early pilots)
  - High: 200–500 million (full fund capitalization and market instruments)
- Beneficiaries vs losers:
  - Beneficiaries: city-wide resilience financing; taxpayers; ability to mobilize external funds; improved credit standing if managed well.
  - Losers: those exposed to financing risk if markets misprice risk; some interest-rate sensitivity in debt.
- Key uncertainties and failure modes:
  - Market appetite for municipal bonds; grant cycles; valuation of blended instruments; governance risk and fiduciary oversight.
  - Failure modes: poor credit rating impact, insufficient reserve buffers, misaligned incentives with private partners.
- Three measurable indicators:
  1) Size of resilience fund and cumulative financing secured; 2) Bond rating trajectory and debt-service coverage ratio; 3) Number of grant/funding agreements awarded and drawn down.
- Contingency trigger to escalate/reverse:
  - If debt-service coverage dips below a predefined threshold for two consecutive years or if market access deteriorates, scale back non-critical programs and shift toward pay-as-you-go or grant-driven pilots.
- Feasible fallback option:
  - Focus on targeted, grant-supported pilots and facility-sharing arrangements; gradually regionalize risk and leverage municipal partnerships to preserve core projects while reducing leverage.

6) Transportation resilience and evacuation planning
- Objective: Ensure safe, reliable evacuation routes and multi-modal mobility during extreme events; upgrade critical transit corridors and improve community-based evacuation readiness.
- Timeline: 0–15 years
- Cost:
  - Low: 60–150 million
  - Medium: 150–500 million
  - High: 600–1,200 million
- Beneficiaries vs losers:
  - Beneficiaries: all residents; improved access to essential services and jobs; enhanced regional coordination.
  - Losers: minimal; potential disruption during construction; land-use changes if corridors are widened.
- Key uncertainties and failure modes:
  - Changes in travel demand; weather disruptions during construction; coordination with neighboring jurisdictions; evacuation compliance and shelter capacity.
  - Failure modes: insufficient evacuation capacity within target timeframes; bottlenecks at critical intersections; inadequate shelter planning.
- Three measurable indicators:
  1) Evacuation time to safety for high-risk populations (target: within 72 hours for 90% of at-risk residents).
  2) Percentage of transit network designed or upgraded to withstand flood/ surge events.
  3) On-time performance and reliability during severe-weather periods.
- Contingency trigger to escalate/reverse:
  - If evacuation modeling shows <80% reach within 72 hours for high-risk groups in multiple scenarios, escalate to supplemental regional sheltering and alternative evacuation strategies; pause non-critical corridor upgrades.
- Feasible fallback option:
  - Prioritize critical corridors with robust shelter and shelter-adjacent services; leverage regional routes and community-based evacuation centers while refining demand-based transit options.

Three plausible future scenarios (climate, economy, social stability) and robustness of proposed actions

Scenario A — Best-case (climate, economy, social stability all favorable)
- Climate: modest sea-level rise, fewer extreme surge events; storms less frequent; sediment dynamics favorable.
- Economy: robust growth, strong tax base, abundant federal grants; low interest rates; high political capital for long-term investments.
- Social stability: high civic trust; strong community-participation; effective governance.
- Robustness of actions:
  - Actions 1, 3, and 6 are highly robust due to broad benefits (infrastructure and evacuation reliability and shoreline protection) and modular funding. Action 2 remains robust because it directly protects vulnerable households, a clearly valued equity objective. Action 5 (financing/governance) is robust due to strong market access and grant opportunities. Action 4 (land-use planning) remains robust but relies on political buy-in, which is expected in this scenario.
- Why these hold up: ample funds, predictable hazard reductions, and strong legitimacy for equity-focused investments.

Scenario B — Expected-case (moderate climate risk, modest economy, mixed politics)
- Climate: gradual SLR and increased storm risk but within model expectations; some variability year-to-year.
- Economy: steady but constrained revenue with selective federal funding; local debt appetite cautious.
- Social stability: moderate polarization, but public appetite for resilience investments remains strong if equity benefits are clear.
- Robustness of actions:
  - Core resilience suite (Actions 1, 3, and 6) remains robust due to diversified benefits and modular funding possibilities; Action 5 (financing/governance) remains crucial to unlock funding; Action 2 (housing resilience) remains essential to avoid displacement and preserve social cohesion, though its pace may be slower. Action 4 (land-use planning) is feasible but contingent on ongoing political consensus; fallback options are available if consensus delays occur.
- Why these hold up: modular, staged investments with clear co-benefits respect revenue constraints; equity-focused components help maintain legitimacy.

Scenario C — Worst-case (accelerated climate risk, tight economy, social instability)
- Climate: higher SLR, more frequent extreme surges, higher uncertainty.
- Economy: persistent budget constraints, higher debt-service costs possible due to risk pricing; less federal support.
- Social stability: rising inequality, susceptibility to political volatility; potential protests over funding choices.
- Robustness of actions:
  - Actions with strongest intrinsic risk reduction and equity emphasis (Actions 1, 2, and 3) remain essential and are designed to be modular and scalable; Action 5 (financing/governance) is critical to maintain capital access, but may need to rely more on local resources and phased implementation. Action 4 (land-use) may be the most challenging but remains necessary for long-term risk reduction; fallback options (smaller scope, voluntary, and community-led) increase resilience in uncertainty.
- Why these hold up: resilience is more likely when core services and housing security are protected and when funding is diversified and modular; however, success depends on pragmatic governance and strong community partnerships.

Three biggest trade-offs and the most critical plan assumptions

Top three trade-offs you’ll be choosing among:
- Upfront capital expenditure vs long-term risk reduction: here we trade immediate budget hits for multi-decade resilience gains. The plan prioritizes actions with durable hazard reduction and co-benefits, accepting higher early costs supported by blended financing.
- Equity protection vs political/real estate incentives: ensuring protection for low-income waterfront residents (housing resilience, buyouts, refuge in place) may require difficult siting decisions, relocations, or subsidies that could upset some groups or property owners outside the priority zones. The plan seeks to mitigate this with transparent processes and community engagement, but it remains a political trade-off.
- Uniform city-wide resilience vs targeted front-loaded investments: some actions (like watershed-wide flood protection) may be more robust if deployed city-wide, but limited budgets force prioritization toward high-risk, high-impact areas and critical services first, with scalable expansion if funds allow.

Key assumptions critical to plan success:
- Availability and predictability of funding: federal grants, state programs, and interest-rate environments must align with the planned phasing; financing relies on willingness of markets to price municipal resilience risk favorably.
- Hazard projections and engineering performance: the estimated rainfall, surge, SLR trajectories, sediment dynamics, and climate resilience performance must align with the models used to define protection levels and retrofit requirements.
- Legal and regulatory feasibility: buyouts, zoning changes, and eminent-domain decisions require legal feasibility and community acceptance; without stable legal pathways, timelines can slip.
- Community engagement and equity outcomes: sustained trust and inclusive engagement are essential for acceptance of managed retreat and housing resilience; without robust engagement, implementation risk increases.
- Intergovernmental coordination: success depends on effective alignment with state and regional authorities, emergency management agencies, and neighboring jurisdictions for evacuation and regional financing.

Implementation structure and governance notes
- Phasing and sequencing: begin with Actions 1, 3, and 6 to establish core protection, critical service reliability, and evacuation readiness; layer in Actions 2, 4, and 5 as funding and political momentum allow, maintaining flexibility to scale up/down.
- Monitoring framework: for each action, track the three indicators, with annual reporting to city council and the public, including a mid-course adjustment protocol.
- Equity safeguards: establish a dedicated community advisory council for waterfront residents, a transparent grievance/appeals process for relocations, and guardrails ensuring rental protections and affordable housing commitments are enforceable.
- Legal/ethical guardrails: ensure rights of tenants and property owners are respected; use fair compensation for buyouts; avoid coercive displacement; maintain strong environmental justice considerations.

In sum
- The six actions present a balanced package of structural protection, housing resilience, critical infrastructure hardening, rational land-use planning, financing and governance, and transportation/evacuation resilience.
- The plan is designed to be scalable, modular, and resilient to different future conditions, with explicit triggers, fallbacks, and measurable indicators.
- The three most robust actions across scenarios are Actions 1 (coastal protection), 3 (infrastructure resilience), and 6 (evacuation/transport resilience), with Action 2 (housing resilience) and Action 5 (financing/governance) providing essential equity and fiscal stability, and Action 4 (land-use planning) ensuring long-term exposure reduction.

If you’d like, I can convert this into a compact one-page action memo for policymakers, or tailor cost assumptions and indicators to your city’s current budgets and procurement rules.

In [None]:
# Anthropic has a slightly different API, and Max Tokens is required

model_name = "claude-sonnet-4-5"

claude = Anthropic()
response = claude.messages.create(model=model_name, messages=messages, max_tokens=1000)
answer = response.content[0].text

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [None]:
gemini = OpenAI(api_key=google_api_key, base_url="https://generativelanguage.googleapis.com/v1beta/openai/")
model_name = "gemini-2.5-flash"

response = gemini.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [None]:
deepseek = OpenAI(api_key=deepseek_api_key, base_url="https://api.deepseek.com/v1")
model_name = "deepseek-chat"

response = deepseek.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [None]:
# Updated with the latest Open Source model from OpenAI

groq = OpenAI(api_key=groq_api_key, base_url="https://api.groq.com/openai/v1")
model_name = "openai/gpt-oss-120b"

response = groq.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)


## For the next cell, we will use Ollama

Ollama runs a local web service that gives an OpenAI compatible endpoint,  
and runs models locally using high performance C++ code.

If you don't have Ollama, install it here by visiting https://ollama.com then pressing Download and following the instructions.

After it's installed, you should be able to visit here: http://localhost:11434 and see the message "Ollama is running"

You might need to restart Cursor (and maybe reboot). Then open a Terminal (control+\`) and run `ollama serve`

Useful Ollama commands (run these in the terminal, or with an exclamation mark in this notebook):

`ollama pull <model_name>` downloads a model locally  
`ollama ls` lists all the models you've downloaded  
`ollama rm <model_name>` deletes the specified model from your downloads

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Super important - ignore me at your peril!</h2>
            <span style="color:#ff7800;">The model called <b>llama3.3</b> is FAR too large for home computers - it's not intended for personal computing and will consume all your resources! Stick with the nicely sized <b>llama3.2</b> or <b>llama3.2:1b</b> and if you want larger, try llama3.1 or smaller variants of Qwen, Gemma, Phi or DeepSeek. See the <A href="https://ollama.com/models">the Ollama models page</a> for a full list of models and sizes.
            </span>
        </td>
    </tr>
</table>

In [None]:
!ollama pull llama3.2

In [None]:
ollama = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')
model_name = "llama3.2"

response = ollama.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [None]:
# So where are we?

print(competitors)
print(answers)


In [None]:
# It's nice to know how to use "zip"
for competitor, answer in zip(competitors, answers):
    print(f"Competitor: {competitor}\n\n{answer}")


In [None]:
# Let's bring this together - note the use of "enumerate"

together = ""
for index, answer in enumerate(answers):
    together += f"# Response from competitor {index+1}\n\n"
    together += answer + "\n\n"

In [None]:
print(together)

In [None]:
judge = f"""You are judging a competition between {len(competitors)} competitors.
Each model has been given this question:

{question}

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}}

Here are the responses from each competitor:

{together}

Now respond with the JSON with the ranked order of the competitors, nothing else. Do not include markdown formatting or code blocks."""


In [None]:
print(judge)

In [None]:
judge_messages = [{"role": "user", "content": judge}]

In [None]:
# Judgement time!

openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-5-mini",
    messages=judge_messages,
)
results = response.choices[0].message.content
print(results)


In [None]:
# OK let's turn this into results!

results_dict = json.loads(results)
ranks = results_dict["results"]
for index, result in enumerate(ranks):
    competitor = competitors[int(result)-1]
    print(f"Rank {index+1}: {competitor}")

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/exercise.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Exercise</h2>
            <span style="color:#ff7800;">Which pattern(s) did this use? Try updating this to add another Agentic design pattern.
            </span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#00bfff;">Commercial implications</h2>
            <span style="color:#00bfff;">These kinds of patterns - to send a task to multiple models, and evaluate results,
            are common where you need to improve the quality of your LLM response. This approach can be universally applied
            to business projects where accuracy is critical.
            </span>
        </td>
    </tr>
</table>