## Welcome to the Second Lab - Week 1, Day 3

Today we will work with lots of models! This is a way to get comfortable with APIs.

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Important point - please read</h2>
            <span style="color:#ff7800;">The way I collaborate with you may be different to other courses you've taken. I prefer not to type code while you watch. Rather, I execute Jupyter Labs, like this, and give you an intuition for what's going on. My suggestion is that you carefully execute this yourself, <b>after</b> watching the lecture. Add print statements to understand what's going on, and then come up with your own variations.<br/><br/>If you have time, I'd love it if you submit a PR for changes in the community_contributions folder - instructions in the resources. Also, if you have a Github account, use this to showcase your variations. Not only is this essential practice, but it demonstrates your skills to others, including perhaps future clients or employers...
            </span>
        </td>
    </tr>
</table>

In [1]:
# Start with imports - ask ChatGPT to explain any package that you don't know

import os
import json
from dotenv import load_dotenv
from openai import OpenAI
from anthropic import Anthropic
from IPython.display import Markdown, display

In [2]:
# Always remember to do this!
load_dotenv(override=True)

True

In [3]:
# Print the key prefixes to help with any debugging

openai_api_key = os.getenv('OPENAI_API_KEY')
anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
google_api_key = os.getenv('GOOGLE_API_KEY')
deepseek_api_key = os.getenv('DEEPSEEK_API_KEY')
groq_api_key = os.getenv('GROQ_API_KEY')
openrouter_api_key = os.getenv('OPENROUTER_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set")
    
if anthropic_api_key:
    print(f"Anthropic API Key exists and begins {anthropic_api_key[:7]}")
else:
    print("Anthropic API Key not set (and this is optional)")

if google_api_key:
    print(f"Google API Key exists and begins {google_api_key[:2]}")
else:
    print("Google API Key not set (and this is optional)")

if deepseek_api_key:
    print(f"DeepSeek API Key exists and begins {deepseek_api_key[:3]}")
else:
    print("DeepSeek API Key not set (and this is optional)")

if groq_api_key:
    print(f"Groq API Key exists and begins {groq_api_key[:4]}")
else:
    print("Groq API Key not set (and this is optional)")

if openrouter_api_key:
    print(f"OpenRouter API Key exists and begins {openrouter_api_key[:8]}")
else:
    print("OpenRouter API Key not set (and this is optional)")
    

OpenAI API Key exists and begins sk-proj-
Anthropic API Key not set (and this is optional)
Google API Key exists and begins AI
DeepSeek API Key exists and begins sk-
Groq API Key not set (and this is optional)
OpenRouter API Key exists and begins sk-or-v1


In [4]:
request = "Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. "
request += "Answer only with the question, no explanation."
messages = [{"role": "user", "content": request}]

In [5]:
messages

[{'role': 'user',
  'content': 'Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. Answer only with the question, no explanation.'}]

In [6]:
openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-5-mini",
    messages=messages,
)
question = response.choices[0].message.content
print(question)


Read this incident report: "At 08:12 a delivery van arrived at the warehouse and was parked near the chemical storage; by 08:45 three employees reported headache and nausea; at 09:10 the chemical-storage alarm triggered; at 09:40 one employee was found unconscious; CCTV shows no one entering or leaving the chemical storage between 08:10–09:00; the van's manifest lists only innocuous office supplies." Based only on this report, what is the most likely explanation? For your answer: (A) state your single best hypothesis and give a concise causal timeline with a numeric probability estimate; (B) give three alternative hypotheses with probability estimates and concise supporting and contradicting evidence for each; (C) list and rank the top five pieces of additional data or tests that would best distinguish among these hypotheses and explain why; (D) identify potential cognitive biases or failure modes that could mislead an analyst here; and (E) propose a 24‑hour priority response plan that

In [7]:
competitors = []
answers = []
messages = [{"role": "user", "content": question}]

## Note - update since the videos

I've updated the model names to use the latest models below, like GPT 5 and Claude Sonnet 4.5. It's worth noting that these models can be quite slow - like 1-2 minutes - but they do a great job! Feel free to switch them for faster models if you'd prefer, like the ones I use in the video.

In [8]:
# The API we know well
# I've updated this with the latest model, but it can take some time because it likes to think!
# Replace the model with gpt-4.1-mini if you'd prefer not to wait 1-2 mins

model_name = "gpt-5-nano"

response = openai.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Below is a structured analytic response based strictly on the incident report provided. It treats this as a hypothetical exercise for reasoning and planning, not as field instructions.

A. Best hypothesis, concise causal timeline, and numeric probability

- Best hypothesis (single best): Accidental release of a volatile toxic chemical from the chemical storage area, contaminating nearby workspace and causing inhalation exposure to employees.

- Rationale (concise causal timeline with probabilities):
  - 08:12: Delivery van arrives near chemical storage. No loading of chemicals described; manifest lists innocuous office supplies.
  - 08:45: Three employees report headache and nausea, consistent with inhalation exposure to a volatile chemical.
  - 09:10: Chemical-storage alarm triggers, suggesting a rise in a detectable contaminant or concentration within storage or connected space.
  - 09:40: One employee unconscious, indicating potential higher exposure or a more dangerous agent present.
  - 08:10–09:00 CCTV window shows no entry/exit to the chemical storage, implying the source is internal to storage or a passive release (e.g., vapor leakage) rather than an intruder entering and handling material during that window.
  - 10:00+ (implied by sequence): responders begin life-safety actions and containment, consistent with a hazardous chemical release scenario.

- Numeric probability estimate for this best hypothesis: 60%
  - Why: The timing (symptoms about 30–50 minutes after arrival near storage), the subsequent alarm, and the rapid progression to unconsciousness fit a release or leakage of a volatile chemical from storage rather than a simple vehicle exhaust or a misdelivered cargo. The CCTV data reduce—but do not eliminate—the likelihood of an intruder entering during 08:10–09:00, making an internal storage release the most coherent explanation given the data.

B. Three alternative hypotheses (with probability estimates and supporting/contradicting evidence)

- H2. Insider/malicious release of chemicals from storage
  - Probability: 25%
  - Supporting evidence:
    - Three employees affected in quick succession could result from a deliberate release with a contained source inside storage.
    - The van’s manifest lists innocuous office supplies, which lowers suspicion on the van as a chemical source but does not prove absence of an insider with access to storage.
  - Contradicting evidence:
    - CCTV shows noone entering/leaving chemical storage between 08:10–09:00; insider release would require prior presence and exit within or before that window.
    - No explicit signs of tampering or unusual access noted in the report.
  - Summary: Plausible but requires an insider to interact with storage before 08:10 or during a window not captured by the provided CCTV frame; less supported by the available data but not excluded.

- H3. Unlisted hazardous cargo delivered in the van or mislabeling
  - Probability: 15%
  - Supporting evidence:
    - Manifest lists innocuous office supplies; if a hazardous material was mislabeled, mispacked, or not documented, it could cause a rapid exposure once near the storage or ventilation paths.
  - Contradicting evidence:
    - The van is reported to carry innocuous items; no documented offloading of hazardous materials.
    - The alarm is a chemical-storage alarm, not a vehicle-specific alarm; ambiguity remains about whether the cargo ever contacted storage or was released remotely.
  - Summary: Possible in principle, but the manifest discrepancy and lack of offloading details make this less likely without additional corroborating data.

- H4. Vehicle exhaust or external chemical gas infiltrating via ventilation (e.g., CO or irritant gas)
  - Probability: 10%
  - Supporting evidence:
    - A van parked near storage could contribute exhaust or ambient contaminants if the building’s ventilation draws in outside air or if exhaust leaks into the workspace. Headache and nausea align with some irritants and CO exposure; progression to unconsciousness is also plausible with certain agents or concentrations.
    - The absence of entry/exit to storage does not exclude a dispersion mechanism from outside or from the vehicle’s vicinity into occupied spaces.
  - Contradicting evidence:
    - The incident is described as a chemical-storage alarm event; if the contaminant were primarily from vehicle exhaust, spell-out ventilation or exposure patterns would need to align with HVAC behavior and outdoor conditions, which are not provided.
  - Summary: Credible as a secondary pathway but less likely as the primary cause given the data; would benefit from targeted outdoor/indoor air testing and HVAC assessment.

C. Top five data points or tests to distinguish among hypotheses (and why)

1) Real-time and final ambient air monitoring around storage and adjacent work areas
   - Why: Detects the presence and concentration of toxic gases or volatile chemicals; can distinguish internal storage leak (H1) from external infiltration (H4) and potential misdelivery (H3) by identifying specific agents.

2) Immediate HVAC and ventilation system assessment (air intake locations, filtration status, and whether HVAC was running or isolated during the event)
   - Why: Determines if external air or vehicle exhaust could have been drawn into the workspaces; helps support or refute H4 and informs containment strategies.

3) Expanded inventory reconciliation and MSDS cross-checks; traceability of all items in the warehouse (past 24–72 hours)
   - Why: Assesses the possibility of mislabeling, misdelivery, or a hidden hazardous substance being present (H2 or H3). Identifies potential chemicals stored near the storage area that could volatilize.

4) Extended surveillance analysis and access logs (previous 24 hours, including doors, equipment rooms, and any off-hours access)
   - Why: Tests the insider-release hypothesis (H2) by confirming or denying prior access to storage and potential tampering; helps close the timeline gaps that CCTV 08:10–09:00 may not resolve.

5) Vehicle assessment and external environment data (van inspection for leaks, exhaust, recent maintenance; wind direction/speed and outdoor pollutant sources)
   - Why: Evaluates whether the van could have contributed to an external contaminant source (H4) and whether atmospheric conditions would drive dispersion into occupied spaces.

D. Potential cognitive biases or failure modes to watch for

- Premature closure: Stopping analysis after identifying a storage leak as the default hypothesis without testing alternatives.
- Anchoring: Focusing on the alarm at 09:10 as the defining moment and filtering data to fit a leak narrative.
- Confirmation bias: Giving more weight to evidence that supports an internal leak while downplaying external sources (e.g., vehicle exhaust, HVAC pathways).
- Availability heuristic: Assuming “chemical storage release” because it’s the most straightforward explanation given a chemical environment, rather than thoroughly evaluating other plausible pathways (CO or misdelivery).
- Illusory correlation: Concluding causation from temporally proximate events (van arrival, symptoms) without sufficient cross-checks.
- Evidence fragmentation bias: Relying on the provided CCTV window (08:10–09:00) without requesting longer video or access logs that could reveal prior entries/activities.
- Hindsight bias: Framing the interpretation around the alarm trigger (09:10) after the fact, possibly underemphasizing earlier signals or alternative explanations.

E. 24-hour priority response plan (assumptions, uncertainties clearly stated)

Assumptions and uncertainties
- Assumptions:
  - The incident involves a hazardous chemical exposure affecting multiple workers in or near the chemical storage area.
  - The exact chemical is unknown from the report; alarms indicate a detectable contaminant in storage or adjacent spaces.
  - Life-safety is the immediate priority; containment and evidence preservation are essential to determine cause.
- Uncertainties:
  - Whether the hazard is due to an internal storage leak, vehicle exhaust, misdelivered/unlisted chemical, or an outside source.
  - Exact chemical identity, concentration, dispersion pattern, and potential for ongoing exposure.
  - The completeness of CCTV coverage and access logs within 08:00–09:30.

24-hour priority response plan (high level)
- Immediate life-safety actions (within minutes)
  - Evacuate affected and nearby personnel from the affected zones; establish a muster point away from the building and fresh air intake.
  - Provide immediate medical assessment and first aid; activate EMS for symptomatic individuals (headache, nausea, unconsciousness).
  - If available, begin administering 100% fresh air ventilation in the affected zone only after ensuring that this will not worsen exposure or spread contaminants to uninvolved areas; avoid creating drafts that could spread vapors.
  - Secure ignition sources and shut down nonessential electrical equipment if safe to do so; avoid creating sparks in unknown chemical environments.
  - Notify internal incident command, facilities, safety, and the local emergency response if the hazard is suspected to be toxic industrial chemicals or involves potential HazMat risk.

- Containment and isolation (within 0–4 hours)
  - Isolate chemical storage area and the surrounding zones; implement unintended-dispersion controls per site protocol.
  - Begin stabilization procedures for the storage area (valve closure if safe, container integrity checks) and isolate HVAC to prevent cross-zone contamination if directed by HazMat or safety leadership.
  - Establish a hot zone perimeter with clear access control; restrict entry to trained responders only.
  - Deploy initial air monitoring and sampling teams to characterize contaminants without compromising safety.

- Evidence preservation and documentation (ongoing)
  - Preserve CT (chain of custody) for any samples collected; do not reset alarms or tamper with the scene; photograph and document the scene before any intervention that could alter evidence (except life-safety actions).
  - Capture and preserve CCTV footage beyond 08:10–09:00 window; collect access logs, door sensors, and maintenance logs.
  - Collect air and surface samples from storage, adjacent work areas, and ventilation intakes/outlets as soon as safely possible.

- Medical triage and monitoring (0–24 hours)
  - Triage exposed workers; provide decontamination if a hazardous surface exposure is suspected.
  - Obtain health history, exposure timelines, and symptom progression for all affected workers; coordinate with occupational health for follow-up.
  - Monitor for delayed effects; have a plan for escalation if symptoms worsen or new cases appear.

- Investigation planning and communication (0–24 hours)
  - Establish a formal incident command structure (IC, safety officer, public information, liaison with HazMat/Fire, and facilities).
  - Prioritize rapid, targeted air testing and inventory reconciliation to distinguish among hypotheses (H1–H4).
  - Communicate with building occupants and nearby facilities about safety status, without causing undue alarm; provide instructions on when it is safe to re-enter and what to do if symptoms appear.

- 0–4 hours tasks (immediate)
  - Initiate real-time air monitoring in and around storage and adjacent spaces; begin external environmental checks (wind, ambient air quality).
  - Start inventory/MSDS review; request any available supplier data for recent deliveries.
  - Retrieve and review all access logs and CCTV, extend monitoring window if possible.

- 4–12 hours tasks (early containment and assessment)
  - Collect and analyze air samples; determine likely contaminants and concentrations.
  - Inspect storage equipment, seals, valves, and containment systems for leaks or failures.
  - Evaluate the van and building exhaust/ventilation pathways for potential cross-contamination.
  - Convene a joint field team with safety, facilities, and potentially HazMat to interpret data and adjust containment.

- 12–24 hours tasks (risk assessment and next steps)
  - Finalize identification of the contaminant(s) and confirm the exposure pathway(s).
  - Decide on temporary or permanent re-entry conditions, decontamination requirements, and remediation steps.
  - Prepare a concise incident report with causal hypotheses, data collected, and recommended corrective actions.
  - Update training, inventory controls, and security protocols to mitigate recurrence.

Notes on communicating uncertainties
- Clearly label the current hypothesis status (e.g., “best-supported hypothesis: internal storage leak”) and track alternative hypotheses with ongoing data.
- Provide daily briefings with confidence levels for each hypothesis as new data arrive.

If you’d like, I can tailor the plan to a specific site layout, available incident command structure, and the types of chemicals typically stored in your warehouse, or adapt the data-gathering steps to align with your standard operating procedures.

In [None]:
# Anthropic has a slightly different API, and Max Tokens is required

model_name = "claude-sonnet-4-5"

claude = Anthropic()
response = claude.messages.create(model=model_name, messages=messages, max_tokens=1000)
answer = response.content[0].text

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

In [33]:
print(request)

Please come up with a challenging, nuanced question that I can ask a number of LLMs to evaluate their intelligence. Answer only with the question, no explanation.


In [9]:
# Accessing Claude models through openrouter.ai
from openai import OpenAI
model_name = "anthropic/claude-sonnet-4.5"
client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key= openrouter_api_key,
)

completion = client.chat.completions.create(
  extra_headers={
    "HTTP-Referer": "https://openrouter.ai", # Optional. Site URL for rankings on openrouter.ai.
    "X-Title": "OpenRouter", # Optional. Site title for rankings on openrouter.ai.
  },
  model=model_name,
  messages=messages
)
answer = completion.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)


# Incident Analysis Report

## A. Primary Hypothesis

**Best Hypothesis:** Van exhaust fumes containing carbon monoxide (CO) entered the warehouse ventilation system when parked near chemical storage intake vents.

**Probability Estimate:** 65%

**Causal Timeline:**
1. 08:12 - Van parks near chemical storage with engine idling or poor ventilation positioning
2. 08:12-08:45 - CO/exhaust accumulates in chemical storage area via HVAC intake or building envelope gaps
3. 08:45 - Three employees in/near chemical storage exhibit classic CO poisoning symptoms (headache, nausea)
4. 09:10 - Chemical storage alarm triggers (possibly oxygen depletion or CO sensor)
5. 09:40 - One employee loses consciousness from continued exposure (severe CO poisoning)

---

## B. Alternative Hypotheses

### Alternative 1: Legitimate Chemical Leak from Storage (20% probability)

**Supporting Evidence:**
- Chemical storage alarm activation at 09:10
- Progressive symptom severity (headache→unconsciousness)
- Symptoms consistent with many toxic gas exposures

**Contradicting Evidence:**
- Timing correlation with van arrival is unlikely to be coincidental
- No visible breach mentioned
- CCTV shows no human activity that might disturb containers
- Alarm triggered 58 minutes after first symptoms (unusual delay for direct leak)

### Alternative 2: Concealed Hazardous Material in Van Delivery (10% probability)

**Supporting Evidence:**
- Symptoms began shortly after van arrival
- Manifest discrepancy could indicate smuggling/mislabeling
- Proximity to chemical storage area
- Progressive exposure pattern

**Contradicting Evidence:**
- Office supplies rarely off-gas severely toxic substances
- Would require leak/breach during specific 30-minute window
- Chemical storage alarm (not general area alarm) triggered
- No mention of van inspection revealing damage

### Alternative 3: Deliberate Chemical Release/Sabotage (5% probability)

**Supporting Evidence:**
- Controlled timing of events
- Chemical storage specifically affected
- No visible entry could suggest pre-positioned device

**Contradicting Evidence:**
- No evidence of entry or tampering
- Van arrival timing would be elaborate misdirection
- Multiple employees affected suggests area contamination, not targeted attack
- Unconscious employee suggests accidental exposure pattern rather than targeted attack

---

## C. Top Five Investigative Priorities (Ranked)

### 1. **Atmospheric Testing of Chemical Storage Area and Van**
**Why:** Directly identifies contaminant(s). Test for CO, CO₂, volatile organic compounds (VOCs), and known warehouse chemicals.
**Discriminatory Power:** Will immediately distinguish between exhaust-related (Alternative 1) vs. chemical leak (Alternative 2) vs. van cargo (Alternative 3). If elevated CO correlates with van position, confirms primary hypothesis.

### 2. **Blood/Biomarker Analysis of Affected Employees**
**Why:** Carboxyhemoglobin levels confirm CO poisoning; other biomarkers identify chemical exposure.
**Discriminatory Power:** CO-hemoglobin >10% strongly supports primary hypothesis and Alternative 1; other toxicology results point to Alternatives 2-4. Time-sensitive (CO-hemoglobin has ~4-6 hour half-life).

### 3. **HVAC System Inspection and Airflow Mapping**
**Why:** Identifies pathway for contaminant distribution and explains why chemical storage alarm triggered.
**Discriminatory Power:** If intake vents near van parking position feed chemical storage area, strongly supports primary hypothesis. Recirculation patterns explain employee exposure locations.

### 4. **Van Inspection: Engine Status, Cargo Examination, Thermal Imaging**
**Why:** Determines if engine was running, fuel type, exhaust system integrity, and verifies cargo contents.
**Discriminatory Power:** Engine temperature residue/witnesses confirm idling (supports primary); cargo discrepancies support Alternative 3; van damage supports Alternative 2.

### 5. **Chemical Storage Inventory Audit and Container Integrity Check**
**Why:** Identifies any breached containers, missing materials, or unusual configurations.
**Discriminatory Power:** Confirmed leak supports Alternative 2; inventory discrepancies suggest Alternative 4; intact storage contradicts Alternative 2.

---

## D. Cognitive Biases and Failure Modes

### 1. **Anchoring Bias**
Fixating on "chemical storage" in the report may cause overweighting of chemical leak scenarios while undervaluing mundane explanations (vehicle exhaust).

### 2. **Availability Heuristic**
Recent incidents or dramatic scenarios (sabotage, exotic chemical exposure) may seem more probable than they statistically are.

### 3. **Confirmation Bias**
Once hypothesis is selected, tendency to interpret ambiguous evidence (alarm timing, van presence) as supporting rather than testing that hypothesis.

### 4. **Correlation-Causation Confusion**
Van arrival and symptoms correlate temporally but multiple causal pathways exist. The coincidence could be genuine rather than causal.

### 5. **Outcome Bias**
Severe outcome (unconsciousness) may drive assumption of exotic/dramatic cause rather than common hazards (CO is leading cause of poisoning deaths).

### 6. **Neglect of Base Rates**
Industrial CO poisoning from vehicles is relatively common; deliberate chemical sabotage is extremely rare. Prior probabilities should weight accordingly.

### 7. **Search Satisfaction**
Finding one explanation (chemical leak) may terminate search prematurely before considering mechanical/environmental factors.

---

## E. 24-Hour Priority Response Plan

### Assumptions
- One unconscious employee is receiving emergency medical care
- Facility is operational but area is accessible
- Standard industrial safety equipment available
- Incident occurred today, ~2-4 hours ago based on report timing
- No confirmation yet of contaminant type

### Uncertainties
- Current atmospheric conditions in affected area
- Number of additional exposed employees
- Van current location and status
- Whether contaminant release is ongoing
- Building ventilation system status

---

### IMMEDIATE ACTIONS (Hour 0-2)

**Life Safety (Priority 1):**
- ✓ Evacuate chemical storage area and adjacent zones (100m radius)
- ✓ Account for all personnel; medical screening for anyone in building 08:00-10:00
- ✓ Deploy first responders with SCBA and multi-gas detectors
- ✓ Rush blood samples from all symptomatic employees for CO-hemoglobin and comprehensive toxicology panel
- ✓ Obtain clinical histories: exact timelines, locations, activities

**Containment (Priority 2):**
- ✓ Shut down HVAC system serving chemical storage area to prevent further distribution
- ✓ Secure van in place; prohibit engine start
- ✓ Establish hot/warm/cold zones with access control
- ✓ Deploy atmospheric monitoring at chemical storage, van location, HVAC intakes, and employee exposure locations

**Evidence Preservation (Priority 3):**
- ✓ Photograph van position, chemical storage layout, and HVAC configuration
- ✓ Secure CCTV footage (expand window to 07:00-11:00)
- ✓ Preserve van manifest and delivery documentation
- ✓ Document atmospheric readings with timestamps and GPS coordinates

---

### INVESTIGATIVE PHASE (Hour 2-8)

**Testing & Analysis:**
- Hour 2-3: Atmospheric survey results analyzed; safe entry determined
- Hour 3-4: If CO detected, inspect van exhaust system and position relative to air intakes
- Hour 3-4: If other chemicals detected, begin chemical storage inventory
- Hour 4-6: HVAC system tracer gas study to map airflow patterns
- Hour 5-8: Van cargo detailed inspection (X-ray if discrepancies suspected)
- Hour 6-8: Preliminary toxicology results from employees

**Interviews:**
- Van driver: arrival time, parking duration, engine status, observations
- Three symptomatic employees: precise locations, timeline, symptom progression
- Warehouse staff: HVAC operation, van parking protocols, chemical storage access
- Delivery company: cargo verification, vehicle maintenance records

---

### ANALYSIS PHASE (Hour 8-16)

**Data Integration:**
- Correlate atmospheric data with employee locations and symptom timing
- Map airflow patterns to contamination distribution
- Reconcile toxicology results with detected atmospheric contaminants
- Evaluate van inspection findings against manifest

**Hypothesis Refinement:**
- Calculate probability updates based on test results
- Identify remaining explanatory gaps
- Design targeted tests for unresolved questions

---

### REMEDIATION PLANNING (Hour 16-24)

**If Primary Hypothesis Confirmed (CO from van):**
- Implement vehicle idling policies near building air intakes
- Mark exclusion zones for delivery vehicles
- Install CO monitors in vulnerable areas
- Ventilate affected areas and confirm clearance
- Return to operations protocol

**If Alternative Hypothesis Confirmed:**
- Chemical leak: Engage hazmat remediation, inventory security protocols
- Van cargo issue: Regulatory notification, delivery process review, driver/company investigation
- Sabotage: Law enforcement notification, security enhancement, facility lockdown

**Documentation:**
- Incident reconstruction report with timeline
- Root cause analysis
- Corrective and preventive actions (CAPA)
- Regulatory notifications as required (OSHA, EPA, DOT depending on findings)

---

### Key Metrics for 24-Hour Success
1. All personnel medically cleared or under appropriate treatment
2. Contaminant identified with >90% confidence
3. Source controlled or confirmed inactive
4. Atmospheric clearance confirmed
5. Evidence chain of custody maintained for all samples
6. Preliminary root cause identified
7. Interim controls implemented to prevent recurrence

**Critical Decision Point (Hour 4):** If atmospheric/toxicology data is inconclusive, expand testing scope and extend evacuation until positive identification achieved. Life safety trumps operational continuity.

In [22]:
gemini = OpenAI(api_key=google_api_key, base_url="https://generativelanguage.googleapis.com/v1beta/openai/")
model_name = "gemini-3-pro-preview"

response = gemini.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Here is the analysis based on the provided incident report.

### (A) Best Hypothesis: Carbon Monoxide (CO) Infiltration from Idling Vehicle
**Probability:** 65%

The most likely explanation is that the delivery van, left idling near the chemical storage area (likely near an air intake or loading dock door), released exhaust fumes containing Carbon Monoxide into the facility.

**Causal Timeline:**
1.  **08:12:** Van arrives and parks near the chemical storage/air intake; driver leaves engine idling (or faulty exhaust system present).
2.  **08:12–08:45:** Exhaust fumes are drawn into the building ventilation or drift through gaps, accumulating in the workspace.
3.  **08:45:** CO levels reach symptomatic thresholds (approx. 70–100 ppm); three sensitive employees report early symptoms (headache, nausea).
4.  **09:10:** CO gas migrates into the chemical storage room or concentrations rise high enough to trigger cross-sensitive electrochemical sensors in the chemical alarm system.
5.  **09:40:** Continued exposure raises blood carboxyhemoglobin levels, causing the remaining employee (likely closer to the source or with higher exertion) to lose consciousness.

***

### (B) Alternative Hypotheses

**1. Spontaneous Failure of Chemical Containment (Internal Leak)**
*   **Probability:** 20%
*   **Hypothesis:** A container inside the chemical storage failed due to age, heat, or pressure, releasing volatile organic compounds (VOCs).
*   **Supporting Evidence:** The chemical storage alarm triggered; symptoms match toxic inhalation.
*   **Contradicting Evidence:** Symptoms appeared (08:45) *before* the alarm triggered (09:10), suggesting the source was external to the alarmed room or the leak was too slow to trigger the alarm immediately but fast enough to cause symptoms (unlikely combination); CCTV proves no human caused the spill.

**2. Undeclared Hazardous Material Leak (Van Cargo)**
*   **Probability:** 10%
*   **Hypothesis:** The van’s manifest is fraudulent or incorrect; it is carrying volatile chemicals that began leaking upon arrival.
*   **Supporting Evidence:** High temporal correlation between van arrival and symptom onset; proximity to the affected area.
*   **Contradicting Evidence:** Manifest lists "office supplies"; requires two failures (leaking cargo + false documentation); implies the gas penetrated the building envelope rapidly without being seen (smoke/vapor).

**3. HVAC Failure / "Sick Building" Event (Coincidental)**
*   **Probability:** 5%
*   **Hypothesis:** A coincidental failure in the building’s HVAC system recirculated sewer gas or stagnant air, unrelated to the van.
*   **Supporting Evidence:** Explains symptoms in multiple employees; explains why no one entered the storage room.
*   **Contradicting Evidence:** Does not explain the specific triggering of the chemical storage alarm; the severity (unconsciousness) is high for typical HVAC failures; ignores the strong correlation with the van's arrival.

***

### (C) Top Five Additional Data/Tests

1.  **Blood Gas Analysis (Carboxyhemoglobin levels) of Victims:**
    *   *Why:* This is the definitive biological differentiator. High COHb confirms Hypothesis A (Exhaust). Presence of specific toxins confirms Hypothesis B or 2.
2.  **Specific Telemetry/Logs from the Chemical Alarm:**
    *   *Why:* Did it detect a specific substance (e.g., Chlorine, Ammonia) or simply "Low Oxygen" / "Combustible Gas"? A generic or cross-sensitive sensor supports the CO theory; a specific chemical match supports the internal leak.
3.  **Atmospheric Monitoring (Multi-gas Meter) at the Intake/Loading Bay:**
    *   *Why:* Testing the air immediately outside the storage room near the van for CO, O2 levels, and VOCs will confirm if the source is external (the van) or internal.
4.  **Visual Inspection of the Delivery Van:**
    *   *Why:* Check if the engine is running/warm, if the tailpipe is directed toward a vent, or if there are liquids dripping from the cargo area (checking Hypothesis 2).
5.  **Inventory Audit of Chemical Storage (via Remote/Hazmat Entry):**
    *   *Why:* To visually confirm if any internal drums/tanks have burst. This definitively rules Hypothesis B in or out.

***

### (D) Cognitive Biases and Failure Modes

*   **Anchoring Bias:** The term "Chemical Storage" and the "Chemical Alarm" strongly anchor the analyst to an internal chemical spill. This may lead to overlooking "mundane" external threats like vehicle exhaust.
*   **Correlation vs. Causation:** Assuming the van is the cause solely because of the timeline (Post hoc ergo propter hoc). While likely, it could distract from a coincidental but deadly internal pipe rupture.
*   **Confirmation Bias:** If the analyst believes the manifest is true ("innocuous supplies"), they may ignore the van entirely. Conversely, if they distrust the driver, they may assume a chemical attack/smuggling without evidence.
*   **Representation Heuristic:** "Unconscious employee" + "Alarm" typically represents a toxic spill in mental models, potentially delaying the diagnosis of simple asphyxiation or CO poisoning, which requires different medical treatment (pure Oxygen vs. specific antidotes).

***

### (E) 24-Hour Priority Response Plan

**Assumptions:** The environment is IDLH (Immediately Dangerous to Life and Health); the unconscious employee is alive but critical; the nature of the gas is currently unknown but suspected to be an inhalation hazard.

**Phase 1: Immediate Safety & Rescue (Hours 0–2)**
1.  **Evacuation:** Immediate total building evacuation upwind of the van/storage area.
2.  **Rescue Entry:** Fire/EMS to perform rescue of the unconscious employee using **SCBA (Self-Contained Breathing Apparatus)** gear. *Do not allow entry without respiratory protection.*
3.  **Medical Triage:** Administer high-flow oxygen to all symptomatic employees immediately (treating for potential CO or hypoxia) while awaiting tox screen.
4.  **Source Control (Provisional):** If the van is running, instruct emergency responders to shut off the engine (wearing PPE). If the van is off, seal the building air intakes.

**Phase 2: Containment & Investigation (Hours 2–6)**
5.  **Hazmat Sweep:** Hazmat team to enter with multi-gas detectors. Priority: Measure CO, O2, and VOCs in the storage room and near the loading dock.
6.  **Secure the Van:** Cordon off the delivery van. Do not open the cargo hold without Hazmat backup (to rule out Hypothesis 2).
7.  **Data Retrieval:** Pull the alarm logs to identify exactly *what* triggered the sensor at 09:10.

**Phase 3: Remediation & Preservation (Hours 6–24)**
8.  **Ventilation:** Once the source is identified and stopped, mechanically ventilate the building.
9.  **Evidence Preservation:** Secure CCTV footage, the van driver’s log, and the chemical storage inventory records for legal/insurance review.
10. **Debrief:** Match medical findings (blood work) with atmospheric readings to confirm the hypothesis. If CO is confirmed, install CO detectors at air intakes to prevent recurrence. If chemical leak is confirmed, review storage maintenance records.

In [11]:
deepseek = OpenAI(api_key=deepseek_api_key, base_url="https://api.deepseek.com/v1")
model_name = "deepseek-chat"

response = deepseek.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

Based solely on the provided report, here is a structured analysis.

---

### **(A) Single Best Hypothesis**

**Hypothesis:** The delivery van was carrying or emitting a toxic substance (e.g., a gas, vapor, or aerosol) that drifted from its parking location near the chemical storage, causing the initial symptoms. The chemical-storage alarm later triggered due to either a coincidental leak from within (possibly unrelated) or due to the same external contaminant being drawn into the storage ventilation/sensors. The unconscious employee suggests progressive or concentrated exposure.

**Causal Timeline:**
1. **08:12** – Van arrives, parks near chemical storage. Unknown toxic substance begins to disperse (e.g., from its cargo, exhaust, or a spill/leak on the van).
2. **08:12–08:45** – Employees in vicinity inhale contaminant; symptoms develop.
3. **08:45** – Three employees report headache/nausea.
4. **09:10** – Chemical-storage alarm triggers, possibly due to contaminant infiltrating storage air intake or a separate internal leak.
5. **09:40** – One employee found unconscious (worsening exposure or individual susceptibility).

**Probability Estimate:** 45% — Plausible and parsimonious, but relies on an unconfirmed contaminant from the van.

---

### **(B) Three Alternative Hypotheses**

**1. Hypothesis: Coincidental leak from chemical storage starting before 08:12, with delayed alarm.**
- **Probability:** 30%
- **Supporting Evidence:** Symptoms align with chemical exposure; alarm eventually triggered; van’s manifest shows innocuous supplies.
- **Contradicting Evidence:** CCTV shows no one entered storage (so leak would be spontaneous or from earlier activity); symptoms appeared *after* van arrival (temporal link); van parked “near” storage could be red herring.

**2. Hypothesis: “Innocuous” van cargo underwent a hazardous reaction (e.g., off-gassing, decomposition) or was mislabeled/hidden.**
- **Probability:** 15%
- **Supporting Evidence:** Symptoms began after van arrival; manifest could be inaccurate or incomplete; office supplies could include aerosols, adhesives, or batteries that release toxins if damaged.
- **Contradicting Evidence:** No report of visible spill or odor from van; unconsciousness suggests strong exposure possibly requiring more than typical office supply off-gassing.

**3. Hypothesis: Mass psychogenic illness or unrelated medical event coinciding with a false chemical-storage alarm.**
- **Probability:** 5%
- **Supporting Evidence:** No one seen entering storage; alarm could be faulty or triggered by dust/other; symptoms (headache, nausea) are nonspecific.
- **Contradicting Evidence:** Unconscious employee is a serious objective finding that strongly suggests physiological toxin exposure; temporal clustering around van arrival and alarm is unlikely to be purely coincidental.

**Remaining 5%** to other possibilities (e.g., van exhaust fumes directed into HVAC intake, unrelated food poisoning plus independent alarm).

---

### **(C) Top Five Additional Data/Tests (Ranked)**

1. **Immediate air monitoring data** (from portable gas detectors) around the van, chemical storage, and affected employees’ work areas — identifies specific toxins and gradient.
2. **Medical evaluation/toxicology** of affected employees (especially unconscious one) — confirms chemical exposure vs. other illness, identifies toxin class.
3. **Review of van interior/cargo** and driver interview — checks for hidden chemicals, spills, or unusual conditions.
4. **HVAC system logs and intake locations** relative to van parking — determines if airborne contaminants could have been drawn into building or storage.
5. **Chemical storage inventory logs and recent activity** (pre-08:10) — checks for missing items, unstable compounds, or leaks that began earlier.

---

### **(D) Cognitive Biases & Failure Modes**

- **Confirmation bias:** Focusing on the chemical storage because the alarm went off, ignoring the van as source.
- **Temporal proximity bias:** Assuming the van caused everything simply because it arrived before symptoms.
- **Anchoring:** Fixating on the first hypothesis (e.g., “chemical storage leak”) and not updating given CCTV evidence.
- **Overlooking negative evidence:** Dismissing the “innocuous” manifest too quickly or too completely.
- **Single-cause fallacy:** Assuming all events (symptoms, alarm, unconsciousness) must have one cause; could be two separate incidents (van fumes + unrelated alarm).
- **Availability bias:** Previous incidents involving chemical storage might lead to over-weighting that hypothesis.

---

### **(E) 24‑Hour Priority Response Plan**

**Assumptions:**  
- Unconscious employee has been removed and received emergency medical care.  
- Area around van and warehouse is currently evacuated but not yet secured.  
- Chemical storage alarm may still be active; storage integrity unknown.  
- Authorities (HAZMAT, OSHA, police) have been notified but may not be on scene yet.

**Uncertainties:**  
- Identity and concentration of toxic substance(s).  
- Whether source is ongoing.  
- Whether contamination is localized or spread through HVAC.  
- Accuracy of van manifest and driver knowledge.

**Plan:**

**First Hour:**  
1. **Life‑Safety:** Ensure all employees accounted for; isolate symptomatic individuals for decontamination triage and medical transport. Provide first responders with timeline and locations.  
2. **Containment:** Establish hot/warm/cold zones based on initial air monitoring if possible. Keep van and chemical storage area isolated; shut down HVAC to prevent spread.  
3. **Evidence Preservation:** Secure CCTV footage, van keys, shipping documents, employee shift logs. Restrict access to entire scene.

**Hours 2–6:**  
4. **HAZMAT entry** to conduct air sampling in storage and around van. Check van cargo vs. manifest.  
5. **Medical liaison** obtains preliminary diagnosis/toxicology clues from hospital.  
6. **Interview driver and affected employees** (if medically able) about odors, symptoms onset, etc.

**Hours 7–24:**  
7. **Forensic analysis** of samples; verify chemical storage integrity and inventory.  
8. **Regulatory/legal reporting** initiated.  
9. **Preliminary root‑cause analysis** with all gathered data to decide on reopening facility or further investigation.  
10. **Communicate** findings to stakeholders with emphasis on facts vs. speculation.

In [None]:
# Updated with the latest Open Source model from OpenAI

groq = OpenAI(api_key=groq_api_key, base_url="https://api.groq.com/openai/v1")
model_name = "openai/gpt-oss-120b"

response = groq.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)


In [12]:
# Accessing Opensourced Openai GPT model through openrouter.ai
from openai import OpenAI
model_name = "openai/gpt-oss-120b"
client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key= openrouter_api_key,
)

completion = client.chat.completions.create(
  extra_headers={
    "HTTP-Referer": "https://openrouter.ai", # Optional. Site URL for rankings on openrouter.ai.
    "X-Title": "OpenRouter", # Optional. Site title for rankings on openrouter.ai.
  },
  model=model_name,
  messages=messages
)
answer = completion.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

**A. Single‑best hypothesis (≈ 65 % probability)**  

| Time (hh:mm) | Event (causal link) |
|--------------|----------------------|
| 08:12 – 08:15 | Delivery van parks next to chemical‑storage area. The van’s trailer is sealed but its engine or an attached “auxiliary heater” is running, venting a volatile organic compound (VOC) or low‑level toxic gas (e.g., carbon monoxide, hydrogen sulfide, or a solvent vapour) into the adjacent warehouse air. |
| 08:20 – 08:40 | Gas diffuses under the slight negative pressure of the storage building and accumulates at floor level (heavier‑than‑air gases settle). Three employees entering the aisle near the storage notice headache & nausea – classic early symptoms of inhalation exposure. |
| 08:45 – 09:00 | Concentration climbs; the warehouse’s fixed‑point chemical‑storage alarm (set to trigger at a predefined vapour‑density threshold) finally activates at 09:10. |
| 09:10 – 09:35 | Personnel evacuate but one worker, already compromised, collapses unconscious at 09:40. |
| 09:40 – 10:00 | No door activity is seen on CCTV because the source is external (the van) and the storage doors remain closed. |

*Why this fits the data*  

- **Van presence** is the only new element in the timeline; its manifest of “innocuous office supplies” does not preclude a hidden gas source.  
- **Rapid onset of symptoms** (within ~30 min) is typical of acute inhalation, not of a slow chemical leak from sealed containers.  
- **No entry/exit** to the storage area eliminates an internal breach; the alarm triggers **after** symptoms, consistent with a rising external concentration.  

*Uncertainties*  

- Exact nature of the emitting device (engine, heater, hidden container) is unknown.  
- The chemical‑storage alarm type (CO, H₂S, VOC) is not specified, limiting precise toxicology.  

---

**B. Three alternative hypotheses**

| # | Hypothesis | Probability | Supporting evidence (from report) | Contradicting evidence |
|---|------------|-------------|-----------------------------------|------------------------|
| 1 | **Intentional release of a toxic agent hidden in the delivery (e.g., a covert “sick‑bag” device).** | 15 % | • Van arrives exactly when symptoms start.<br>• Manifest lists “innocuous” items – could be a cover. | • CCTV shows no unloading activity inside storage; a concealed device would likely be placed inside or near the storage, not just parked outside. |
| 2 | **Spontaneous leak from the chemical‑storage itself (e.g., a valve failure) that was not captured on CCTV because the leak was through a vent or pipe.** | 12 % | • Alarm triggered (indicates an on‑site hazard).<br>• Symptoms match exposure to the stored chemicals. | • No door traffic; however, a leak could occur without entry, but the timeline shows symptoms **before** the alarm, which is atypical for a sealed leak that usually triggers the alarm first. |
| 3 | **Coincidental unrelated medical event (e.g., food‑borne illness) that happened to coincide with the van’s arrival.** | 8 % | • Headache and nausea are non‑specific symptoms.<br>• The van’s manifest truly contains only office supplies. | • The clustering of symptoms within minutes of each other, the alarm activation, and an unconscious employee strongly point to an environmental trigger rather than a random illness. |

*All probabilities sum to ~100 % (65 + 15 + 12 + 8 = 100).*

---

**C. Top five data points / tests to discriminate among the hypotheses**

| Rank | Data / Test | Rationale |
|------|------------|-----------|
| 1 | **Portable multi‑gas detector sweep of the van exterior, parking area, and immediate vicinity of the storage building** (CO, H₂S, VOCs, NH₃, etc.) | Directly confirms or rules out an *external* gas source (Hypothesis A). A positive reading that matches the alarm sensor type would strongly support A; a negative reading would shift weight to internal leak (B) or concealed device (1). |
| 2 | **Analysis of the chemical‑storage alarm log** (type of sensor, threshold, exact time of trigger, alarm‑history) | Determines *what* the alarm sensed (CO vs. H₂S vs. VOC). If it matches a known toxic agent that can be emitted by a vehicle (e.g., CO), it backs A; if it matches a stored chemical (e.g., chlorine), it backs B. |
| 3 | **Inspection of the delivery van** (engine off/on, fuel‑type, presence of portable heater, any sealed containers in cargo area) and review of the driver’s manifest & CCTV from loading dock | Looks for hidden or inadvertent sources (covert device or faulty equipment). Finding a running engine/ heater would confirm A; a concealed canister would point to 1. |
| 4 | **Air‑sampling and toxicology of the three affected employees** (blood COHb, H₂S metabolites, VOC metabolites) | Provides *biological* evidence of the inhaled agent. A high COHb would seal A; presence of chlorine metabolites would support B; unusual toxins would raise suspicion of 1. |
| 5 | **Review of storage‑area ventilation and pressure differentials** (duct flow, negative pressure status) | If the building is under negative pressure, external gases can be drawn in, reinforcing A. If the system is neutral or positive, an internal leak (B) becomes more plausible. |

---

**D. Potential cognitive biases / failure modes**

| Bias / Failure mode | How it could mislead the analyst |
|---------------------|---------------------------------|
| **Anchoring** – fixing on the first obvious element (the van) and neglecting other possibilities. | May cause over‑confidence in hypothesis A before testing the alarm’s nature. |
| **Confirmation bias** – seeking evidence that supports the preferred hypothesis (e.g., “the van must be the cause”). | Could lead to ignoring contradictory signs such as a known leak in the storage system. |
| **Availability heuristic** – recent high‑profile chemical‑release incidents may make the analyst over‑estimate intentional sabotage (hypothesis 1). | May inflate the probability of a covert attack despite limited evidence. |
| **Groupthink / diffusion of responsibility** – if multiple responders assume “the alarm will handle it,” vital evidence (e.g., gas samples) may not be collected. | Reduces thoroughness of data collection. |
| **Premature closure** – deciding the case is solved once symptoms are explained, preventing further investigation (e.g., missing a secondary source). | Could leave a lingering hazard unaddressed. |
| **Risk‑homeostasis** – assuming that because workers evacuated after the alarm, the situation is safe, leading to delayed containment actions. | May allow continued exposure if the source persists. |

---

**E. 24‑hour priority response plan (life‑safety > containment > evidence‑preservation)**  

*Assumptions*  
- The warehouse has a functional fire‑suppression system but no dedicated hazmat ventilation.  
- Emergency services (EMS, HazMat) are within 30 min response time.  
- The chemical‑storage alarm is known to be a *fixed‑point* CO/H₂S/VOC detector (type to be confirmed).  
- The van is still on site (or can be quickly located).

*Uncertainties*  
- Exact toxic agent.  
- Whether the source continues to emit after engine shutdown.  

| Timeframe | Action | Responsible party | Rationale / Expected Outcome |
|-----------|--------|-------------------|------------------------------|
| **0–1 h** | **Immediate evacuation** of all personnel from the warehouse and adjacent zones; establish a safety perimeter (≥50 m). | Site Safety Officer & Facility Manager | Prevent further exposure; protect life. |
|           | **EMS triage** of the three symptomatic employees on‑site; transport the unconscious worker to nearest hospital with “possible toxic inhalation” pre‑alert. | On‑site First‑Aid Team & EMS | Life‑saving medical care. |
|           | **Shut off** all ventilation fans & HVAC that could distribute airborne contaminants; set the building to “seal” mode. | Facilities Engineer | Contain any airborne plume. |
| **1–3 h** | **Deploy portable multi‑gas detector** team (trained HazMat technicians) to sweep the van, parking area, and building interior. Record concentrations and sensor type. | External HazMat Unit | Identify the agent (supports A vs B vs 1). |
|           | **Secure the delivery van**: stop engine, disconnect any auxiliary power, and, if safe, collect the vehicle for forensic examination. Photograph and video all exterior components. | Security Lead & HazMat Unit | Preserve potential evidence; stop ongoing emission. |
| **3–6 h** | **Collect and submit** air‑samples (canisters) from inside the storage area and the adjacent aisle for laboratory analysis. | Environmental Health & Safety (EHS) Lab | Definitive chemical identification. |
|           | **Obtain** the alarm’s event log and sensor specification (type, threshold). | Facility Control Room Operator | Correlate alarm trigger with agent detected. |
| **6–12 h** | **Medical evaluation** of all workers (blood COHb, H₂S biomarkers, VOC metabolites). | Occupational Health Provider | Confirm exposure and guide treatment. |
|           | **Inspect** the chemical‑storage system (valves, vent lines, pressure gauges) for integrity; look for any signs of a leak or pressure loss. | Maintenance Team | Rule out internal leak (hypothesis B). |
| **12–18 h** | **If gas is confirmed external** (e.g., CO), order the van’s removal to a secure, ventilated area and conduct a full forensic search for hidden containers or tampering. | Logistics & HazMat | Preserve evidence for potential criminal investigation. |
|           | **If internal leak** is identified, isolate the storage area, engage containment crews to stop the leak, and begin de‑contamination. | HazMat Containment Team | Prevent further release. |
| **18–24 h** | **Brief senior management** with findings, probability updates, and recommended next steps (e.g., law‑enforcement involvement, long‑term monitoring). | Incident Commander | Align decision‑making with updated risk picture. |
|           | **Document** all actions, timestamps, photographs, and chain‑of‑custody for evidence; store data in a secure, read‑only repository. | Incident Documentation Officer | Preserve evidentiary integrity for later investigation. |
|           | **Restore** limited safe access to unaffected portions of the warehouse only after clearance from HazMat and medical teams. | Facility Operations | Resume essential business while maintaining safety. |

*Key priorities*  

1. **Life‑safety** – rapid evacuation, medical care, and isolation of the potential plume.  
2. **Containment** – stop any continuing emission (engine off, ventilation shut) and secure the source.  
3. **Evidence preservation** – photograph, video, and lock‑down the van, alarm panel, and storage area before any de‑contamination or repair work.  

*Contingency*: If portable detection indicates a highly lethal agent (e.g., chlorine, phosgene), immediately **declare a full hazardous‑materials emergency**, request specialized hazmat teams, and consider **large‑scale shelter‑in‑place** for neighboring facilities.  

---  

**Bottom line:** The most parsimonious explanation, given the timeline and the lack of internal movement, is that a toxic gas emitted from the delivery van (or a device attached to it) infiltrated the warehouse, causing acute symptoms and the alarm. The outlined data collection, bias awareness, and 24‑hour response plan provide a systematic path to confirm or refute this hypothesis while safeguarding personnel, limiting the hazard, and protecting critical evidence.

## For the next cell, we will use Ollama

Ollama runs a local web service that gives an OpenAI compatible endpoint,  
and runs models locally using high performance C++ code.

If you don't have Ollama, install it here by visiting https://ollama.com then pressing Download and following the instructions.

After it's installed, you should be able to visit here: http://localhost:11434 and see the message "Ollama is running"

You might need to restart Cursor (and maybe reboot). Then open a Terminal (control+\`) and run `ollama serve`

Useful Ollama commands (run these in the terminal, or with an exclamation mark in this notebook):

`ollama pull <model_name>` downloads a model locally  
`ollama ls` lists all the models you've downloaded  
`ollama rm <model_name>` deletes the specified model from your downloads

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Super important - ignore me at your peril!</h2>
            <span style="color:#ff7800;">The model called <b>llama3.3</b> is FAR too large for home computers - it's not intended for personal computing and will consume all your resources! Stick with the nicely sized <b>llama3.2</b> or <b>llama3.2:1b</b> and if you want larger, try llama3.1 or smaller variants of Qwen, Gemma, Phi or DeepSeek. See the <A href="https://ollama.com/models">the Ollama models page</a> for a full list of models and sizes.
            </span>
        </td>
    </tr>
</table>

In [None]:
!ollama pull llama3.2

In [None]:
ollama = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')
model_name = "llama3.2"

response = ollama.chat.completions.create(model=model_name, messages=messages)
answer = response.choices[0].message.content

display(Markdown(answer))
competitors.append(model_name)
answers.append(answer)

: 

In [23]:
# So where are we?

print(competitors)
print(answers)


['gpt-5-nano', 'anthropic/claude-sonnet-4.5', 'gemini-2.5-flash', 'deepseek-chat', 'openai/gpt-oss-120b', 'gemini-3-pro-preview']
['Below is a structured analytic response based strictly on the incident report provided. It treats this as a hypothetical exercise for reasoning and planning, not as field instructions.\n\nA. Best hypothesis, concise causal timeline, and numeric probability\n\n- Best hypothesis (single best): Accidental release of a volatile toxic chemical from the chemical storage area, contaminating nearby workspace and causing inhalation exposure to employees.\n\n- Rationale (concise causal timeline with probabilities):\n  - 08:12: Delivery van arrives near chemical storage. No loading of chemicals described; manifest lists innocuous office supplies.\n  - 08:45: Three employees report headache and nausea, consistent with inhalation exposure to a volatile chemical.\n  - 09:10: Chemical-storage alarm triggers, suggesting a rise in a detectable contaminant or concentration 

In [24]:
# It's nice to know how to use "zip"
for competitor, answer in zip(competitors, answers):
    print(f"Competitor: {competitor}\n\n{answer}")


Competitor: gpt-5-nano

Below is a structured analytic response based strictly on the incident report provided. It treats this as a hypothetical exercise for reasoning and planning, not as field instructions.

A. Best hypothesis, concise causal timeline, and numeric probability

- Best hypothesis (single best): Accidental release of a volatile toxic chemical from the chemical storage area, contaminating nearby workspace and causing inhalation exposure to employees.

- Rationale (concise causal timeline with probabilities):
  - 08:12: Delivery van arrives near chemical storage. No loading of chemicals described; manifest lists innocuous office supplies.
  - 08:45: Three employees report headache and nausea, consistent with inhalation exposure to a volatile chemical.
  - 09:10: Chemical-storage alarm triggers, suggesting a rise in a detectable contaminant or concentration within storage or connected space.
  - 09:40: One employee unconscious, indicating potential higher exposure or a mor

In [25]:
# Let's bring this together - note the use of "enumerate"

together = ""
for index, answer in enumerate(answers):
    together += f"# Response from competitor {index+1}\n\n"
    together += answer + "\n\n"

In [26]:
print(together)

# Response from competitor 1

Below is a structured analytic response based strictly on the incident report provided. It treats this as a hypothetical exercise for reasoning and planning, not as field instructions.

A. Best hypothesis, concise causal timeline, and numeric probability

- Best hypothesis (single best): Accidental release of a volatile toxic chemical from the chemical storage area, contaminating nearby workspace and causing inhalation exposure to employees.

- Rationale (concise causal timeline with probabilities):
  - 08:12: Delivery van arrives near chemical storage. No loading of chemicals described; manifest lists innocuous office supplies.
  - 08:45: Three employees report headache and nausea, consistent with inhalation exposure to a volatile chemical.
  - 09:10: Chemical-storage alarm triggers, suggesting a rise in a detectable contaminant or concentration within storage or connected space.
  - 09:40: One employee unconscious, indicating potential higher exposure or

In [27]:
judge = f"""You are judging a competition between {len(competitors)} competitors.
Each model has been given this question:

{question}

Your job is to evaluate each response for clarity and strength of argument, and rank them in order of best to worst.
Respond with JSON, and only JSON, with the following format:
{{"results": ["best competitor number", "second best competitor number", "third best competitor number", ...]}}

Here are the responses from each competitor:

{together}

Now respond with the JSON with the ranked order of the competitors, nothing else. Do not include markdown formatting or code blocks."""


In [28]:
print(judge)

You are judging a competition between 6 competitors.
Each model has been given this question:

Read this incident report: "At 08:12 a delivery van arrived at the warehouse and was parked near the chemical storage; by 08:45 three employees reported headache and nausea; at 09:10 the chemical-storage alarm triggered; at 09:40 one employee was found unconscious; CCTV shows no one entering or leaving the chemical storage between 08:10–09:00; the van's manifest lists only innocuous office supplies." Based only on this report, what is the most likely explanation? For your answer: (A) state your single best hypothesis and give a concise causal timeline with a numeric probability estimate; (B) give three alternative hypotheses with probability estimates and concise supporting and contradicting evidence for each; (C) list and rank the top five pieces of additional data or tests that would best distinguish among these hypotheses and explain why; (D) identify potential cognitive biases or failure 

In [29]:
judge_messages = [{"role": "user", "content": judge}]

In [30]:
# Judgement time!

openai = OpenAI()
response = openai.chat.completions.create(
    model="gpt-5-mini",
    messages=judge_messages,
)
results = response.choices[0].message.content
print(results)


{"results": ["2", "5", "6", "1", "3", "4"]}


In [31]:
# OK let's turn this into results!

results_dict = json.loads(results)
ranks = results_dict["results"]
for index, result in enumerate(ranks):
    competitor = competitors[int(result)-1]
    print(f"Rank {index+1}: {competitor}")

Rank 1: anthropic/claude-sonnet-4.5
Rank 2: openai/gpt-oss-120b
Rank 3: gemini-3-pro-preview
Rank 4: gpt-5-nano
Rank 5: gemini-2.5-flash
Rank 6: deepseek-chat


In [32]:
# Accessing Claude models through openrouter.ai
from openai import OpenAI
model_name = "anthropic/claude-sonnet-4.5"
client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key= openrouter_api_key,
)

completion = client.chat.completions.create(
  extra_headers={
    "HTTP-Referer": "https://openrouter.ai", # Optional. Site URL for rankings on openrouter.ai.
    "X-Title": "OpenRouter", # Optional. Site title for rankings on openrouter.ai.
  },
  model=model_name,
  messages=judge_messages
)
result = completion.choices[0].message.content
print(result)


```json
{"results": ["1", "2", "5", "3", "6", "4"]}
```


In [33]:
# Accessing Claude models through openrouter.ai
from openai import OpenAI
model_name = "google/gemini-3-pro-preview"
client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key= openrouter_api_key,
)

completion = client.chat.completions.create(
  extra_headers={
    "HTTP-Referer": "https://openrouter.ai", # Optional. Site URL for rankings on openrouter.ai.
    "X-Title": "OpenRouter", # Optional. Site title for rankings on openrouter.ai.
  },
  model=model_name,
  messages=judge_messages
)
result = completion.choices[0].message.content
print(result)

{"results": ["2", "5", "6", "3", "4", "1"]}


<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/exercise.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Exercise</h2>
            <span style="color:#ff7800;">Which pattern(s) did this use? Try updating this to add another Agentic design pattern.
            </span>
        </td>
    </tr>
</table>

ORCHESTRATOR-WORKER and EVALUATOR-OPTIMIZER (partially)

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/business.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#00bfff;">Commercial implications</h2>
            <span style="color:#00bfff;">These kinds of patterns - to send a task to multiple models, and evaluate results,
            are common where you need to improve the quality of your LLM response. This approach can be universally applied
            to business projects where accuracy is critical.
            </span>
        </td>
    </tr>
</table>