Working on a world model for rare road conditions — hailstorms, unusual surface scatter, edge-case visibility. The consistent failure we're hitting: model receives image context for one of these scenarios and produces a description that's technically plausible but loses scene specificity. "Reduced visibility conditions" covers hail, fog, wet road, and dusk equally. Not useful when the downstream task requires distinguishing them.
The failure isn't dramatic hallucination. It's conflation — model fills in from base rate expectations rather than grounding strictly in what's visible. For common scenes this is fine. For rare hazards it's the whole problem.
Built training data specifically for this. One row from DinoDS Lane 26 (Image Context Understanding):
{"sample_id": "lane_26_spec_example_01", "mode": "quick", "tone": "professional", "intent_family": "info_retrieval", "intent_subtype": "object_description", "image_context": {"mode": "photo_upload", "summary": "A dining table with a phone and two bowls."}, "user_message": "What do you see here?", "assistant_response": "It looks like a dining table with a smartphone and two bowls on it."}
The output contract: describe exactly what image_context supports, mention specific objects and attributes only if present, ask a minimal clarifying question if context is insufficient. No invented detail, no filling the gap from priors.
This is a straightforward scene in the example — the rare-hazard rows in the full bundle operate on the same constraint but with sparser, more ambiguous image_context inputs. Still working out how well coverage on common scenes generalizes to edge-case visual conditions vs. needing direct rare-hazard training rows. Probably the latter.
Dataset exists. Full bundle in DinoLab.
Working on a world model for rare road conditions — hailstorms, unusual surface scatter, edge-case visibility. The consistent failure we're hitting: model receives image context for one of these scenarios and produces a description that's technically plausible but loses scene specificity. "Reduced visibility conditions" covers hail, fog, wet road, and dusk equally. Not useful when the downstream task requires distinguishing them.
The failure isn't dramatic hallucination. It's conflation — model fills in from base rate expectations rather than grounding strictly in what's visible. For common scenes this is fine. For rare hazards it's the whole problem.
Built training data specifically for this. One row from DinoDS Lane 26 (Image Context Understanding):
{"sample_id": "lane_26_spec_example_01", "mode": "quick", "tone": "professional", "intent_family": "info_retrieval", "intent_subtype": "object_description", "image_context": {"mode": "photo_upload", "summary": "A dining table with a phone and two bowls."}, "user_message": "What do you see here?", "assistant_response": "It looks like a dining table with a smartphone and two bowls on it."}The output contract: describe exactly what image_context supports, mention specific objects and attributes only if present, ask a minimal clarifying question if context is insufficient. No invented detail, no filling the gap from priors.
This is a straightforward scene in the example — the rare-hazard rows in the full bundle operate on the same constraint but with sparser, more ambiguous image_context inputs. Still working out how well coverage on common scenes generalizes to edge-case visual conditions vs. needing direct rare-hazard training rows. Probably the latter.
Dataset exists. Full bundle in DinoLab.