# Prototyping with LLM
Before building with large language models, most of my experience with them came from direct interaction: asking questions, refining prompts, and getting surprisingly fluent responses. In those moments, LLMs felt powerful and flexible. However, when I started thinking about using an LLM as part of a real product rather than as a personal assistant, my expectations had to change.
A product needs consistency, predictability, and clear boundaries. It also needs to support users who may not know how to prompt well or even what to ask. My goal was to prototype an AI-supported learning workflow where the model would help users reflect on their work, generate study guidance, or scaffold thinking rather than simply provide answers. On paper, this seemed like a natural fit for an LLM.
The challenge was to move from “this works when I talk to it” to “this works reliably for someone else.”


### Testing the Idea: Prototyping with LLMs in a Workflow

To test this, I began embedding LLM outputs into structured workflows. Instead of open-ended chat, I constrained the model with specific roles, instructions, and expected formats. For example, I asked it to analyze student responses, identify misconceptions, or generate targeted reflection questions based on predefined learning goals.

I also tested the same prompt across multiple inputs to see how stable the outputs were. In theory, if the instructions were clear enough, the model should behave consistently. I treated these prompts almost like product logic: if the input looks like this, the output should follow a predictable pattern.
As I iterated, I realized that many prompts that worked perfectly in isolation started to break when placed inside a system.


![AI Workflow](product.png)

### The Limits I Ran Into

The first thing that broke was consistency. Even with carefully written prompts, the model’s responses varied more than I expected. Sometimes the tone shifted. Sometimes the depth of explanation changed. For a human reader, these differences might feel minor, but in a product context, they created uneven user experiences.
The second issue was overhelpfulness. When the model was designed to support learning, it often gave too much away. Instead of scaffolding thinking, it sometimes jumped straight to polished explanations. This undermined the original learning goal and made it difficult to control cognitive load. What felt helpful in a demo became problematic in a real educational workflow.

Another thing that broke was assumption alignment. The model frequently made assumptions about user intent that were not always correct. If a student response was ambiguous, the AI might confidently interpret it one way, even when multiple interpretations were possible. In a product, this kind of confident guessing can mislead users rather than support them.

Finally, prompt fragility became clear. Small changes in wording, input length, or user tone sometimes led to disproportionately different outputs. This meant the system was more sensitive than expected, which made it hard to design robust guardrails.


### Reflection:
This experience changed how I think about LLMs in products. I no longer see them as plug-and-play intelligence. Instead, they feel more like probabilistic components that require careful framing, monitoring, and constraint. What works in a chat interface does not automatically translate into a reliable system.
I also realized that many of the failures were not technical but conceptual. I initially treated the model as if it understood my design intentions. In reality, it only responds to what is explicitly encoded in prompts and structure. Any ambiguity I left behind showed up later as unexpected behavior.
Perhaps most importantly, this process highlighted the need to design not just for what the AI can do, but for what users need. In learning contexts especially, more fluent output is not always better. Sometimes the most valuable role for AI is to slow things down, ask better questions, or surface uncertainty rather than resolve it.
In the end, what broke was my assumption that intelligence alone makes a good product. Building with LLMs forced me to confront the gap between impressive demos and meaningful, responsible design. That gap is not a failure of the technology, but a reminder that products are systems, not conversations.
