- 
                Notifications
    You must be signed in to change notification settings 
- Fork 11.5k
Rewrite the Cookbook introduction to make it more accessible #2163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Open
      
      
            paytonison
  wants to merge
  8
  commits into
  openai:main
  
    
      
        
          
  
    
      Choose a base branch
      
     
    
      
        
      
      
        
          
          
        
        
          
            
              
              
              
  
           
        
        
          
            
              
              
           
        
       
     
  
        
          
            
          
            
          
        
       
    
      
from
paytonison:intro-rewrite
  
      
      
   
  
    
  
  
  
 
  
      
    base: main
Could not load branches
            
              
  
    Branch not found: {{ refName }}
  
            
                
      Loading
              
            Could not load tags
            
            
              Nothing to show
            
              
  
            
                
      Loading
              
            Are you sure you want to change the base?
            Some commits from the old base branch may be removed from the timeline,
            and old review comments may become outdated.
          
          
      
        
          +203
        
        
          −170
        
        
          
        
      
    
  
  
     Open
                    Changes from all commits
      Commits
    
    
            Show all changes
          
          
            8 commits
          
        
        Select commit
          Hold shift + click to select a range
      
      509cc2b
              
                Delete old article, add new article
              
              
                paytonison 20fd37c
              
                Update registry and author list
              
              
                paytonison 286ef54
              
                Update registry.yaml
              
              
                paytonison bbe0233
              
                Update authors.yaml
              
              
                paytonison b88361c
              
                Change authors contact link from Twitter to a more reputable site.
              
              
                paytonison bed61ff
              
                m
              
              
                paytonison 2dc7ba5
              
                Revert "Update avatar and website url for my authors.yaml entry. (#21…
              
              
                paytonison 80e79fe
              
                Update contact info and clean up stuff
              
              
                paytonison File filter
Filter by extension
Conversations
          Failed to load comments.   
        
        
          
      Loading
        
  Jump to
        
          Jump to file
        
      
      
          Failed to load files.   
        
        
          
      Loading
        
  Diff view
Diff view
There are no files selected for viewing
This file was deleted.
      
      Oops, something went wrong.
      
    
  
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,184 @@ | ||
| --- | ||
| title: "LLMs 101: A Practical Introduction" | ||
| description: "A hands-on, code-first introduction to large language models for Cookbook readers." | ||
| last_updated: "2025-08-24" | ||
| --- | ||
|  | ||
| # LLMs 101: A Practical Introduction | ||
|  | ||
| > **Who this is for.** Developers who want a fast, working understanding of large language models and the knobs that matter in real apps. | ||
|  | ||
| ## At a glance | ||
|  | ||
| ``` | ||
| Text prompt | ||
| ↓ (tokenization) | ||
| Tokens → Embeddings → [Transformer layers × N] → Next‑token probabilities | ||
| ↓ ↓ | ||
| Detokenization Sampling (temperature/top_p) → Output text | ||
| ``` | ||
|  | ||
| - **LLMs** are neural networks (usually **transformers**) trained on lots of text to predict the next token. | ||
| - **Tokenization** splits text into subword units; **embeddings** map tokens to vectors; transformer layers build context‑aware representations. | ||
| - Generation repeats next‑token sampling until a stop condition (length or stop sequences) is met. | ||
|  | ||
| --- | ||
|  | ||
| ## Quick start: generate text | ||
|  | ||
| ### Python | ||
|  | ||
| ```python | ||
| from openai import OpenAI | ||
|  | ||
| client = OpenAI() | ||
| resp = client.responses.create( | ||
| model="gpt-4o", | ||
| instructions="You are a concise technical explainer.", | ||
| input="In one paragraph, explain what a token is in an LLM." | ||
| ) | ||
| print(resp.output_text) | ||
| ``` | ||
|  | ||
| ### JavaScript / TypeScript | ||
|  | ||
| ```js | ||
| import OpenAI from "openai"; | ||
| const client = new OpenAI(); | ||
|  | ||
| const resp = await client.chat.completions.create({ | ||
| model: "gpt-4o", | ||
| messages: [ | ||
| { role: "system", content: "You are a concise technical explainer." }, | ||
| { role: "user", content: "In one paragraph, explain what a token is in an LLM." } | ||
| ] | ||
| }); | ||
| console.log(resp.choices[0].message.content); | ||
| ``` | ||
|  | ||
| > **Tip.** Model names evolve; check your Models list before shipping. Prefer streaming for chat‑like UIs (see below). | ||
|  | ||
| --- | ||
|  | ||
| ## What can LLMs do? | ||
|  | ||
| Despite the name, LLMs can be **multi‑modal** when models and inputs support it (text, code, sometimes images/audio). Core text tasks: | ||
|  | ||
| - **Generate**: draft, rewrite, continue, or brainstorm. | ||
| - **Transform**: translate, rephrase, format, classify, extract. | ||
| - **Analyze**: summarize, compare, tag, or answer questions. | ||
| - **Tool use / agents**: call functions or APIs as part of a loop to act. | ||
|  | ||
| These patterns compose into search, assistants, form‑fillers, data extraction, QA, and more. | ||
|  | ||
| --- | ||
|  | ||
| ## How LLMs work (just enough to be dangerous) | ||
|  | ||
| 1. **Tokenization.** Input text → tokens (IDs). Whitespace and punctuation matter—“token‑budget math” is a real constraint. | ||
| 2. **Embeddings.** Each token ID becomes a vector; positions are encoded so order matters. | ||
| 3. **Transformer layers.** Self‑attention mixes information across positions so each token’s representation becomes **contextual** (richer than the raw embedding). | ||
| 4. **Decoding.** The model outputs a probability distribution over the next token. | ||
| 5. **Sampling.** Choose how “adventurous” generation is (see knobs below), append the token, and repeat until done. | ||
|  | ||
| --- | ||
|  | ||
| ## The knobs you’ll touch most | ||
|  | ||
| - **Temperature** *(0.0–2.0)* — Lower → more deterministic/boring; higher → more diverse/creative. | ||
| - **Top‑p (nucleus)** *(0–1)* — Sample only from the smallest set of tokens whose cumulative probability ≤ *p*. | ||
| - **Max output tokens** — Hard limit on output length; controls latency and cost. | ||
| - **System / instructions** — Up‑front role, constraints, and style to steer behavior. | ||
| - **Stop sequences** — Cleanly cut off output at known boundaries. | ||
| - **Streaming** — Receive tokens as they’re generated; improves perceived latency. | ||
|  | ||
| **Practical defaults:** `temperature=0.2–0.7`, `top_p=1.0`, set a **max output** that fits your UI, and **stream** by default for chat UX. | ||
|  | ||
| --- | ||
|  | ||
| ## Make context do the heavy lifting | ||
|  | ||
| - **Context window.** Inputs + outputs share a finite token budget; plan prompts and retrieval to fit. | ||
| - **Ground with your data (RAG).** Retrieve relevant snippets and include them in the prompt to improve factuality. | ||
| - **Structured outputs.** Ask for JSON (and validate) when you need machine‑readable results. | ||
| - **Few‑shot examples.** Provide 1–3 compact exemplars to stabilize format and tone. | ||
|  | ||
| --- | ||
|  | ||
| ## Minimal streaming example | ||
|  | ||
| ### Python | ||
|  | ||
| ```python | ||
| from openai import OpenAI | ||
| client = OpenAI() | ||
|  | ||
| with client.responses.stream( | ||
| model="gpt-4o", | ||
| input="Stream a two-sentence explanation of context windows." | ||
| ) as stream: | ||
| for event in stream: | ||
| if event.type == "response.output_text.delta": | ||
| print(event.delta, end="") | ||
|         
                  paytonison marked this conversation as resolved.
              Show resolved
            Hide resolved | ||
| ``` | ||
|  | ||
| ### JavaScript | ||
|  | ||
| ```js | ||
| import OpenAI from "openai"; | ||
| const client = new OpenAI(); | ||
|  | ||
| const stream = await client.responses.stream({ | ||
| model: "gpt-4o", | ||
| input: "Stream a two-sentence explanation of context windows." | ||
| }); | ||
|  | ||
| for await (const event of stream) { | ||
| if (event.type === "response.output_text.delta") { | ||
| process.stdout.write(event.delta); | ||
|         
                  paytonison marked this conversation as resolved.
              Show resolved
            Hide resolved | ||
| } | ||
| } | ||
| ``` | ||
|  | ||
| --- | ||
|  | ||
| ## Limitations (design around these) | ||
|  | ||
| - **Hallucinations.** Models can generate plausible but false statements. Ground with citations/RAG; validate critical outputs. | ||
| - **Recency.** Models don’t inherently know the latest facts; retrieve or provide current data. | ||
| - **Ambiguity.** Vague prompts → vague answers; specify domain, audience, length, and format. | ||
| - **Determinism.** Even at `temperature=0`, responses may vary across runs/envs. Don’t promise bit‑for‑bit reproducibility. | ||
| - **Cost & latency.** Longer prompts and bigger models are slower and costlier; iterate toward the smallest model that meets quality. | ||
|  | ||
| --- | ||
|  | ||
| ## Common gotchas | ||
|  | ||
| - **Characters ≠ tokens.** Budget both input and output to avoid truncation. | ||
| - **Over‑prompting.** Prefer simple, testable instructions; add examples sparingly. | ||
| - **Leaky formats.** If you need JSON, enforce it (schema + validators) and add a repair step. | ||
| - **One prompt for everything.** Separate prompts per task/endpoint; keep them versioned and testable. | ||
| - **Skipping evaluation.** Keep a tiny dataset of real tasks; score changes whenever you tweak prompts, models, or retrieval. | ||
|  | ||
| --- | ||
|  | ||
| ## Glossary | ||
|  | ||
| - **Token** — Small unit of text (≈ subword) used by models. | ||
| - **Embedding** — Vector representation of a token or text span. | ||
| - **Context window** — Max tokens the model can attend to at once (prompt + output). | ||
| - **Temperature / top‑p** — Randomness controls during sampling. | ||
| - **System / instructions** — Up‑front guidance that shapes responses. | ||
| - **RAG** — Retrieval‑Augmented Generation; retrieve data and include it in the prompt. | ||
|  | ||
| --- | ||
|  | ||
| ## Where to go next | ||
|  | ||
| - Prompt patterns for **structured outputs** | ||
| - **Retrieval‑augmented generation (RAG)** basics | ||
| - **Evaluating** LLM quality (offline + online) | ||
| - **Streaming UX** patterns and backpressure handling | ||
| - **Safety** and policy‑aware prompting | ||
|  | ||
| > Adapted from a shorter draft and expanded with code-first guidance. | ||
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Uh oh!
There was an error while loading. Please reload this page.