Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dspy/README.md at main · stanfordnlp/dspy #734

Open
1 task
irthomasthomas opened this issue Mar 16, 2024 · 1 comment
Open
1 task

dspy/README.md at main · stanfordnlp/dspy #734

irthomasthomas opened this issue Mar 16, 2024 · 1 comment
Labels
AI-Chatbots Topics related to advanced chatbot platforms integrating multiple AI models ai-leaderboards leaderdoards for llm's and other ml models base-model llm base models not finetuned for chat chat-templates llm prompt templates for chat models few-shot-learning Examples of few-shot prompts for in-context learning. finetuning Tools for finetuning of LLMs e.g. SFT or RLHF Knowledge-Dataset llm Large Language Models llm-applications Topics related to practical applications of Large Language Models in various fields llm-benchmarks testing and benchmarking large language models llm-completions large language models for completion tasks, e.g. copilot llm-evaluation Evaluating Large Language Models performance and behavior through human-written evaluation sets llm-experiments experiments with large language models llm-function-calling Function Calling with Large Language Models llm-inference-engines Software to run inference on large language models llm-serving-optimisations Tips, tricks and tools to speedup inference of large language models Models LLM and ML model repos and links multimodal-llm LLMs that combine modes such as text and image recognition. openai OpenAI APIs, LLMs, Recipes and Evals Papers Research papers programming-languages Topics related to programming languages and their features. prompt Collection of llm prompts and notes prompt-engineering Developing and optimizing prompts to efficiently use language models for various applications and re

Comments

@irthomasthomas
Copy link
Owner

dspy/README.md at main · stanfordnlp/dspy

DSPy: Programming—not prompting—Foundation Models

[Oct'23] DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
[Jan'24] In-Context Learning for Extreme Multi-Label Classification
[Dec'23] DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines
[Dec'22] Demonstrate-Search-Predict: Composing Retrieval & Language Models for Knowledge-Intensive NLP

Getting Started:  

Documentation: DSPy Docs


DSPy is a framework for algorithmically optimizing LM prompts and weights, especially when LMs are used one or more times within a pipeline. To use LMs to build a complex system without DSPy, you generally have to: (1) break the problem down into steps, (2) prompt your LM well until each step works well in isolation, (3) tweak the steps to work well together, (4) generate synthetic examples to tune each step, and (5) use these examples to finetune smaller LMs to cut costs. Currently, this is hard and messy: every time you change your pipeline, your LM, or your data, all prompts (or finetuning steps) may need to change.

To make this more systematic and much more powerful, DSPy does two things. First, it separates the flow of your program (modules) from the parameters (LM prompts and weights) of each step. Second, DSPy introduces new optimizers, which are LM-driven algorithms that can tune the prompts and/or the weights of your LM calls, given a metric you want to maximize.

DSPy can routinely teach powerful models like GPT-3.5 or GPT-4 and local models like T5-base or Llama2-13b to be much more reliable at tasks, i.e. having higher quality and/or avoiding specific failure patterns. DSPy optimizers will "compile" the same program into different instructions, few-shot prompts, and/or weight updates (finetunes) for each LM. This is a new paradigm in which LMs and their prompts fade into the background as optimizable pieces of a larger system that can learn from data. tldr; less prompting, higher scores, and a more systematic approach to solving hard tasks with LMs.

Table of Contents

If you need help thinking about your task, we recently created a Discord server for the community.

  1. Installation
  2. Tutorials & Documentation
  3. Framework Syntax
  4. Compiling: Two Powerful Concepts
  5. Pydantic Types
  6. FAQ: Is DSPy right for me?

Analogy to Neural Networks

When we build neural networks, we don't write manual for-loops over lists of hand-tuned floats. Instead, you might use a framework like PyTorch to compose declarative layers (e.g., Convolution or Dropout) and then use optimizers (e.g., SGD or Adam) to learn the parameters of the network.

Ditto! DSPy gives you the right general-purpose modules (e.g., ChainOfThought, ReAct, etc.), which replace string-based prompting tricks. To replace prompt hacking and one-off synthetic data generators, DSPy also gives you general optimizers (BootstrapFewShotWithRandomSearch or BayesianSignatureOptimizer), which are algorithms that update parameters in your program. Whenever you modify your code, your data, your assertions, or your metric, you can compile your program again and DSPy will create new effective prompts that fit your changes.

Mini-FAQs

What do DSPy optimizers tune? Each optimizer is different, but they all seek to maximize a metric on your program by updating prompts or LM weights. Current DSPy optimizers can inspect your data, simulate traces through your program to generate good/bad examples of each step, propose or refine instructions for each step based on past results, finetune the weights of your LM on self-generated examples, or combine several of these to improve quality or cut cost. We'd love to merge new optimizers that explore a richer space: most manual steps you currently go through for prompt engineering, "synthetic data" generation, or self-improvement can probably generalized into a DSPy optimizer that acts on arbitrary LM programs.

How should I use DSPy for my task? Using DSPy is an iterative process. You first define your task and the metrics you want to maximize, and prepare a few example inputs — typically without labels (or only with labels for the final outputs, if your metric requires them). Then, you build your pipeline by selecting built-in layers (modules) to use, giving each layer a signature (input/output spec), and then calling your modules freely in your Python code. Lastly, you use a DSPy optimizer to compile your code into high-quality instructions, automatic few-shot examples, or updated LM weights for your LM.

What if I have a better idea for prompting or synthetic data generation? Perfect. We encourage you to think if it's best expressed as a module or an optimizer, and we'd love to merge it in DSPy so everyone can use it. DSPy is not a complete project; it's an ongoing effort to create structure (modules and optimizers) in place of hacky prompt and pipeline engineering tricks.

What does DSPy stand for? It's a long story but the backronym now is Declarative Self-improving Language Programs, pythonically.

1) Installation

All you need is:

pip install dspy-ai

Or open our intro notebook in Google Colab:

By default, DSPy installs the latest openai from pip. However, if you install old version before OpenAI changed their API openai~=0.28.1, the library will use that just fine. Both are supported.

For the optional (alphabetically sorted) Chromadb, Qdrant, Marqo, Pinecone, or Weaviate retrieval integration(s), include the extra(s) below:

pip install dspy-ai[chromadb]  # or [qdrant] or [marqo] or [mongodb] or [pinecone] or [weaviate]

2) Documentation

The DSPy documentation is divided into tutorials (step-by-step illustration of solving a task in DSPy), guides (how to use specific parts of the API), and examples (self-contained programs that illustrate usage).

A) Tutorials

Level Tutorial Run in Colab Description
Beginner Getting Started Introduces the basic building blocks in DSPy. Tackles the task of complex question answering with HotPotQA.
Beginner Minimal Working Example N/A Builds and optimizes a very simple chain-of-thought program in DSPy for math question answering. Very short.
Beginner Compiling for Tricky Tasks N/A Teaches LMs to reason about logical statements and negation. Uses GPT-4 to bootstrap few-shot CoT demonstations for GPT-3.5. Establishes a state-of-the-art result on ScoNe. Contributed by Chris Potts.
Beginner Local Models & Custom Datasets Illustrates two different things together: how to use local models (Llama-2-13B in particular) and how to use your own data examples for training and development.
Intermediate The DSPy Paper N/A Sections 3, 5, 6, and 7 of the DSPy paper can be consumed as a tutorial. They include explained code snippets, results, and discussions of the abstractions and API.
Intermediate DSPy Assertions Introduces example of applying DSPy Assertions while generating long-form responses to questions with citations. Presents comparative evaluation in both zero-shot and compiled settings.
Intermediate Finetuning for Complex Programs Teaches a local T5 model (770M) to do exceptionally well on HotPotQA. Uses only 200 labeled answers. Uses no hand-written prompts, no calls to OpenAI, and no labels for retrieval or reasoning.
Advanced Information Extraction Tackles extracting information from long articles (biomedical research papers). Combines in-context learning and retrieval to set SOTA on BioDEX. Contributed by Karel D’Oosterlinck.

Other resources people find useful:

B) Guides

If you're new to DSPy, it's probably best to go in sequential order. You will probably refer to these guides frequently after that, e.g. to copy/paste snippets that you can edit for your own DSPy programs.

  1. Language Models
  2. Signatures
  3. Modules
  4. Data
  5. Metrics
  6. Optimizers (formerly Teleprompters)
  7. DSPy Assertions

C) Examples

The DSPy team believes complexity has to be justified. We take this seriously: we never release a complex tutorial (above) or example (below) unless we can demonstrate empirically that this complexity has generally led to improved quality or cost. This kind of rule is rarely enforced by other frameworks or docs, but you can count on it in DSPy examples.

There's a bunch of examples in the examples/ directory and in the top-level directory. We welcome contributions!

You can find other examples tweeted by @lateinteraction on Twitter/X.

Some other examples (not exhaustive, feel free to add more via PR):

There are also recent cool examples at Weaviate's DSPy cookbook by Connor Shorten. See tutorial on YouTube.

3) Syntax: You're in charge of the workflow—it's free-form Python code!

DSPy hides tedious prompt engineering, but it cleanly exposes the important decisions you need to make: [1] what's your

Suggested labels

@irthomasthomas irthomasthomas added AI-Chatbots Topics related to advanced chatbot platforms integrating multiple AI models ai-leaderboards leaderdoards for llm's and other ml models base-model llm base models not finetuned for chat chat-templates llm prompt templates for chat models few-shot-learning Examples of few-shot prompts for in-context learning. finetuning Tools for finetuning of LLMs e.g. SFT or RLHF Knowledge-Dataset llm Large Language Models llm-applications Topics related to practical applications of Large Language Models in various fields llm-benchmarks testing and benchmarking large language models llm-completions large language models for completion tasks, e.g. copilot llm-evaluation Evaluating Large Language Models performance and behavior through human-written evaluation sets llm-experiments experiments with large language models llm-function-calling Function Calling with Large Language Models llm-inference-engines Software to run inference on large language models llm-serving-optimisations Tips, tricks and tools to speedup inference of large language models Models LLM and ML model repos and links multimodal-llm LLMs that combine modes such as text and image recognition. openai OpenAI APIs, LLMs, Recipes and Evals Papers Research papers programming-languages Topics related to programming languages and their features. prompt Collection of llm prompts and notes prompt-engineering Developing and optimizing prompts to efficiently use language models for various applications and re prompt-tuning labels Mar 16, 2024
@irthomasthomas
Copy link
Owner Author

Related content

#706

Similarity score: 0.91

#660

Similarity score: 0.89

#626

Similarity score: 0.89

#494

Similarity score: 0.89

#546

Similarity score: 0.88

#324

Similarity score: 0.88

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI-Chatbots Topics related to advanced chatbot platforms integrating multiple AI models ai-leaderboards leaderdoards for llm's and other ml models base-model llm base models not finetuned for chat chat-templates llm prompt templates for chat models few-shot-learning Examples of few-shot prompts for in-context learning. finetuning Tools for finetuning of LLMs e.g. SFT or RLHF Knowledge-Dataset llm Large Language Models llm-applications Topics related to practical applications of Large Language Models in various fields llm-benchmarks testing and benchmarking large language models llm-completions large language models for completion tasks, e.g. copilot llm-evaluation Evaluating Large Language Models performance and behavior through human-written evaluation sets llm-experiments experiments with large language models llm-function-calling Function Calling with Large Language Models llm-inference-engines Software to run inference on large language models llm-serving-optimisations Tips, tricks and tools to speedup inference of large language models Models LLM and ML model repos and links multimodal-llm LLMs that combine modes such as text and image recognition. openai OpenAI APIs, LLMs, Recipes and Evals Papers Research papers programming-languages Topics related to programming languages and their features. prompt Collection of llm prompts and notes prompt-engineering Developing and optimizing prompts to efficiently use language models for various applications and re
Projects
None yet
Development

No branches or pull requests

1 participant