This is a proposed breaking change API for Genkit to streamline the most common scenarios while keeping the flexibility and capability level constant. The changes can be broken down into three components:
- Encouraging default model configurations
- Streamlining generation to return data directly instead of returning a wrapping response
- Separating out multi-turn and single-turn use cases
Default Model Configurations
While one of the strengths of Genkit is the ability to easily swap between multiple models, we find in practice that most people use a single model as their "go-to" with other models swapped in as needed. The same goes for model configuration -- most of the time you're going to want the same settings.
Proposed is to encourage setting a default model (now just called model) when initializing Genkit as well as the ability to define model settings when instantiating a reference to a model.
import { genkit } from "genkit";
import { vertexAI } from "@genkit-ai/vertexAI";
const ai = genkit({
plugins: [vertexAI()],
// sets a default model with configuration
model: vertexAI.geminiModel('gemini-1.5-flash', {safetySettings: [...]});
});
const claude = vertexAI.anthropicModel('claude-3.5-sonnet', {...claudeSettings});
Both model and configuration can still be overridden at call time, but this makes it easier to set a common reusable baseline.
Streamlining Generation
Most of the time, what you want from a generate() call is the data that is being generated. Today this requires a two-line "get response, get output from response" pattern which gets tedious when working with e.g. multi-step processes.
Proposed is to simplify to a generate API that will return text or structured data depending on call configuration:
const jokeText = await ai.generate("Tell a funny joke.");
const fakePerson = await ai.generate({
prompt: "Generate the information for an imaginary person named Annaka",
schema: z.object({name: z.string(), job: z.string(), hobbies: z.array(z.string())}),
});
This can get more complex if you want it to:
const jokeAdvanced = await ai.generate({
model: gpt,
config: {...},
prompt: {role: "user", content: [{text: "Tell a funny joke."}],
});
When developers do want to dig into the metadata of the response, they can use a new generateResponse method which will be equivalent to generate today.
const jokeResponse = await ai.generateResponse("Tell a funny joke.");
console.log(jokeResponse.text());
console.log(jokeResponse.stopReason);
Streaming will be supported through streamGenerate and streamGenerateResponse. When doing streamGenerate, the chunks emitted will be in output form (either a partial data response or a string chunk):
const {stream, data} = ai.streamGenerate("Tell a really long joke with at least 5 paragraphs.");
for await (const chunk of stream) {
console.log(chunk); // chunk is just a string
}
console.log(await data); // this is the full result, equiavalent to `generate()`
const {stream, response} = ai.streamGenerateResponse(...);
for await (const chunk of stream) {
console.log(chunk.text()); // chunk is a Chunk instance
}
console.log((await response).usage);
Multi-Turn Generation
All of the above is great if you only have a single turn generation, but it doesn't really help for a chatbot scenario. Fundamentally multi-turn use cases are pretty different and deserve better attention in the API surface.
Proposed is a new Chat class and a new send() method that lets you explicitly opt-in to multi-turn conversational use cases.
const chat = ai.chat({
system: "You are a pirate.",
});
const response = await chat.send("How are you today?");
console.log(reply);
// "Yarr, not too bad, matey. How be ye?"
const {stream, data} = await chat.streamSend("Tell me a long story, ye scurvy sea dog!");
chat.messages(); // equivalent to `toHistory()` in current Genkit
This is a proposed breaking change API for Genkit to streamline the most common scenarios while keeping the flexibility and capability level constant. The changes can be broken down into three components:
Default Model Configurations
While one of the strengths of Genkit is the ability to easily swap between multiple models, we find in practice that most people use a single model as their "go-to" with other models swapped in as needed. The same goes for model configuration -- most of the time you're going to want the same settings.
Proposed is to encourage setting a default model (now just called
model) when initializing Genkit as well as the ability to define model settings when instantiating a reference to a model.Both model and configuration can still be overridden at call time, but this makes it easier to set a common reusable baseline.
Streamlining Generation
Most of the time, what you want from a
generate()call is the data that is being generated. Today this requires a two-line "get response, get output from response" pattern which gets tedious when working with e.g. multi-step processes.Proposed is to simplify to a
generateAPI that will return text or structured data depending on call configuration:This can get more complex if you want it to:
When developers do want to dig into the metadata of the response, they can use a new
generateResponsemethod which will be equivalent togeneratetoday.Streaming will be supported through
streamGenerateandstreamGenerateResponse. When doingstreamGenerate, the chunks emitted will be in output form (either a partial data response or a string chunk):Multi-Turn Generation
All of the above is great if you only have a single turn generation, but it doesn't really help for a chatbot scenario. Fundamentally multi-turn use cases are pretty different and deserve better attention in the API surface.
Proposed is a new
Chatclass and a newsend()method that lets you explicitly opt-in to multi-turn conversational use cases.