- 🌎 Multimodal: Interact with images, videos, audio, and more, built into the model.
- 🌐 Contextual Conversations: Chat with Gemini, built in.
- 🧪 Simple Parameters: Easily modify
temperature
,topP
, and more - ⛓️ Streaming: Get AI output the second it's available.
- 🔒 Typesafe: Types built in. Gemini AI is written with TypeScript
Why should I use this, instead of Google's own API?
It's all about simplicity. Gemini AI allows you to make requests to Gemini at just about a quarter of the code necessary with Google's API.
Don't believe me? Take a look.
Google's own API (CommonJS):
const { GoogleGenerativeAI } = require("@google/generative-ai");
const genAI = new GoogleGenerativeAI(API_KEY);
async function run() {
const model = genAI.getGenerativeModel({ model: "gemini-1.5-pro-latest" });
const prompt = "Hi!";
const result = await model.generateContent(prompt);
const response = await result.response;
const text = response.text();
console.log(text);
}
run();
Gemini AI (ES6 Modules):
import Gemini from "gemini-ai";
const gemini = new Gemini(API_KEY);
console.log(await gemini.ask("Hi!"));
That's nearly 4 times less code!
There's also more...
- ⚡ Native REST API: Simplicity without compromise.
- 📝 Optimized File Uploads: Automatically uses Google's File API when necessary
- 📁 Automatic File Type Detection: Gemini AI will detect MIME types of files automatically
- 🧩 Automatic Request Creation: Auto-formats your requests—So you don't have to.
Install with the following command, or the command for your favorite package manager.
npm install gemini-ai
Gemini AI is a pure ES6 Module, which means you will have to use it with import
. It is recommended that your project is also ES6, but look in the FAQ for a CJS (require()
) workaround.
- Go to Google AI Studio's API keys tab
- Follow the steps to get an API key
- Copy this key, and use it below when
API_KEY
is mentioned.
Warning
Do not share this key with other people! It is recommended to store it in a .env
file.
Make a text request:
import Gemini from "gemini-ai";
const gemini = new Gemini(API_KEY);
console.log(await gemini.ask("Hi!"));
Make a streaming text request:
import Gemini from "gemini-ai";
const gemini = new Gemini(API_KEY);
gemini.ask("Hi!", {
stream: console.log,
});
Chat with Gemini:
import Gemini from "gemini-ai";
const gemini = new Gemini(API_KEY);
const chat = gemini.createChat();
console.log(await chat.ask("Hi!"));
console.log(await chat.ask("What's the last thing I said?"));
Make a text request with images:
import fs from "fs";
import Gemini from "gemini-ai";
const gemini = new Gemini(API_KEY);
console.log(
await gemini.ask(["What do you see?", fs.readFileSync("./cat.png")])
);
Make a text request with custom parameters:
import Gemini from "gemini-ai";
const gemini = new Gemini(API_KEY);
console.log(
await gemini.ask("Hello!", {
temperature: 0.5,
topP: 1,
})
);
Embed text:
import fs from "fs";
const gemini = new Gemini(API_KEY);
console.log(await gemini.embed("Hi!"));
Here's a quick demo:
import Gemini from "gemini-ai";
const gemini = new Gemini(API_KEY);
gemini.ask("Write an essay", {
stream: (x) => process.stdout.write(x),
});
Let's walk through what this code is doing. Like always, we first initialize Gemini
. Then, we call the ask
function, and provide a stream
config. This callback will be invoked whenever there is new content coming in from Gemini!
Note that this automatically cuts to the streamContentGenerate
command... you don't have to worry about that!
Tip
Realize that you don't need to call ask
async if you're handling stream management on your own. If you want to tap the final answer, it still is returned by the method, and you call it async as normal.
Gemini AI v2 is completely written in TypeScript, which means that all parameters, and more importantly configuration, have type hints.
Furthermore, return types are also conditional based on what format
you place in the configuration to guarentee great DX.
Google requires large files to be sent through their dedicated File API, instead of being included directly in the POST
request.
With Gemini AI, your uploads are automatically optimized so that when it's necessary, your files are routed through the File API, but otherwise, it sends them inline, for peak performance.
Here's how Gemini AI decides which files to send through the File API:
- All videos are automatically uploaded via Files API (because Gemini wouldn't accept them otherwise)
- If all of your files combined are under 20MB (Google-set limit), all non-videos will be included as
inline_data
, which is the faster method - If all of your files combined are over 20MB, all files will be uploaded via File API
This ensures the fastest file upload experience, while ensuring all your files are safely included.
Gemini also automatically detects the MIME type of your file to pass to the server, so you don't need to worry about it.
Use a proxy when fetching from Gemini. To keep package size down and adhere to the SRP, the actual proxy handling is delegated to the undici library.
Here's how to add a proxy:
Install undici
:
npm i undici
Initialize it with Gemini AI:
import { ProxyAgent } from "undici";
import Gemini from "gemini-ai";
let gemini = new Gemini(API_KEY, {
dispatcher: new ProxyAgent(PROXY_URL),
});
And use as normal!
To start any project, include the following lines:
Note
Under the hood, we are just running the Gemini REST API, so there's no fancy authentication going on! Just pure, simple web requests.
// Import Gemini AI
import Gemini from "gemini-ai";
// Initialize your key
const gemini = new Gemini(API_KEY);
Learn how to add a fetch
polyfill for the browser here.
All model calling methods have a main parameter first (typically the text as input), and a config
second, as a JSON. A detailed list of all config can be found along with the method. An example call of a function may look like this:
await gemini.ask("Hi!", {
// Config
temperature: 0.5,
topP: 1,
});
Tip
All methods (except Gemini.createChat()
) are async! This means you should call them something like this: await gemini.ask(...)
You have the option to set format to Gemini.JSON
await gemini.ask("Hi!", {
format: Gemini.JSON,
});
This gives you the full response from Gemini's REST API.
Note that the output to Gemini.JSON
varies depending on the model and command, and is not documented here in detail due to the fact that it is unnecessary to use in most scenarios. You can find more information about the REST API's raw output here.
If you are using typescript, you get type annotations for all the responses, so autocomplete away.
This method uses the generateContent
command to get Gemini's response to your input.
The first parameter of the ask()
method can take in 3 different forms:
This is simply a text query to Gemini.
Example:
await gemini.ask("Hi!");
In this array, which represents ordered "parts", you can put strings, or Buffers (these are what you get directly from fs.readFileSync()
!). These will be fed, in order to Gemini.
Gemini accepts most major file formats, so you shouldn't have to worry about what format you give it. However, check out a comprehensive list here.
There's a whole ton of optimizations under the hood for file uploads too, but you don't have to worry about them! Learn more here...
Example:
import fs from "fs";
await gemini.ask([
"Between these two cookies, which one appears to be home-made, and which one looks store-bought? Cookie 1:",
fs.readFileSync("./cookie1.png"),
"Cookie 2",
fs.readFileSync("./cookie2.png"),
]);
Note
You can also place buffers in the data
field in the config (this is the v1 method, but it still works). These buffers will be placed, in order, directly after the content in the main message.
This is the raw message format. It is not meant to be used directly, but can be useful when needing raw control over file uploads, and it is also used internally by the Chat
class.
Please check src/types.ts
for more information about what is accepted in the Message
field.
Note
These are Google REST API defaults.
Field Name | Description | Default Value |
---|---|---|
format |
Whether to return the detailed, raw JSON output. Typically not recommended, unless you are an expert. Can either be Gemini.JSON or Gemini.TEXT |
Gemini.TEXT |
topP |
See Google's parameter explanations | 0.94 |
topK |
See Google's parameter explanations. Note that this field is not available on v1.5 models. | 32 |
temperature |
See Google's parameter explanations | 1 |
model |
gemini-1.5-flash-latest |
|
maxOutputTokens |
Max tokens to output | 2048 |
messages |
Array of [userInput, modelOutput] pairs to show how the bot is supposed to behave |
[] |
data |
An array of Buffer s to input to the model. It is recommended that you directly pass data through the message in v2. |
[] |
stream |
A function that is called with every new chunk of JSON or Text (depending on the format) that the model receives. Learn more | undefined |
safetySettings |
An object that specifies the blocking threshold for each safety rating dimension. Learn more | An object representing Google's defaults. Learn more |
systemInstruction |
Instruct what the model should act like (i.e. a persona, output format, style/tone, goals/rules, and additional context) | "" |
jsonSchema |
Make Gemini always output in a set JSON schema. Set to true to let Gemini pick the schema, or give a specific schema. Learn more |
undefined |
Example Usage:
import Gemini from "gemini-ai";
const gemini = new Gemini(API_KEY);
console.log(
await gemini.ask("Hello!", {
temperature: 0.5,
topP: 1,
})
);
Google categorizes the Gemini's response with 4 main categories of safety ratings. Here's an overview:
Category Name | Description | Gemini AI Field name |
---|---|---|
Harassment | Negative/Harmful comments towards someone | harassment |
Hate Speech | Rude/Disrespectful/Profane | hate |
Sexually Explicit | General sexual/lewd content | sexual |
Dangerous | Encourages harmful acts | dangerous |
Learn more at Google's official docs here.
In order to set each category, provide an object like this into the safetySettings
configuration option:
await gemini.ask("Hello!", {
safetySettings: {
hate: Gemini.SafetyThreshold.BLOCK_SOME,
sexual: Gemini.SafetyThreshold.BLOCK_SOME,
harassment: Gemini.SafetyThreshold.BLOCK_SOME,
dangerous: Gemini.SafetyThreshold.BLOCK_SOME,
},
});
Note that the names of the categories have been shortened, which you can reference above in the table.
You can assign 4 different thresholds (which are an enum under Gemini.SafetyThreshold
) of blocking content, listed here from the strictest to the least strict:
Enum Name | Google Internal Name | Simple Description (In Specified Safety Category) |
---|---|---|
Gemini.SafetyThreshold.BLOCK_MOST |
BLOCK_LOW_AND_ABOVE |
Blocks everything that is potentially unsafe |
Gemini.SafetyThreshold.BLOCK_SOME |
BLOCK_MEDIUM_AND_ABOVE |
Blocks moderately unsafe content (Default) |
Gemini.SafetyThreshold.BLOCK_FEW |
BLOCK_ONLY_HIGH |
Blocks only highly unsafe content |
Gemini.SafetyThreshold.BLOCK_NONE |
BLOCK_NONE |
Blocks nothing |
By Google's default, all categories are set to BLOCK_SOME
. Google also states that "Adjusting to lower safety settings will trigger a more indepth review process of your application."
Google allows you to force Gemini to reply in a specific JSON schema. Note that Gemini AI requires you to manually JSON.parse()
this output.
The following examples entail generating cookie recipies with Gemini returning an array of objects, each with a recipe_name
field.
This feature can be enabled simply by setting the jsonSchema
config to true
. It is ideal that you specify how you want the JSON to be shaped in your prompt, but it is not necessary.
await gemini.ask(
"List 5 popular cookie recipes. Give them as an array of objects, each with a recipe_name field.",
{
jsonSchema: true,
}
);
However, you can also set a specific JSON schema with code. Pass in a JSON schema object as follows into jsonSchema
:
await gemini.ask(
"List 5 popular cookie recipes. Give them as an array of objects, each with a recipe_name field.",
{
jsonSchema: {
type: Gemini.SchemaType.ARRAY,
items: {
type: Gemini.SchemaType.OBJECT,
properties: {
recipe_name: {
type: Gemini.SchemaType.STRING,
},
},
},
},
}
);
The available types are ARRAY
, OBJECT
, STRING
, NUMBER
, INTEGER
, BOOLEAN
, accessible in the Gemini.SchemaType
enum. Learn more about this syntax at Google's documentation.
Note
When you pass in this schema object into gemini-1.5-flash-latest
, it will be directly included as text after your prompt, wrapped in <JSONSchema>
tags. However, with gemini-1.5-pro-latest
, Gemini utilizes controlled generation/constrained decoding to force the output to be in your JSON schema. In other words, Gemini 1.5 Flash should be able to reasonably infer what you want to do, but in the cases where it still deviates from your schema, use Gemini 1.5 Pro to force it.
This method uses the countTokens
command to figure out the number of tokens in your input.
Config available:
Field Name | Description | Default Value |
---|---|---|
model |
Which model to use. Can be any model Google has available, but reasonably must be gemini-pro |
Automatic based on Context |
Example Usage:
import Gemini from "gemini-ai";
const gemini = new Gemini(API_KEY);
console.log(await gemini.count("Hello!"));
This method uses the embedContent
command (currently only on embedding-001
) to generate an embedding matrix for your input.
Config available:
Field Name | Description | Default Value |
---|---|---|
model |
Which model to use. Can be any model Google has available, but reasonably must be embedding-001 |
embedding-001 |
Example Usage:
import Gemini from "gemini-ai";
const gemini = new Gemini(API_KEY);
console.log(await gemini.embed("Hello!"));
Gemini.createChat()
is a unique method. For one, it isn't asynchronously called. Additionally, it returns a brand new Chat
object. The Chat
object only has one method, which is Chat.ask()
, which has the exact same syntax as the Gemini.ask()
method, documented above. The only small difference is that most parameters are passed into the Chat
through createChat()
, and cannot be overriden by the ask()
method. The only parameters that can be overridden is format
, stream
, data
, and jsonSchema
.
All important data in the Chat
object is stored in the Chat.messages
variable, and can be used to create a new Chat
that "continues" the conversation, as will be demoed in the example usage section.
Config available for createChat
:
Field Name | Description | Default Value |
---|---|---|
topP |
See Google's parameter explanations | 0.94 |
topK |
See Google's parameter explanations. Note that this field is not available on v1.5 models. | 10 |
temperature |
See Google's parameter explanations | 1 |
model |
gemini-1.5-flash-latest |
|
maxOutputTokens |
Max tokens to output | 2048 |
messages |
Array of [userInput, modelOutput] pairs to show how the bot is supposed to behave (or to continue a conversation), or a array of type Message[] , so you can directly input a previous chat.messages |
[] |
systemInstruction |
Instruct what the model should act like (i.e. a persona, output format, style/tone, goals/rules, and additional context) | "" |
Example Usage:
// Simple example:
import Gemini from "gemini-ai";
const gemini = new Gemini(API_KEY);
const chat = gemini.createChat();
console.log(await chat.ask("Hi!"));
// Now, you can start a conversation
console.log(await chat.ask("What's the last thing I said?"));
// "Continuing" a conversation:
import Gemini from "gemini-ai";
const gemini = new Gemini(API_KEY);
const chat = gemini.createChat();
console.log(await chat.ask("Hi!"));
// Creating a new chat, with existing messages
const newChat = gemini.createChat({
messages: chat.messages,
});
console.log(await newChat.ask("What's the last thing I said?"));
Common Questions:
- What's the difference between the
data
config and directly passing buffers in the message? - What do I need to update for Gemini AI v2?
- What is the default model?/Why is it the default model?
- How do I change the API version?
- How do I polyfill fetch?
- How do I use Gemini AI in CJS?/Cannot
require()
ESM Module
Are they the same thing?
data
was the old way to pass Media data. It is now not recommended, but kept for backwards compatability. The new method is to simply pass an array of strings/buffers into the first parameter of ask()
. The major benefit is now you can include strings between buffers, which you couldn't do before. Here's a quick demo of how to migrate:
With data
:
import fs from "fs";
await gemini.ask(
"Between these two cookies, which one appears to be home-made, and which one looks store-bought?",
{
data: [fs.readFileSync("./cookie1.png"), fs.readFileSync("./cookie2.png")],
}
);
New Version:
import fs from "fs";
await gemini.ask([
"Between these two cookies, which one appears to be home-made, and which one looks store-bought?",
fs.readFileSync("./cookie1.png"),
fs.readFileSync("./cookie2.png"),
]);
Learn more in the dedicated section.
Does everything still work?
Yes! Gemini AI v2 should completely be backward-compatible. Most changes are under-the-hood, so your DX should be much smoother, especially for TS developers!
The only thing that you can consider changing is using the new array message format instead of the old buffer format. See the dedicated question to learn more.
And, by extension, why is it the default model?
By default, Gemini AI uses gemini-1.5-flash-latest
, Google's leading efficiency-based model. The reason that this is the default model is because of two main reasons regarding DX:
- 📈 Higher Rate Limits: Gemini 1.5 Pro is limited to 2 requests per minute, versus the 15 for Flash, so we choose the one with the higher rate limit, which is especially useful for development.
- ⚡ Faster Response Time: Gemini 1.5 Pro is a significant amount slower, so we use the faster model by default.
But, of course, should you need to change the model, it's as easy as passing it into the configuration of your request. For example:
import Gemini from "gemini-ai";
const gemini = new Gemini(API_KEY);
console.log(
await gemini.ask("Hello!", {
model: "gemini-1.5-pro-latest",
})
);
What if I want to use a deprecated command?
When initializing Gemini
, you can pass in an API version. This feature mainly exists to futureproof, as the current recommended API version (and the one used) is v1beta
. Note that some modern models (including the default Gemini 1.5 Flash) may not work on other API versions.
Here's how you can change it to, say, v1:
import Gemini from "gemini-ai";
const gemini = new Gemini(API_KEY, {
apiVersion: "v1",
});
I'm in a browser environment! What do I do?
Everything is optimized so it works for both browsers and Node.js—Files are passed as Buffers, so you decide how to get them, and adding a fetch polyfill is as easy as:
import Gemini from "gemini-ai";
import fetch from "node-fetch";
const gemini = new Gemini(API_KEY, {
fetch: fetch,
});
Nearly all fetch
polyfills should work as of Gemini AI v2.2, as streaming is now done mainly through response.body.getReader().read()
, but with a AsyncIterator
fallback, so nearly all environments should be covered.
I got
Error [ERR_REQUIRE_ESM]: require() of ES Module
, what can I do?
Gemini AI is a ESM (import
) only module. It is recommended that you use ESM in your projects too. However, if you must use CommonJS, you can use dynamic imports. Here's an example:
import("gemini-ai").then(async ({ default: Gemini }) => {
let gemini = new Gemini(API_KEY);
});
A special shoutout to developers of and contributors to the bard-ai
and palm-api
libraries. Gemini AI's interface is heavily based on what we have developed on these two projects.