-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support New Bedrock Embeddings (Cohere) #3315
Comments
🤖 Hello @metaskills, I'm Dosu, a friendly bot here to assist you with your langchainjs issues and questions, and help you contribute to the project if you're interested. While we wait for a human maintainer to join us, I'll be working on your issue regarding the new Bedrock Embeddings (Cohere). I'll provide you with a full answer as soon as possible. Thank you for your patience! |
🤖 Hello @metaskills, Nice to meet you! I'm Dosu, a bot here to assist you. I can help you tackle bugs, answer questions, and guide you in becoming a contributor. While we wait for a human maintainer, feel free to ask me any questions. I'm here to help! Sure, you can integrate the You can add the Here is how you can modify the code: export interface CohereEmbeddingsParams extends EmbeddingsParams {
modelName: string;
inputType?: string; // Add this line
/**
* The maximum number of documents to embed in a single request. This is
* limited by the Cohere API to a maximum of 96.
*/
batchSize?: number;
}
// ...
export class CohereEmbeddings
extends Embeddings
implements CohereEmbeddingsParams
{
modelName = "small";
inputType = "text"; // Add this line
// ...
constructor(
fields?: Partial<CohereEmbeddingsParams> & {
verbose?: boolean;
apiKey?: string;
}
) {
// ...
this.inputType = fieldsWithDefaults?.inputType ?? this.inputType; // Add this line
}
// ...
async embedDocuments(texts: string[]): Promise<number[][]> {
// ...
const batchRequests = batches.map((batch) =>
this.embeddingWithRetry({
model: this.modelName,
texts: batch,
input_type: this.inputType, // Add this line
})
);
// ...
}
async embedQuery(text: string): Promise<number[]> {
// ...
const { body } = await this.embeddingWithRetry({
model: this.modelName,
texts: [text],
input_type: this.inputType, // Add this line
});
// ...
}
// ...
} This way, you can specify the Please note that the current implementation of the I hope this helps! Let me know if you have any other questions. SourcesThis response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
As supplemental information that I found during my investigation, it appears that embeddings/bedrock.ts#L88-L95 should be revised to accommodate the new Cohere embedding schema on AWS Bedrock --
AWS Bedrock Documentation on Cohere Embed input_type Here's a sample of the current error: EDIT: I don't know how the contribution process works and whether or not this is an agreeable design strategy for the LangChain devs, but I was able to find a solution that worked for me so I thought I'd contribute back to whomever is also stuck on this. Here's a patched BedrockEmbeddings module that can be used in lieu of the BedrockEmbeddings for when the new Cohere embeddings need to be used. import { BedrockRuntimeClient, InvokeModelCommand } from "@aws-sdk/client-bedrock-runtime";
import { AwsCredentialIdentity, Provider } from "@aws-sdk/types";
import { Embeddings } from "langchain/embeddings/base";
import { BedrockEmbeddingsParams } from "langchain/embeddings/bedrock";
export type CredentialType =
| AwsCredentialIdentity
| Provider<AwsCredentialIdentity>;
export class BedrockCohereEmbeddings
extends Embeddings
implements BedrockEmbeddingsParams {
model: string;
client: BedrockRuntimeClient;
batchSize = 512;
constructor(fields?: BedrockEmbeddingsParams) {
super(fields ?? {});
this.model = fields?.model ?? "cohere.embed-english-v3";
this.client =
fields?.client ??
new BedrockRuntimeClient({
region: fields?.region,
credentials: fields?.credentials,
});
}
/**
* Embeds an array of documents using the Bedrock model.
* @param documents The array of documents to be embedded.
* @param inputType The input type for the embedding process.
* @returns A promise that resolves to a 2D array of embeddings.
* @throws If an error occurs while embedding documents with Bedrock.
*/
protected async _embedDocuments(documents: string[], inputType: string): Promise<number[][]> {
return this.caller.call(async () => {
try {
const res = await this.client.send(
new InvokeModelCommand({
modelId: this.model,
body: JSON.stringify({
texts: documents.map((doc) => doc.replace(/\n+/g, ' ')),
input_type: inputType,
}),
contentType: "application/json",
accept: "application/json",
})
);
const body = new TextDecoder().decode(res.body);
return JSON.parse(body).embeddings;
} catch (e) {
console.error({
error: e,
});
if (e instanceof Error) {
throw new Error(
`An error occurred while embedding documents with Bedrock: ${e.message}`
);
}
throw new Error(
"An error occurred while embedding documents with Bedrock"
);
}
});
}
/**
* Method that takes a document as input and returns a promise that
* resolves to an embedding for the document.
* @param document Document for which to generate an embedding.
* @returns Promise that resolves to an embedding for the input document.
*/
async embedQuery(document: string): Promise<number[]> {
return this._embedDocuments([document], 'search_query').then((embeddings) => embeddings[0]);
}
/**
* Method to generate embeddings for an array of texts.
* @param documents Array of texts for which to generate embeddings.
* @returns Promise that resolves to a 2D array of embeddings for each input document.
*/
async embedDocuments(documents: string[]): Promise<number[][]> {
return this._embedDocuments(documents, 'search_document');
}
} |
🤖 Hi, @metaskills, I'm helping the langchainjs team manage their backlog and am marking this issue as stale. From what I understand, the issue you opened requested support for new bedrock embeddings from Cohere, specifically the Could you please confirm if this issue is still relevant to the latest version of the langchainjs repository? If it is, please let the langchainjs team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days. Thank you! |
I'd like to test cohere embeddings available on Bedrock and think this helper class would be really beneficial. Is it possible to get @BrianErikson 's contribution merged into community extensions ? |
Is there any update on this ? |
New embedding are. Seems these use an
input_type
arg which I would like to use too.The text was updated successfully, but these errors were encountered: