Skip to content

Commit

Permalink
Merge branch 'main' into himself65/20240422/http-support
Browse files Browse the repository at this point in the history
  • Loading branch information
himself65 committed Apr 24, 2024
2 parents 619d834 + aeefc77 commit eac554a
Show file tree
Hide file tree
Showing 24 changed files with 407 additions and 93 deletions.
5 changes: 5 additions & 0 deletions .changeset/curly-shoes-give.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"llamaindex": patch
---

feat: support jina ai embedding and reranker
6 changes: 3 additions & 3 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
strategy:
fail-fast: false
matrix:
node-version: [18.x, 20.x, 21.x]
node-version: [18.x, 20.x, 22.x]
name: E2E on Node.js ${{ matrix.node-version }}
runs-on: ubuntu-latest
steps:
Expand All @@ -26,7 +26,7 @@ jobs:
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version-file: ".nvmrc"
node-version: ${{ matrix.node-version }}
cache: "pnpm"
- name: Install dependencies
run: pnpm install
Expand All @@ -37,7 +37,7 @@ jobs:
strategy:
fail-fast: false
matrix:
node-version: [18.x, 20.x, 21.x]
node-version: [18.x, 20.x, 22.x]
name: Test on Node.js ${{ matrix.node-version }}
runs-on: ubuntu-latest

Expand Down
19 changes: 13 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,14 +114,21 @@ Add the following config to your `next.config.js` to ignore specific packages in
/** @type {import('next').NextConfig} */
const nextConfig = {
experimental: {
serverComponentsExternalPackages: ["pdf2json", "@zilliz/milvus2-sdk-node"],
serverComponentsExternalPackages: [
"pdf2json",
"@zilliz/milvus2-sdk-node",
"sharp",
"onnxruntime-node",
],
},
webpack: (config) => {
config.resolve.alias = {
...config.resolve.alias,
sharp$: false,
"onnxruntime-node$": false,
};
config.externals.push({
pdf2json: "commonjs pdf2json",
"@zilliz/milvus2-sdk-node": "commonjs @zilliz/milvus2-sdk-node",
sharp: "commonjs sharp",
"onnxruntime-node": "commonjs onnxruntime-node",
});

return config;
},
};
Expand Down
21 changes: 21 additions & 0 deletions apps/docs/docs/modules/embeddings/available_embeddings/jinaai.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Jina AI

To use Jina AI embeddings, you need to import `JinaAIEmbedding` from `llamaindex`.

```ts
import { JinaAIEmbedding, Settings } from "llamaindex";

Settings.embedModel = new JinaAIEmbedding();

const document = new Document({ text: essay, id_: "essay" });

const index = await VectorStoreIndex.fromDocuments([document]);

const queryEngine = index.asQueryEngine();

const query = "What is the meaning of life?";

const results = await queryEngine.query({
query,
});
```
71 changes: 71 additions & 0 deletions apps/docs/docs/modules/node_postprocessors/jinaai_reranker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Jina AI Reranker

The Jina AI Reranker is a postprocessor that uses the Jina AI Reranker API to rerank the results of a search query.

## Setup

Firstly, you will need to install the `llamaindex` package.

```bash
pnpm install llamaindex
```

Now, you will need to sign up for an API key at [Jina AI](https://jina.ai/reranker). Once you have your API key you can import the necessary modules and create a new instance of the `JinaAIReranker` class.

```ts
import {
JinaAIReranker,
Document,
OpenAI,
VectorStoreIndex,
Settings,
} from "llamaindex";
```

## Load and index documents

For this example, we will use a single document. In a real-world scenario, you would have multiple documents to index.

```ts
const document = new Document({ text: essay, id_: "essay" });

Settings.llm = new OpenAI({ model: "gpt-3.5-turbo", temperature: 0.1 });

const index = await VectorStoreIndex.fromDocuments([document]);
```

## Increase similarity topK to retrieve more results

The default value for `similarityTopK` is 2. This means that only the most similar document will be returned. To retrieve more results, you can increase the value of `similarityTopK`.

```ts
const retriever = index.asRetriever();
retriever.similarityTopK = 5;
```

## Create a new instance of the JinaAIReranker class

Then you can create a new instance of the `JinaAIReranker` class and pass in the number of results you want to return.
The Jina AI Reranker API key is set in the `JINAAI_API_KEY` environment variable.

```bash
export JINAAI_API_KEY=<YOUR API KEY>
```

```ts
const nodePostprocessor = new JinaAIReranker({
topN: 5,
});
```

## Create a query engine with the retriever and node postprocessor

```ts
const queryEngine = index.asQueryEngine({
retriever,
nodePostprocessors: [nodePostprocessor],
});

// log the response
const response = await queryEngine.query("Where did the author grown up?");
```
2 changes: 1 addition & 1 deletion apps/docs/docusaurus.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ const config = {
"docusaurus-plugin-typedoc",
{
entryPoints: ["../../packages/core/src/index.ts"],
tsconfig: "../../packages/core/tsconfig.json",
tsconfig: "../../tsconfig.json",
readme: "none",
sourceLinkTemplate:
"https://github.com/run-llama/LlamaIndexTS/blob/{gitRevision}/{path}#L{line}",
Expand Down
1 change: 1 addition & 0 deletions apps/docs/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
"@llamaindex/examples": "workspace:*",
"@mdx-js/react": "^3.0.1",
"clsx": "^2.1.0",
"llamaindex": "workspace:*",
"postcss": "^8.4.38",
"prism-react-renderer": "^2.3.1",
"raw-loader": "^4.0.2",
Expand Down
2 changes: 0 additions & 2 deletions examples/multimodal/load.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ import {
VectorStoreIndex,
storageContextFromDefaults,
} from "llamaindex";
import { DocStoreStrategy } from "llamaindex/ingestion/strategies/index";

import * as path from "path";

Expand Down Expand Up @@ -32,7 +31,6 @@ async function generateDatasource() {
});
await VectorStoreIndex.fromDocuments(documents, {
storageContext,
docStoreStrategy: DocStoreStrategy.NONE,
});
});
console.log(`Storage successfully generated in ${ms / 1000}s.`);
Expand Down
19 changes: 10 additions & 9 deletions examples/readers/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,20 +3,21 @@
"private": true,
"type": "module",
"scripts": {
"start": "node --loader ts-node/esm ./src/simple-directory-reader.ts",
"start:csv": "node --loader ts-node/esm ./src/csv.ts",
"start:docx": "node --loader ts-node/esm ./src/docx.ts",
"start:html": "node --loader ts-node/esm ./src/html.ts",
"start:markdown": "node --loader ts-node/esm ./src/markdown.ts",
"start:pdf": "node --loader ts-node/esm ./src/pdf.ts",
"start:llamaparse": "node --loader ts-node/esm ./src/llamaparse.ts"
"start": "node --import tsx ./src/simple-directory-reader.ts",
"start:csv": "node --import tsx ./src/csv.ts",
"start:docx": "node --import tsx ./src/docx.ts",
"start:html": "node --import tsx ./src/html.ts",
"start:markdown": "node --import tsx ./src/markdown.ts",
"start:pdf": "node --import tsx ./src/pdf.ts",
"start:llamaparse": "node --import tsx ./src/llamaparse.ts",
"start:notion": "node --import tsx ./src/notion.ts"
},
"dependencies": {
"llamaindex": "*"
},
"devDependencies": {
"@types/node": "^20.12.7",
"ts-node": "^10.9.2",
"typescript": "^5.4.3"
"tsx": "^4.7.2",
"typescript": "^5.4.5"
}
}
4 changes: 2 additions & 2 deletions examples/readers/src/notion.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ import { createInterface } from "node:readline/promises";

program
.argument("[page]", "Notion page id (must be provided)")
.action(async (page, _options, command) => {
.action(async (page, _options) => {
// Initializing a client

if (!process.env.NOTION_TOKEN) {
Expand Down Expand Up @@ -55,7 +55,7 @@ program
.filter((page) => page !== null);
console.log("Found pages:");
console.table(pages);
console.log(`To run, run ts-node ${command.name()} [page id]`);
console.log(`To run, run with [page id]`);
return;
}
}
Expand Down
7 changes: 5 additions & 2 deletions packages/core/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
"@llamaindex/cloud": "0.0.5",
"@llamaindex/env": "workspace:*",
"@mistralai/mistralai": "^0.1.3",
"@notionhq/client": "^2.2.15",
"@pinecone-database/pinecone": "^2.2.0",
"@qdrant/js-client-rest": "^1.8.2",
"@types/lodash": "^4.17.0",
Expand All @@ -31,7 +30,7 @@
"mammoth": "^1.7.1",
"md-utils-ts": "^2.0.0",
"mongodb": "^6.5.0",
"notion-md-crawler": "^0.0.2",
"notion-md-crawler": "^1.0.0",
"ollama": "^0.5.0",
"openai": "^4.38.0",
"papaparse": "^5.4.1",
Expand All @@ -45,7 +44,11 @@
"wikipedia": "^2.1.2",
"wink-nlp": "^1.14.3"
},
"peerDependencies": {
"@notionhq/client": "^2.2.15"
},
"devDependencies": {
"@notionhq/client": "^2.2.15",
"@swc/cli": "^0.3.12",
"@swc/core": "^1.4.16",
"concurrently": "^8.2.2",
Expand Down
31 changes: 31 additions & 0 deletions packages/core/src/Node.ts
Original file line number Diff line number Diff line change
Expand Up @@ -326,6 +326,37 @@ export class ImageNode<T extends Metadata = Metadata> extends TextNode<T> {
const absPath = path.resolve(this.id_);
return new URL(`file://${absPath}`);
}

// Calculates the image part of the hash
private generateImageHash() {
const hashFunction = createSHA256();

if (this.image instanceof Blob) {
// TODO: ideally we should use the blob's content to calculate the hash:
// hashFunction.update(new Uint8Array(await this.image.arrayBuffer()));
// as this is async, we're using the node's ID for the time being
hashFunction.update(this.id_);
} else if (this.image instanceof URL) {
hashFunction.update(this.image.toString());
} else if (typeof this.image === "string") {
hashFunction.update(this.image);
} else {
throw new Error(
`Unknown image type: ${typeof this.image}. Can't calculate hash`,
);
}

return hashFunction.digest();
}

generateHash() {
const hashFunction = createSHA256();
// calculates hash based on hash of both components (image and text)
hashFunction.update(super.generateHash());
hashFunction.update(this.generateImageHash());

return hashFunction.digest();
}
}

export class ImageDocument<T extends Metadata = Metadata> extends ImageNode<T> {
Expand Down
29 changes: 29 additions & 0 deletions packages/core/src/embeddings/JinaAIEmbedding.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
import { getEnv } from "@llamaindex/env";
import { OpenAIEmbedding } from "./OpenAIEmbedding.js";

export class JinaAIEmbedding extends OpenAIEmbedding {
constructor(init?: Partial<OpenAIEmbedding>) {
const {
apiKey = getEnv("JINAAI_API_KEY"),
additionalSessionOptions = {},
model = "jina-embeddings-v2-base-en",
...rest
} = init ?? {};

if (!apiKey) {
throw new Error(
"Set Jina AI API Key in JINAAI_API_KEY env variable. Get one for free or top up your key at https://jina.ai/embeddings",
);
}

additionalSessionOptions.baseURL =
additionalSessionOptions.baseURL ?? "https://api.jina.ai/v1";

super({
apiKey,
additionalSessionOptions,
model,
...rest,
});
}
}
1 change: 1 addition & 0 deletions packages/core/src/embeddings/index.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
export * from "./ClipEmbedding.js";
export * from "./HuggingFaceEmbedding.js";
export * from "./JinaAIEmbedding.js";
export * from "./MistralAIEmbedding.js";
export * from "./MultiModalEmbedding.js";
export { OllamaEmbedding } from "./OllamaEmbedding.js";
Expand Down
12 changes: 6 additions & 6 deletions packages/core/src/ingestion/IngestionPipeline.ts
Original file line number Diff line number Diff line change
Expand Up @@ -94,20 +94,20 @@ export class IngestionPipeline {
documents?: Document[],
nodes?: BaseNode[],
): Promise<BaseNode[]> {
const inputNodes: BaseNode[] = [];
const inputNodes: BaseNode[][] = [];
if (documents) {
inputNodes.push(...documents);
inputNodes.push(documents);
}
if (nodes) {
inputNodes.push(...nodes);
inputNodes.push(nodes);
}
if (this.documents) {
inputNodes.push(...this.documents);
inputNodes.push(this.documents);
}
if (this.reader) {
inputNodes.push(...(await this.reader.loadData()));
inputNodes.push(await this.reader.loadData());
}
return inputNodes;
return inputNodes.flat();
}

async run(
Expand Down

0 comments on commit eac554a

Please sign in to comment.