-
Notifications
You must be signed in to change notification settings - Fork 5
feat: vector embedding demo #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Caution Review failedThe pull request is closed. WalkthroughThis pull request introduces several new configuration files, build scripts, documentation, and source code modules to support a Google Apps Script project centered on vector embeddings. It adds settings for VSCode formatting, project configuration for clasp, TypeScript, and Apps Script runtime, and a comprehensive README guide that details semantic search and text similarity functions. Additionally, new source files implement core vector operations, embedding generation via an API call, and compatibility fixes with a polyfill. Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant SS as SEMANTIC_SEARCH
participant BE as batchedEmbeddings_
participant GC as Google Cloud Vertex AI
participant VT as Vector Tools
U->>SS: Initiate semantic search with query & corpus
SS->>BE: Request text embeddings
BE->>GC: Call AI API for embeddings
GC-->>BE: Return embeddings
BE-->>SS: Deliver embeddings
SS->>VT: Compute similarity scores
VT-->>SS: Return similarity data
SS->>U: Return sorted results
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI ⛔ Files ignored due to path filters (2)
📒 Files selected for processing (13)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @jpoehnelt, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
This pull request introduces a new project, vector-embeddings, which demonstrates how to use Google's Vertex AI to generate vector embeddings within Google Apps Script. The project includes code for generating embeddings, calculating similarity between embeddings, and building a simple semantic search function. It also provides examples of how to use vector embeddings in Google Sheets and for document classification. The changes include adding necessary dependencies, build scripts, and configuration files for the new project.
Highlights
- New Project: vector-embeddings: Introduces a new project demonstrating vector embeddings with Google Apps Script and Vertex AI.
- Embedding Generation: Implements functions to generate vector embeddings from text using Vertex AI.
- Similarity Calculation: Includes functions to calculate the cosine similarity between two embeddings.
- Semantic Search: Provides a basic semantic search function that leverages vector embeddings.
- Google Sheets Integration: Demonstrates how to use vector embeddings in Google Sheets with a custom function.
- Document Classification: Shows how to use embeddings to automatically categorize documents.
- Build Process: Adds a build script using esbuild to bundle the Apps Script project.
Changelog
Click here to see the changelog
- .vscode/settings.json
- Configures the default formatter to be
esbenp.prettier-vscodefor javascript files.
- Configures the default formatter to be
- pnpm-lock.yaml
- Adds dependencies for the
vector-embeddingsproject, including@google/clasp,@types/google-apps-script, andesbuild.
- Adds dependencies for the
- projects/vector-embeddings/.clasp.json
- Adds a clasp configuration file for the
vector-embeddingsproject, specifying the script ID, root directory, project ID, and file extensions.
- Adds a clasp configuration file for the
- projects/vector-embeddings/README.md
- Creates a comprehensive README file explaining how to use vector embeddings in Google Apps Script with Vertex AI, including code examples and real-world applications.
- projects/vector-embeddings/build.js
- Adds a build script using esbuild to bundle the Apps Script project, including minification and banner injection.
- projects/vector-embeddings/package.json
- Creates a package.json file for the
vector-embeddingsproject, specifying dependencies, scripts, and other metadata.
- Creates a package.json file for the
- projects/vector-embeddings/polyfill.js
- Adds a polyfill to define
globalThis.windowfor compatibility with certain libraries.
- Adds a polyfill to define
- projects/vector-embeddings/src/appsscript.json
- Adds an appsscript.json file specifying the project's time zone, dependencies, exception logging, runtime version, and OAuth scopes.
- projects/vector-embeddings/src/examples.js
- Adds example functions for semantic search, Google Sheets integration, and document classification.
- projects/vector-embeddings/src/index.ts
- Creates an empty index.ts file, likely serving as an entry point for the TypeScript build process.
- projects/vector-embeddings/src/internal.d.ts
- Adds a declaration file for internal functions used in the project, such as
batchedEmbeddings_,similarity_, and others.
- Adds a declaration file for internal functions used in the project, such as
- projects/vector-embeddings/src/main.js
- Implements the core logic for generating embeddings and performing semantic search using Vertex AI.
- projects/vector-embeddings/src/tools.js
- Adds utility functions for calculating dot products, magnitudes, and cosine similarity.
- projects/vector-embeddings/tsconfig.json
- Creates a tsconfig.json file to configure the TypeScript compiler options for the project.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Did you know?
The concept of word embeddings, which vector embeddings are based on, gained prominence with the introduction of word2vec in 2013 by a team at Google, led by Tomas Mikolov.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces vector embedding functionality to Google Apps Script using Vertex AI. The code is well-structured and includes a comprehensive README with implementation details and examples. However, there are a few areas that could be improved for clarity and efficiency.
Summary of Findings
- Missing Error Handling in
semanticSearch: ThesemanticSearchfunction inexamples.jsdoes not handle potential errors during the embedding generation, which could lead to unexpected behavior. - Potential Performance Bottleneck in Similarity Calculations: The similarity calculations are performed without caching, which could lead to performance issues when dealing with large datasets.
- Inconsistent Variable Naming: There are inconsistencies in variable naming conventions across different files (e.g.,
PROJECT_IDvs.ProjectId).
Merge Readiness
The pull request introduces valuable functionality and is generally well-structured. However, addressing the error handling in semanticSearch and considering caching for similarity calculations would significantly improve the robustness and performance of the code. I am unable to approve this pull request, and recommend that others review and approve this code before merging. At a minimum, the high severity issues should be addressed before merging.
| function semanticSearch(query, corpus) { | ||
| // Generate embedding for the query | ||
| const queryEmbedding = batchedEmbeddings_([query])[0]; | ||
|
|
||
| // Create or use existing index | ||
| const index = corpus.map((text) => ({ | ||
| text, | ||
| embedding: batchedEmbeddings_([text])[0], | ||
| })); | ||
|
|
||
| // Calculate similarities | ||
| const results = index.map(({ text, embedding }) => ({ | ||
| text, | ||
| similarity: similarity_(embedding, queryEmbedding), | ||
| })); | ||
|
|
||
| // Sort by similarity (highest first) | ||
| return results.sort((a, b) => b.similarity - a.similarity); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| const token = ScriptApp.getOAuthToken(); | ||
|
|
||
| // TODO chunk in instances of 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| function truncate_(text, maxLength) { | ||
| return text.slice(0, maxLength) + (text.length > maxLength ? "..." : ""); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deploying apps-script with
|
| Latest commit: |
df60a44
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://3f0ed13c.apps-script.pages.dev |
| Branch Preview URL: | https://feat-embeddings.apps-script.pages.dev |
Summary by CodeRabbit