-
Notifications
You must be signed in to change notification settings - Fork 52
feat: text embeddings #163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4cf7634 to
11bff0f
Compare
11bff0f to
d34a6bd
Compare
| fun preprocess(input: String): Array<LongArray> { | ||
| val inputIds = tokenizer.encode(input).map { it.toLong() }.toLongArray() | ||
| val attentionMask = inputIds.map { if (it != 0L) 1L else 0L }.toLongArray() | ||
| return arrayOf(inputIds, attentionMask) // Shape: [2, max_length] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is max_length specified? I think mentioning it here would be nice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_length is specified inside tokenizer.json
| modelSource: string | number; | ||
| tokenizerSource: string | number; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use ResourceSource instead of string | number
src/modules/natural_language_processing/TextEmbeddingsModule.ts
Outdated
Show resolved
Hide resolved
ios/RnExecutorch/models/BaseModel.mm
Outdated
| - (NSArray *)forwardMultiple:(NSArray *)inputs { | ||
| NSMutableArray *shapes = [NSMutableArray new]; | ||
| NSMutableArray *inputTypes = [NSMutableArray new]; | ||
| NSNumber *numberOfInputs = [module getNumberOfInputs]; | ||
|
|
||
| for (NSUInteger i = 0; i < [numberOfInputs intValue]; i++) { | ||
| [shapes addObject:[module getInputShape:[NSNumber numberWithInt:i]]]; | ||
| [inputTypes addObject:[module getInputType:[NSNumber numberWithInt:i]]]; | ||
| } | ||
|
|
||
| NSArray *result = [module forward:inputs shapes:shapes inputTypes:inputTypes]; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not look correct, wouldn't it be better to actually check if length on inputs is the same as numberOfInputs and if not abort? Then we don't need to create another method just to copy 99% of the code. You can either add if sattement here checking if input is array of arrays or change it to only work with array of inputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and then change the name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be changed in another PR as mentioned in BaseModel.h. This change requires changes in multiple files and I decided to not include them here.
src/native/RnExecutorchModules.ts
Outdated
| const TextEmbeddingsSpec = require('./NativeTextEmbeddings').default; | ||
| const TextEmbeddings = TextEmbeddingsSpec | ||
| ? TextEmbeddingsSpec | ||
| : new Proxy( | ||
| {}, | ||
| { | ||
| get() { | ||
| throw new Error(LINKING_ERROR); | ||
| }, | ||
| } | ||
| ); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add class for this like for other modules, e.g.
class _ClassificationModule {
async forward(input: string): ReturnType<ClassificationInterface['forward']> {
return await Classification.forward(input);
}
async loadModule(
modelSource: string | number
): ReturnType<ClassificationInterface['loadModule']> {
return await Classification.loadModule(modelSource);
}
}
src/modules/natural_language_processing/TextEmbeddingsModule.ts
Outdated
Show resolved
Hide resolved
| "devDependencies": { | ||
| "@babel/core": "^7.25.2", | ||
| "@types/react": "~18.3.12", | ||
| "react-native-executorch": "file:../../react-native-executorch-20250401133031.tgz", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleanup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be cleaned up once v0.4.0 will be released on npm
| title: TextEmbeddingsModule | ||
| sidebar_position: 8 | ||
| --- | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add the description and the caution banner like in useTextEmbeddings page
06125c3 to
a43a3ba
Compare
Co-authored-by: Mateusz Kopcinski <120639731+mkopcins@users.noreply.github.com>
Remove Llama Export, refactor docs - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [x] Documentation update (improves or adds clarity to existing documentation) - [x] I have performed a self-review of my code - [x] I have commented my code, particularly in hard-to-understand areas - [x] I have updated the documentation accordingly - [x] My changes generate no new warnings --------- Co-authored-by: Jakub Chmura <92989966+chmjkb@users.noreply.github.com>
Modified forward function inside BaseModel to accept multiple inputs. Removed unnecessary includes/imports. Made the model loading synchronous. - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] Documentation update (improves or adds clarity to existing documentation) - [x] iOS - [ ] Android - [x] I have performed a self-review of my code - [x] I have commented my code, particularly in hard-to-understand areas - [x] I have updated the documentation accordingly - [x] My changes generate no new warnings
Code refactor (example) - [x] iOS - [x] Android - [x] I have performed a self-review of my code - [x] I have commented my code, particularly in hard-to-understand areas - [x] I have updated the documentation accordingly - [x] My changes generate no new warnings --------- Co-authored-by: Jakub Chmura <92989966+chmjkb@users.noreply.github.com>
Add tokenizer documentation - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [x] Documentation update (improves or adds clarity to existing documentation) - [x] I have performed a self-review of my code - [x] I have commented my code, particularly in hard-to-understand areas - [x] I have updated the documentation accordingly - [x] My changes generate no new warnings --------- Co-authored-by: kopcion <mati3111@gmail.com>
f0e49d6 to
568007b
Compare
## Description Add text embeddings ### Type of change - [ ] Bug fix (non-breaking change which fixes an issue) - [x] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [x] Documentation update (improves or adds clarity to existing documentation) ### Tested on - [x] iOS - [x] Android ### Checklist - [x] I have performed a self-review of my code - [x] I have commented my code, particularly in hard-to-understand areas - [x] I have updated the documentation accordingly - [x] My changes generate no new warnings --------- Co-authored-by: Mateusz Kopcinski <120639731+mkopcins@users.noreply.github.com> Co-authored-by: Jakub Chmura <92989966+chmjkb@users.noreply.github.com> Co-authored-by: kopcion <mati3111@gmail.com>
Description
Add text embeddings
Type of change
Tested on
Checklist