-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image Search with Subindices #606
Comments
@davidmezzetti This code seems to work without the "columns" parameter in the Embedding config |
Hello, that is correct. Since you are using the default object field name of "columns": {
"object": "object"
} |
@davidmezzetti My apologies, i mistyped the code in the question. I got it to work now. It seemed to have been confused between the columns in the subindexes However, changing
I'm using Could you provide an example of how you would do a image search? |
Another question I have is is there a way to weigh the subindexes when during combined search? |
Would it be possible to provide a few notional example records? It would be easier for me to follow exactly what you're trying to accomplish. |
We want to leverage subindexes to perform both text and image search. We have 3 more subindexes other than what is shown here in the config above. To run a search on Could you provide an example of how we could run a search on only image data (the |
Did you try? embeddings.search('select object, score from txtai where similar("machine learning", "image")') or embeddings.search('select object, score from txtai where similar("machine learning")', index="image") |
I figured out what the issue is. When I run The output for the first 3 items are: The problem is with the similarity scores. Typically for text-to-image comparisons, the similarity scores are lower as supposed to text-to-text comparisons. The algorithm seems to return all the scores for all subindexes, not just Is there a workaround that you know of to only return objects from the I have tried |
I will try to find time to come up with an example that does what you want to do with the appropriate configuration. I think some of the configuration you have could be throwing something off. |
With the current configuration, I noticed that it only calculates similarity using the Configuration with subindexes returns the following when I run
Configuration with subindexes returns:
This is what i have set up for subindexes:
|
The following code should give you what you're looking for. from PIL import Image
from txtai import Embeddings
embeddings = Embeddings(
content=True,
objects="image",
defaults=False,
indexes={
"text-data": {
"columns": {
"text": "body",
"object": "content"
}
},
"image":{
"method": "sentence-transformers",
"path": "sentence-transformers/clip-ViT-B-32",
"columns": {
"text": "image-text"
}
}
}
)
embeddings.index([{"object": Image.open("books.jpg")}, {"body": "machine learning"}]) And supports the following queries. embeddings.search("select id, score from txtai where similar(:x, :y)", parameters={"x": "books", "y": "text-data"})
# [{'id': '1', 'score': 0.3759748339653015}]
embeddings.search("select id, score from txtai where similar(:x, :y)", parameters={"x": "books", "y": "image"})
# [{'id': '0', 'score': 0.2725779414176941}]
embeddings.search("select id, object, score from txtai where similar(:x, :y)", parameters={"x": Image.open("books.jpg"), "y": "image"})
# [{'id': '0', 'object': <Image>, 'score': 0.9999999403953552}]
embeddings.search("select id, score from txtai where similar(:x, :y) or similar(:x, :z)", parameters={"x": "books", "y": "image", "z": "text-data"})
# [{'id': '1', 'score': 0.3759748339653015}, {'id': '0', 'score': 0.2725779414176941}]
embeddings.search("select id, score from txtai where similar(:x, :y) and similar(:x, :z)", parameters={"x": "books", "y": "image", "z": "text-data"})
# [] The tricky thing is that the CLIP model encodes both text and images. So the config needs to be setup to skip text records. Note how both indexes have non-standard names for the text column. |
Thank you for your help |
Closing this issue. If there are further questions, please re-open or open a new issue. |
@davidmezzetti
This is the code to ingest images:
The only difference between the local version that works and the deployed version that does not is the
Any insights you can provide on this issue would be greatly appreciated! |
It looks like in dev you're using Python directly and prod is using the API? |
@davidmezzetti Yes. We're using the API in prod. Do you have any insights into what might be causing this issue? |
@davidmezzetti Do you have insights into why it's throwing an error with API but working perfectly when calling |
The API does not currently support multi-part form submissions for embeddings inputs. I will try to work on an example to demonstrate how to do that but it will involve adding a custom API endpoint. This article has an example of that. |
@davidmezzetti Do you have any updates on this issue? |
Sending binary data through the API isn't currently supported. It's something I want to add and showing how to do it in the meantime with a custom endpoint is on my list. |
Closing here and tracking this in #606. |
Could you provide an example of this image search can used with other data using subindices?
I get this error when I try to upsert the vector-database with image data.
https://github.com/neuml/txtai/blob/master/examples/13_Similarity_search_with_images.ipynb
The text was updated successfully, but these errors were encountered: