Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for image dataset #571

Merged
merged 17 commits into from
Oct 4, 2023
Merged

Add support for image dataset #571

merged 17 commits into from
Oct 4, 2023

Conversation

rupeshbansal
Copy link
Contributor

@rupeshbansal rupeshbansal commented Sep 6, 2023

Description

This adds support for image as a new dataset in embedchain. Users will be able to feed in an image/directory(containing a set of images) to embedchain, and then issue query searches on the context of images.

NOTE: Docs are yet to be updated. Unsure about the convention of the repository, but planning to add those in a followup!

Setup

Run poetry install -E images to install all the additional dependencies

Preparation of the data

Create a new folder and put all the images on which the embedchain is to be trained in it

Add the Images to Embedchain

from embedchain import App
from embedchain.config.llm.base_llm_config import BaseLlmConfig

app = App()
app.add(<PATH_TO_THE_LOCAL_IMAGES>, data_type="images")

Query on the context

app.query(<QUERY>, config = BaseLlmConfig(query_type = "Images"))

Test

You can test this feature by following instructions in https://github.com/rupeshbansal/embedchain_imagetest/

Fixes #511

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Please delete options that are not relevant.

  • Unit Test
  • Test Script (please provide)

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

Maintainer Checklist

@rupeshbansal
Copy link
Contributor Author

@taranjeet @cachho Can you please take a look?

@rupeshbansal
Copy link
Contributor Author

@taranjeet Can you please take a look? New merge conflicts are coming into this every few days.

@cachho
Copy link
Contributor

cachho commented Sep 11, 2023

I'm sure this is a great feature! I couldn't test it yet or do a thorough review, I hope you can just answer a few questions here.

What's happening behind the scenes? Is your image embedded as an image, or is it transformed to text and that text is embedded? After adding it, can you only query it as an image (is that what query_type is for) or also in other contexts? Will it return the image or a text answer? Does it work with all LLMs?

We're definitely going to need documentation for the query_type, especially since it's user facing. What does it do? Can it be used with other data_types as well?

Can you provide a full test script (with creative commons images)? What kind of question can you ask?

@rupeshbansal
Copy link
Contributor Author

rupeshbansal commented Sep 15, 2023

Thanks @cachho for taking a look. Please find the response inline

What's happening behind the scenes? Is your image embedded as an image, or is it transformed to text and that text is embedded?

We create an embedding for each image, which is stored in the DB(CharomDB or ES of whatever is configured). By embedding, I mean an array of floating numbers. This conversion is done using https://github.com/openai/CLIP. You should be able to find how exactly this is being done in embedchain/models/ClipProcessor.py

After adding it, can you only query it as an image (is that what query_type is for) or also in other contexts?

After adding the images, you query it using text context of the added images. Example if you add an image of a mountain, a beach, a forest, a boy to embedchain, we should then be able to search using queries like "A lush green set of trees", which should return the image of a forest.

Will it return the image or a text answer?
It will return text answer, representing an array of image paths which are most relevant to the query

Does it work with all LLMs?
The is no LLM involved with this. We are finding the nearest neighbour of the query in the DB. We are relying on the the default algorithm(Generally Cosine Similarity) to get the nearest neighbour

We're definitely going to need documentation for the query_type, especially since it's user facing. What does it do?
I completely agree. My plans were to add it once this gets an LGTM. Will help reduce the iterations. Let me know if you feel otherwise

Can it be used with other data_types as well?
No, it cannot be

Can you provide a full test script (with creative commons images)? What kind of question can you ask?

Nice suggestion. I have created https://github.com/rupeshbansal/embedchain_imagetest/ which has the steps to test it. Will add it to the PR description as well

Copy link
Collaborator

@deshraj deshraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me. Great work @rupeshbansal ❤️

Please resolve the comments and incorporate minor suggestions and we are good to go here.

embedchain/embedchain.py Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Oct 2, 2023

Codecov Report

Attention: 25 lines in your changes are missing coverage. Please review.

Files Coverage Δ
embedchain/chunkers/base_chunker.py 100.00% <100.00%> (ø)
embedchain/chunkers/images.py 100.00% <100.00%> (ø)
embedchain/config/llm/base_llm_config.py 94.73% <100.00%> (+0.14%) ⬆️
embedchain/data_formatter/data_formatter.py 86.79% <100.00%> (+0.51%) ⬆️
embedchain/models/data_type.py 100.00% <100.00%> (ø)
embedchain/vectordb/elasticsearch.py 67.53% <100.00%> (+33.74%) ⬆️
embedchain/llm/base.py 71.42% <50.00%> (-0.37%) ⬇️
embedchain/vectordb/chroma.py 82.79% <87.50%> (+6.39%) ⬆️
embedchain/embedchain.py 73.89% <50.00%> (-0.44%) ⬇️
embedchain/models/clip_processor.py 84.21% <84.21%> (ø)
... and 1 more

📢 Thoughts on this report? Let us know!.

@taranjeet taranjeet merged commit d0af018 into mem0ai:main Oct 4, 2023
5 checks passed
@LuciAkirami
Copy link
Contributor

LuciAkirami commented Oct 4, 2023

Hey @rupeshbansal , glad to know that Image search has come to embedchain. It would be great if there is a default query_type set to text (non Image). I mean, after these commits, the user has to explicitly give the query_text to the app.query(), else the code throws out AttributeError: 'NoneType' object has no attribute query_type

-> Before this commit, the typical flow of creating a bot would be like this

from embedchain import App
from embedchain.llm.openai import OpenAILlm


app = App()

app.add("https://www.youtube.com/watch?v=ZnEgvGPMRXA")

response = app.query("Is Co-Pilot in Windows update?")
print(response)

-> After this commit, we need to explicitly provide the query type like the below

from embedchain import App
from embedchain.config import LlmConfig

app = App()

app.add("https://www.youtube.com/watch?v=ZnEgvGPMRXA")

response = app.query("Is Co-Pilot in Windows update?",config=LlmConfig(query_type='text'))
print(response)

-> if the old style is followed, then throws it out an error
AttributeError: 'NoneType' object has no attribute 'query_type'

-> One work around would be, to check if the config is provided to the app.query() and if not, then assume the query_type is text type(I mean like setting text as default). For this the following changes have to be made in the embedchain.py inside the query() function

def query(self, input_query: str, config: BaseLlmConfig = None, dry_run=False, where: Optional[Dict] = None) -> str:
        if config == None:
            config=BaseLlmConfig(query_type='text')
        ......

-> And as @cachho as mentioned, this needs to be in documentation, I myself was confused (when I did a git pull) and all of a sudden like why wasn't the code working and was backtracking and finally found this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feature request: add support for Image(s) as a new DataType
5 participants