Skip to content

Multimodal search#276

Merged
iulusoy merged 9 commits intossciwr:mainfrom
DimasfromLavoisier:multimodal_search
Jan 5, 2026
Merged

Multimodal search#276
iulusoy merged 9 commits intossciwr:mainfrom
DimasfromLavoisier:multimodal_search

Conversation

@DimasfromLavoisier
Copy link
Contributor

@DimasfromLavoisier DimasfromLavoisier commented Dec 15, 2025

This is a PR for a new version f a multimodal search module. Few things left until its final version:

  1. Tests
    2. A method for multi-query search
    3. Small fixes according copilot and other AI reviews
    4. Update demo notebook

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@DimasfromLavoisier DimasfromLavoisier force-pushed the multimodal_search branch 3 times, most recently from f56ea29 to 6781827 Compare December 16, 2025 16:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@DimasfromLavoisier DimasfromLavoisier force-pushed the multimodal_search branch 3 times, most recently from 08228b3 to fcc6934 Compare December 17, 2025 15:14
@sonarqubecloud
Copy link

Copy link
Member

@iulusoy iulusoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having some issues with memory on the GPU. Is it possible that there is a memory leak in the image encoding when building the FAISS?
Maybe you can try allocating your local GPU memory to something else largely, to reduce the amount of memory ammico can use, to reproduce the issue; or use much more data.


@pytest.mark.long
def test_multimodal_search_combined_query(get_path):
model = MultimodalEmbeddingsModel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here my GPU runs out of memory and it fails with

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 464.00 MiB. GPU 0 has a total capacity of 5.55 GiB of which 337.88 MiB is free. Including non-PyTorch memory, this process has 5.20 GiB memory in use. Of the allocated memory 4.57 GiB is allocated by PyTorch, and 553.08 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

maybe here we can add a try - except block to catch this error and fallback to CPU?
The problem starts in

ammico/multimodal_search.py:237: in index_images
    embeddings = self.model.encode_image(
../../../miniforge3/envs/ammico/lib/python3.13/site-packages/torch/utils/_contextlib.py:120: in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
ammico/model.py:285: in encode_image
    embeddings = self.model.encode(
../../../miniforge3/envs/ammico/lib/python3.13/site-packages/torch/utils/_contextlib.py:120: in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
../../../miniforge3/envs/ammico/lib/python3.13/site-packages/sentence_transformers/SentenceTransformer.py:1094: in encode
    out_features = self.forward(features, **kwargs)
    ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I set

def test_multimodal_search_combined_query(get_path):
    model = MultimodalEmbeddingsModel(device="cpu")
    mms = MultimodalSearch(model=model)

the test runs fine.

"metadata": {},
"outputs": [],
"source": [
"multim_s_model.index_images(\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here also the kernel crashes, I assume it is the same memory problem. I suspect that it is due to a memory leak, torch maybe not releasing memory when it should..? Otherwise, why does the memory use accumulate so much during the run? (The first few encodings are usually fine.)

@iulusoy
Copy link
Member

iulusoy commented Dec 18, 2025

Also, somehow the code coverage is not showing, I assume this is because the PR is opened from a branch.
Other than the memory issue, the implementation looks very good.

Copy link
Member

@iulusoy iulusoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the other faiss library version, I also could not get it to run locally. But since it runs fine on the CPU, I would postpone this to the testing stage and merge the PR now.

@iulusoy iulusoy merged commit a408b99 into ssciwr:main Jan 5, 2026
4 checks passed
@iulusoy iulusoy deleted the multimodal_search branch January 5, 2026 11:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants