Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preprocess image right after downloading and pycurl to download images #849

Merged
merged 19 commits into from
Jun 3, 2024

Conversation

wanliAlex
Copy link
Collaborator

  • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
    feature

  • What is the current behavior? (You can also link to an open issue here)
    In add_documents, we download and cache the images until we send all the images to vectorise.
    This may cause memory issues as Marqo is caching too many images.

  • What is the new behavior (if this is a feature change)?
    We preprocess the images to tensors right after downloading. Marqo no long caches the full-sized images anymore.
    This strategy significantly improves the memory efficiency of Marqo.

  • Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
    no

  • Have unit tests been run against this PR? (Has there also been any additional testing?)
    no

  • Related Python client changes (link commit/PR here)
    no

  • Related documentation changes (link commit/PR here)
    no

  • Other information:
    no

  • Please check if the PR fulfills these requirements

  • The commit message follows our guidelines
  • Tests for the changes have been added (for bug fixes/features)
  • Docs have been added / updated (for bug fixes / features)

@wanliAlex wanliAlex changed the title [WIP] Preprocess image right after downloading Preprocess image right after downloading May 30, 2024
@wanliAlex wanliAlex requested a review from farshidz May 30, 2024 00:20
@wanliAlex wanliAlex changed the title Preprocess image right after downloading Preprocess image right after downloading and pycurl to download images May 31, 2024
src/marqo/s2_inference/s2_inference.py Outdated Show resolved Hide resolved
src/marqo/s2_inference/s2_inference.py Outdated Show resolved Hide resolved
@wanliAlex wanliAlex merged commit 19fe1dd into mainline Jun 3, 2024
5 checks passed
@wanliAlex wanliAlex deleted the li/preprocess-image branch June 3, 2024 05:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants