Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text Compression Transform #2225

Merged
merged 43 commits into from
May 6, 2024
Merged

Text Compression Transform #2225

merged 43 commits into from
May 6, 2024

Conversation

WaelKarkoub
Copy link
Collaborator

@WaelKarkoub WaelKarkoub commented Mar 31, 2024

Why are these changes needed?

This PR introduces text compression by leveraging the LLMLingua library. This addition enhances processing efficiency and response speed by reducing token usage in large language models.

NOTE: LLM lingua uses locally hosted models, so caching might be important here.

Future work:

  • Image Compression
  • Video Compression

Related issue number

Closes #2538

Checks

@codecov-commenter
Copy link

codecov-commenter commented Mar 31, 2024

Codecov Report

Attention: Patch coverage is 25.96154% with 77 lines in your changes are missing coverage. Please review.

Project coverage is 45.11%. Comparing base (ded2d61) to head (ec6fe57).
Report is 35 commits behind head on main.

Files Patch % Lines
...togen/agentchat/contrib/capabilities/transforms.py 20.23% 67 Missing ⚠️
...agentchat/contrib/capabilities/text_compressors.py 50.00% 10 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #2225       +/-   ##
===========================================
+ Coverage   33.33%   45.11%   +11.77%     
===========================================
  Files          83       86        +3     
  Lines        8636     9108      +472     
  Branches     1835     2090      +255     
===========================================
+ Hits         2879     4109     +1230     
+ Misses       5516     4651      -865     
- Partials      241      348      +107     
Flag Coverage Δ
unittest 12.61% <25.96%> (?)
unittests 44.36% <0.00%> (+11.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sonichi sonichi added this pull request to the merge queue May 6, 2024
Merged via the queue into main with commit 372ac1e May 6, 2024
77 of 91 checks passed
@sonichi sonichi deleted the llm-lingua-transform branch May 6, 2024 14:22
jayralencar pushed a commit to jayralencar/autogen that referenced this pull request May 28, 2024
* adds implementation

* handles optional import

* cleanup

* updates github workflows

* skip test if dependencies not installed

* skip test if dependencies not installed

* use cpu

* skip openai

* unskip openai

* adds protocol

* better docstr

* minor fixes

* updates optional dependencies docs

* wip

* update docstrings

* wip

* adds back llmlingua requirement

* finalized protocol

* improve docstr

* guide complete

* improve docstr

* fix FAQ

* added cache support

* improve cache key

* cache key fix + faq fix

* improve docs

* improve guide

* args -> params

* spelling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request long context handling Compression to handle long context
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Issue]: Warning from the def _num_token_from_messages is verbose and hard to silence
7 participants