As mentioned in the paper, key concepts might get omitted either corrupted by the compression, in a way that the GPT can't process the compressed prompt.
You mention also there is an approach to optimize around this issue; could you share details on the corresponding configuration options in the Python implementation?
In the attached image, I've tested the GPT confidence degradation according to compression effects on the qasper_e subset of the LongBench benchmark.

Wrong answers/no answer possible:
- Regular GPT-4: %45.36 e.g. without prompt compression (GPT-4 seems to "give up" frequently on longer queries)
- Compressed prompt by LLM Lingua, target_token=200: 63.93%
- Compressed prompt by LLM Lingua, target_token=400: 60.66%
As mentioned in the paper, key concepts might get omitted either corrupted by the compression, in a way that the GPT can't process the compressed prompt.
You mention also there is an approach to optimize around this issue; could you share details on the corresponding configuration options in the Python implementation?
In the attached image, I've tested the GPT confidence degradation according to compression effects on the qasper_e subset of the LongBench benchmark.
Wrong answers/no answer possible: