Compress continuation tokens #2279

brendankowitz · 2021-10-14T17:06:25Z

Description

To negate some of the impacts in using Base64 encoding, this PR uses a fast Deflate to reduce the overall size.

Findings for a selection of cases resulted in having the CT near or under the originally requested limit:

Original 2044, Continuation token Base64: 2728, Compressed+Base64: 1792
Original 2021, Continuation token Base64: 2696, Compressed+Base64: 1788
Original 621, Continuation token Base64: 828, Compressed+Base64: 464
Original 1929, Continuation token Base64: 2572, Compressed+Base64: 1040

Related issues

Addresses #2250.

Testing

Should pass existing tests.
Adds tests

FHIR Team Checklist

Update the title of the PR to be succinct and less than 50 characters
Add a milestone to the PR for the sprint that it is merged (i.e. add S47)
Tag the PR with the type of update: Bug, Dependencies, Enhancement, or New-Feature
Tag the PR with Azure API for FHIR if this will release to the Azure API for FHIR managed service (CosmosDB or common code related to service)
Tag the PR with Azure Healthcare APIs if this will release to the Azure Healthcare APIs managed service (Sql server or common code related to service)
Review squash-merge requirements

Semver Change (docs)

Patch|Skip|Feature|Breaking (reason)

feordin · 2021-10-14T17:19:04Z

This would only impact results where we generate continuation tokens, but would there be any noticeable impact on performance?

src/Microsoft.Health.Fhir.Core/Features/Search/ContinuationTokenConverter.cs

brendankowitz · 2021-10-14T18:01:14Z

This would only impact results where we generate continuation tokens, but would there be any noticeable impact on performance?

Perf implications on compressing a small string seem to be fairly minimal, this is the avg of 10 compress/decompress operations at each char size:

501 chars, encode: 00:00:00.0000118, decode: 00:00:00.0000136
1001 chars, encode: 00:00:00.0000094, decode: 00:00:00.0000059
1501 chars, encode: 00:00:00.0000087, decode: 00:00:00.0000055
2001 chars, encode: 00:00:00.0000109, decode: 00:00:00.0000079
2501 chars, encode: 00:00:00.0000105, decode: 00:00:00.0000064
3001 chars, encode: 00:00:00.0000114, decode: 00:00:00.0000093
3501 chars, encode: 00:00:00.0000118, decode: 00:00:00.0000095
4001 chars, encode: 00:00:00.0000125, decode: 00:00:00.0000136
4501 chars, encode: 00:00:00.0000130, decode: 00:00:00.0000281
5001 chars, encode: 00:00:00.0000165, decode: 00:00:00.0000346
5501 chars, encode: 00:00:00.0000132, decode: 00:00:00.0000270
6001 chars, encode: 00:00:00.0000133, decode: 00:00:00.0000291
6501 chars, encode: 00:00:00.0000135, decode: 00:00:00.0000302
7001 chars, encode: 00:00:00.0000145, decode: 00:00:00.0001254
7501 chars, encode: 00:00:00.0000147, decode: 00:00:00.0000114
8001 chars, encode: 00:00:00.0000135, decode: 00:00:00.0000113
8501 chars, encode: 00:00:00.0000141, decode: 00:00:00.0000124
9001 chars, encode: 00:00:00.0000146, decode: 00:00:00.0000126
9501 chars, encode: 00:00:00.0000149, decode: 00:00:00.0000135
10001 chars, encode: 00:00:00.0000156, decode: 00:00:00.0000355
10501 chars, encode: 00:00:00.0000154, decode: 00:00:00.0000136
11001 chars, encode: 00:00:00.0000156, decode: 00:00:00.0000139
11501 chars, encode: 00:00:00.0000160, decode: 00:00:00.0000143
12001 chars, encode: 00:00:00.0000164, decode: 00:00:00.0000146
12501 chars, encode: 00:00:00.0000173, decode: 00:00:00.0000316
13001 chars, encode: 00:00:00.0000171, decode: 00:00:00.0000154
13501 chars, encode: 00:00:00.0000176, decode: 00:00:00.0000160
14001 chars, encode: 00:00:00.0000177, decode: 00:00:00.0000163
14501 chars, encode: 00:00:00.0000392, decode: 00:00:00.0000192
15001 chars, encode: 00:00:00.0000194, decode: 00:00:00.0000187
15501 chars, encode: 00:00:00.0000188, decode: 00:00:00.0000193
16001 chars, encode: 00:00:00.0000191, decode: 00:00:00.0000194
16501 chars, encode: 00:00:00.0000424, decode: 00:00:00.0000213
17001 chars, encode: 00:00:00.0000287, decode: 00:00:00.0000314
17501 chars, encode: 00:00:00.0000386, decode: 00:00:00.0000390
18001 chars, encode: 00:00:00.0000338, decode: 00:00:00.0000765
18501 chars, encode: 00:00:00.0000358, decode: 00:00:00.0000353
19001 chars, encode: 00:00:00.0000521, decode: 00:00:00.0000553
19501 chars, encode: 00:00:00.0000450, decode: 00:00:00.0000440
20001 chars, encode: 00:00:00.0000298, decode: 00:00:00.0000318

brendankowitz requested a review from a team as a code owner October 14, 2021 17:06

feordin reviewed Oct 14, 2021

View reviewed changes

src/Microsoft.Health.Fhir.Core/Features/Search/ContinuationTokenConverter.cs Outdated Show resolved Hide resolved

feordin approved these changes Oct 14, 2021

View reviewed changes

Compress continuation tokens

a14d450

brendankowitz force-pushed the personal/bkowitz/compress-ct branch from ac36d67 to a14d450 Compare October 14, 2021 18:02

brendankowitz added Bug Bug bug bug. Azure API for FHIR Label denotes that the issue or PR is relevant to the Azure API for FHIR Azure Healthcare APIs Label denotes that the issue or PR is relevant to the FHIR service in the Azure Healthcare APIs labels Oct 14, 2021

Adds version inside token

b1be8d4

brendankowitz merged commit 3293f75 into main Oct 15, 2021

brendankowitz deleted the personal/bkowitz/compress-ct branch October 15, 2021 17:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compress continuation tokens #2279

Compress continuation tokens #2279

brendankowitz commented Oct 14, 2021 •

edited

Loading

feordin commented Oct 14, 2021

brendankowitz commented Oct 14, 2021 •

edited

Loading

Compress continuation tokens #2279

Compress continuation tokens #2279

Conversation

brendankowitz commented Oct 14, 2021 • edited Loading

Description

Related issues

Testing

FHIR Team Checklist

Semver Change (docs)

feordin commented Oct 14, 2021

brendankowitz commented Oct 14, 2021 • edited Loading

brendankowitz commented Oct 14, 2021 •

edited

Loading

brendankowitz commented Oct 14, 2021 •

edited

Loading