Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compress continuation tokens #2279

Merged
merged 2 commits into from
Oct 15, 2021
Merged

Conversation

brendankowitz
Copy link
Member

@brendankowitz brendankowitz commented Oct 14, 2021

Description

To negate some of the impacts in using Base64 encoding, this PR uses a fast Deflate to reduce the overall size.

Findings for a selection of cases resulted in having the CT near or under the originally requested limit:

Original 2044, Continuation token Base64: 2728, Compressed+Base64: 1792
Original 2021, Continuation token Base64: 2696, Compressed+Base64: 1788
Original 621, Continuation token Base64: 828, Compressed+Base64: 464
Original 1929, Continuation token Base64: 2572, Compressed+Base64: 1040

Related issues

Addresses #2250.

Testing

  • Should pass existing tests.
  • Adds tests

FHIR Team Checklist

  • Update the title of the PR to be succinct and less than 50 characters
  • Add a milestone to the PR for the sprint that it is merged (i.e. add S47)
  • Tag the PR with the type of update: Bug, Dependencies, Enhancement, or New-Feature
  • Tag the PR with Azure API for FHIR if this will release to the Azure API for FHIR managed service (CosmosDB or common code related to service)
  • Tag the PR with Azure Healthcare APIs if this will release to the Azure Healthcare APIs managed service (Sql server or common code related to service)
  • Review squash-merge requirements

Semver Change (docs)

Patch|Skip|Feature|Breaking (reason)

@brendankowitz brendankowitz requested a review from a team as a code owner October 14, 2021 17:06
@feordin
Copy link
Contributor

feordin commented Oct 14, 2021

This would only impact results where we generate continuation tokens, but would there be any noticeable impact on performance?

@brendankowitz
Copy link
Member Author

brendankowitz commented Oct 14, 2021

This would only impact results where we generate continuation tokens, but would there be any noticeable impact on performance?

Perf implications on compressing a small string seem to be fairly minimal, this is the avg of 10 compress/decompress operations at each char size:

501 chars, encode: 00:00:00.0000118, decode: 00:00:00.0000136
1001 chars, encode: 00:00:00.0000094, decode: 00:00:00.0000059
1501 chars, encode: 00:00:00.0000087, decode: 00:00:00.0000055
2001 chars, encode: 00:00:00.0000109, decode: 00:00:00.0000079
2501 chars, encode: 00:00:00.0000105, decode: 00:00:00.0000064
3001 chars, encode: 00:00:00.0000114, decode: 00:00:00.0000093
3501 chars, encode: 00:00:00.0000118, decode: 00:00:00.0000095
4001 chars, encode: 00:00:00.0000125, decode: 00:00:00.0000136
4501 chars, encode: 00:00:00.0000130, decode: 00:00:00.0000281
5001 chars, encode: 00:00:00.0000165, decode: 00:00:00.0000346
5501 chars, encode: 00:00:00.0000132, decode: 00:00:00.0000270
6001 chars, encode: 00:00:00.0000133, decode: 00:00:00.0000291
6501 chars, encode: 00:00:00.0000135, decode: 00:00:00.0000302
7001 chars, encode: 00:00:00.0000145, decode: 00:00:00.0001254
7501 chars, encode: 00:00:00.0000147, decode: 00:00:00.0000114
8001 chars, encode: 00:00:00.0000135, decode: 00:00:00.0000113
8501 chars, encode: 00:00:00.0000141, decode: 00:00:00.0000124
9001 chars, encode: 00:00:00.0000146, decode: 00:00:00.0000126
9501 chars, encode: 00:00:00.0000149, decode: 00:00:00.0000135
10001 chars, encode: 00:00:00.0000156, decode: 00:00:00.0000355
10501 chars, encode: 00:00:00.0000154, decode: 00:00:00.0000136
11001 chars, encode: 00:00:00.0000156, decode: 00:00:00.0000139
11501 chars, encode: 00:00:00.0000160, decode: 00:00:00.0000143
12001 chars, encode: 00:00:00.0000164, decode: 00:00:00.0000146
12501 chars, encode: 00:00:00.0000173, decode: 00:00:00.0000316
13001 chars, encode: 00:00:00.0000171, decode: 00:00:00.0000154
13501 chars, encode: 00:00:00.0000176, decode: 00:00:00.0000160
14001 chars, encode: 00:00:00.0000177, decode: 00:00:00.0000163
14501 chars, encode: 00:00:00.0000392, decode: 00:00:00.0000192
15001 chars, encode: 00:00:00.0000194, decode: 00:00:00.0000187
15501 chars, encode: 00:00:00.0000188, decode: 00:00:00.0000193
16001 chars, encode: 00:00:00.0000191, decode: 00:00:00.0000194
16501 chars, encode: 00:00:00.0000424, decode: 00:00:00.0000213
17001 chars, encode: 00:00:00.0000287, decode: 00:00:00.0000314
17501 chars, encode: 00:00:00.0000386, decode: 00:00:00.0000390
18001 chars, encode: 00:00:00.0000338, decode: 00:00:00.0000765
18501 chars, encode: 00:00:00.0000358, decode: 00:00:00.0000353
19001 chars, encode: 00:00:00.0000521, decode: 00:00:00.0000553
19501 chars, encode: 00:00:00.0000450, decode: 00:00:00.0000440
20001 chars, encode: 00:00:00.0000298, decode: 00:00:00.0000318

@brendankowitz brendankowitz added Bug Bug bug bug. Azure API for FHIR Label denotes that the issue or PR is relevant to the Azure API for FHIR Azure Healthcare APIs Label denotes that the issue or PR is relevant to the FHIR service in the Azure Healthcare APIs labels Oct 14, 2021
@brendankowitz brendankowitz merged commit 3293f75 into main Oct 15, 2021
@brendankowitz brendankowitz deleted the personal/bkowitz/compress-ct branch October 15, 2021 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Azure API for FHIR Label denotes that the issue or PR is relevant to the Azure API for FHIR Azure Healthcare APIs Label denotes that the issue or PR is relevant to the FHIR service in the Azure Healthcare APIs Bug Bug bug bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants