Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reintroduction of reactivity to lint reports #2792

Merged
merged 17 commits into from
May 22, 2024
Merged

Conversation

martin-lysk
Copy link
Contributor

@martin-lysk martin-lysk commented May 15, 2024

This PR reintroduces reactivity to lint reports:

  • replaces async calls to getAll() and get() in createMessageLintReportsQuery.ts with Subscribable signals.
  • getAll() subscribers no longer receive an event for every message update. Instead it reports get scheduled as promises and the results of the last promise scheduled will update the reactive map and trigger corresponding signals -> inform subscribers.
implementation details

Each report that gets scheduled (from any crud operation within the messagequery) is put on the reports stack:
https://github.com/opral/monorepo/blob/b7675b5920299d3d25e5d66556f6cd2bcaaa776e/inlang/source-code/sdk/src/createMessageLintReportsQuery.ts#L109C4-L109C61
A scheduled report consists of two promises:

  1. One promise, that actual run of the lint rule - that genarates the reports
  2. One promise that (if it is currently the last scheduled report) updates the reactive reports map with the results of the current reports.

We update the last scheduled report whenever we schedule another report

currentBatchEnd = updateOutstandingReportsOnLast

This scheduling logic is an interims solution to make the reports reactive while not producing thousands of signals - and will be iterated with https://linear.app/opral/issue/MESDK-108/requirements-for-lint-rule-scheduling
  • adds a new async settled() method that allows non reactive callees like inlang/source-code/badge/src/badge.ts to await all currently queued reports to be finished.
  • Reverts all changes of this PR by
    • replacing async calls of get and getAll with reactive pattern like in fink
    • using settled in non reactive envs like cli

To make the reactivity work as expected this PR also:

  • Fixes a bug where loadProject.ts triggered effects with corrupt configurations when settings where changed because of missing reset of resolvedModules (compare)
  • reduces signal noise by comparing the results of a fresh lint report run with known lint report results to avoid unneeded signals (see)
  • Batching of lint reports to trigger signals only once (see)

@NiklasBuchfink, @felixhaeberle please also test with respect to performance in fink and sherlock
@jldec please also check regarding performance in load test

Copy link

changeset-bot bot commented May 15, 2024

🦋 Changeset detected

Latest commit: b7675b5

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 30 packages
Name Type
@inlang/paraglide-sveltekit Patch
@inlang/github-lint-action Patch
vs-code-extension Patch
@inlang/badge Patch
@inlang/cli Patch
@inlang/sdk Patch
@inlang/paraglide-sveltekit-example Patch
@inlang/server Patch
next-js-testapp Patch
@inlang/editor Patch
@inlang/plugin-i18next Patch
@inlang/plugin-json Patch
@inlang/plugin-m-function-matcher Patch
@inlang/plugin-next-intl Patch
@inlang/plugin-t-function-matcher Patch
@inlang/sdk-load-test Patch
@inlang/sdk-multi-project-test Patch
@inlang/doc-layout-component Patch
@inlang/message-bundle-component Patch
@inlang/rpc Patch
@inlang/settings-component Patch
@inlang/telemetry Patch
@inlang/cross-sell-ninja Patch
@inlang/paraglide-unplugin Patch
@inlang/paraglide-js-e2e Patch
@inlang/paraglide-next-e2e Patch
@inlang/paraglide-rollup Patch
@inlang/paraglide-vite Patch
@inlang/paraglide-webpack Patch
@inlang/paraglide-astro Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

# Conflicts:
#	inlang/source-code/github-lint-action/src/main.ts
#	inlang/source-code/sdk/src/createMessageLintReportsQuery.ts
@martin-lysk martin-lysk marked this pull request as ready for review May 16, 2024 12:27
Copy link
Member

@NiklasBuchfink NiklasBuchfink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I compared the performance of Fink to the live version with the cal.com repo and it feels the same. Great job 👍

@felixhaeberle
Copy link
Contributor

lgtm 👍


was there a communication issue between @martin-lysk & @jldec resulting in this? how can we prevent such reintroductions in the future?

@martin-lysk
Copy link
Contributor Author

martin-lysk commented May 16, 2024

was there a communication issue between @martin-lysk & @jldec resulting in this? how can we prevent such reintroductions in the future?

No - we just didn't knew better. The LoadProject's/MessaageQuery/LintReport reactivity is a beast. After moving to delegate pattern the signal flow became way less convoluted and I could finally nail down the root cause(es) of our performance issues - those outlined in this pr's description like batching and other fixes we recently applied like #2738 (merged within another pr)

Edit: Regarding how we can prevent such a thing in the future? I think this iteration was a good example on how to move fast in small steps so you can step back if you have to correct and tweak the direction. No need of a post mortem in this case i think

@jldec
Copy link
Contributor

jldec commented May 17, 2024

@felixhaeberle

to clarify this statement:

Reverts all changes of #2378

  • 2378 removed the reactiveMap for lint reports and replaced the api with an async pull api.
  • This PR reintroduces a different reactive api using a delegate pattern. It does not reintroduce the reactiveMap.

Both PRs are improvements over the prior state.

Copy link
Contributor

@jldec jldec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed some changes to the load-test to make it work and added a proper (CI) "test" script which does a tsc check, and runs the load test with 1000 messages.

The old test script is now called "load-test"
E.g. you run it with pnpm load-test 1000 1 1 1

For 1000 messages, translated, and with a subscription to lintreports getAll(), the load-test now takes about 30s and receives a lintreport per message change. Before it took about 4s because it only fetched the lint reports twice.

with this PR

$ time pnpm load-test 1000 1 1 1 
...
✔ Machine translate complete.                                                                                                                                  1:48:51 AM
  load-test lint reports changed event: 36, length: 34740 +3s
  load-test lint reports changed event: 110, length: 32076 +3s
  load-test lint reports changed event: 189, length: 29232 +3s
  load-test lint reports changed event: 279, length: 25992 +3s
  load-test lint reports changed event: 381, length: 22320 +3s
  load-test messages getAll event: 503, length: 1000 +3s
  load-test lint reports changed event: 503, length: 17928 +213ms
  load-test lint reports changed event: 675, length: 11736 +3s
  load-test load-test done - exiting +2s
  load-test messages getAll event: 1001, length: 1000 +1ms
  load-test lint reports changed event: 1001, length: 0 +952ms

real	0m25.784s
user	0m32.536s
sys	0m2.993s

Before

$ time pnpm test 1000 1 1 1
...
✔ Machine translate complete.                                                                                                                                  1:45:55 AM
  load-test load-test done - exiting +2s
  load-test messages getAll event: 1001, length: 1000 +682ms
  load-test lintReports getAll event: 2, length: 0 +1ms

real	0m4.393s
user	0m3.749s
sys	0m0.353s

@martin-lysk
Copy link
Contributor Author

This PR reintroduces a different reactive api using a delegate pattern. It does not reintroduce the reactiveMap.

It still uses ReactiveMap for Messages and for LintReport's reactivity - But the Lint reports no longer listen to signals of messages but use the delegate pattern for communication.

For 1000 messages, translated, and with a subscription to lintreports getAll(), the load-test now takes about 30s and receives a lintreport per message change. Before it took about 4s because it only fetched the lint reports twice.

This is expected with the current design - If the test asks for reactive reports - those will get executed whenever a message is changed. I think the problem we are having here is the fact, that reports are generated even if the cli doesent request them - we briefly discussed that last week.

So I see two action items here:

1. Changing the load test in the following manner:

Simulate a more realistic scenario like vscode extension running while cli runs translate.

Note

I expect translations to also running slow - as long as lints executed even if not requested.

Therefore i suggest to two separate processes both accessing the same fs.

  • One running translate (no effect attached to reports)
  • one running in the background with effects attached to messages and lint reports.

@jldec what do you think of this?

2. run lint reports on demand

I see two options on this one:

  • implicit: only if the lintReportQuery was accessed the first time
  • explicit: providing a method to turn the lintReportQuery on and off

I gonna investigate on the on demand lint reports on monday

@jldec
Copy link
Contributor

jldec commented May 18, 2024

Therefore i suggest to two separate processes both accessing the same fs.

The load-test does exactly this - it spawns a process and runs the cli to do the translation.

@jldec
Copy link
Contributor

jldec commented May 18, 2024

@martin-lysk, I discussed this with @samuelstroschein today:

  • we need reactivity on the per-message lint reports - ok to trigger those per-message.
  • reactivity on the getAll() lint reports is more challenging because the current design generates a flood of lint report events, (each containing all reports for all messages) whenever there are multi-message operations (like translate). The resulting CPU and memory pressure then causes UX issues.

I would propose that instead of returning to this heavy model of getAll() reactivity, we make the getAll() subscribe "nicer" and throttle the getAll events to a maximum rate of once/second.

It is fine imo to drop the intermediate lint report events, as long as there is always one event for the most recent changes, once the changes stop.

@martin-lysk
Copy link
Contributor Author

Therefore i suggest to two separate processes both accessing the same fs.

The load-test does exactly this - it spawns a process and runs the cli to do the translation.

thanks for the clarification - sorry missed the run part for translation.

It is fine imo to drop the intermediate lint report events, as long as there is always one event for the most recent changes, once the changes stop.

Agree - I gonna look into the option of throttling report evaluation/signal production.

@martin-lysk
Copy link
Contributor Author

martin-lysk commented May 18, 2024

@jldec Just pushed a new version that only triggers reactivity after the last repot exectued found in the report queue.
Outcome looks pretty promising for 1000 messages.

Its not the amount of events that are triggered but how the interact with async execution - the version before was like:

Execute report for message1
produce signal - run effects message1
Execute report for message2
produce signal - run effects message2
Execute report for message3
produce signal - run effects message3

Now it looks like this:

Execute report for message1
Execute report for message2
Execute report for message3
produce signal - run effects message1
produce signal - run effects message2
produce signal - run effects message3

Will clean this up beginning of next week but looks good!

ℹ Machine translated message "message_key_998"                               11:01:32 PM
ℹ Machine translated message "message_key_999"                               11:01:32 PM
ℹ Machine translated message "message_key_1000"                              11:01:32 PM
✔ Machine translate complete.                                                11:01:32 PM
  load-test load-test done - exiting +2s
  load-test messages getAll event: 1001, length: 1000 +945ms
  load-test lint reports changed event: 1998, length: 36 +0ms
pnpm load-test 1000 1 1 1  
3.06s user 
0.38s system 

@samuelstroschein samuelstroschein temporarily deployed to martinlysk1/mesdk-102 - badge-service PR #2792 May 18, 2024 23:31 — with Render Destroyed
@jldec
Copy link
Contributor

jldec commented May 19, 2024

The load test output seems to show a lot of lint report events with odd reports.length count.

load-test lint reports changed event: 1998, length: 36 +0ms

I would have expected a small number of events (2 or 3), starting with a high report count, and ending with 0.

When I tried the PR in fink, it showed similar (incorrect looking) results. e.g. only 36 missing translations instead of the expected 3600 for 100 messages in this project.

@martin-lysk
Copy link
Contributor Author

martin-lysk commented May 20, 2024

The load test output seems to show a lot of lint report events with odd reports.length count.

load-test lint reports changed event: 1998, length: 36 +0ms

I would have expected a small number of events (2 or 3), starting with a high report count, and ending with 0.

When I tried the PR in fink, it showed similar (incorrect looking) results. e.g. only 36 missing translations instead of the expected 3600 for 100 messages in this project.

@jldec thanks for the close look - your observations and expectations are 100% correct.

I batched reports now to reduce the reports noise as expected and fixed a reference error. results look as you described now - two events for reports (one after 500 updated messages - the setTimeout in propagation of load Messages we introduced recently opens the excution slot for the reports) and one after 1000 messages with the expected report count.

Tests of last push fail but i also get failing tests on things i didn't touch. will investigate further tomorrow.

@martin-lysk
Copy link
Contributor Author

martin-lysk commented May 20, 2024

@jldec It just made "click" here: with the batching we can also consider removing the sleep in loadMessages - while we apply messages into the reactivity structure (the part of load messages that is sync) no one can do something meaningful with the signals since we are still processing messages. Batching should be possible now since we no longer rely on reactivity for message crud handling internally 🍾

Will look into this as well.

@jldec
Copy link
Contributor

jldec commented May 21, 2024

I ran this again today and now the behavior is as expected.

Copy link
Contributor

@jldec jldec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@samuelstroschein samuelstroschein temporarily deployed to martinlysk1/mesdk-102 - inlang-manage PR #2792 May 21, 2024 23:35 — with Render Destroyed
@martin-lysk martin-lysk marked this pull request as ready for review May 22, 2024 09:40
@samuelstroschein samuelstroschein temporarily deployed to martinlysk1/mesdk-102 - fink-editor PR #2792 May 22, 2024 09:41 — with Render Destroyed
@samuelstroschein samuelstroschein temporarily deployed to martinlysk1/mesdk-102 - inlang-website PR #2792 May 22, 2024 09:42 — with Render Destroyed
@martin-lysk martin-lysk merged commit 4e79867 into main May 22, 2024
5 checks passed
@martin-lysk martin-lysk deleted the martinlysk1/mesdk-102 branch May 22, 2024 10:31
@github-actions github-actions bot locked and limited conversation to collaborators May 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants