Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store: Add AccountDataStore #23

Merged
merged 1 commit into from
Dec 26, 2020
Merged

store: Add AccountDataStore #23

merged 1 commit into from
Dec 26, 2020

Conversation

daenney
Copy link
Contributor

@daenney daenney commented Dec 17, 2020

The AccountDataStore leverages a user's account data to load/save the next_batch token, while retaining the InMemoryStore behaviour for saving filters and room state.

This permits a client to know which messages it has seen/processed when (re)syncing, for example after a bot is restarted. This isn't possible with the InMemoryStore since the token would be lost when the process is restarted resulting in answering to messages we might have already handled before.

An otherwise stateless client can now use the home server as its persistent data store for the next_batch token, instead of having to use a file or database of its own.

@daenney
Copy link
Contributor Author

daenney commented Dec 17, 2020

I've been wondering whether it would be a good idea to change NewClient to use the AccountDataStore as a default. It does result in slightly awkward code, since the store needs to hold a reference to the client now, and we'd need to add an extra argument to NewClient for the event type.

func NewClient(... eventType string) *Client {
c := &Client{
		AccessToken:   accessToken,
		UserAgent:     "mautrix-go " + Version,
		HomeserverURL: hsURL,
		UserID:        userID,
		Client:        http.DefaultClient,
		Prefix:        URLPath{"_matrix", "client", "r0"},
		Syncer:        NewDefaultSyncer(),
	}, nil
store = NewAccountDataStore(eventType, c)
c.Store = store
return c, nil

I think the default behaviour you'd get with AccountDataStore is more inline with what people, especially newcomers, would expect. I.e the bot doesn't react to events "in the past".

@daenney
Copy link
Contributor Author

daenney commented Dec 17, 2020

In Save, we now blast over all account data under the eventType since this does a PUT, not a PATCH. This might be acceptable, but I could also see a situation where more than the next_batch token gets saved under that type, and it would be better to fetch, update the value we care about and save the resulting change?

Right now the code ignores the id.User that gets passed to Save/Load, but I'm not sure that's actually safe. I can never load someone else's account data, but I wonder if this needs to be changed to have a map of userID to token under the next_batch key instead.

@daenney
Copy link
Contributor Author

daenney commented Dec 17, 2020

I'm also wondering if this change might be a good reason to drop the OldEventIgnorer entirely?

@daenney
Copy link
Contributor Author

daenney commented Dec 21, 2020

@tulir Mind taking a look at this?

Based on my understanding, I've added a check to Save and Load to ensure the user ID we're manipulating data for is the same as the client that was created for it.

The reason for it is that I think this should only be used with bots, because if it's used with an appservice that would potentially result in us needing to store something like a map[id.UserID]string instead which could result in massive JSON blobs if an appservice is handling hundreds (of thousand) of users. I'm not a 100% sure my understanding is correct though.

@daenney
Copy link
Contributor Author

daenney commented Dec 21, 2020

I've tried the current implementation out with a bot of mine, seems to work quite well.

@tulir tulir self-requested a review December 21, 2020 17:56
@daenney
Copy link
Contributor Author

daenney commented Dec 21, 2020

I think I might be wrong about this working. I'm trying this against a dendrite, and the server logs go completely nuts. Might be a thing in dendrite, but it seems wrong somehow. I seem to end up in a cycle of

time="2020-12-21T18:00:55.466862660Z" level=info msg=Responding func="OnIncomingSyncRequest\n\t" file=" [github.com/matrix-org/dendrite/syncapi/sync/requestpool.go:218]" device_id=GDQBot limit=20 next=s194_0_0_0_53.dl-0-2 req.id=oOCZhZS5vQqW req.method=GET req.path=/_matrix/client/r0/sync since=s193_0_0_0_53.dl-0-2 timed_out=false timeout=30s user_id="@bot1:localhost.localdomain"
time="2020-12-21T18:00:55.575826470Z" level=info msg="Producing to topic 'DendriteOutputClientData'" func="SendData\n\t" file=" [github.com/matrix-org/dendrite/clientapi/producers/syncapi.go:51]" data_type=io.github.daenney.gdqbot.internal room_id="" user_id="@bot1:localhost.localdomain"
time="2020-12-21T18:00:55.670510877Z" level=info msg="received data from client API server" func="onMessage\n\t" file=" [github.com/matrix-org/dendrite/syncapi/consumers/clientapi.go:82]" room_id="" type=io.github.daenney.gdqbot.internal

time="2020-12-21T18:00:55.900328821Z" level=info msg=Responding func="OnIncomingSyncRequest\n\t" file=" [github.com/matrix-org/dendrite/syncapi/sync/requestpool.go:218]" device_id=GDQBot limit=20 next=s195_0_0_0_53.dl-0-2 req.id=zLYawbhNqcn0 req.method=GET req.path=/_matrix/client/r0/sync since=s194_0_0_0_53.dl-0-2 timed_out=false timeout=30s user_id="@bot1:localhost.localdomain"
time="2020-12-21T18:00:56.014856776Z" level=info msg="Producing to topic 'DendriteOutputClientData'" func="SendData\n\t" file=" [github.com/matrix-org/dendrite/clientapi/producers/syncapi.go:51]" data_type=io.github.daenney.gdqbot.internal room_id="" user_id="@bot1:localhost.localdomain"
time="2020-12-21T18:00:56.137178313Z" level=info msg="received data from client API server" func="onMessage\n\t" file=" [github.com/matrix-org/dendrite/syncapi/consumers/clientapi.go:82]" room_id="" type=io.github.daenney.gdqbot.internal

@daenney
Copy link
Contributor Author

daenney commented Dec 26, 2020

I think I might be wrong about this working. I'm trying this against a dendrite, and the server logs go completely nuts. Might be a thing in dendrite, but it seems wrong somehow. I seem to end up in a cycle of

time="2020-12-21T18:00:55.466862660Z" level=info msg=Responding func="OnIncomingSyncRequest\n\t" file=" [github.com/matrix-org/dendrite/syncapi/sync/requestpool.go:218]" device_id=GDQBot limit=20 next=s194_0_0_0_53.dl-0-2 req.id=oOCZhZS5vQqW req.method=GET req.path=/_matrix/client/r0/sync since=s193_0_0_0_53.dl-0-2 timed_out=false timeout=30s user_id="@bot1:localhost.localdomain"
time="2020-12-21T18:00:55.575826470Z" level=info msg="Producing to topic 'DendriteOutputClientData'" func="SendData\n\t" file=" [github.com/matrix-org/dendrite/clientapi/producers/syncapi.go:51]" data_type=io.github.daenney.gdqbot.internal room_id="" user_id="@bot1:localhost.localdomain"
time="2020-12-21T18:00:55.670510877Z" level=info msg="received data from client API server" func="onMessage\n\t" file=" [github.com/matrix-org/dendrite/syncapi/consumers/clientapi.go:82]" room_id="" type=io.github.daenney.gdqbot.internal

time="2020-12-21T18:00:55.900328821Z" level=info msg=Responding func="OnIncomingSyncRequest\n\t" file=" [github.com/matrix-org/dendrite/syncapi/sync/requestpool.go:218]" device_id=GDQBot limit=20 next=s195_0_0_0_53.dl-0-2 req.id=zLYawbhNqcn0 req.method=GET req.path=/_matrix/client/r0/sync since=s194_0_0_0_53.dl-0-2 timed_out=false timeout=30s user_id="@bot1:localhost.localdomain"
time="2020-12-21T18:00:56.014856776Z" level=info msg="Producing to topic 'DendriteOutputClientData'" func="SendData\n\t" file=" [github.com/matrix-org/dendrite/clientapi/producers/syncapi.go:51]" data_type=io.github.daenney.gdqbot.internal room_id="" user_id="@bot1:localhost.localdomain"
time="2020-12-21T18:00:56.137178313Z" level=info msg="received data from client API server" func="onMessage\n\t" file=" [github.com/matrix-org/dendrite/syncapi/consumers/clientapi.go:82]" room_id="" type=io.github.daenney.gdqbot.internal

It turns out this was caused by Dendrite, it doesn't appear to have account data filters working yet. Testing this against a Synpase works as expected.

The AccountDataStore leverages a user's account data to load/save the
next_batch token, while retaining the InMemoryStore behaviour for saving
filters and room state.

This permits a client to know which messages it has seen/processed when
(re)syncing, for example after a bot is restarted. This isn't possible
with the InMemoryStore since the token would be lost when the
process is restarted resulting in answering to messages we might have
already handled before.

An otherwise stateless client can now use the home server as its
persistent data store for the next_batch token, instead of having to
use a file or database of its own.
@tulir tulir merged commit 4714561 into mautrix:master Dec 26, 2020
@daenney daenney deleted the accountdata-store branch December 26, 2020 13:48
@z3ntu
Copy link
Contributor

z3ntu commented Mar 21, 2021

@daenney What are you supposed to do with the filter created by mautrix.Client.CreateFilter ? For my bot I just set register OnEventType on client.Syncer and then call client.Sync(), quite similar to the example code https://github.com/tulir/mautrix-go/blob/master/example/main.go . Do I have to pass in the filter somewhere?

@daenney
Copy link
Contributor Author

daenney commented Mar 22, 2021

The AcoountDataStore uses the account data to store the sync token. If you don't filter it out, you'll get into writing the sync token, which causes a new event to appear on /sync and you end up in a sync storm.

So you want to create a filter to filter out your custom event type, and you want to tell the AccountDataStore to use that event type to store the sync token.

Once you have the filter, all you need to do is register it:

client.Store.SaveFilterID(..., <filter ID from CreateFilter>)

And then pass that same event type into NewAccountDataStore.

@httpjamesm
Copy link

Is it possible to get an example with this NewAccountDataStore function?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants