-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add async serialize option via coroutines #4
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great idea, but it needs some more work before I can merge it.
LibSerialize.lua
Outdated
@@ -1265,10 +1279,13 @@ end | |||
|
|||
function LibSerialize:SerializeEx(opts, ...) | |||
self:_ClearReferences() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the library stores state on itself during serialization, you're going to have a bad time if someone (perhaps another addon) tries to serialize something while another serialization is happening asynchronously. You'll need to update all such state to be managed at the function call level. Also add a test case for this (make sure it fails in your current version).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably rename combinedOpts
and use it as the state for the current serialization, since it's already plumbed everywhere. Just need to make sure the loop that writes user-specified keys is restricted to valid options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not store any global states in the library.
The state can be a function local variable, because the state is cleared at the beginning and at the end of Serialization/Deserialization,
Change the prototype of self:_ClearReferences() to self:_ClearReferences(state), where "state" is initialized as a table.
Store "refsDirty", "stringRefs" and "tableRefs" in the variable "state" instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I probably shouldn't have done it that way to begin with, but it didn't make a difference until now :) unless people tried to dig into the library's internals, I suppose, but that was never a support scenario and I'm not concerned about breaking them.
LibDeflate got a similar PR. I suggest a DeserializeBegin/Continue/End API, instead of coroutine, so the serialization can be done by chunk. |
Yeah I was thinking about that, but as you said it's much more disruptive to the entire serialization process, and given that the current code uses recursion and iterators as implicit state while serializing, it'd be quite a large undertaking. We do have regression tests, but a change of that magnitude still worries me. Given that the coroutine is opt-in and quite undisruptive to the code, I think it's preferable - you're taking on the overhead of the coroutine, sure, but it's an opt-in feature to avoid mini-hangs so I think the tradeoff is reasonable. But in case we did want to refactor to remove it in the future, I'd still want the coroutine management to be moved into the library, as I suggested above. For reference, the LibDeflate PR: SafeteeWoW/LibDeflate#6 |
@oratory i haven't heard from you since you first opened the PR - are you planning on making these changes? |
Hey yes I'm hoping to make these changes today or tomorrow. |
Coroutines are now all managed within the lib, and cleaned up a bunch. One thing I'd still like to add is a check if an async process is currently running since it will break if a 2nd is started. |
This needs to be supported - this is a global library and may be used by any number of addons, which should not have to handle the case of "I want to serialize something but I'm told to come back later". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current iteration still has issues:
- Introduces non-WoW dependencies
- Doesn't support multiple simultaneous serialization/deserialization
- Insufficient test coverage
The API shape as implemented isn't possible to implement in pure Lua, so it'll need updating. I'd recommend adding a section to the readme to describe how to incorporate it into WoW's environment, though, using an OnUpdate frame script to drive the API.
LibSerialize.lua
Outdated
Callback arguments: | ||
* `...`: the deserialized value(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What callback is this referring to?
LibSerialize.lua
Outdated
|
||
Arguments: | ||
* `callback`: a function called when serialization is complete | ||
`callback` is called with the serialized string as its argument, or nil on failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this, you have the explicit section below.
The serialization API didn't have a concept of a non-error()ing failure mode before - can you propagate any failure from within the coroutine and "rethrow" it?
LibSerialize.lua
Outdated
Arguments: | ||
* `input`: a string previously returned from `LibSerialize:Serialize()` | ||
* `callback`: a function called when deserialization is complete | ||
`callback` is called with `success` (Boolean) as the first parameter, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above - remove it since you have the specific section below.
LibSerialize.lua
Outdated
|
||
Arguments: | ||
* `input`: a string previously returned from `LibSerialize:Serialize()` | ||
* `callback`: a function called when deserialization is complete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above - remove it since you have the specific section below. But this time you actually don't have the section, so you need to add it :)
LibSerialize.lua
Outdated
@@ -182,6 +230,14 @@ The following serialization options are supported: | |||
table encountered during serialization. The function must return true for | |||
the pair to be serialized. It may be called multiple times on a table for | |||
the same key/value pair. See notes on reeentrancy and table modification. | |||
* `asyncYieldSize`: `number` (default 50000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be documented and enforced as only being valid from SerializeAsync().
LibSerialize.lua
Outdated
} | ||
|
||
local canSerializeFnOptions = { | ||
errorOnUnserializableType = false | ||
} | ||
|
||
local CreateFrame = CreateFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The library is designed to be runnable on Lua 5.1.4+ with no modifications - you can't introduce WoW-specific dependencies. Unfortunately this will complicate the API, since you can't drive the async loop yourself, but it'll fix the other issue of not supporting multiple simultaneous callers.
For reference, this is what I do on Windows:
- Get Lua from https://sourceforge.net/projects/luabinaries/files/5.1.4/Tools%20Executables/lua5_1_4_Win64_bin.zip/download
- lua.exe tests.lua
All tests should pass.
LibSerialize.lua
Outdated
return FlushWriter() | ||
end | ||
|
||
function LibSerialize:SerializeAsyncEx(callback, opts, ...) | ||
if not CreateFrame then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this isn't sufficient of a deterrent, since the tests still need to run outside the WoW environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an idea here that if the lib is run outside of WoW (or without a callback) then the async function will instead return some simple handlers to operate the coroutine. I'll work on that and update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That won't be sufficient - the async version needs to be able to be tested outside WoW.
LibSerialize.lua
Outdated
self:_ClearReferences() | ||
local ReadBytes, ReaderBytesLeft = CreateReader(input) | ||
|
||
self._readBytes = ReadBytes | ||
opts = opts or {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and elsewhere - inconsistent indentation (you're using 2 instead of 4?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not modify function arguments. Make a copy
tests.lua
Outdated
@@ -66,6 +66,21 @@ function LibSerialize:RunTests() | |||
assert(tab.nested.c == nil) | |||
end | |||
|
|||
do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need more test cases for the other scenarios we've brought up, e.g. actually triggering a yield, multiple simultaneous async serializations/deserializations (including mixing those two categories), propagating an error from a failed serialization/deserialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to remove all global states that may share between API calls.
Test for multiple simultaneous async serializations/deserializations is still needed.
LibSerialize.lua
Outdated
@@ -1265,10 +1279,13 @@ end | |||
|
|||
function LibSerialize:SerializeEx(opts, ...) | |||
self:_ClearReferences() | |||
local WriteString, FlushWriter = CreateWriter() | |||
local WriteString, FlushWriter, DoYield = CreateWriter() | |||
|
|||
self._writeString = WriteString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not modify "self" in API. This introduces global states
LibSerialize.lua
Outdated
|
||
self._writeString = WriteString | ||
self:_WriteByte(SERIALIZATION_VERSION) | ||
if opts.async then | ||
self._doYield = DoYield |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not modify "self" in API. This introduces global states
Currently, this library uses lots of global states, so coroutine will have lots of problems. Do not modify "self" in API. Store the states in a local table variable, and pass it through function |
LibSerialize.lua
Outdated
@@ -1346,6 +1464,26 @@ function LibSerialize:DeserializeValue(input) | |||
return unpack(output, 1, outputSize) | |||
end | |||
|
|||
function LibSerialize:DeserializeValueAsync(input, callback, opts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think a WoW specific API is needed here.
Better to put it in a example code
LibSerialize.lua
Outdated
function LibSerialize:DeserializeAsync(input, callback, opts) | ||
opts = opts or {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not modify user arguments. Make a copy
@oratory as mentioned in several comments, adding this functionality is difficult due to the library storing state effectively globally. Would you like me to make a separate PR fixing that, so you don't have to worry about it in this one? You'd just have to rebase onto it. |
If you want to make that change anyway, then sure. Then we won't need to worry about multiple serializing processes running at once. |
Well this PR won't be merged until that's supported, so the work has to be done by one of us :). It's a hard requirement that calling into the library to serialize/deserialize doesn't fail because some random other addon happens to be doing the same thing. Especially since it impacts synchronous users of the API as well. |
My original plan was to cheat and just add a isWorking boolean to the state and make new requests wait, but this method seems better. :) |
Your way wouldn't work either, since as I said even synchronous API requests are impacted if there's an ongoing async request. There's no way to block a synchronous request unless you decide to immediately/synchronously finish the work for the async one. I'll take a look at updating state tracking over the weekend. |
I wonder if the yield time should be expressed in microseconds, rather than number of items? |
I was thinking that - then on the WoW side it could be set by the current FPS. |
I don't think the library itself would do that, but rather you'd include an end-to-end example in the readme that covers the WoW-specific integration (using a frame's OnUpdate handler, calculating the yield time based on current FPS, etc. |
@oratory any update? |
Ya I've got some updates in the works, just a bit swamped lately. Will update this week. |
I changed things around a bit. Instead of being a typical async function, I'm calling it Chunks Mode which returns a handler which gets called on repeat until processing is finished. Some rough benchmarks Seralize & CompressSmall table: 115 Bytes
Medium Table: 163584 Bytes
Large Table: 1750073 Bytes
X-Large Table: 3511073 Bytes
Decompress and DeserializeSmall table: 115 Bytes
Medium Table: 163584 Bytes
Large Table: 1750073 Bytes
X-Large Table: 3511073 Bytes
Chunks Mode does have a small increase in processing time but has the benefit (in WoW and likely other environments) of not locking up the game and making the OS think it's non-responsive when processing large data. |
Please rebase on top of latest master commit instead of merging, so that the PR doesn't have my recent changes in it. |
Alrighty I think that did it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still a lot to be done before I can approve this. If you'd like me to work on this feature instead of iterating on this PR, let me know - I could probably spend some time on it in the next week or so.
LibSerialize.lua
Outdated
@@ -479,7 +558,7 @@ local function CreateWriter() | |||
bufferSize = 0 | |||
return flushed | |||
end | |||
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this extra space (and reconfigure your editor to trim trailing spaces on save imo!)
LibSerialize.lua
Outdated
@@ -633,6 +712,7 @@ end | |||
local LibSerializeInt = {} | |||
|
|||
local function CreateSerializer(opts) | |||
opts = opts or {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary - remove.
LibSerialize.lua
Outdated
@@ -700,6 +790,15 @@ end | |||
function LibSerializeInt:_ReadObject() | |||
local value = self:_ReadByte() | |||
|
|||
self.currentSize = (self.currentSize or 0) + value/8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to _currentSize
, initialize in CreateDeserializer()
.
LibSerialize.lua
Outdated
@@ -700,6 +790,15 @@ end | |||
function LibSerializeInt:_ReadObject() | |||
local value = self:_ReadByte() | |||
|
|||
self.currentSize = (self.currentSize or 0) + value/8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value
has no relationship with the size.
LibSerialize.lua
Outdated
if self._opts.chunksMode and ( | ||
(self.currentTime and self._opts.timeFn() - self.currentTime > self._opts.yieldOnChunkTime) | ||
or self.currentSize > self._opts.yieldOnChunkSize) then | ||
if self._opts.timeFn then self.currentTime = self._opts.timeFn() end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timeFn
isn't included in the default options table (even as a nil entry, just for code documentation)
LibSerialize.lua
Outdated
@@ -182,6 +209,24 @@ The following serialization options are supported: | |||
table encountered during serialization. The function must return true for | |||
the pair to be serialized. It may be called multiple times on a table for | |||
the same key/value pair. See notes on reeentrancy and table modification. | |||
* `chunksMode`: `boolean` (default false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to async
LibSerialize.lua
Outdated
* `chunksMode`: `boolean` (default false) | ||
* `true`: the serialize function will return a coroutine handler to process | ||
the serialization. If true, the following additional options are considered: | ||
* `yieldOnChunkSize`: `number` How large to allow the buffer before yielding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to yieldOnObjectCount
LibSerialize.lua
Outdated
* `true`: the serialize function will return a coroutine handler to process | ||
the serialization. If true, the following additional options are considered: | ||
* `yieldOnChunkSize`: `number` How large to allow the buffer before yielding | ||
* `yieldOnChunkTime`: `number` Max duration between yields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to yieldOnElapsedTime
LibSerialize.lua
Outdated
* `yieldOnChunkTime`: `number` Max duration between yields | ||
* `timeFn`: `function` To return the time in `number` format relevant to the | ||
environment. | ||
* `false`: the serialize function will return a the serialized string directly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo
@@ -145,6 +145,30 @@ function LibSerialize:RunTests() | |||
end | |||
|
|||
|
|||
--[[--------------------------------------------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing is still insufficient.
Tests 1 (nil) and 35 (0/0) are currently failing on async, not sure why yet. Otherwise the other requested updates are completed. |
@oratory any update? |
Ya I've been working on it here and there, should have something to post tomorrow. |
LibSerialize.lua
Outdated
* `timeFn`: `function` To return the time in `number` format relevant to the | ||
environment. | ||
* `false`: the serialize function will return the serialized string directly. | ||
When using `SerializeAsyncEx()`, these additional options are supported: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per my comments in the LibDeflate PR, I think these can be collapsed into a single option, where you supply either:
yieldCheckFn: function(objectCount: number) => boolean
- a yield will occur if this function returns true, or
yieldOnObjectCount: number
- a yield will occur after this number of objects
And then if neither are provided, yield after objects. With this approach, the library code just calls yieldCheckFn
and does bookkeeping on the count of objects since yield, and the caller (or default args) handles the specifics. A time-based yield is then up to the caller to implement, which is easy (they just need to reset their own "elapsed" when returning true).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The yieldCheckFn
works well, makes yieldOnObjectCount
redundant even since it's now all self contained.
Can you please retarget your PR to be into the new |
@rossnichols hey we are revisiting this and were wondering if you can take a look at this again? |
@Stanzilla sorry for the delay here. I've stopped playing WoW and would be happy to let others from the community help make changes here. Would you like to be added? |
Hey there, I wouldn't say "want" but I think it might be a good idea. We were considering a fork for Wago related content but I really don't want more on my/our plate so rather avoid that. So far Async is all we need so If we can get this PR into a good shape and merge (Can do that If you don't have time) that would be great. |
Sounds good - if anyone from the community wants to be added to the repo I'm happy to do so. For this PR specifically, IIRC I was mostly wanting to ensure adequate test coverage and validating that existing tests are unaffected. I'd have to spend some time going back through it to refresh my memory on the actual code changes. It'd be great if you can take a pass and try to make it ready, I'll spend some time looking at it as well. |
Alright well since this is targeting my async-staging branch, I'll go ahead and merge this and then take a look at merging into master. Thanks! |
* Add chunks mode for coroutine processing * Requested updates * Split sync/async functions, others * Further updates
* Add chunks mode for coroutine processing * Requested updates * Split sync/async functions, others * Further updates
* Add async serialize option via coroutines (#4) * Add chunks mode for coroutine processing * Requested updates * Split sync/async functions, others * Further updates * Implement support for custom readers and writers (#12) * Implement generic support for custom readers This permits a new options table to be supplied to the Deserialize and DeserializeValue functions on the library which may contain two keys that control how the library reads bytes from the stream for deserialization, and how it detects the end-of-stream state. As-is this implementation is fully backwards compatible and allows users of the library to throw processing into a coroutine for large inputs if required, meeting some use cases that have prompted similar PRs whilst also being more flexible and much simpler. The new signatures for Deserialize and DeserializeValue are as follows: function LibSerialize:DeserializeValue(input[, options]) function LibSerialize:Deserialize(input[, options]) The options parameter, if supplied, is expected to be a table that may contain the following optional keys: - readBytes - atEnd The "readBytes" key if present is expected to point to a function of the signature "readBytes(input, i, j)" and must return a string whose length is equal to (j - i) + 1). The "input" parameter is the same as-supplied to Deserialize/DeserializeValue verbatim. The "i" and "j" parameter represent an inclusive range of indices ([i, j]) typically as byte offsets within the input value. This convention is identical to that provided by many Lua functions, such as string.sub, table.concat, and unpack. The "atEnd" key if present is expected to point to a function of the signature "atEnd(input, i)". As with "readBytes", the "input" parameter is whatever value was supplied to Deserialize/DeserializeValue, and the "i" parameter represents the offset of the stream. By default if neither of these keys are supplied, or no options table is given, then the implementation will just default to using string.sub for "readBytes" and for "atEnd" will test if the offset exceeds the length of the input as-provided by the length operator. Examples are provided in the test cases, where we test the following scenarios: - A case where "input" may have additional data after the payload that we don't want deserializing. - A case where the "input" is a table of character bytes. - A case where we yield from within "readBytes" and feed data from the main thread, simulating receiving data with a potentially unknown bound. * Implement support for custom writers This mirrors recent changes submitted for the deserialization API and provides a generic system for custom writer support to control how and where the byte stream generated during serialization is written. The change as-is is fully backwards compatible and passes the existing and expanded test cases. Primary use cases would align with existing async serialization attempts but don't put the complexity of thread management or throttling logic into the library, instead leaving the minutia of how those aspects are handled to callers. It also enables some other advanced use cases such as streaming the data to a destination without buffering it fully beforehand, which could enable efficiencies elsewhere. * Convert generic reader/writer to use objects instead of closures * Hoist async into reader/writer layer instead of serialize/deserialize * Rename function to GenericReader_AtEnd * Expand tests, add example for reader/writer protocol * Update readme * Fix formatting * Globals cleanup --------- Co-authored-by: Mark Bosley <mark.bosley@gmail.com> Co-authored-by: Daniel Yates <me@meorawr.io>
This adds an async=true config option to coroutine.yield() in the serialize function. This removes or reduces framerate loss while processing large tables.