-
-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reinventing the wheel with custom serialization? #29
Comments
@straight-shoota yep, and I agree. The serialization system is both redundant and complicated. You probably shouldn't be using the mosquito custom serializers. Ideally you shouldn't have to. Most importantly, mosquito should not be used as a canonical data store. You shouldn't be storing full objects as parameters in your jobs -- whatever your real data store is, send the keys to your objects into a job instead of trying to serialize the entire object. The first task in your job should be to read any objects from the data store. Doing it this way accomplishes three things: it keeps the mosquito data footprint low, prevents stale objects (and thus decreases the probability of a distributed race condition), and maintains the single responsibility principle (let your ORM do the ORM-ing, and mosquito can handle the job scheduling). The current, unrefined method is left as such to be a kind of syntactic salt. The feature is there, if you really want it, but you probably shouldn't be using it. It's not a hard rule, it's just the choice I made. There is certainly some iteration to be done even within the paradigm. Secondly, as you mention, mosquito architecturally gets in the way here. The metadata and job parameters are currently stored in the same redis hash. If it's used with a simple set of parameters as recommended above, this isn't a big problem, but even the recently implemented throttling code exposes some of the ways this is not going to be a scalable paradigm. Storing json in redis instead of using redis built in hashes feels like reinventing an altogether different wheel, but I digress. An early iteration of mosquito just serialized and stored all ivars, and it was fine but it wasn't flexible. There are plenty of uses for ivars which are explicitly not serialized and blacklisting is less intuitive than whitelisting. Would you mind posting or linking to some job code? I'd love to see some of the ways it's being used and document specific situations where the current framework is getting in the way. |
Yes, of course, jobs arguments shouldn't store any state. But job configuration can include custom datatypes (such as enums or value objects) which use only simple values but there are no default serialization handlers. Setting up JSON serialization for such types is trivial and reusable.
That's actually a very common pattern and considered good practice because JSON is much more flexible than redis hashes. Similar applications use this method. Sidekiq for example stores jobs as JSON. Redis hashes are better when you need to access single hash keys, but that's not an issue for a job que because it always reads and writes the entire job config. In fact it might even be faster to use JSON serialization. As an example, a serialized mosquito task in JSON could look like this: {
"enqueue_time": "...",
"type": "...",
"retry_count": "...",
"job_data: {
// serialized Job instance
},
} You wouldn't have to worry about how jobs are serialized. The Job type just needs to be able to serialize to JSON, which is usually trivial to accomplish with stdlib's tools. Some examples are the jobs for shardsdb. They have a slim layer of indirection, because I was experimenting with different job processors and didn't want to adapt the individual jobs for each API and put taskmaster in between. But for the job definition this is completely irrelevant. |
I think storing the job metadata as a json hash is a nonstarter... functions like Regardless, for the purposes of any user of the toolchain, the job metadata storage mechanism is irrelevant anyway. Dividing the two into separate places in redis is a prerequisite for that. I agree that the task parameters should be more flexible and that code for serializing is outside the bounds of mosquito. I believe I wandered a bit far during the implementation. Honestly, my immediate temptation is to just punt on serialization altogether and only accept string parameters but some experimentation with the newer json stuff is worth a shot. |
Sure, job config and task metadata are independent of each other. Task metadata can be stored as a redis hash and one such metadata value could simply be a JSON string containing the job config. As a side note, I don't really follow your argument for using redis hashes. Maybe it's because I don't know your future plans. But sidekiq uses JSON serialization and that seems to work pretty well with thousands of job executions per second. But whatever, metadata storage is an implementation detail (and easy to change in either direction, should the need arise). I care mostly about making job serialization easy for the user. |
In order to store job data in the queue, it needs to be serialized and deserialized for execution.
I'm wondering why this shard pulls up a completely proprietary serialization mechanism while there are widely supported general-purpose solutions to this problem. This makes it unnecessarily difficult for users because you need to implement custom serialization methods for anything but the most basic data types.
In my opinion, using a standard format like JSON for this would provide great benefits. There is practically no downside I can think of besides the cost for switching (which is not very high).
JSON is a universal data format and with Crystal's JSON implementation, it is really easy to provide bindings for custom datatypes, which are not limited to use with mosquito. It is string based, so instead of storing a hash in redis, mosquito would store a string. I don't think that makes much of a difference. The current implementation has some issues, because job data and task metadata are stored in the same hash, which will cause problems if a task uses eg.
enqueue_time
as parameter.A huge benefit besides usability is also the reduction of complexity. Serializing and deserializing JSON is all provided in Crystal's stdlib. The entire
params
macro could be reduced or potentially even completely removed in favour of using plain properties/ivars.The text was updated successfully, but these errors were encountered: