New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Versioned Documents #224
Comments
Any thoughts on this @nicolaiarocci? I'm considering patching Eve to add this, but I don't want to do it if you aren't interested in the feature. |
Hello, I just gave a quick read to the above.
This would also allow for the use of logical operators such as I am guessing that you don't want clients to send the version field themselves and you probably want the version to be determined and updated by the server itself. This could be done with an |
Thanks for the response! Your suggestion is intriguing because it works out of the box without changes to the Eve code base. I do, however, see some downsides to this implementation. For example, a GET on the collection endpoint would by default return all versions of all items. Also, the notion of the objectid field as a unique field becomes a lot weaker since it would be different for every version. I have started modifying the Eve core to natively support the idea of versions. I think you'll like my approach! Versioning is supported on a per-field basis and is assumed to be not versioned unless the field has The goal for this approach is to have versions be transparent to the user unless they ask for it. The latest version of the document is stored as normal, so validation, filtering, sorting, and projections work as normal. In additional, all standard HTTP verbs work as expected - the client does not need any additional logic to work with versions. Also, the entire resource is treated unversioned (the normal Eve way) unless When a client is ready to care about versions, they can access the child endpoints of I think this would be a pretty damn cool feature for Eve to support natively. It is very difficult to find and API framework that supports this out-of-the-box at the moment, and including it would put Eve a step ahead of the rest. Do you think this is something you would be interested in supporting with me? |
You could set a predefined resource filter to only retrieve a specific version by default (the most recent?). This being said, your approach seems to be very consistent with Eve design and general behaviour. However, on top of my mind I have a few questions and concerns:
If you go the "I'm not returning meta fields with versioned responses" then you would end up with inconsistent API responses (for one, clients would have to support different payloads). Of course you could choose to store the metafields with the versioned documents but again, the ETAG would still be different. In general, I would say that versioning at the document level would avoid a lot of issues as the whole document snapshot would be stored every time (meta fields included). While API versioning is a common concern in REST API development, I don't remember seeing RESTful document versioning design and implementation discussion before, so I feel like we are kind of exploring a new territory here (I might be wrong of course). I'm wondering if document versioning with URLs is the correct approach. Would a header or a query parameter approach be better perhaps? I'm thinking something like |
PS: of course you could correctly handle the diffs (and this probably your approach), reconstructing the whole document when versioning. You would still need to store LAST_UPDATED and DATE_CREATED. |
I'm very glad to hear that you are open to this idea! I am strongly trying to avoid having to create a new project for this purpose, especially since Eve already has all of the other features that I need. I think you're correct in that document versioning is uncommon in many of the existing REST API frameworks available. It's too bad to, since nearly every wiki and blog software implements some sort of document version control. You'd think more API frameworks would have this as a feature. CouchDB is the closet thing I've seen to document revision control within the API. They do, in fact, maintain multiple versions of documents, but they do not guarantee that older version are kept around. It is primarily there to facilitate their replication and conflict model. (See CouchDB Revisions for more information.) Alright, so let me see if I can tackle all of your comments. First, I definitely am supporting returning meta data when returning versioned documents so far. I agree the API should be consistent between accessing the head version of a document and accessing a specific revision. Currently, I generate the return document by grabbing the head version of the document, dropping all fields that are versioned, and then adding the fields for the specific version requested. This method allows for the correct document to be recreated even when non-required fields get deleted. I currently have The ETag field is only necessary when we want to modify a document. If we want to prevent modification to a document after receiving a specific revision, we can compute the ETag with the response document. The ETag here will not match the ETag of the head version, so the user will be unable to modify the document. I like this approach, but we could also always return the head ETag and also require the client to include the I like your idea of using a query parameter to access a specific version. It seems cleaner than having child endpoints. How would you envision returning a list of document versions? I'd also like to enable a query parameter to return all of the versions (basically don't remove Sorry for the long response. Thanks for your support!! |
I'll give some more thought on whether or not to version control the entire document as well. Although even if I version control the entire document, I don't think that will include the I'd also like to add support for |
I've updated my code such that version controlling every field is much easier. If This change makes it easier to support the |
From a client perspective you are implementing document-versioning by default, which is absolutely the way to go. I would store the diffs in a different collection for better performance and reduced memory usage, which brings us to another concern of mine: will the the base io data layer need new methods/interfaces in order to support this feature? Remember we're now dealing with mongo but in the future, hopefully, someone will want to add other data-layers.
The first approach is the clear winner here.
We could support an
While this would be convenient, I'm not 100% positive on this feature as it would change the payload format (a list instead of a dictionary). We could just ignore this feature at the time being. In the rare circumstance where the client needs multiple versions, it could just send multiple requests.
I don't think this is a good idea. We should keep the interface simple, clean and coherent with the resource schema. You ask for a document, you get the document back (whatever version of it you requested). Also, mainly for performance reasons but also to keep it simple, I would only support versioning at the document endpoint, at least at the first iteration. |
I think I can add this functionality without changing the base io data layer. Most of the heavy-lifting should take place in the method definition files (get.py, post.py, etc.) Just to be clear, I'm not storing "diffs" in the same way the many version control systems do. If only one field changes, I'm still copying the entire document. You're right in that this is really only scalable when using a separate collection for changes. Storing changes within the document means we'll hit the 16MB document limit even when the document isn't 16MB and means the application has to hold all of the version in memory during the request even if they aren't needed (as you mentioned.) So yeah, I'll switch back to the separate collection approach! I like the My suggestions of |
Eh, I forgot about the original reasons I went with embedded version storage:
I think the benefits for having a separate collection still outweighs these downsides. Again, just wanted to document that I've thought about it. Question for you - can the values of |
Eh. So, I am still liking the separate collections, but I have hit a roadblock that I'd like your input in overcoming. I have the following line in
This doesn't work, because I could also just transparently add EDIT: I've gotten around this problem by copying |
Another question for ya. I can't find where the mongo objectid gets created. Or, if it is automatically set by PyMongo's insert, how do you set control if the I suppose I won't be able to make Mongo allow non-unique id's in the shadow collection. Maybe the thing to do is to create a second id field like EDIT: I avoided the problem by adding |
I currently have POST and GET (collection endpoint) working after switching back to the shadow collection. Check out my progress against the current develop branch. |
I like what you've been cooking so far. A few notes:
Some more general thoughts
|
I suppose I can. Do you specifically want that (sort of optimization) in 0.3 or do you just want it in the develop branch sooner than later? I'd like to have this versioning stuff mostly figured out in no more than a week's time. If 0.3 ships soon, we can dump all of this into develop sooner than later. I'm asking because I have another tiny optimization or two like this in mind.
This link is the same as the comment above. Not sure what you're trying to say here. Agreed on the next four comments. I'll definitely refactor a lot of the new bits into descrete functions as I fill out the other HTTP verbs. 😄
You got it! I'm not sure I like the idea of a single history collection though... Seeing collections in pairs of myresource and myresource_versions doesn't seem like too much pollution to me.
I'm definitely not a Mongo expert, but I'll look into it, especially as it pertains to this feature. |
Sorry, I meant this
0.3 is going out when i'm back from FOSDEM (early february). Don't want versioning in 0.3, more likely 0.4 so there's no rush. If you have general optimizations/refactorings (not strictly related to versioning), those would be good to go with 0.3 but again there's no rush, we can wait next release.
Yeah, most likely. I guess it depends on how many collections the maintainer going to be versioning ;) Let's keep it as it is for the time being. |
Notes
|
End of day summary:
TODO:
|
Woo. 19 test failures. Starting to work through these one by one. I changed the EDIT: I'm simply always counting versioning fields as offenders even when versioning isn't turned on for a particular resource. |
Hey @nicolaiarocci, I've noticed a lot of redundant code and inconsistent function parameters and variable names throughout the method functions. What would you think about having the only parameter to the method functions be the A change like this would probably take place outside of the this issue, of course. |
Passing all existing tests. Added documentation. Versioning support at all HTTP methods. See diff. Still need to add new unit tests and support for a versioned reference. I noticed today that the io/mongo find function includes an addition function parameter that is not defined in base.py. Doesn't this defeat the whole point of the base class if you have to have driver specific parameters? It doesn't even have a default value... |
Yeah there's a lot of room for improvement in the refactoring/cleaning department. I'm not entirely sure what you mean but feel free to submit a separate PR with your proposal :) |
Nope, I just forgot to update the base class. Will do later today. |
Loving it DRY patches included. There's some minor merging issues (I pushed a few new updates in the meanwhile) but overall I think it's Very Good. I really appreciate you providing the documentation too. Will eventually do a complete code review next week when I'm back from Bruxelles (FOSDEM) and 0.3 is released. By the way, I'll mention document versioning in the "what we are cooking" section of the FOSDEM talk ;-) |
I'll update this branch whenever we are about ready to merge this into develop (or maybe just once you release 0.3.)
I look forward to your code review! I am least satisfied with the implementation of
Awesome! I'll be curious to hear if people are interested in this feature or not. |
Looks like v0.3 is almost released!! The last piece of this puzzle that I haven't implemented yet is a version-aware data_relation. I could use some help choosing the implementation! I have a few different options here showing how the data would actually live in the document. Option 4 seems to be the only reasonable option to me. # Non-versioned data_relation:
my_field_name: ID
# Option 1
my_field_name: ID
my_field_name_version: VER
# Option 2
my_field_name: [ID, VER]
# Option 3
my_field_name: {'value': ID, '_version': VER}
# Option 4
my_field_name: {'_foreign_field_name': ID, '_version': VER} Regardless of how the foreign reference lives in the document, I think the necessary addition to the schema definition is clear: 'protocol_id': {
'type': 'objectid',
'data_relation': {
'resource': 'protocols',
'field': ID_FIELD,
'versioned': True, # add this to make a versioned data_relation!
'embeddable': True
}
} The only thing that is weird about this that when you turn And just to note, one could also have a non-versioned data_relation to a versioned resource. We don't have to do any changes to get that to work. Users will be able to mix and match versioned and non-versioned data_relations at will. What are your thoughts, @nicolaiarocci? |
It was released exactly a week ago ;-)
Dumb question: wouldn't 'protocol_id': {
'type': 'objectid',
'versioned': True
'data_relation': {
'resource': 'protocols',
'field': ID_FIELD,
'embeddable': True
}
} take care of this field versioning already? A gave quick look at the code and the first thought was, it would be nice if we had a |
That does not accomplish what I'm looking for. Putting |
Of course it does! One thing to keep in mind is that MongoDB does not hash/compress/optimize document field names (or data) in any way so your option #4 might become expensive in term of both performance and size, and #2 probably the more compact. Using short fields can go a long way ( This being said, I probably prefer #4 as well.
You are probably going to need to transform it back to its original form when sending it back to the client. An alternative would be to use a separate collection to store this kind of meta data. Something like {
'versioned_document_id': '454..',
'protocol_id': {'foreign_field_name': '_id', 'ver': 2}
} or, to only store one object in this meta collection even when multiple data relations within the same document are versioned (here I'm using option #2): {
'versioned_document_id': '454..',
'versioned': [{'protocol_id': [ID, VER]}, {'invoice_id': [ID, VER]} ..]
} These meta collections might come in handy in the future if other versioning-related stuff will need to be persisted somewhere. Just venting out my thoughts as they cross my mind here :) |
Alright! I'm back on the job here. I'm starting with the latest 0.4 develop branch and re-integrating the versioning changes. I'll be moving much of the new functionality into a versioning.py file as requested. :) I don't think I want to use a separate collection to store the versioning of the pointer, nor to I want to transform the pointer to a different format before sending it to the client. On the second topic, if the version of the reference I'm pointing to is important enough to store into the database into the first place, I don't want to strip that before sending it to the client. More coming soon! |
@nicolaiarocci - why are some settings duplicated in |
Also, how can I debug failing test cases? In particular, some tests are getting 500 errors, but since I can't actually see what the error was, I don't know what to fix! I was able to figure out some solution to this problem when I wrote the initial chunk of the versioning code, but I don't remember what I did. :| |
Some settings are used on bootstrap, before default_settings are loaded. There's a helper method in utils (I think it's
For 500s, I'm afraid good ole PDB (or even the classical "print" way) is the way to go. |
Hey @nicolaiarocci, check out my progress! I'm basically back to where I was with 0.3. I cleaned up some of the I'd like to push this into the develop branch as soon as possible to avoid having to do a bunch of back-merges. Please review my code at your earliest convenience! |
Excellent. Will hold on non-trivial updates/merges until this is merged. Promise! |
Woot!! Thanks. :) I should be 90% done tomorrow evening. |
Hey @nicolaiarocci, I've DRY'd up a bit more of your code and added in considerations for #36 at the same time. A new function called EDIT: I should note that this modifications allows |
Hmm, I'm thinking through this a little bit more... In order to pull that off, I have to move all of the So I guess the question is do you want to support return any field in a post (regardless of return all fields or only some) versus the possible performance hit during a |
I'm not opposed to doing this in a separate issue ticket. Maybe we can revive #36 after we finish this up. (I just have to do more changes across multiple files until this functionality is consolidated.) |
Versioning is quite a change already. Let's do one thing at a time. |
Agreed. Sounds good. :) |
I'm having a much worse time debugging my failing test cases since I merged upstream. Per my question before about |
Oh! Yeah. Just need to set |
👍 keep in mind I merged a couple PRs; last one was a few minutes ago and it deals with the test suite, so you might want to pull. |
Merged to the HEAD of develop. Passing all existing tests. Added 25% of the versioning specific tests. Have outlines for the remaining versioning specific tests. I also still need to implement the data_relation with version, but I'll implement that after I complete the tests for the code that is already implements. I'll probably have more unit tests done by the time you read this. Any comments on the latest diff? |
Absolutely love what you are doing here. The DRY fixes are great too. Keep up the good work. 👍 |
Hmm... I really should run the entirety of the existing Eve tests with versioning turned on. In other words, I'm not writing a special test for every single combination of settings + the version control setting on. I'm basically only testing the intricasies of the document versioning functionality against Anyway, I think this is fine since I'm obviously trying to test the most interesting and more important cases. I do know of at least two interesting cases that have different responses that I'm not testing. They are:
Thoughts? |
Just noticed that if you don't include something like It's an easy fix for a client after all...they just include those fields in the projection query parameter! |
This seems related to #282. Haven't checked it yet, hope to do it real soon (and apply a fix).
I usually follow the code and try to cover most of it with tests. I'd say your should test all possible version requests with their responses. And yes, possibly the special/odd configuration settings which can have an impact on versioning. |
Hmm. Since I have it all open, I'll go ahead and fix #282 for ya (since I can make life easier for version+projection queries.) I'm also going to add the feature to through a 400 (instead of a 500) when you try to do 0 and 1 projections together. Besides that, I have all of the features and all of the test cases done expect for data_releations with references!! Almost done!!! I also haven't tackled the unlikely combo scenarios I just mentioned. I'll maybe do those... 😒 |
If it's not too much of a pain, please send a separate PR for the #282 fix (and thanks!). |
Ok. Will do - probably tomorrow though. My goal is to finally do a pull request for this guy tomorrow!! |
Hey @nicolaiarocci, I'm trying to find a way to leverage Eve to create versioned documents. This feature might end up being a standalone extension, but I thought I'd start an "issue" here so that there was a recording of the conversion.
I'd like to extend the common pattern of
/resource/
and/resource/_id_/
endpoints with/resource/_id_/versions/
and/resource/_id_/versions/_version_/
. I'd like to implement versioning by storing shadow copies of a model as opposed to storing the changes between versions. In my intended application, I'll actually be referencing / retrieving a specific version of a resource resource often than I would retrieve the latest version of a resource. Only GET methods would be allowed on the two additional version endpoints, and a PATCH/UPDATE to/resource/_id_/
would also insert a new shadow copy.I originally liked the idea of storing shadow copies within the object itself, but storing the copies as separate documents in a separate collection probably lends itself better to the MongoDB document size limit Your support for sub-modules lends itself well to storing the shadow copies in a second collection. I think I would then make use of the hooks to automatically write to the shadow collection on update. Could you give me an example of how to write to a different resource from inside of a hook other then the resource in request? I don't see an example of this in the event hooks documentation.
Once the shadow copies are actually stored in the database, the next step is to support a versioned document reference. I guess I'd have to create a custom data type for this and then extend the validation rules.
I'd like to use this versioned resource pattern for many (if not all) of the models in my database. I'd likely create a reusable function to generate the required configuration bits for me. However, it would be pretty darn sweet if all i had to do was flip a "version_controlled" flag in the resource configuration... :)
What do you think? Any suggestions on how to implement something like this?
Thanks!
The text was updated successfully, but these errors were encountered: