Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Versioned Documents #224

Closed
joshvillbrandt opened this issue Jan 15, 2014 · 53 comments
Closed

Versioned Documents #224

joshvillbrandt opened this issue Jan 15, 2014 · 53 comments

Comments

@joshvillbrandt
Copy link
Contributor

Hey @nicolaiarocci, I'm trying to find a way to leverage Eve to create versioned documents. This feature might end up being a standalone extension, but I thought I'd start an "issue" here so that there was a recording of the conversion.

I'd like to extend the common pattern of /resource/ and /resource/_id_/ endpoints with /resource/_id_/versions/ and /resource/_id_/versions/_version_/. I'd like to implement versioning by storing shadow copies of a model as opposed to storing the changes between versions. In my intended application, I'll actually be referencing / retrieving a specific version of a resource resource often than I would retrieve the latest version of a resource. Only GET methods would be allowed on the two additional version endpoints, and a PATCH/UPDATE to /resource/_id_/ would also insert a new shadow copy.

I originally liked the idea of storing shadow copies within the object itself, but storing the copies as separate documents in a separate collection probably lends itself better to the MongoDB document size limit Your support for sub-modules lends itself well to storing the shadow copies in a second collection. I think I would then make use of the hooks to automatically write to the shadow collection on update. Could you give me an example of how to write to a different resource from inside of a hook other then the resource in request? I don't see an example of this in the event hooks documentation.

Once the shadow copies are actually stored in the database, the next step is to support a versioned document reference. I guess I'd have to create a custom data type for this and then extend the validation rules.

I'd like to use this versioned resource pattern for many (if not all) of the models in my database. I'd likely create a reusable function to generate the required configuration bits for me. However, it would be pretty darn sweet if all i had to do was flip a "version_controlled" flag in the resource configuration... :)

What do you think? Any suggestions on how to implement something like this?

Thanks!

@joshvillbrandt
Copy link
Contributor Author

Any thoughts on this @nicolaiarocci? I'm considering patching Eve to add this, but I don't want to do it if you aren't interested in the feature.

@nicolaiarocci
Copy link
Member

Hello, I just gave a quick read to the above.

  • You could store different document versions in the same collection.
  • You could then simply query /resource?where={"identifier": <value>, "version": 2}

This would also allow for the use of logical operators such as $or when retrieving documents (all documents of same version, or retrieve different versions of the same document in one go, etc.)

I am guessing that you don't want clients to send the version field themselves and you probably want the version to be determined and updated by the server itself. This could be done with an on_insert hook which is using the app.data object to query the database (see the BasicAuth snippet for an usage example).

@joshvillbrandt
Copy link
Contributor Author

Thanks for the response!

Your suggestion is intriguing because it works out of the box without changes to the Eve code base. I do, however, see some downsides to this implementation. For example, a GET on the collection endpoint would by default return all versions of all items. Also, the notion of the objectid field as a unique field becomes a lot weaker since it would be different for every version.

I have started modifying the Eve core to natively support the idea of versions. I think you'll like my approach! Versioning is supported on a per-field basis and is assumed to be not versioned unless the field has versioned: True in the schema definition. For fields that are versioned, a new dictionary is created in a _versions list. A _version field is also inserted into the document and automatically maintained. Of course these field names can be changed at the global config level similar to LAST_UPDATED and DATE_CREATED.

The goal for this approach is to have versions be transparent to the user unless they ask for it. The latest version of the document is stored as normal, so validation, filtering, sorting, and projections work as normal. In additional, all standard HTTP verbs work as expected - the client does not need any additional logic to work with versions. Also, the entire resource is treated unversioned (the normal Eve way) unless versioning: True is defined in each resource definition or if the global VERSIONING is set to True.

When a client is ready to care about versions, they can access the child endpoints of versions/ and versions/_version_/ (as mentioned above.) These endpoints only support GET verbs of course, and are automatically created for the api maintainer when the resource has versioning: True defined.

I think this would be a pretty damn cool feature for Eve to support natively. It is very difficult to find and API framework that supports this out-of-the-box at the moment, and including it would put Eve a step ahead of the rest. Do you think this is something you would be interested in supporting with me?

@nicolaiarocci
Copy link
Member

Your suggestion is intriguing because it works out of the box without changes to the Eve code base. I do, however, see some downsides to this implementation. For example, a GET on the collection endpoint would by default return all versions of all items. Also, the notion of the objectid field as a unique field becomes a lot weaker since it would be different for every version.

You could set a predefined resource filter to only retrieve a specific version by default (the most recent?).

This being said, your approach seems to be very consistent with Eve design and general behaviour. However, on top of my mind I have a few questions and concerns:

  • if versioning is supported at the field level, would I have to enable versioning on every field when/if I wanted the whole document versioned?
  • moreover, if only a few fields are versioned, when I access the versioned endpoint what am I getting as LAST_UPDATED, DATE_CREATED and ETAG values (even DATE_CREATED can change in case of a PUT)? Are they going to be coherent with the version being accessed/retrieved? ETAG is particularly tricky as it is computed at runtime and not stored with the document. The versioned ETAG (since only a few fields have been stored for the version being requested) would not match the original ETAG stored with that version.

If you go the "I'm not returning meta fields with versioned responses" then you would end up with inconsistent API responses (for one, clients would have to support different payloads). Of course you could choose to store the metafields with the versioned documents but again, the ETAG would still be different. In general, I would say that versioning at the document level would avoid a lot of issues as the whole document snapshot would be stored every time (meta fields included).

While API versioning is a common concern in REST API development, I don't remember seeing RESTful document versioning design and implementation discussion before, so I feel like we are kind of exploring a new territory here (I might be wrong of course).

I'm wondering if document versioning with URLs is the correct approach. Would a header or a query parameter approach be better perhaps? I'm thinking something like resource/_id_?version=2. This way clients are still accessing the same url (remember one fundamental REST constraint is unique identifiers for every documentsì). Just thinking aloud here, I guess I should do some research on the matter though.

@nicolaiarocci
Copy link
Member

PS: of course you could correctly handle the diffs (and this probably your approach), reconstructing the whole document when versioning. You would still need to store LAST_UPDATED and DATE_CREATED.

@joshvillbrandt
Copy link
Contributor Author

I'm very glad to hear that you are open to this idea! I am strongly trying to avoid having to create a new project for this purpose, especially since Eve already has all of the other features that I need.

I think you're correct in that document versioning is uncommon in many of the existing REST API frameworks available. It's too bad to, since nearly every wiki and blog software implements some sort of document version control. You'd think more API frameworks would have this as a feature.

CouchDB is the closet thing I've seen to document revision control within the API. They do, in fact, maintain multiple versions of documents, but they do not guarantee that older version are kept around. It is primarily there to facilitate their replication and conflict model. (See CouchDB Revisions for more information.)

Alright, so let me see if I can tackle all of your comments. First, I definitely am supporting returning meta data when returning versioned documents so far. I agree the API should be consistent between accessing the head version of a document and accessing a specific revision. Currently, I generate the return document by grabbing the head version of the document, dropping all fields that are versioned, and then adding the fields for the specific version requested. This method allows for the correct document to be recreated even when non-required fields get deleted.

I currently have DATE_CREATED as an unversioned field and LAST_UPDATED (why didn't you call this DATE_UPDATED?) as a versioned field. Using the method above you can see how specific values end up in the response document.

The ETag field is only necessary when we want to modify a document. If we want to prevent modification to a document after receiving a specific revision, we can compute the ETag with the response document. The ETag here will not match the ETag of the head version, so the user will be unable to modify the document. I like this approach, but we could also always return the head ETag and also require the client to include the _version field of the head version when modifying the document as another means on concurrency control.

I like your idea of using a query parameter to access a specific version. It seems cleaner than having child endpoints. How would you envision returning a list of document versions?

I'd also like to enable a query parameter to return all of the versions (basically don't remove _versions from the response) when accessing the document head. I don't including the _versions field as part of the schema definition, so Eve already strips it from the response document. I tried to add it back in using a projection (?projection={"_versions":1}) only to discover that Eve doesn't support projections on item endpoints. (I guess the makes sense, you don't normally need to do a projection when retrieving only one document.) Would you suggest that I add the plumbing to support projections on item endpoints to accomplish this or should I add a specific query parameter like ?all_versions?

Sorry for the long response. Thanks for your support!!

@joshvillbrandt
Copy link
Contributor Author

I'll give some more thought on whether or not to version control the entire document as well. Although even if I version control the entire document, I don't think that will include the ID_FIELD or DATE_CREATED fields.

I'd also like to add support for USER_CREATED and USER_UPDATED parameters that tie in with Auth. I need to read the docs more, but I think you have USER_CREATED already in place.

@joshvillbrandt
Copy link
Contributor Author

I've updated my code such that version controlling every field is much easier. If versioning: True is defined for the resource (or it is turned on globally), all fields are considered version controlled unless versioned: False is supplied in a field definition. Version control of fields is only considered at the first depth. Automatic fields ID_FIELD and DATE_CREATED are not version controlled while LAST_UPDATED is version controlled.

This change makes it easier to support the ALLOW_UNKNOWN setting and will assume unknown fields should be version controlled. (I'm not sure if that is good are bad, but if you are allowing the unknown if the first depth of your resource anyway, who knows what you want!)

@nicolaiarocci
Copy link
Member

Currently, I generate the return document by grabbing the head version of the document, dropping all fields that are versioned, and then adding the fields for the specific version requested. This method allows for the correct document to be recreated even when non-required fields get deleted.

I've updated my code such that version controlling every field is much easier. If versioning: True is defined for the resource (or it is turned on globally), all fields are considered version controlled unless versioned: False is supplied in a field definition. Version control of fields is only considered at the first depth. Automatic fields ID_FIELD and DATE_CREATED are not version controlled while LAST_UPDATED is version controlled.

From a client perspective you are implementing document-versioning by default, which is absolutely the way to go. I would store the diffs in a different collection for better performance and reduced memory usage, which brings us to another concern of mine: will the the base io data layer need new methods/interfaces in order to support this feature? Remember we're now dealing with mongo but in the future, hopefully, someone will want to add other data-layers.

The ETag field is only necessary when we want to modify a document. If we want to prevent modification to a document after receiving a specific revision, we can compute the ETag with the response document. The ETag here will not much the ETag of the head version, so the user will be unable to modify the document. I like this approach, but we could also always return the head ETag and also require the client to include the _version field of the head version when modifying the document as another means on concurrency control.

The first approach is the clear winner here.

I like your idea of using a query parameter to access a specific version. It seems cleaner than having child endpoints. How would you envision returning a list of document versions?

We could support an all option:

  • GET resource/_id_ returns the head
  • GET resource/_id_?version=<version> returns specific version
  • GET resource/_id_?version=all returns a list of all versions

While this would be convenient, I'm not 100% positive on this feature as it would change the payload format (a list instead of a dictionary). We could just ignore this feature at the time being. In the rare circumstance where the client needs multiple versions, it could just send multiple requests.

I'd also like to enable a query parameter to return all of the versions (basically don't remove _versions from the response) when accessing the document head. I don't including the _versions field as part of the schema definition, so Eve already strips it from the response document.

I don't think this is a good idea. We should keep the interface simple, clean and coherent with the resource schema. You ask for a document, you get the document back (whatever version of it you requested).

Also, mainly for performance reasons but also to keep it simple, I would only support versioning at the document endpoint, at least at the first iteration.

@joshvillbrandt
Copy link
Contributor Author

I think I can add this functionality without changing the base io data layer. Most of the heavy-lifting should take place in the method definition files (get.py, post.py, etc.)

Just to be clear, I'm not storing "diffs" in the same way the many version control systems do. If only one field changes, I'm still copying the entire document. You're right in that this is really only scalable when using a separate collection for changes. Storing changes within the document means we'll hit the 16MB document limit even when the document isn't 16MB and means the application has to hold all of the version in memory during the request even if they aren't needed (as you mentioned.) So yeah, I'll switch back to the separate collection approach!

I like the all syntax. For my intended application, I do not think it would be uncommon to want to see all versions. For example, I will probably have a 'history' view for documents in my application that shows the changes over time. I don't think it is unreasonable to have a different output format (a list) when adding ?version=all to a query.

My suggestions of ?projection={"_versions":1} was an attempt to get all versions only at the document level (just as you've suggested with resource/_id_?version=all.) It also worked with the resource endpoint as a side effect, but that wasn't the motivation. Anyway, I'll take it out in exchange for the resource/_id_?version=all syntax.

@joshvillbrandt
Copy link
Contributor Author

Eh, I forgot about the original reasons I went with embedded version storage:

  • I don't have to do extra database queries - the HEAD version and all past versions are in one document
  • POST is easy because I don't have to insert first just to grab the ID back out
  • The shadow collection is interesting because the id field is not unique now
  • If I want to support version controlling only some features, than I always have to grab the HEAD version and the specific version I care about to synthesize the full document

I think the benefits for having a separate collection still outweighs these downsides. Again, just wanted to document that I've thought about it.

Question for you - can the values of DATE_CREATED and ID_FIELD in a given document be changed after the first insert? If I do synthesize the full document before responding, then I just don't care what the answer is because there is only, in fact, one instance of these fields. If I went the route of blindly copying the entire document for old versions, then these fields could be out of sync if changed later. So I guess I'm going with option number 1 (synthesizing), but I'm still curious if they can be changed after-the-fact or not.

@joshvillbrandt
Copy link
Contributor Author

Eh. So, I am still liking the separate collections, but I have hit a roadblock that I'd like your input in overcoming.

I have the following line in methods/post.py:

app.data.insert(resource+config.VERSIONS, versioned_documents)

This doesn't work, because resource+config.VERSIONS (or todos_versions in my test project) doesn't exist in config.SOURCES. Besides retrieving config.SOURCES[resource]['source'] (since I should really be insert to config.SOURCES[resource]['source']+config.VERSIONS), I really just need a backdoor into the database driver. I don't think I'll ever need the filter, projection, default_sort, or anything else the SOURCES variable has.

I could also just transparently add resource+config.VERSIONS to the source definition. I think I'll do that for now.

EDIT: I've gotten around this problem by copying config['SOURCES'] for each resource's shadow collections. I also made registering resources more DRY in flaskapp.py.

@joshvillbrandt
Copy link
Contributor Author

Another question for ya. I can't find where the mongo objectid gets created. Or, if it is automatically set by PyMongo's insert, how do you set control if the ID_FIELD should be the PyMongo default or not?

I suppose I won't be able to make Mongo allow non-unique id's in the shadow collection. Maybe the thing to do is to create a second id field like _id_real or something? Of course the user would never see that.

EDIT: I avoided the problem by adding _id_document which might be the right thing to do anyway.

@joshvillbrandt
Copy link
Contributor Author

I currently have POST and GET (collection endpoint) working after switching back to the shadow collection. Check out my progress against the current develop branch.

@nicolaiarocci
Copy link
Member

I like what you've been cooking so far. A few notes:

  • Would you be willing to submit this change as a separate PR to 0.3?
  • this is probably meant to be 0.4
  • consider refactoring these lines into a _resolve_versioning function (besides leaving the GET code cleaner, you are probably going to need the same code in getitem.
  • I would move this block in this position as this is the proper place where we are adding metadata and updating the documents before the insert.
  • Besides, it would probably be a good idea to refactor the above block into a eve.io.common function, as you will need a variance of it in other edit methods (PATCH, PUT). See what's been done with resolve_default_values and resolve_media_files
  • I would make a function out of this block, if nothing else to reduce post complexity.

Some more general thoughts

  • if I get it right you are building a new "history collection" for every versioned resource. I'm wondering if it would be more appropriate to have one single history collection across the board. Wether this would bring more advantages, I am not sure really, but it would probably avoid too much database pollution.
  • this isn't related to just this feature really. So far, Eve has not been dealing with database optimization at all: proper indexing is left to the API maintainer. As we add complexity, I'm wondering if we shouldn't add some auto-indexing features in the future (or at the very least sanity checks at startup).

@joshvillbrandt
Copy link
Contributor Author

Would you be willing to submit this change as a separate PR to 0.3?

I suppose I can. Do you specifically want that (sort of optimization) in 0.3 or do you just want it in the develop branch sooner than later? I'd like to have this versioning stuff mostly figured out in no more than a week's time. If 0.3 ships soon, we can dump all of this into develop sooner than later. I'm asking because I have another tiny optimization or two like this in mind.

this is probably meant to be 0.4

This link is the same as the comment above. Not sure what you're trying to say here.

Agreed on the next four comments. I'll definitely refactor a lot of the new bits into descrete functions as I fill out the other HTTP verbs. 😄

if I get it right you are building a new "history collection" for every versioned resource. I'm wondering if it would be more appropriate to have one single history collection across the board. Wether this would bring more advantages, I am not sure really, but it would probably avoid too much database pollution.

You got it! I'm not sure I like the idea of a single history collection though... Seeing collections in pairs of myresource and myresource_versions doesn't seem like too much pollution to me.

this isn't related to just this feature really. So far, Eve has not been dealing with database optimization at all: proper indexing is left to the API maintainer. As we add complexity, I'm wondering if we shouldn't add some auto-indexing features in the future (or at the very least sanity checks at startup).

I'm definitely not a Mongo expert, but I'll look into it, especially as it pertains to this feature.

@nicolaiarocci
Copy link
Member

This link is the same as the comment above. Not sure what you're trying to say here.

Sorry, I meant this

I suppose I can. Do you specifically want that (sort of optimization) in 0.3 or do you just want it in the develop branch sooner than later? I'd like to have this versioning stuff mostly figured out in no more than a week's time. If 0.3 ships soon, we can dump all of this into develop sooner than later. I'm asking because I have another tiny optimization or two like this in mind.

0.3 is going out when i'm back from FOSDEM (early february). Don't want versioning in 0.3, more likely 0.4 so there's no rush. If you have general optimizations/refactorings (not strictly related to versioning), those would be good to go with 0.3 but again there's no rush, we can wait next release.

You got it! I'm not sure I like the idea of a single history collection though... Seeing collections in pairs of myresource and myresource_versions doesn't seem like too much pollution to me.

Yeah, most likely. I guess it depends on how many collections the maintainer going to be versioning ;) Let's keep it as it is for the time being.

@joshvillbrandt
Copy link
Contributor Author

Notes

  1. Hmm, MongoDB doesn't support transactions, so we could end up in state with the primary document gets updated, but the history is not saved. This should only happen if the server is killed during a POST/PUT/PATCH request. I imagine you have other scenarios like this too (because of media files or anything other request that involved multiple independent write actions.)
  2. During GET requests, I noticed that you execute _resolve_embedded_documents() after generating the document ETag last_modified timestamp for the response. Doesn't this mean that the response could change, but you are still sending a 304?! It is more complicated, but it seems like last_modified to account for the embedded documents.

@joshvillbrandt
Copy link
Contributor Author

End of day summary:

  • I refactored some parts of methods/get.py to be a bit more DRY
    • I essentially added something similar to get_document() in methods/common.py
    • if you look at the diff, it looks like i totally trashed the file, but i promise it's not that different
    • thoughts?
  • added versioning support for PUT, PATCH, and DELETE
    • versioning support in these function (and in POST) only requires to additional lines!

TODO:

  • add support for ?version= in GET (item)
  • versioned references?
  • unit tests! both verifying against old ones and making new ones 😬

@joshvillbrandt
Copy link
Contributor Author

Woo. 19 test failures. Starting to work through these one by one.

I changed the validate_schema stub from (self, resource, schema) to (self, resource, settings) causing 4 test failures. See change for motivation. I could always make the versioning fields offenders or change the tests to support settings or adding versioning as an additional parameter to the validate_schema stub. Preference?

EDIT: I'm simply always counting versioning fields as offenders even when versioning isn't turned on for a particular resource.

@joshvillbrandt
Copy link
Contributor Author

Hey @nicolaiarocci, I've noticed a lot of redundant code and inconsistent function parameters and variable names throughout the method functions. What would you think about having the only parameter to the method functions be the request instead of the resource and a lookup variable. Within request, we could add resource, lookup, resource_definition, method, payload, etc. I think this would reduce a lot of code and increase readability! Just a thought. :)

A change like this would probably take place outside of the this issue, of course.

@joshvillbrandt
Copy link
Contributor Author

Passing all existing tests. Added documentation. Versioning support at all HTTP methods. See diff.

Still need to add new unit tests and support for a versioned reference.

I noticed today that the io/mongo find function includes an addition function parameter that is not defined in base.py. Doesn't this defeat the whole point of the base class if you have to have driver specific parameters? It doesn't even have a default value...

@nicolaiarocci
Copy link
Member

I've noticed a lot of redundant code and inconsistent function parameters and variable names throughout the method functions. What would you think about having the only parameter to the method functions be the request instead of the resource and a lookup variable. Within request, we could add resource, lookup, resource_definition, method, payload, etc. I think this would reduce a lot of code and increase readability! Just a thought. :)

Yeah there's a lot of room for improvement in the refactoring/cleaning department. I'm not entirely sure what you mean but feel free to submit a separate PR with your proposal :)

@nicolaiarocci
Copy link
Member

I noticed today that the io/mongo find function includes an addition function parameter that is not defined in base.py. Doesn't this defeat the whole point of the base class if you have to have driver specific parameters? It doesn't even have a default value...

Nope, I just forgot to update the base class. Will do later today.

@nicolaiarocci
Copy link
Member

Passing all existing tests. Added documentation. Versioning support at all HTTP methods. See diff.

Loving it DRY patches included. There's some minor merging issues (I pushed a few new updates in the meanwhile) but overall I think it's Very Good. I really appreciate you providing the documentation too. Will eventually do a complete code review next week when I'm back from Bruxelles (FOSDEM) and 0.3 is released.

By the way, I'll mention document versioning in the "what we are cooking" section of the FOSDEM talk ;-)

@joshvillbrandt
Copy link
Contributor Author

I pushed a few new updates in the meanwhile

I'll update this branch whenever we are about ready to merge this into develop (or maybe just once you release 0.3.)

Will eventually do a complete code review next week when I'm back from Bruxelles (FOSDEM) and 0.3 is released.

I look forward to your code review! I am least satisfied with the implementation of methods/get.py. 😕

By the way, I'll mention document versioning in the "what we are cooking" section of the FOSDEM talk ;-)

Awesome! I'll be curious to hear if people are interested in this feature or not.

@joshvillbrandt
Copy link
Contributor Author

Looks like v0.3 is almost released!!

The last piece of this puzzle that I haven't implemented yet is a version-aware data_relation. I could use some help choosing the implementation! I have a few different options here showing how the data would actually live in the document. Option 4 seems to be the only reasonable option to me.

# Non-versioned data_relation:
my_field_name: ID

# Option 1
my_field_name: ID
my_field_name_version: VER

# Option 2
my_field_name: [ID, VER]

# Option 3
my_field_name: {'value': ID, '_version': VER}

# Option 4
my_field_name: {'_foreign_field_name': ID, '_version': VER}

Regardless of how the foreign reference lives in the document, I think the necessary addition to the schema definition is clear:

'protocol_id': {
    'type': 'objectid',
    'data_relation': {
        'resource': 'protocols',
        'field': ID_FIELD,
        'versioned': True, # add this to make a versioned data_relation!
        'embeddable': True
    }
}

The only thing that is weird about this that when you turn versioned to True in a data_relation, the field goes from being a single value of type type to an object with two fields.

And just to note, one could also have a non-versioned data_relation to a versioned resource. We don't have to do any changes to get that to work. Users will be able to mix and match versioned and non-versioned data_relations at will.

What are your thoughts, @nicolaiarocci?

@nicolaiarocci
Copy link
Member

Looks like v0.3 is almost released!!

It was released exactly a week ago ;-)

The last piece of this puzzle that I haven't implemented yet is a version-aware data_relation. I could use some help choosing the implementation! I have a few different options here showing how the data would actually live in the document. Option 4 seems to be the only reasonable option to me.

Dumb question: wouldn't

'protocol_id': {
    'type': 'objectid',
    'versioned': True
    'data_relation': {
        'resource': 'protocols',
        'field': ID_FIELD,
        'embeddable': True
    }
}

take care of this field versioning already?

A gave quick look at the code and the first thought was, it would be nice if we had a versioning.py module (maybe in eve.methodsor even in the project root folder - not sure) where we could dump all the versioning-related code (right now we have it scattered in several places: utils.py, methods.common.py and then in some <verb>.py modules). This way imports wold be less cluttered and more logic.

@joshvillbrandt
Copy link
Contributor Author

That does not accomplish what I'm looking for. Putting 'versioned': True directly under protocol_id version controls the pointer itself (which is already on by default if the entire resource is versioned.) I'm trying to point to a specific version of a document. So instead of protocol_id pointing to protocol 123, I want to point to protocol 123 version 5 (and not the latest version.) Does that make sense?

@nicolaiarocci
Copy link
Member

Of course it does!

One thing to keep in mind is that MongoDB does not hash/compress/optimize document field names (or data) in any way so your option #4 might become expensive in term of both performance and size, and #2 probably the more compact. Using short fields can go a long way (_v instead of _versionetc.) to minimize the problem.

This being said, I probably prefer #4 as well.

The only thing that is weird about this that when you turn versioned to True in a data_relation, the field goes from being a single value of type type to an object with two fields.

You are probably going to need to transform it back to its original form when sending it back to the client.

An alternative would be to use a separate collection to store this kind of meta data. Something like

{
    'versioned_document_id': '454..', 
    'protocol_id': {'foreign_field_name': '_id', 'ver': 2}
}  

or, to only store one object in this meta collection even when multiple data relations within the same document are versioned (here I'm using option #2):

{
    'versioned_document_id': '454..', 
    'versioned': [{'protocol_id': [ID, VER]}, {'invoice_id': [ID, VER]} ..]
}

These meta collections might come in handy in the future if other versioning-related stuff will need to be persisted somewhere. Just venting out my thoughts as they cross my mind here :)

@joshvillbrandt
Copy link
Contributor Author

Alright! I'm back on the job here. I'm starting with the latest 0.4 develop branch and re-integrating the versioning changes. I'll be moving much of the new functionality into a versioning.py file as requested. :)

I don't think I want to use a separate collection to store the versioning of the pointer, nor to I want to transform the pointer to a different format before sending it to the client. On the second topic, if the version of the reference I'm pointing to is important enough to store into the database into the first place, I don't want to strip that before sending it to the client.

More coming soon!

@joshvillbrandt
Copy link
Contributor Author

@nicolaiarocci - why are some settings duplicated in eve/__init__.py and in eve/default_settings.py? Also, It looks like eve/flaskapp.py sometimes grabs eve.VARIABLE which means that the user's settings are not used.

@joshvillbrandt
Copy link
Contributor Author

Also, how can I debug failing test cases? In particular, some tests are getting 500 errors, but since I can't actually see what the error was, I don't know what to fix! I was able to figure out some solution to this problem when I wrote the initial chunk of the versioning code, but I don't remember what I did. :|

@nicolaiarocci
Copy link
Member

why are some settings duplicated in eve/init.py and in eve/default_settings.py?

Some settings are used on bootstrap, before default_settings are loaded. There's a helper method in utils (I think it's config) that will return either user/default settings or built-in settings depending on wether the former are available.

Also, how can I debug failing test cases?

For 500s, I'm afraid good ole PDB (or even the classical "print" way) is the way to go.

@joshvillbrandt
Copy link
Contributor Author

Hey @nicolaiarocci, check out my progress! I'm basically back to where I was with 0.3. I cleaned up some of the get.py file which is the method with the most new lines of code. Much of the versioning logic is now in a versioning.py file as requested. I'm going to be adding unit tests and the versioned data relations in the next day or two.

I'd like to push this into the develop branch as soon as possible to avoid having to do a bunch of back-merges. Please review my code at your earliest convenience!

@nicolaiarocci
Copy link
Member

Excellent. Will hold on non-trivial updates/merges until this is merged. Promise!

@joshvillbrandt
Copy link
Contributor Author

Woot!! Thanks. :) I should be 90% done tomorrow evening.

@joshvillbrandt
Copy link
Contributor Author

Hey @nicolaiarocci, I've DRY'd up a bit more of your code and added in considerations for #36 at the same time. A new function called resolve_write_response() has been created in methods/common.py to handle adding fields to the response for POST, PATCH, and PUT. A (verbosely named) setting called CONSERVATIVE_WRITE_RESPONSES allows all fields to be written back to the client. Otherwise, only the "automatically handled" fields and extra_response_fields fields are sent.

EDIT: I should note that this modifications allows extra_response_fields to be sent in PUT and PATCH responses instead of just POST responses. Let me know if you think I should preserve the existing functionality here or not.

@joshvillbrandt
Copy link
Contributor Author

Hmm, I'm thinking through this a little bit more... In order to pull that off, I have to move all of the _resolve_embedded_fields and _resolve_embedded_documents type functions to methods/common.py. This also means that I'm essentially build the entire document as would happen with a GET even if only some of the fields are being returned. The trade off here is that this allows any field to be returned via extra_response_fields, even those that are embedded documents. It also allows us to keep the logic for the required/automatically handled fields (ID_FIELD, LAST_UPDATED, DATE_CREATED, ETAG, LINKS, VERSION, and LATEST_VERSION) all in one place (the _build_response_document function currently in methods/get.py.) Although I guess I could separate the required field logic too...

So I guess the question is do you want to support return any field in a post (regardless of return all fields or only some) versus the possible performance hit during a POST, PATCH, or PUT requests?

@joshvillbrandt
Copy link
Contributor Author

I'm not opposed to doing this in a separate issue ticket. Maybe we can revive #36 after we finish this up. (I just have to do more changes across multiple files until this functionality is consolidated.)

@nicolaiarocci
Copy link
Member

Versioning is quite a change already. Let's do one thing at a time.

@joshvillbrandt
Copy link
Contributor Author

Agreed. Sounds good. :)

@joshvillbrandt
Copy link
Contributor Author

I'm having a much worse time debugging my failing test cases since I merged upstream. Per my question before about 500 errors, something in Eve or Flask or Werkzeug catches the errors so I never see a traceback. I can't tell if I need to turn on a debug mode or something.

@joshvillbrandt
Copy link
Contributor Author

Oh! Yeah. Just need to set DEBUG = True in test/test_settings.py. This should move much quicker now. :)

@nicolaiarocci
Copy link
Member

👍 keep in mind I merged a couple PRs; last one was a few minutes ago and it deals with the test suite, so you might want to pull.

@joshvillbrandt
Copy link
Contributor Author

Merged to the HEAD of develop. Passing all existing tests. Added 25% of the versioning specific tests. Have outlines for the remaining versioning specific tests. I also still need to implement the data_relation with version, but I'll implement that after I complete the tests for the code that is already implements. I'll probably have more unit tests done by the time you read this.

Any comments on the latest diff?

@nicolaiarocci
Copy link
Member

Any comments on the latest diff?

Absolutely love what you are doing here. The DRY fixes are great too. Keep up the good work.

👍

@joshvillbrandt
Copy link
Contributor Author

Hmm... I really should run the entirety of the existing Eve tests with versioning turned on. In other words, I'm not writing a special test for every single combination of settings + the version control setting on. I'm basically only testing the intricasies of the document versioning functionality against tests/test_settings.py, but there are many more combinations out there. I don't think the current Eve test suite is written in a way to test all of the possible combinations. (That would be a pretty crazy test suite!)

Anyway, I think this is fine since I'm obviously trying to test the most interesting and more important cases. I do know of at least two interesting cases that have different responses that I'm not testing. They are:

  • HATEOAS on/off - performing ?version=all/diff on an item returns a list and this list is nominally under _items only if HATEOAS is on
  • projections on ?version=all as forces _version and _latest_version when document versioning is on

Thoughts?

@joshvillbrandt
Copy link
Contributor Author

Just noticed that if you don't include something like _created_at in a projection, the getitem() routine actually still adds a _created_at and it has an incorrect value (the linux epoch of course.) At some point if we are returning those fields to the client, we should probably force them into the projection, but since you aren't worrying about them yet, I'm not worrying about them yet for document versioning either.

It's an easy fix for a client after all...they just include those fields in the projection query parameter!

@nicolaiarocci
Copy link
Member

Just noticed that if you don't include something like _created_at in a projection, the getitem() routine actually still adds a _created_at and it has an incorrect value (the linux epoch of course.)

This seems related to #282. Haven't checked it yet, hope to do it real soon (and apply a fix).

I don't think the current Eve test suite is written in a way to test all of the possible combinations. (That would be a pretty crazy test suite!) [...] Anyway, I think this is fine since I'm obviously trying to test the most interesting and more important cases.

I usually follow the code and try to cover most of it with tests. I'd say your should test all possible version requests with their responses. And yes, possibly the special/odd configuration settings which can have an impact on versioning.

@joshvillbrandt
Copy link
Contributor Author

Hmm. Since I have it all open, I'll go ahead and fix #282 for ya (since I can make life easier for version+projection queries.) I'm also going to add the feature to through a 400 (instead of a 500) when you try to do 0 and 1 projections together.

Besides that, I have all of the features and all of the test cases done expect for data_releations with references!! Almost done!!! I also haven't tackled the unlikely combo scenarios I just mentioned. I'll maybe do those... 😒

@nicolaiarocci
Copy link
Member

If it's not too much of a pain, please send a separate PR for the #282 fix (and thanks!).

@joshvillbrandt
Copy link
Contributor Author

Ok. Will do - probably tomorrow though. My goal is to finally do a pull request for this guy tomorrow!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants