New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fet/use indexes to decide habtm join table behaviour #877
Fet/use indexes to decide habtm join table behaviour #877
Conversation
I see why you think this is the most backward compatible because currently all HABTM relationships allow duplicate entries. However, I'm not sure that's something we want and is more of an accident. Allowing duplicate entries:
If we were to implement this, I'd prefer to become more opinionated about the join table behaviour and this is added simply to accommodate the exception of EQG. So EQG would receive similar treatment as what you are suggesting here for index (maybe instead of However...I applaud the effort to "fix" this in the code as opposed to make a change in the database level (and I think what you are suggesting would technically work) but I'm wondering if we should bite the bullet and fix the schema for EQG instead. There's two ways we could do this:
With both options we'd have to tweak the queries involving question groups, although we probably could put in some back compat shims in the Event model (currently that appears to be the only place that includes With both options there'd be some sort of db migration needed to migrate the existing entries and delete the "duplicates". The primary downside to option 1 is that the resource hit will be larger on migration I think because it involves altering the tables (so impact on ES could be greater). The primary downside to option 2 is that it means an additional join in current queries. My preference is option 2 because I think that provides a future proof way to add additional meta data to this relation down the road (eg purchaser). Why?You've identified the primary problem with fixing this in the code only Mike:
I can't overstate how significant a disadvantage this is and greatly complicates understanding not only how to interpret model data but also how to dynamically create CRUD actions for model data. It increases the code needed and with that increase in code surface area comes an increase in the potential for bugs. Now, that's okay if we need to accommodate a feature needed or desired. But we're talking about having to deal with the idiosyncrasy of a single join table that results from (in hindsight) a poorly designed schema. I don't think it's a good idea to shift the problem of this design downstream to code to handle if it's something we don't want (duplicate entries in the join table) to begin with. For example, with EQG left as is, this means the js client I'm building would have to make sure that every time the relation endpoint is used, it includes the Let's think about how we might communicate this information via the REST API should something like this (allowing for duplicates in join tables) land.
In the end, I think it lends to much simpler and easier to reason with code if we make the fix at the db table level (which I realize means we have to tweak a few places in our existing code) so that we enforce the opinion that join tables should only have unique records for the relations. |
I didn't totally follow your rationale for that, so I did some reading to try to figure out "the proper way" to have join tables. I see one stack overflow article saying that, at least in Doctrine 2, join tables should ONLY contain foreign keys.. That definition came up again elsewhere. But I found at least one tutorial talking about "join tables" with "extra columns". So I don't think it's been a hard-and-fast rule that join tables should only contain foreign keys, but I get the idea that's the most common definition. So ya, having other columns on "join tables" is a no-no to start with. Having duplicates seems to be more of a no-no. I wasn't aware of that before.
Event-Question-Group is the only table that is treated as a a join table with an extra column which "allows duplicates" (as defined solely by the foreign keys). Registration_Payment and Event_Venue are other join-models with an extra fields, but they are meant to only have one entry per foreign-key combo. But Event_Question_Group is very similar to several other models that are just two foreign keys with one or two extra columns: Answer, Question_Mailchimp_Field, and Promotion_Object and possibly others. The only difference between it and those other models is that it's treated as a join table in a HABTM relation between events and question groups. So if we're talking about backward incompatible changes, and we don't like having some join tables "allow duplicates" while others don't (what this branch does), we could remove this inconsistency by just not using Event_Question_Group as a join-model. Same goes for another other models we might find that are used for I'd really like to avoid changing the data structure as that seems like it will take an order of magnitude more effort, and the only problem I really see with it is that we're calling Event_Question_Group as join table although it breaks two of the conventions of join-tables (it has extra columns and "allows duplicates"). If we no longer call it a join table (by removing the relations entry in Event for What do you think of this proposed alternative? (Just don't use "Event_Question_Group" as a join table, and the necessary refactoring in the code for that). |
I want to try to be a bit clearer here. I'm not against join tables having extra columns when it makes sense and the extra data needs are static and very clear (to represent order etc). I'm specifically addressing the flaw I think exists because we have duplicate rows for the joins solely because of the poor choice to have a single column represent a binary value and have it possible to have multiples of that binary value. I gave reasons for why (among the suggestions I gave) I prefer the option of storing the data in extra meta as opposed to adding additional columns in the table but among those reasons I'm not saying that join tables should never have extra data in them.
I don't think this is a valid solution as it'll involve more code churn than either of the solutions I'm proposing. Effectively, changing the "label" we give a table doesn't change its purpose. It'd still be the only way we could represent the Question Group to Event relation - only we'd be complicating the process to add that relation compared to the rest of the relations in our model system.
I think before we talk about other EE_HABTM_Relations that "allow duplicates", I think we need to put that hypothetical to bed. Either we have other existing models with that kind of scenario that need fixed or we don't. Either we decide we're going to support it or we don't. I think we need a decision here to help improve the stability and usability of our code and I'm arguing that we need to take the necessary steps to be clear we don't support it and we won't support it (for all the reasons I've already given).
I do agree there will be slightly more short term pain but I'm arguing for this because I'm confident it'll save us even more headaches down the road (and yes I'm aware of the impact of db schema changes on larger db's like EventSmart).
I disagree with this. Changing the label of a table does not change how its used. By removing its status as a HABTM Relation table, we're just effectively increasing the difficulty of representing the relation between Question Group and Event (of which the need has not disappeared). So while the problem prompting this discussion may "disappear" I think we'll introduce a new set of problems needing fixed. I'm aware that any fix has the potential for introducing a new set of problems but I'm confident that what I'm proposing will help make the codebase more solid and improve the stability of the db schema as opposed to simply trying to accommodate the current flaws.
Do we really know its been okay for the past several years? Is it possible there are some bugs and/or weird behaviour that have happened here and there over the years that are because of this table setup that we've just never identified yet? I think it's a pretty significant claim. Appears to have been okay might be slightly more accurate (and likely what you meant). Also the nature of the "complaint" that is surfaced matters imo. There's real complications involved in any client code consuming and interacting with HABTM_relations in the EE schema if there's potential that there may be multiples of those relations with arbitrary indexes that define their "uniqueness". There's a very clear issue with that and an opportunity (especially with this affecting only ONE table) to simplify things and make them more in line with convention (which exists for good reason). In my developer career, I've read about and experienced the kind of pain that happens when db schema issues are covered over with clever code solutions instead of dealing with them "properly" (we're still working out what the "properly" part is). I want to avoid that future pain for us (especially the real pain being felt currently in building our js client for the REST API). I feel confident I've stated my opinions on this in what direction we should take. I don't feel I'm the person that should make the decision on this so I don't think I have anything more to say other than to respond to additional questions for clarity etc. At this point I think @tn3rb should probably weigh in and make the decision on what path we take. |
I just want to clarify a few things first before discussing any possible solutions...
No that's incorrect. The links you posted are not saying that an intersection table can not have extra columns, they are merely pointing out that both Doctrine 2 and Hibernate need to map the join table as an entity table in order to access the data in those extra columns. So that's a constraint of those ORMs and not a constraint of SQL. Here is a more pure explanation of Intersection tables and Intersection Data in SQL: https://docs.oracle.com/cd/B31104_02/books/ConfigApps/ConfigApps_TablesColumns7.html Now that said, if we want to get technical about things, our tables like It is THIS property that allows us to have duplicate entries in the first place. Had we chosen to build proper intersection tables that utilized a composite primary key, then duplicate entries would be absolutely impossible because there would be no way to have two records that both have a composite key like Now with regards to our specific tables, here are a couple of examples of how this is implemented: Event Question GroupsConsists of an intersection table with intersection data columns.
Registration PaymentsConsists of an intersection table with intersection data columns,
So just to clarify, is it the usage of |
Summary (so you don't have to read all the above): We discussed in slack https://eventespresso.slack.com/archives/C080DDVLM/p1546881500247100
How long will the above changes take? Initial development probably a few days, testing a few days, roll-out of DB schema changes to ES would be another day or two, and then probably extra support in helping clients run the data migration script (but theoretically it will just work). This PR is an alternative, but it won't make the DB change for purchasing agent, and will introduce the gotcha that @joshfeck as PM, you should probably ok the DB change and migration script before we get too far along in this. For the record, Brent and Darren are on board with it, I can go either way. |
I think these are needed since 1) it's exposed a problem while building out the underpinnings needed for REM, and 2) This was a blocker when the first crack at the Purchasing Agent was made. While we're adding to the esp_event_question_group table, could we also add one more field that's been discussed for another requested feature? We want to be able to add a context field to allow for different registration form contexts (default context would be the registration form). There's some backstory in codebase 4426 |
KK I'll embark on another PR asap. |
Problem this Pull Request solves
Restores the behaviour from #798 (which limits there to only being one join table entry for the foreign keys) but only for the join models that expect that behaviour (other models allow duplicates based on foreign keys).
WIP!
Exterior behaviour
This means join tables can either impose a uniqueness constraint based on foreign keys, or not. If a model defines a unique index/combined key (that is, a set of fields that must be globally unique for that model) then the
EE_HABTM_Relation
will respect that when adding a relation (instead of inserting a new entry, it will update the existing one).However, if a model doesn't define a unique index, the default is to to allow duplicate join table entries. (We could have the opposite be the default, but this is most backward compatible IMO).
Implementation details
This primarily means join models should define a
EE_Unique_Index
that lists the fields that make the combined key. Those that don't get the old behaviour (which allowed duplicates).Up for debate
The main disadvantage is that there's no way for REST API clients to know which behaviour to expect. We should expose that either in the
/resources
endpoint, or somehow in the schema.How has this been tested
Not yet tested.
Checklist
esc_html__()
, see https://codex.wordpress.org/I18n_for_WordPress_Developers)