Skip to content

added support for 1 to 1 relations and added support for n unique key…#3

Merged
tiger5226 merged 2 commits intomasterfrom
merge_changes
Feb 8, 2019
Merged

added support for 1 to 1 relations and added support for n unique key…#3
tiger5226 merged 2 commits intomasterfrom
merge_changes

Conversation

@tiger5226
Copy link
Copy Markdown

added support for 1 to 1 relations and added support for n unique keys for conflict resolution during merging.

Comment thread templates/singleton/boil_queries.tpl Outdated
if len(conflict.columns) == 1 && len(conflictingColumns) == 0 {
isFKManyto1 = 0
conflictingColumns = conflict.columns
selectParams = []interface{}{secondaryID}
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thought of a case not covered. If the secondary user has a row but the primary does not the secondary row will still be deleted. We need to also check that the primary has a row too in order for the secondary to be deleted, otherwise there is no conflict to resolve.

@tiger5226 tiger5226 requested a review from nikooo777 September 3, 2018 20:42
@lbry-bot lbry-bot assigned nikooo777 and unassigned nikooo777 Sep 3, 2018
Comment thread templates/singleton/boil_queries.tpl Outdated
Copy link
Copy Markdown
Collaborator

@nikooo777 nikooo777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand this PR, but I took a look at it and it seems to logically make sense.

Still I don't feel experienced enough to say anything else.

Copy link
Copy Markdown
Member

@lyoshenka lyoshenka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i recommend splitting the two cases out into separate functions. it will be far easier, and i don't think there's much overlap

Comment thread templates/singleton/boil_queries.tpl Outdated
selectParams = []interface{}{secondaryID}
}

//This query checks for conflicting rows based on the keys. There are two cases. The first is when we have a 1To1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id prefer not to have large comments like this in the code. we can either use godoc-style comments at the top of the function which describe what the function does, or write up the docs somewhere else. other than that, the code should speak for itself as much as possible

@lbry-bot lbry-bot assigned tiger5226 and unassigned lyoshenka Sep 4, 2018
@lyoshenka
Copy link
Copy Markdown
Member

lyoshenka commented Sep 4, 2018

some code i threw together real quick

func mergeModels(tx boil.Executor, primaryID uint64, secondaryID uint64, foreignKeys []foreignKey, conflictingKeys []conflictingUniqueKey) error {
    if len(foreignKeys) < 1 {
        return nil
    }
    var err error

    for _, conflict := range conflictingKeys {
        if len(conflict.columns) == 1 && conflict.columns[0] == conflict.objectIdColumn {
            err = deleteOneToOneConflictsBeforeMerge(tx, conflict, primaryID, secondaryID)
        } else {
            err = deleteOneToManyConflictsBeforeMerge(tx, conflict, primaryID, secondaryID)
        }
        if err != nil {
            return err
        }
    }

    for _, fk := range foreignKeys {
        // TODO: use NewQuery here, not plain sql
        query := fmt.Sprintf(
            "UPDATE %s SET %s = %s WHERE %s = %s",
            fk.foreignTable, fk.foreignColumn, strmangle.Placeholders(dialect.IndexPlaceholders, 1, 1, 1),
            fk.foreignColumn, strmangle.Placeholders(dialect.IndexPlaceholders, 1, 2, 1),
        )
        _, err = tx.Exec(query, primaryID, secondaryID)
        if err != nil {
            return errors.Err(err)
        }
    }
    return checkMerge(tx, foreignKeys)
}

func deleteOneToOneConflictsBeforeMerge(tx boil.Executor, conflict conflictingUniqueKey, primaryID uint64, secondaryID uint64) error {
    query := fmt.Sprintf(
        "SELECT COUNT(*) FROM %s WHERE %s IN (%s)",
        conflict.table, conflict.objectIdColumn,
        strmangle.Placeholders(dialect.IndexPlaceholders, 2, 1, 1),
    )

    var count int
    err := tx.QueryRow(query, primaryID, secondaryID).Scan(&count)
    if err != nil {
        return errors.Err(err)
    }

    if count > 2 {
        return errors.Err("it should not be possible to have more than two rows here")
    } else if count != 2 {
        return nil // no conflicting rows
    }

    query = fmt.Sprintf(
        "DELETE FROM %s WHERE %s = %s",
        conflict.table, conflict.objectIdColumn, strmangle.Placeholders(dialect.IndexPlaceholders, 1, 1, 1),
    )

    _, err = tx.Exec(query, secondaryID)
    return errors.Err(err)
}

func deleteOneToManyConflictsBeforeMerge(tx boil.Executor, conflict conflictingUniqueKey, primaryID uint64, secondaryID uint64) error {
    conflictingColumns := strmangle.SetComplement(conflict.columns, []string{conflict.objectIdColumn})

    if len(conflictingColumns) < 1 {
        return nil
    } else if len(conflictingColumns) > 1 {
        return errors.Err("this doesnt work for unique keys with more than two columns (yet)")
    }

    query := fmt.Sprintf(
        "SELECT %s FROM %s WHERE %s IN (%s) GROUP BY %s HAVING count(distinct %s) > 1",
        conflictingColumns[0], conflict.table, conflict.objectIdColumn,
        strmangle.Placeholders(dialect.IndexPlaceholders, 2, 1, 1),
        conflictingColumns[0], conflict.objectIdColumn,
    )

    rows, err := tx.Query(query, primaryID, secondaryID)
    defer rows.Close()
    if err != nil {
        return errors.Err(err)
    }

    args := []interface{}{secondaryID}
    for rows.Next() {
        var value string
        err = rows.Scan(&value)
        if err != nil {
            return errors.Err(err)
        }
        args = append(args, value)
    }

    // if no rows found, no need to delete anything
    if len(args) < 2 {
        return nil
    }

    query = fmt.Sprintf(
        "DELETE FROM %s WHERE %s = %s AND %s IN (%s)",
        conflict.table, conflict.objectIdColumn, strmangle.Placeholders(dialect.IndexPlaceholders, 1, 1, 1),
        conflictingColumns[0], strmangle.Placeholders(dialect.IndexPlaceholders, len(args)-1, 2, 1),
    )

    _, err = tx.Exec(query, args...)
    if err != nil {
        return errors.Err(err)
    }
    return nil
}

@tiger5226
Copy link
Copy Markdown
Author

@lyoshenka I split it out into the two methods per your suggestion. I also tested the merge for the flagged edge case and it passes.

Copy link
Copy Markdown
Member

@lyoshenka lyoshenka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im having a bit of trouble following some of this. whats an example in lbry where we'd use this now?

also, could you add some tests for this behavior? sqlboiler already has a lot of features for testing. it would be good to keep using them.

finally, comments like //Grab scanned values for query arguments are unnecessary, and may be a sign that the code itself is not readable. please consider rewriting the code instead so that it reads better.

Comment thread templates/singleton/boil_queries.tpl Outdated
// used in the delete query.
colNames, err := rows.Columns()
if err != nil {
log.Fatal(err)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log.Fatal?

}

args := []interface{}{secondaryID}
//Since we don't don't know if advance how many columns the query returns, we have dynamically assign them to be
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we know in advance how many columns will be returned. isn't it just len(conflictingColumns)?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do, but we can't predict the columns in the code. So we have to be dynamic. One query might have 1, another might have 5. I can't create a generic struct for scanning purposes. Front the research I did online, they said to use this pointer solution for dynamic scans.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyoshenka is this acceptable? or were you just curious? If you approve I would like deploy this fix.

@lbry-bot lbry-bot assigned tiger5226 and unassigned lyoshenka and tiger5226 Sep 5, 2018
@tiger5226 tiger5226 requested a review from lyoshenka September 8, 2018 15:57
@lbry-bot lbry-bot assigned lyoshenka and unassigned tiger5226 Sep 8, 2018
@lbry-bot lbry-bot assigned tiger5226 and unassigned lyoshenka Sep 18, 2018
@lyoshenka
Copy link
Copy Markdown
Member

Feel free to merge, though you should still remove the log.Fatal line.

@lyoshenka
Copy link
Copy Markdown
Member

Also, would still love tests for this if possible

@lyoshenka
Copy link
Copy Markdown
Member

@tiger5226 should i merge this?

@tiger5226
Copy link
Copy Markdown
Author

No, I was talking with @tzarebczan and we will need to make changes to the schema. There are certain conflicts where we don't want to delete. For example, youtube data. So in this example we need to change the index to not be unique.

@lyoshenka
Copy link
Copy Markdown
Member

lyoshenka commented Sep 27, 2018

If its not on github, it didnt happen 😉

What schema changes are you planning to make?

I think youtube_data should probably still be unique. All the code Niko wrote depends on that - it would be a lot of work to change it. I'd prefer allowing Merge to error if there's a conflict, rather than making youtube_data not unique. Then we can handle the merge manually.

@tiger5226
Copy link
Copy Markdown
Author

In my defense I posted that about 2 minutes after we discussed it :) to make sure it was on github haha. I was actually planning on merging it when Tom flagged his concern based on your comment to merge it.

@tiger5226
Copy link
Copy Markdown
Author

I talked with Tom about this more tonight. So this PR fixes the fact that manual merging is required. However, this is for things like stripe where we want this to happen for example. However, youtube data is a special exception potentially. @tzarebczan mentioned that he thinks the youtube data is actually unique per channel not per user. He plans on double checking with @nikooo777 tomorrow.

@nikooo777
Copy link
Copy Markdown
Collaborator

what do you mean by unique per channel and not per user?

the youtube_data table is strictly related to the user table, a youtube_data row can only exist if there is a user associated with it, and only one row can exist per user.

i don't think we have a concept of channels in our database, do we?

@tiger5226
Copy link
Copy Markdown
Author

@nikooo777 I will let @tzarebczan answer your question, but based on what you said, what is the expected behavior when we merge 2 users that each have a youtube account? If we have two records conflict ( meaning the uniqueness keys all match ) we delete the secondary users record. The main concern is that we would have two accounts for two users that request to be merged into one. I think, we should delete the secondary youtube_data until we support multiple youtube accounts per user. What impacts could this gave?

@nikooo777
Copy link
Copy Markdown
Collaborator

I don't think deleting data is acceptable in this case. I'd much rather drop the uniqueness constraint as I had previously proposed in an issue on internal-apis.
What this implies is that we'll have to rework rewards to allow a user to claim the YouTube rewards multiple times and verify that nowhere else we depend on that constraint.

@tiger5226
Copy link
Copy Markdown
Author

the youtube_data table is strictly related to the user table, a youtube_data row can only exist if there is a user associated with it, and only one row can exist per user.

I was referring to this comment. I would prefer that we drop the constraint as well and allow multiple channels. Does this mean that this PR is blocked by that or can this be done until that is handled? How common is it that we merge two users with different youtube channels? Also is there an issue for this?

@nikooo777
Copy link
Copy Markdown
Collaborator

probably not my call to make, but I'd vote for blocked. I don't agree in ever dropping any youtube_data rows.

Opinions @lyoshenka and @kauffj ?

@tiger5226
Copy link
Copy Markdown
Author

Per @nikooo777 from slack, issue for multiple youtube accounts is https://github.com/lbryio/internal-apis/issues/323, will block this by for now. @alyssaoc just want to make sure this is on your radar.

@alyssaoc alyssaoc assigned nikooo777 and unassigned nikooo777 Nov 7, 2018
@tiger5226
Copy link
Copy Markdown
Author

We have a PR in review https://github.com/lbryio/internal-apis/pull/639. Once this PR is merged we can potentially merge this as well.

@tzarebczan
Copy link
Copy Markdown

@tiger5226 how comfortable do you feel merging this? Should we still run through a few scenarios locally before doing so?

@tiger5226
Copy link
Copy Markdown
Author

Yes! This can be merged now! I am really confident. Nonetheless, I would still want to retest it since it's a months old issue.

@tiger5226 tiger5226 merged commit fadcbfa into master Feb 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants