Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dy] Add unique_conflict_method for mssql #4712

Merged

Conversation

dy46
Copy link
Contributor

@dy46 dy46 commented Mar 6, 2024

Description

Add additional logic for exporting data to MSSQL with unique_conflict_method=UPDATE and unique_constraints set. In order to update records directly, I'm using the method parameter of the to_sql method. The resulting query will look like this:

MERGE [dbo].[test_update] AS t

USING (VALUES

    (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)

) s([passengerid], [survived], [pclass], [_name], [sex], [age], [sibsp], [parch], [ticket], [fare], [cabin], [embarked], [new_col])

ON s.[passengerid] = t.[passengerid]

WHEN NOT MATCHED THEN

    INSERT ([passengerid], [survived], [pclass], [_name], [sex], [age], [sibsp], [parch], [ticket], [fare], [cabin], [embarked], [new_col])

    VALUES (s.[passengerid], s.[survived], s.[pclass], s.[_name], s.[sex], s.[age], s.[sibsp], s.[parch], s.[ticket], s.[fare], s.[cabin], s.[embarked], s.[new_col])

WHEN MATCHED THEN UPDATE SET

    [passengerid] = s.[passengerid], [survived] = s.[survived], [pclass] = s.[pclass], [_name] = s.[_name], [sex] = s.[sex], [age] = s.[age], [sibsp] = s.[sibsp], [parch] = s.[parch], [ticket] = s.[ticket], [fare] = s.[fare], [cabin] = s.[cabin], [embarked] = s.[embarked], [new_col] = s.[new_col]

;

This may slow down the query execution, but I'm not sure if there is a workaround since we have to update the records.

How Has This Been Tested?

  • Tested locally by updating rows with different column values

Checklist

  • The PR is tagged with proper labels (bug, enhancement, feature, documentation)
  • I have performed a self-review of my own code
  • I have added unit tests that prove my fix is effective or that my feature works
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • If new documentation has been added, relative paths have been added to the appropriate section of docs/mint.json

cc: @wangxiaoyou1993

@dy46 dy46 added the enhancement Polish or UX improvements label Mar 6, 2024
@dy46 dy46 linked an issue Mar 6, 2024 that may be closed by this pull request
Comment on lines +205 to +252
unique_conflict_method = kwargs.get('unique_conflict_method')
unique_constraints = kwargs.get('unique_constraints')

if unique_conflict_method and unique_constraints:

def merge_table(
table, conn, keys, data_iter, unique_constraints=unique_constraints
):
dbapi_conn = conn.connection
with dbapi_conn.cursor() as cur:
if table.schema:
table_name = f'[{table.schema}].[{table.name}]'
else:
table_name = table.name

values_placeholder = ', '.join(['?' for i in range(len(keys))])
values = [tuple(row) for row in data_iter]
sql = MERGE_TABLE_SQL.format(
table_name=table_name,
values_placeholder=values_placeholder,
columns=', '.join([f'[{k}]' for k in keys]),
on_clause=' AND '.join(
[f's.[{k}] = t.[{k}]' for k in unique_constraints]
),
insert=', '.join([f'[{c}]' for c in keys]),
values=', '.join([f's.[{c}]' for c in keys]),
update=', '.join([f'[{c}] = s.[{c}]' for c in keys]),
)
cur.executemany(sql, values)

if UNIQUE_CONFLICT_METHOD_UPDATE == unique_conflict_method:
df.to_sql(
table_name,
engine,
schema=schema_name,
if_exists=if_exists or ExportWritePolicy.APPEND,
index=False,
method=merge_table,
)
return

df.to_sql(
table_name,
engine,
schema=schema_name,
if_exists=if_exists or ExportWritePolicy.REPLACE,
index=False,
)
Copy link
Contributor Author

@dy46 dy46 Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the main change in this file. Other changes are primarily formatting changes

Copy link
Member

@wangxiaoyou1993 wangxiaoyou1993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also update the doc?

@dy46 dy46 merged commit 77d2441 into master Mar 8, 2024
5 checks passed
@dy46 dy46 deleted the 4599-updating-records-in-ms-sql-destination-instead-of-inserting branch March 8, 2024 21:54
@wangxiaoyou1993 wangxiaoyou1993 mentioned this pull request Mar 13, 2024
15 tasks
oonyoontong pushed a commit to bunker-tech/mage-ai that referenced this pull request May 2, 2024
* [dy] Add unique_conflict_method for mssql

* [dy] Update mssql doc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Polish or UX improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Updating records in MS SQL destination instead of inserting
2 participants