-
Notifications
You must be signed in to change notification settings - Fork 9.1k
New Python clustering tutorial #271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
added ML Services Python
from sklearn.cluster import KMeans | ||
|
||
#get data from input query | ||
customer_data = my_input_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this line after modifying @input_data_1.
OutputDataSet = customer_data | ||
' | ||
, @input_data_1 = @input_query | ||
, @input_data_1_name = N'my_input_data' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to 'customer_data'.
clusters = est.labels_ | ||
customer_data["cluster"] = clusters | ||
|
||
OutputDataSet = customer_data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this & use @output_data_1_name = 'customer_data' instead. This will output the dataframe back to SQL.
-- Stored procedure that performs customer clustering using Python and SQL Server ML Services | ||
DROP PROCEDURE IF EXISTS [dbo].[py_generate_customer_return_clusters] | ||
GO | ||
CREATE procedure [dbo].[py_generate_customer_return_clusters] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to CREATE OR ALTER and remove the DROP PROCEDURE IF EXISTS.
JOIN | ||
[dbo].[py_customer_clusters] as c | ||
ON c.Customer = customer.c_customer_sk | ||
WHERE c.cluster = 0; No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like there is no newline here or there is some special character. Can you add a carriage return after ; to make sure it runs?
"frequency": {"type": "integer"} | ||
} | ||
|
||
data_source = RxSqlServerData(sql_query=input_query, column_Info=column_info, connection_string=conn_str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be column_info all lowercase.
} | ||
|
||
data_source = RxSqlServerData(sql_query=input_query, column_Info=column_info, connection_string=conn_str) | ||
RxInSqlServer(connection_string=conn_str, num_tasks=1, auto_cleanup=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this line. It is not needed since we are not using SQL compute context.
print(customer_data.groupby(['cluster']).mean()) | ||
|
||
|
||
perform_clustering() No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see some special character at the end. Can you add carriage return at end?
Updated according to comments |
No description provided.