Skip to content

Conversation

NelGson
Copy link
Collaborator

@NelGson NelGson commented Aug 18, 2017

No description provided.

from sklearn.cluster import KMeans

#get data from input query
customer_data = my_input_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this line after modifying @input_data_1.

OutputDataSet = customer_data
'
, @input_data_1 = @input_query
, @input_data_1_name = N'my_input_data'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to 'customer_data'.

clusters = est.labels_
customer_data["cluster"] = clusters

OutputDataSet = customer_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this & use @output_data_1_name = 'customer_data' instead. This will output the dataframe back to SQL.

-- Stored procedure that performs customer clustering using Python and SQL Server ML Services
DROP PROCEDURE IF EXISTS [dbo].[py_generate_customer_return_clusters]
GO
CREATE procedure [dbo].[py_generate_customer_return_clusters]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to CREATE OR ALTER and remove the DROP PROCEDURE IF EXISTS.

JOIN
[dbo].[py_customer_clusters] as c
ON c.Customer = customer.c_customer_sk
WHERE c.cluster = 0; No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like there is no newline here or there is some special character. Can you add a carriage return after ; to make sure it runs?

"frequency": {"type": "integer"}
}

data_source = RxSqlServerData(sql_query=input_query, column_Info=column_info, connection_string=conn_str)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be column_info all lowercase.

}

data_source = RxSqlServerData(sql_query=input_query, column_Info=column_info, connection_string=conn_str)
RxInSqlServer(connection_string=conn_str, num_tasks=1, auto_cleanup=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this line. It is not needed since we are not using SQL compute context.

print(customer_data.groupby(['cluster']).mean())


perform_clustering() No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see some special character at the end. Can you add carriage return at end?

@NelGson
Copy link
Collaborator Author

NelGson commented Aug 18, 2017

Updated according to comments

@uc-msft uc-msft merged commit 7171589 into microsoft:master Aug 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants