microsoft · uc-msft · Apr 20, 2017 · Apr 18, 2017 · Apr 18, 2017 · Apr 18, 2017
diff --git a/...es/machine-learning-services/python/getting-started/rental-prediction/README.md b/...es/machine-learning-services/python/getting-started/rental-prediction/README.md
@@ -0,0 +1,90 @@
+# Build a predictive model with SQL Server Python
+
+This sample shows how to create a predictive model in Python and operationalize it with SQL Server vNext.
+
+### Contents
+
+[About this sample](#about-this-sample)<br/>
+[Before you begin](#before-you-begin)<br/>
+[Sample details](#sample-details)<br/>
+[Related links](#related-links)<br/>
+
+
+<a name=about-this-sample></a>
+
+## About this sample
+
+Predictive modeling is a powerful way to add intelligence to your application. It enables applications to predict outcomes against new data.
+The act of incorporating predictive analytics into your applications involves two major phases: 
+model training and model operationalization.
+
+In this sample, you will learn how to create a predictive model in python and operationalize it with SQL Server vNext.
+
+
+<!-- Delete the ones that don't apply -->
+- **Applies to:** SQL Server vNext 
+- **Key features:**SQL Server Machine Learning Services 
+- **Workload:** SQL Server Machine Learning Services
+- **Programming Language:** T-SQL, Python
+- **Authors:** Nellie Gustafsson
+- **Update history:** Getting started tutorial for SQL Server ML Services - Python 
+
+<a name=before-you-begin></a>
+
+## Before you begin
+
+To run this sample, you need the following prerequisites: </br>
+Download a DB backup file and restore it using Setup.sql. [Download DB](https://deve2e.azureedge.net/sqlchoice/static/TutorialDB.bak)
+
+**Software prerequisites:**
+
+<!-- Examples -->
+1. SQL Server vNext CTP2.0 (or higher) with Machine Learning Services (Python) installed
+2. SQL Server Management Studio
+3. Python Tools for Visual Studio
+
+## Run this sample
+1. From SQL Server Management Studio or SQL Server Data Tools connect to your SQL Server vNext database and execute setup.sql to restore the sample DB you have downloaded </br>
+2. From SQL Server Management Studio or SQL Server Data Tools, open the Predictive Model Python.sql script </br>
+This script sets up: </br>
+Necessary tables </br>
+Creates stored procedure to train a model </br>
+Creates a stored procedure to predict using that model </br>
+Saves the predicted results to a DB table </br>
+3. You can also try the python script on its own. Just remember to point the Python environment to the corresponding path "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES" if you run in-db Python Server, or 
+"C:\Program Files\Microsoft SQL Server\140\PYTHON_SERVER" if you have the standalone Machine Learning Server installed.
+
+<a name=sample-details></a>
+
+## Sample details
+
+This sample shows how to create a predictive model with Python and generate predictions using the model and deploy that in SQL Server with SQL Server Machine Learning Services. 
+
+### rental_prediction.py
+The Python script that generates a predictive model and uses it to predict rental counts
+
+###  rental_prediction.sql
+Takes the Python code in Predictive Model.py and deploys it inside SQL Server. Creating stored procedures and tables for training, storing models and creating stored procedures for prediction.
+
+
+
+Service uses Tedious library for data access and built-in JSON functionalities that are available in SQL Server 2016 and Azure SQL Database.
+
+<a name=disclaimers></a>
+
+## Disclaimers
+The code included in this sample is not intended demonstrate some general guidance and architectural patterns for web development.
+It contains minimal code required to create a REST API.
+You can easily modify this code to fit the architecture of your application.
+
+
+<a name=related-links></a>
+
+## Related Links
+<!-- Links to more articles. Remember to delete "en-us" from the link path. -->
+
+For additional content, see these articles:
+
+[SQL Server R Services - Upgrade and Installation FAQ](https://msdn.microsoft.com/en-us/library/mt653951.aspx)
+[Other SQL Server R Services Tutorials](https://msdn.microsoft.com/en-us/library/mt591993.aspx)
+[Watch a presentation about predictive modeling in SQL Server, that also goes through this sample](https://www.youtube.com/watch?v=YCyj9cdi4Nk&feature=youtu.be)
diff --git a/...s/machine-learning-services/python/getting-started/rental-prediction/rental_prediction.py b/...s/machine-learning-services/python/getting-started/rental-prediction/rental_prediction.py
@@ -0,0 +1,69 @@
+import pandas as pd
+from sklearn.linear_model import LinearRegression
+from sklearn.metrics import mean_squared_error
+
+from revoscalepy.computecontext.RxInSqlServer import RxInSqlServer
+from revoscalepy.computecontext.RxInSqlServer import RxSqlServerData
+from revoscalepy.etl.RxImport import rx_import_datasource
+
+
+def get_rental_predictions():
+    conn_str = 'Driver=SQL Server;Server=MYSQLSERVER;Database=TutorialDB;Trusted_Connection=True;'
+    column_info = { 
+            "Year" : { "type" : "integer" },
+            "Month" : { "type" : "integer" }, 
+            "Day" : { "type" : "integer" }, 
+            "RentalCount" : { "type" : "integer" }, 
+            "WeekDay" : { 
+                "type" : "factor", 
+                "levels" : ["1", "2", "3", "4", "5", "6", "7"]
+            },
+            "Holiday" : { 
+                "type" : "factor", 
+                "levels" : ["1", "0"]
+            },
+            "Snow" : { 
+                "type" : "factor", 
+                "levels" : ["1", "0"]
+            }
+        }
+
+    data_source = RxSqlServerData(table="dbo.rental_data",
+                                  connectionString=conn_str, colInfo=column_info)
+    computeContext = RxInSqlServer(
+        connectionString = conn_str,
+        numTasks = 1,
+        autoCleanup = False
+        )
+
+
+    RxInSqlServer(connectionString=conn_str, numTasks=1, autoCleanup=False)
+
+    # import data source and convert to pandas dataframe
+    df = pd.DataFrame(rx_import_datasource(data_source))
+    print("Data frame:", df)
+    # Get all the columns from the dataframe.
+    columns = df.columns.tolist()
+    # Filter the columns to remove ones we don't want.
+    columns = [c for c in columns if c not in ["Year"]]
+    # Store the variable we'll be predicting on.
+    target = "RentalCount"
+    # Generate the training set.  Set random_state to be able to replicate results.
+    train = df.sample(frac=0.8, random_state=1)
+    # Select anything not in the training set and put it in the testing set.
+    test = df.loc[~df.index.isin(train.index)]
+    # Print the shapes of both sets.
+    print("Training set shape:", train.shape)
+    print("Testing set shape:", test.shape)
+    # Initialize the model class.
+    lin_model = LinearRegression()
+    # Fit the model to the training data.
+    lin_model.fit(train[columns], train[target])
+    # Generate our predictions for the test set.
+    lin_predictions = lin_model.predict(test[columns])
+    print("Predictions:", lin_predictions)
+    # Compute error between our test predictions and the actual values.
+    lin_mse = mean_squared_error(lin_predictions, test[target])
+    print("Computed error:", lin_mse)
+
+get_rental_predictions()
diff --git a/.../machine-learning-services/python/getting-started/rental-prediction/rental_prediction.sql b/.../machine-learning-services/python/getting-started/rental-prediction/rental_prediction.sql
@@ -0,0 +1,147 @@
+
+USE TutorialDB;
+
+-- Table containing ski rental data
+SELECT * FROM [dbo].[rental_data];
+
+
+
+-------------------------- STEP 1 - Setup model table ----------------------------------------
+DROP TABLE IF EXISTS rental_py_models;
+GO
+CREATE TABLE rental_py_models (
+                model_name VARCHAR(30) NOT NULL DEFAULT('default model') PRIMARY KEY,
+                model VARBINARY(MAX) NOT NULL
+);
+GO
+
+
+-------------------------- STEP 2 - Train model ----------------------------------------
+-- Stored procedure that trains and generates an R model using the rental_data and a decision tree algorithm
+DROP PROCEDURE IF EXISTS generate_rental_py_model;
+go
+CREATE PROCEDURE generate_rental_py_model (@trained_model varbinary(max) OUTPUT)
+AS
+BEGIN
+    EXECUTE sp_execute_external_script
+      @language = N'Python'
+    , @script = N'
+
+df = rental_train_data
+
+# Get all the columns from the dataframe.
+columns = df.columns.tolist()
+
+
+# Store the variable well be predicting on.
+target = "RentalCount"
+
+from sklearn.linear_model import LinearRegression
+
+# Initialize the model class.
+lin_model = LinearRegression()
+# Fit the model to the training data.
+lin_model.fit(df[columns], df[target])
+
+import pickle
+#Before saving the model to the DB table, we need to convert it to a binary object
+trained_model = pickle.dumps(lin_model)
+'
+
+    , @input_data_1 = N'select "RentalCount", "Year", "Month", "Day", "WeekDay", "Snow", "Holiday" from dbo.rental_data where Year < 2015'
+    , @input_data_1_name = N'rental_train_data'
+    , @params = N'@trained_model varbinary(max) OUTPUT'
+    , @trained_model = @trained_model OUTPUT;
+END;
+GO
+
+------------------- STEP 3 - Save model to table -------------------------------------
+TRUNCATE TABLE rental_py_models;
+
+DECLARE @model VARBINARY(MAX);
+EXEC generate_rental_py_model @model OUTPUT;
+
+INSERT INTO rental_py_models (model_name, model) VALUES('linear_model', @model);
+
+SELECT * FROM rental_py_models;
+
+
+
+------------------ STEP 4  - Use the model to predict number of rentals --------------------------
+DROP PROCEDURE IF EXISTS py_predict_rentalcount;
+GO
+CREATE PROCEDURE py_predict_rentalcount (@model varchar(100))
+AS
+BEGIN
+	DECLARE @py_model varbinary(max) = (select model from rental_py_models where model_name = @model);
+
+	EXEC sp_execute_external_script 
+					@language = N'Python'
+				  , @script = N'
+
+
+import pickle
+rental_model = pickle.loads(py_model)
+
+
+df = rental_score_data
+#print(df)
+
+# Get all the columns from the dataframe.
+columns = df.columns.tolist()
+# Filter the columns to remove ones we dont want.
+# columns = [c for c in columns if c not in ["Year"]]
+
+# Store the variable well be predicting on.
+target = "RentalCount"
+
+# Generate our predictions for the test set.
+lin_predictions = rental_model.predict(df[columns])
+print(lin_predictions)
+
+# Import the scikit-learn function to compute error.
+from sklearn.metrics import mean_squared_error
+# Compute error between our test predictions and the actual values.
+lin_mse = mean_squared_error(linpredictions, df[target])
+#print(lin_mse)
+
+import pandas as pd
+predictions_df = pd.DataFrame(lin_predictions)  
+OutputDataSet = pd.concat([predictions_df, df["RentalCount"], df["Month"], df["Day"], df["WeekDay"], df["Snow"], df["Holiday"], df["Year"]], axis=1)
+'
+	, @input_data_1 = N'Select "RentalCount", "Year" ,"Month", "Day", "WeekDay", "Snow", "Holiday"  from rental_data where Year = 2015'
+	, @input_data_1_name = N'rental_score_data'
+	, @params = N'@py_model varbinary(max)'
+	, @py_model = @py_model
+	with result sets (("RentalCount_Predicted" float, "RentalCount" float, "Month" float,"Day" float,"WeekDay" float,"Snow" float,"Holiday" float, "Year" float));
+
+END;
+GO
+
+
+---------------- STEP 5 - Create DB table to store predictions -----------------------
+DROP TABLE IF EXISTS [dbo].[py_rental_predictions];
+GO
+--Create a table to store the predictions in
+CREATE TABLE [dbo].[py_rental_predictions](
+	[RentalCount_Predicted] [int] NULL,
+	[RentalCount_Actual] [int] NULL,
+	[Month] [int] NULL,
+	[Day] [int] NULL,
+	[WeekDay] [int] NULL,
+	[Snow] [int] NULL,
+	[Holiday] [int] NULL,
+	[Year] [int] NULL
+) ON [PRIMARY]
+GO
+
+
+---------------- STEP 6 - Save the predictions in a DB table -----------------------
+TRUNCATE TABLE py_rental_predictions;
+--Insert the results of the predictions for test set into a table
+INSERT INTO py_rental_predictions
+EXEC py_predict_rentalcount 'linear_model';
+
+-- Select contents of the table
+SELECT * FROM py_rental_predictions;
+
diff --git a/samples/features/r-services/README.md b/samples/features/r-services/README.md
@@ -1,12 +1,10 @@
-# Samples for SQL Server R Services
+# Samples for SQL Server Machine Learning Services
 
-Go to the getting started tutorials to learn more about:
 
-[Predictive Modeling with R Services](https://www.microsoft.com/en-us/sql-server/developer-get-started/rprediction)
+Go to the getting started tutorials to learn more about:
 
 [Customer Clustering with R Services](https://www.microsoft.com/en-us/sql-server/developer-get-started/rclustering)
 
-
 [Telco Customer Churn](Telco Customer Churn)
 
 Telco Customer Churn sample using SQL Server R Services.

diff --git a/samples/features/r-services/getting-started/rental-prediction/Predictive Model.R b/samples/features/r-services/getting-started/rental-prediction/Predictive Model.R
@@ -0,0 +1,66 @@
+
+##################### STEP1 - Connect to DB and read data ####################
+
+#Connection string to connect to SQL Server named instance
+connStr <- paste("Driver=SQL Server; Server=", "MYSQLSERVER", 
+                ";Database=", "Tutorialdb", ";Trusted_Connection=true;", sep = "");
+
+#Get the data from SQL Server Table
+SQL_rentaldata <- RxSqlServerData(table = "dbo.rental_data",
+                              connectionString = connStr, returnDataFrame = TRUE);
+
+#Import the data into a data frame
+rentaldata <- rxImport(SQL_rentaldata);
+
+#Let's see the structure of the data and the top rows
+# Ski rental data, giving the number of ski rentals on a given date
+head(rentaldata);
+
+
+##################### STEP2 - Clean and prepare the data ####################
+
+#Changing the three factor columns to factor types
+#This helps when building the model because we are explicitly saying that these values are categorical
+rentaldata$Holiday <- factor(rentaldata$Holiday);
+rentaldata$Snow <- factor(rentaldata$Snow);
+rentaldata$WeekDay <- factor(rentaldata$WeekDay);
+
+#Visualize the dataset after the change
+str(rentaldata);
+
+##################### STEP3 - train model ####################
+
+#Now let's split the dataset into 2 different sets
+#One set for training the model and the other for validating it
+train_data = rentaldata[rentaldata$Year < 2015,];
+test_data = rentaldata[rentaldata$Year == 2015,];
+
+#Use this column to check the quality of the prediction against actual values
+actual_counts <- test_data$RentalCount;
+
+#Model 1: Use rxLinMod to create a linear regression model. We are training the data using the training data set
+model_linmod <- rxLinMod(RentalCount ~  Month + Day + WeekDay + Snow + Holiday, data = train_data);
+
+#Model 2: Use rxDTree to create a decision tree model. We are training the data using the training data set
+model_dtree <- rxDTree(RentalCount ~ Month + Day + WeekDay + Snow + Holiday, data = train_data);
+
+
+#################### STEP4 - Predict using the models ########################
+
+#Use the models we just created to predict using the test data set.
+#That enables us to compare actual values of RentalCount from the two models and compare to the actual values in the test data set
+predict_linmod <- rxPredict(model_linmod, test_data, writeModelVars = TRUE, extraVarsToWrite = c("Year"));
+
+predict_dtree <- rxPredict(model_dtree, test_data, writeModelVars = TRUE, extraVarsToWrite = c("Year"));
+
+#Look at the top rows of the two prediction data sets.
+head(predict_linmod);
+head(predict_dtree);
+
+#################### STEP5 - Compare models ########################
+#Now we will use the plotting functionality in R to viusalize the results from the predictions
+#We are plotting the difference between actual and predicted values for both models to compare accuracy
+par(mfrow = c(2, 1));
+plot(predict_linmod$RentalCount_Pred - predict_linmod$RentalCount, main = "Difference between actual and predicted. rxLinmod");
+plot(predict_dtree$RentalCount_Pred - predict_dtree$RentalCount, main = "Difference between actual and predicted. rxDTree");
+