![](MSFTLogo.png)
# SQL Server Machine Learning Services
## 04 - Train and save a Python model using T-SQL

In this Notebook, you will train a machine learning model using the Python packages *scikit-learn* and *revoscalepy*. These Python libraries are already installed with SQL Server Machine Learning Services.

You load the modules and call the necessary functions to create and train the model using a SQL Server stored procedure. The model requires the data features you engineered in earlier lessons. Finally, you save the trained model to a SQL Server table.

## Split the sample data into training and testing sets

The following code creates a stored procedure called *PyTrainTestSplit* to divide the data in the *nyctaxi_sample table* into two parts: *nyctaxi_sample_training* and *nyctaxi_sample_testing*:


In [1]:
USE NYCTaxi;
GO

DROP PROCEDURE IF EXISTS PyTrainTestSplit;
GO

CREATE PROCEDURE [dbo].[PyTrainTestSplit] (@pct int)
AS

DROP TABLE IF EXISTS dbo.nyctaxi_sample_training
SELECT * into nyctaxi_sample_training FROM nyctaxi_sample WHERE (ABS(CAST(BINARY_CHECKSUM(medallion,hack_license)  as int)) % 100) < @pct

DROP TABLE IF EXISTS dbo.nyctaxi_sample_testing
SELECT * into nyctaxi_sample_testing FROM nyctaxi_sample
WHERE (ABS(CAST(BINARY_CHECKSUM(medallion,hack_license)  as int)) % 100) > @pct
GO

To divide your data using a custom split, run the stored procedure, and type an integer that represents the percentage of data allocated to the training set. For example, the following statement would allocate 60% of data to the training set:

In [2]:
EXEC PyTrainTestSplit 60;
GO

After the data has been prepared, you can use it to train a model. You do this by calling a stored procedure that runs some Python code, taking as input the training data table. For this tutorial, you create two models, both binary classification models:

- The stored procedure *PyTrainScikit* creates a tip prediction model using the *scikit-learn* package
- The stored procedure *TrainTipPredictionModelRxPy* creates a tip prediction model using the *revoscalepy* package

Each stored procedure uses the input data you provide to create and train a logistic regression model. All Python code is wrapped in the system stored procedure, *sp_execute_external_script*. To make it easier to retrain the model on new data, you wrap the call to *sp_execute_external_script* in another stored procedure, and pass in the new training data as a parameter. This section will walk you through that process.

## PyTrainScikit

The following code creates the stored procedure *PyTrainScikit*. The stored procedure contains a definition of the input data, so you don't need to provide an input query:

In [3]:
DROP PROCEDURE IF EXISTS PyTrainScikit;
GO

CREATE PROCEDURE [dbo].[PyTrainScikit] (@trained_model varbinary(max) OUTPUT)
AS
BEGIN
EXEC sp_execute_external_script
  @language = N'Python',
  @script = N'
import numpy
import pickle
from sklearn.linear_model import LogisticRegression

##Create SciKit-Learn logistic regression model
X = InputDataSet[["passenger_count", "trip_distance", "trip_time_in_secs", "direct_distance"]]
y = numpy.ravel(InputDataSet[["tipped"]])

SKLalgo = LogisticRegression()
logitObj = SKLalgo.fit(X, y)

##Serialize model
trained_model = pickle.dumps(logitObj)
',
@input_data_1 = N'
select tipped, fare_amount, passenger_count, trip_time_in_secs, trip_distance, 
dbo.fnCalculateDistance(pickup_latitude, pickup_longitude,  dropoff_latitude, dropoff_longitude) as direct_distance
from nyctaxi_sample_training
',
@input_data_1_name = N'InputDataSet',
@params = N'@trained_model varbinary(max) OUTPUT',
@trained_model = @trained_model OUTPUT;
;
END;
GO

In [4]:
-- Insert the trained model into table nyc_taxi_models:

DECLARE @model VARBINARY(MAX);
EXEC PyTrainScikit @model OUTPUT;
INSERT INTO nyc_taxi_models (name, model) VALUES('SciKit_model', @model);
GO


In [5]:
-- Show the model - note that the code above could have specified more columns to serve as a model version:

SELECT * FROM nyc_taxi_models;
GO


model,name
0x800363736B6C6561726E2E6C696E6561725F6D6F64656C2E6C6F6769737469630A4C6F67697374696352656772657373696F6E0A7100298171017D710228580700000070656E616C7479710358020000006C32710458040000006475616C7105895803000000746F6C7106473F1A36E2EB1C432D5801000000437107473FF0000000000000580D0000006669745F696E746572636570747108885811000000696E746572636570745F7363616C696E6771094B01580C000000636C6173735F776569676874710A4E580C00000072616E646F6D5F7374617465710B4E5806000000736F6C766572710C58040000007761726E710D58080000006D61785F69746572710E4B64580B0000006D756C74695F636C617373710F680D5807000000766572626F736571104B00580A0000007761726D5F737461727471118958060000006E5F6A6F627371124E5808000000636C61737365735F7113636E756D70792E636F72652E6D756C746961727261790A5F7265636F6E7374727563740A7114636E756D70790A6E6461727261790A71154B008571164301627117877118527119284B014B0285711A636E756D70790A64747970650A711B58020000006934711C4B004B0187711D52711E284B0358010000003C711F4E4E4E4AFFFFFFFF4AFFFFFFFF4B007471206289430800000000010000007121747122625805000000636F65665F7123681468154B008571246817877125527126284B014B014B04867127681B5802000000663871284B004B0187712952712A284B03681F4E4E4E4AFFFFFFFF4AFFFFFFFF4B0074712B62894320865D0CAABF73A0BF70BF5DA718A7A53F12D0377F8CB682BE70C34C983BCC123F712C74712D62580A000000696E746572636570745F712E681468154B0085712F6817877130527131284B014B01857132682A894308CB2259C25B0DA03F71337471346258070000006E5F697465725F7135681468154B008571366817877137527138284B014B01857139681B58020000006934713A4B004B0187713B52713C284B03681F4E4E4E4AFFFFFFFF4AFFFFFFFF4B0074713D6289430419000000713E74713F6258100000005F736B6C6561726E5F76657273696F6E71405806000000302E32302E32714175622E,SciKit_model


## TrainTipPredictionModelRxPy

The next stored procedure uses the *revoscalepy* package, which is a new package for Python. It contains objects, transformation, and algorithms similar to those provided for the R language's RevoScaleR package.

By using *revoscalepy*, you can create remote compute contexts, move data between compute contexts, transform data, and train predictive models using popular algorithms such as logistic and linear regression, decision trees, and more. For more information, see revoscalepy module in SQL Server and revoscalepy function reference.

The following statement creates the stored procedure *TrainTipPredictionModelRxPy*. Because the stored procedure already includes a definition of the input data, you don't need to provide an input query:

In [6]:
DROP PROCEDURE IF EXISTS TrainTipPredictionModelRxPy;
GO

CREATE PROCEDURE [dbo].[TrainTipPredictionModelRxPy] (@trained_model varbinary(max) OUTPUT)
AS
BEGIN
EXEC sp_execute_external_script 
  @language = N'Python',
  @script = N'
import numpy
import pickle
from revoscalepy.functions.RxLogit import rx_logit

## Create a logistic regression model using rx_logit function from revoscalepy package
logitObj = rx_logit("tipped ~ passenger_count + trip_distance + trip_time_in_secs + direct_distance", data = InputDataSet);

## Serialize model
trained_model = pickle.dumps(logitObj)
',
@input_data_1 = N'
select tipped, fare_amount, passenger_count, trip_time_in_secs, trip_distance, 
dbo.fnCalculateDistance(pickup_latitude, pickup_longitude,  dropoff_latitude, dropoff_longitude) as direct_distance
from nyctaxi_sample_training
',
@input_data_1_name = N'InputDataSet',
@params = N'@trained_model varbinary(max) OUTPUT',
@trained_model = @trained_model OUTPUT;
;
END;
GO

This stored procedure performs the following steps as part of model training:

The **SELECT** query applies the custom scalar function *fnCalculateDistance* to calculate the direct distance between the pick-up and drop-off locations. The results of the query are stored in the default Python input variable, *InputDataset*.

The binary variable *tipped* is used as the label or outcome column, and the model is fit using these feature columns: *passenger_count, trip_distance, trip_time_in_secs*, and *direct_distance*.

The trained model is serialized and stored in the Python variable *logitObj*. By adding the T-SQL keyword **OUTPUT**, you can add the variable as an output of the stored procedure. In the next step, that variable is used to insert the binary code of the model into a database table *nyc_taxi_models*. This mechanism makes it easy to store and re-use models.

This code inserts the trained *revoscalepy* model into the table *nyc_taxi_models*, and check the contents - you should have two models now:

In [7]:
DECLARE @model VARBINARY(MAX);
EXEC TrainTipPredictionModelRxPy @model OUTPUT;
INSERT INTO nyc_taxi_models (name, model) VALUES('revoscalepy_model', @model);
GO

-- Check the models
SELECT * FROM nyc_taxi_models;
GO

model,name
0x800363736B6C6561726E2E6C696E6561725F6D6F64656C2E6C6F6769737469630A4C6F67697374696352656772657373696F6E0A7100298171017D710228580700000070656E616C7479710358020000006C32710458040000006475616C7105895803000000746F6C7106473F1A36E2EB1C432D5801000000437107473FF0000000000000580D0000006669745F696E746572636570747108885811000000696E746572636570745F7363616C696E6771094B01580C000000636C6173735F776569676874710A4E580C00000072616E646F6D5F7374617465710B4E5806000000736F6C766572710C58040000007761726E710D58080000006D61785F69746572710E4B64580B0000006D756C74695F636C617373710F680D5807000000766572626F736571104B00580A0000007761726D5F737461727471118958060000006E5F6A6F627371124E5808000000636C61737365735F7113636E756D70792E636F72652E6D756C746961727261790A5F7265636F6E7374727563740A7114636E756D70790A6E6461727261790A71154B008571164301627117877118527119284B014B0285711A636E756D70790A64747970650A711B58020000006934711C4B004B0187711D52711E284B0358010000003C711F4E4E4E4AFFFFFFFF4AFFFFFFFF4B007471206289430800000000010000007121747122625805000000636F65665F7123681468154B008571246817877125527126284B014B014B04867127681B5802000000663871284B004B0187712952712A284B03681F4E4E4E4AFFFFFFFF4AFFFFFFFF4B0074712B62894320865D0CAABF73A0BF70BF5DA718A7A53F12D0377F8CB682BE70C34C983BCC123F712C74712D62580A000000696E746572636570745F712E681468154B0085712F6817877130527131284B014B01857132682A894308CB2259C25B0DA03F71337471346258070000006E5F697465725F7135681468154B008571366817877137527138284B014B01857139681B58020000006934713A4B004B0187713B52713C284B03681F4E4E4E4AFFFFFFFF4AFFFFFFFF4B0074713D6289430419000000713E74713F6258100000005F736B6C6561726E5F76657273696F6E71405806000000302E32302E32714175622E,SciKit_model
0x8003637265766F7363616C6570792E66756E6374696F6E732E52784C6F6769740A52784C6F676974526573756C74730A7100298171017D710228580B0000005F6D6F64656C5F747970657103580700000072784C6F676974710458080000005F666F726D756C617105584E000000746970706564207E2070617373656E6765725F636F756E74202B20747269705F64697374616E6365202B20747269705F74696D655F696E5F73656373202B206469726563745F64697374616E6365710658070000005F706172616D7371077D710828580E00000072656D6F76654D697373696E67737109885806000000766172546F6C710A473D719799812DEA11580800000072436F6E64546F6C710B473D06849B86A12B9B5807000000466F726D756C61710C6806580400000043756265710D89580F0000004375626550726564696374696F6E73710E89580C000000726F7753656C656374696F6E710F4E580A0000007472616E73666F726D7371104E58100000007472616E73666F726D4F626A6563747371114E580D0000007472616E73666F726D46756E6371124E580D0000007472616E73666F726D5661727371134E58110000007472616E73666F726D5061636B6167657371144E580900000064726F7046697273747115895807000000636F76436F65667116895807000000636F7644617461711789580E000000696E697469616C5F76616C75657371184E580800000064726F704D61696E711988580D000000626C6F636B7350657252656164711A4B01580D0000004D6178497465726174696F6E73711B4B19581500000062557365725365744D6178497465726174696F6E73711C89580E000000436F656666546F6C6572616E6365711D473EB0C6F7A0B5ED8D581A0000004F626A65637469766546756E6374696F6E546F6C6572616E6365711E473E45798EE2308C3A58110000004772616469656E74546F6C6572616E6365711F473EB0C6F7A0B5ED8D580D0000002E2E7278446174614672616D6571204E5806000000466164644350712163636F6C6C656374696F6E730A4F726465726564446963740A712229527123580F00000062436F6D707574654C6F7748696768712488735808000000726561645661727371255D71262858060000007469707065647127580B000000666172655F616D6F756E747128580F00000070617373656E6765725F636F756E7471295811000000747269705F74696D655F696E5F73656373712A580D000000747269705F64697374616E6365712B580F0000006469726563745F64697374616E6365712C655807000000566572626F7365712D4B0058050000005072696E74712E89580E0000005265706F727450726F6772657373712F88580C0000005265706F727454696D696E6771304B01580E00000053686F77497465726174696F6E73713189581600000053686F7744657461696C6564497465726174696F6E73713289580400000074656D707133585A000000433A5C5C50726F6772616D446174615C5C4D5353514C5345525645525C5C54656D702D50595C5C417070636F6E7461696E6572315C5C34434143344441412D453342342D343846462D393532372D45333034433739424532423371345803000000746D7071356834580B0000007573657250726F66696C65713668347558050000005F6461746171374E58100000005F636F6D707574655F636F6E746578747138637265766F7363616C6570792E636F6D70757465636F6E746578742E52784C6F63616C5365710A52784C6F63616C5365710A71392981713A7D713B28580C0000005F6465736372697074696F6E713C58080000004C6F63616C536571713D58080000005F76657273696F6E713E5805000000312E302D31713F58090000005F72656D6F7465496471404AFFFFFFFF580F0000005F636F6E736F6C655F6F7574707574714189580D0000005F6175746F5F636C65616E757071428858030000005F69647143582000000062663261303466346130323431316561393734323030313535643031303730357144756258050000005F63616C6C714558D902000072785F6C6F67697428666F726D756C61203D203C636C6173732027737472273E2C2064617461203D203C636C617373202770616E6461732E636F72652E6672616D652E446174614672616D65273E2C207077656967687473203D204E6F6E652C206677656967687473203D204E6F6E652C2063756265203D2046616C73652C20637562655F70726564696374696F6E73203D2046616C73652C207661726961626C655F73656C656374696F6E203D204E6F6E652C20726F775F73656C656374696F6E203D204E6F6E652C207472616E73666F726D73203D204E6F6E652C207472616E73666F726D5F6F626A65637473203D204E6F6E652C207472616E73666F726D5F66756E6374696F6E203D204E6F6E652C207472616E73666F726D5F7661726961626C6573203D204E6F6E652C207472616E73666F726D5F7061636B61676573203D204E6F6E652C2064726F705F6669727374203D2046616C73652C2064726F705F6D61696E203D20547275652C20636F765F636F6566203D2046616C73652C20636F765F64617461203D2046616C73652C20696E697469616C5F76616C756573203D204E6F6E652C20636F65665F6C6162656C5F7374796C65203D20275265766F272C20626C6F636B735F7065725F72656164203D20312C206D61785F697465726174696F6E73203D204E6F6E652C20636F6566666963656E745F746F6C6572616E6365203D2031652D30362C206772616469656E745F746F6C6572616E6365203D2031652D30362C206F626A6563746976655F66756E6374696F6E5F746F6C6572616E6365203D2031652D30382C207265706F72745F70726F6772657373203D204E6F6E652C20766572626F7365203D20302C20636F6D707574655F636F6E74657874203D203C7265766F7363616C6570792E636F6D70757465636F6E746578742E52784C6F63616C5365712E52784C6F63616C536571206F626A656374206174203078303030303032323543464231463046303E29714658080000005F726573756C7473714768222952714828580C000000636F656666696369656E74737149636E756D70792E636F72652E6D756C746961727261790A5F7265636F6E7374727563740A714A636E756D70790A6E6461727261790A714B4B0085714C430162714D87714E52714F284B014B05857150636E756D70790A64747970650A71515802000000663871524B004B01877153527154284B0358010000003C71554E4E4E4AFFFFFFFF4AFFFFFFFF4B00747156628943282A20F8D37B2EA13FF25850463903A1BF490AD7AC7FA8A53FD2F4362002D682BE5FF85358EE68123F7157747158625810000000726573696475616C2E737175617265737159684A684B4B0085715A684D87715B52715C284B014B0085715D6854894300715E74715F625810000000636F6E646974696F6E2E6E756D6265727160473FE8CEAE17BF6300580400000072616E6B71614B055807000000616C69617365647162684A684B4B00857163684D877164527165284B014B0085716668515802000000693471674B004B01877168527169284B0368554E4E4E4AFFFFFFFF4AFFFFFFFF4B0074716A6289685E74716B62580E000000636F65662E7374642E6572726F72716C684A684B4B0085716D684D87716E52716F284B014B05857170685489432804186987CC576D3FBCBA9DEB8B7F573F52150E094630443FE7B9A3309D7B7B3EB5FD0D7EEF00163F717174717262580C000000636F65662E742E76616C75657173684A684B4B00857174684D877175527176284B014B05857177685489432888CB1359C5BC22404129E1E5F62A37C02BCD5C3A2B2A5140C8E577E78FEEF5BFE6853FAE08C6EA3F717874717962580C000000636F65662E702E76616C7565717A684A684B4B0085717B684D87717C52717D284B014B0585717E6854894328000000000000B03C000000000000B03C000000000000B03CE053D9A879D1C53FC0CC972914C7D93F717F74718062580D000000746F74616C2E73717561726573718147412FA0E4C82F71525805000000792E7661727182684A684B4B00857183684D877184527185284B014B00857186685489685E7471876258050000007369676D617188684A684B4B00857189684D87718A52718B284B014B0085718C685489685E74718D625811000000726573696475616C2E76617269616E6365718E684A684B4B0085718F684D877190527191284B014B00857192685489685E747193625809000000722E737175617265647194684A684B4B00857195684D877196527197284B014B00857198685489685E747199625807000000662E76616C7565719A684A684B4B0085719B684D87719C52719D284B014B0085719E685489685E74719F625808000000662E7076616C756571A0684A684B4B008571A1684D8771A25271A3284B014B008571A4685489685E7471A5625807000000662E6E756D646671A647FFFFFFFFFFFFFFFF5802000000646671A747412F4D1E000000005807000000792E6E616D657371A8580600000074697070656471A9580A000000636F65662E6E616D657371AA5D71AB28580B00000028496E746572636570742971AC580F00000070617373656E6765725F636F756E7471AD580D000000747269705F64697374616E636571AE5811000000747269705F74696D655F696E5F7365637371AF580F0000006469726563745F64697374616E636571B0655807000000786C6576656C7371B1637265766F7363616C6570792E527853657269616C697A61626C650A446174614672616D65446963740A71B2298171B3580C000000636F762E756E7363616C656471B4684A684B4B008571B5684D8771B65271B7284B014B008571B8685489685E7471B962580A000000706172746974696F6E7371BA6822295271BB580A000000706172746974696F6E7371BC68B2298171BD73580900000076616C69642E6F627371BE47412F4D2800000000580B0000006D697373696E672E6F627371BF470000000000000000580800000064657669616E636571C0474135935E3F58A238580300000061696371C147413593683F58A238580A00000064697370657273696F6E71C2473FF0000000000000580A000000697465726174696F6E7371C34B09757D71C458080000007661725F696E666F71C57D71C6285810000000726573696475616C2E7371756172657371C77D71C828580A0000002E72784C6F774869676871C9684A684B4B008571CA684D8771CB5271CC284B014B028571CD6854894310FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF71CE7471CF6258070000005661725479706571D05D71D1285807000000666C6F6174363471D25806000000446F75626C6571D36575580E000000636F65662E7374642E6572726F7271D47D71D528580A0000002E72784C6F774869676871D6684A684B4B008571D7684D8771D85271D9284B014B028571DA6854894310FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF71DB7471DC6258070000005661725479706571DD5D71DE285807000000666C6F6174363471DF5806000000446F75626C6571E06575580C000000636F65662E742E76616C756571E17D71E228580A0000002E72784C6F774869676871E3684A684B4B008571E4684D8771E55271E6284B014B028571E76854894310FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF71E87471E96258070000005661725479706571EA5D71EB285807000000666C6F6174363471EC5806000000446F75626C6571ED6575580C000000636F65662E702E76616C756571EE7D71EF28580A0000002E72784C6F774869676871F0684A684B4B008571F1684D8771F25271F3284B014B028571F46854894310FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF71F57471F66258070000005661725479706571F75D71F8285807000000666C6F6174363471F95806000000446F75626C6571FA6575580D000000746F74616C2E7371756172657371FB7D71FC28580A0000002E72784C6F774869676871FD684A684B4B008571FE684D8771FF527200010000284B014B028572010100006854894310FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF72020100007472030100006258070000005661725479706572040100005D7205010000285807000000666C6F6174363472060100005806000000446F75626C65720701000065755805000000792E76617272080100007D720901000028580A0000002E72784C6F7748696768720A010000684A684B4B0085720B010000684D87720C01000052720D010000284B014B0285720E0100006854894310FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF720F0100007472100100006258070000005661725479706572110100005D7212010000285807000000666C6F6174363472130100005806000000446F75626C657214010000657558050000007369676D6172150100007D721601000028580A0000002E72784C6F77486967687217010000684A684B4B00857218010000684D87721901000052721A010000284B014B0285721B0100006854894310FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF721C01000074721D01000062580700000056617254797065721E0100005D721F010000285807000000666C6F6174363472200100005806000000446F75626C65722101000065755811000000726573696475616C2E76617269616E636572220100007D722301000028580A0000002E72784C6F77486967687224010000684A684B4B00857225010000684D877226010000527227010000284B014B028572280100006854894310FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF722901000074722A01000062580700000056617254797065722B0100005D722C010000285807000000666C6F61743634722D0100005806000000446F75626C65722E01000065755809000000722E73717561726564722F0100007D723001000028580A0000002E72784C6F77486967687231010000684A684B4B00857232010000684D877233010000527234010000284B014B028572350100006854894310FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF72360100007472370100006258070000005661725479706572380100005D7239010000285807000000666C6F61743634723A0100005806000000446F75626C65723B01000065755807000000662E76616C7565723C0100007D723D01000028580A0000002E72784C6F7748696768723E010000684A684B4B0085723F010000684D877240010000527241010000284B014B028572420100006854894310FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF72430100007472440100006258070000005661725479706572450100005D7246010000285807000000666C6F6174363472470100005806000000446F75626C65724801000065755808000000662E7076616C756572490100007D724A01000028580A0000002E72784C6F7748696768724B010000684A684B4B0085724C010000684D87724D01000052724E010000284B014B0285724F0100006854894310FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF72500100007472510100006258070000005661725479706572520100005D7253010000285807000000666C6F6174363472540100005806000000446F75626C6572550100006575580300000061696372560100007D725701000028580A0000002E72784C6F77486967687258010000684A684B4B00857259010000684D87725A01000052725B010000284B014B0285725C0100006854894310FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF725D01000074725E01000062580700000056617254797065725F0100005D7260010000285807000000666C6F6174363472610100005806000000446F75626C657262010000657575736275622E,revoscalepy_model


Now proceed to the **05 - Run Batch and Single-score Predictions using T-SQL** Jupyter Notebook.