![](MSFTLogo.png)
# SQL Server Machine Learning Services
## 01 - Installation, Overview and Setup

Machine Learning Services is a feature in SQL Server that gives the ability to run Python and R scripts with relational data. You can use open-source packages and frameworks, and the Microsoft Python and R packages, for predictive analytics and machine learning. The scripts are executed in-database without moving data outside SQL Server or over the network. This article explains the basics of SQL Server Machine Learning Services and how to get started.

These Jupyter Notbooks walk through the New York City Taxi well-known example for predicting tips. You'll use Python on the SQL Server, but you can also use R or other languages. [You can learn more about this example, and get the database backup you need at this reference.](https://docs.microsoft.com/en-us/sql/machine-learning/tutorials/demo-data-nyctaxi-in-sql?view=sql-server-ver15)

You'll begin with enabling SQL Server Machine Learning Services - after you enable them, you need to re-start the Instance to ensure the feature is functional.

In [1]:
-- Examine the current settings on your SQL Server Instance
sp_configure

name,minimum,maximum,config_value,run_value
allow polybase export,0,1,0,0
allow updates,0,1,0,0
backup checksum default,0,1,0,0
backup compression default,0,1,0,0
clr enabled,0,1,0,0
column encryption enclave type,0,2,0,0
contained database authentication,0,1,0,0
cross db ownership chaining,0,1,0,0
default language,0,9999,0,0
external scripts enabled,0,1,0,0


In [2]:
-- Enable SQL Server Machine Learning Services
EXEC sp_configure  'external scripts enabled', 1
RECONFIGURE WITH OVERRIDE

-- Next, restart the Instance before you run the next cell

In [1]:
-- Check to see if the service is running (run_value should be 1)
EXECUTE sp_configure  'external scripts enabled'

name,minimum,maximum,config_value,run_value
external scripts enabled,0,1,1,1


In [2]:
-- Check R, and then Python

EXEC sp_execute_external_script  @language =N'R',
@script=N'
OutputDataSet <- InputDataSet;
',
@input_data_1 =N'SELECT 1 AS R_Is_Functional'
WITH RESULT SETS (([R_Is_Functional] int not null));
GO

EXEC sp_execute_external_script  @language =N'Python',
@script=N'
OutputDataSet = InputDataSet;
',
@input_data_1 =N'SELECT 1 AS Python_Is_Functional'
WITH RESULT SETS (([Python_Is_Functional] int not null));
GO

R_Is_Functional
1


Python_Is_Functional
1


## Download and install the Taxi Database

The sample database is a SQL Server 2016 backup file hosted by Microsoft. You can restore it on SQL Server 2016 and later. File download begins immediately when you click the link.(File size is approximately 90 MB)

1. Click [NYCTaxi_Sample.bak to download the database backup file](https://sqlmldoccontent.blob.core.windows.net/sqlml/NYCTaxi_Sample.bak).
2. Copy the file to **C:\Program files\Microsoft SQL Server\MSSQL-instance-name\MSSQL\Backup** folder.
3. In Management Studio, right-click **Databases** and select **Restore Files and File Groups**.
4. Enter **NYCTaxi** as the database name.
5. Click **From device** and then open the file selection page to select the backup file. Click Add to select **NYCTaxi_Sample.bak**.
6. Select the **Restore** checkbox and click **OK** to restore the database.

Run a few queries to examine the database contents:

In [3]:
USE NYCTaxi;
GO

SELECT TOP(10) * FROM dbo.nyctaxi_sample;
SELECT COUNT(*) FROM dbo.nyctaxi_sample;

medallion,hack_license,vendor_id,rate_code,store_and_fwd_flag,pickup_datetime,dropoff_datetime,passenger_count,trip_time_in_secs,trip_distance,pickup_longitude,pickup_latitude,dropoff_longitude,dropoff_latitude,payment_type,fare_amount,surcharge,mta_tax,tolls_amount,total_amount,tip_amount,tipped,tip_class
C6F86DFD1C85EEFBFC234FCB953E1D4B,79C37DA10EA88D6B467F9FA2B29F8006,CMT,1,N,2013-11-17 12:26:45.000,2013-11-17 12:32:02.000,1,316,0.8,-73.98864,40.748764,-74.002075,40.754913,CSH,5.5,0,0.5,0,6,0,0,0
C8B42D06B8961C18CE435A51FE6A3C0B,59A32B69453C9E2EAAFC4C822A914DC9,CMT,1,N,2013-11-20 06:33:57.000,2013-11-20 06:38:47.000,1,289,0.8,-73.980888,40.753613,-73.977753,40.763477,CSH,5.5,0,0.5,0,6,0,0,0
C291E4F982CE74F51D4C388CD2595B80,3BD04BB028B2145779C0D3F6DD42ACD6,CMT,1,N,2013-12-21 12:57:03.000,2013-12-21 13:02:26.000,1,322,0.8,-73.986847,40.750877,-73.979164,40.757526,CSH,5.5,0,0.5,0,6,0,0,0
C2B9067BB92017120FEF9865217E5E53,7D2921FDCC869190E736D3B731C66DC5,CMT,1,N,2013-12-08 11:09:52.000,2013-12-08 11:14:13.000,1,260,0.8,-73.978096,40.72953,-73.988503,40.736359,CSH,5.5,0,0.5,0,6,0,0,0
BF0445196F40892C2E39501333833245,0B071535952183F132EB38B643E2252E,CMT,1,N,2013-10-21 10:58:58.000,2013-10-21 11:03:50.000,1,291,0.8,-73.987923,40.769955,-73.981415,40.780495,CSH,5.5,0,0.5,0,6,0,0,0
BF4E062AE5D0C2EF0B82C4A6EBA19CA9,55896BD722D3504FDA94C38A17603EC1,CMT,1,N,2013-10-31 13:38:26.000,2013-10-31 13:43:59.000,1,332,0.8,-74.000748,40.718441,-73.988609,40.718552,CSH,5.5,0,0.5,0,6,0,0,0
C34BAE8B360BEFDAFEA5B28CDEC56586,40411995121B7F36031730CB5BDB6906,CMT,1,N,2013-12-14 17:42:00.000,2013-12-14 17:47:47.000,1,347,0.8,-73.971077,40.788174,-73.979683,40.783722,CSH,5.5,0,0.5,0,6,0,0,0
C34CEB1F63F59A43297B9DFB35F9C561,F6D14E7742FE7EFFCE16E55CB62274FC,CMT,1,N,2013-12-07 13:06:17.000,2013-12-07 13:11:29.000,1,312,0.8,-74.014412,40.703903,-74.003571,40.706619,CSH,5.5,0,0.5,0,6,0,0,0
CB86B19E3AAE896309FCD5226B0AEFB5,56141DE00CE0C3D480782C196B535BC3,CMT,1,N,2013-11-02 14:58:16.000,2013-11-02 15:03:01.000,1,285,0.8,-73.994019,40.751293,-73.981377,40.746857,CSH,5.5,0,0.5,0,6,0,0,0
C2009A8A5FDFE471692880EE0F2476EF,56077F7CDA24CCB1E776E36C1D471170,CMT,1,N,2013-10-06 09:42:16.000,2013-10-06 09:47:06.000,1,289,0.8,-74.010429,40.71891,-74.008064,40.709061,CSH,5.5,0,0.5,0,6,0,0,0


(No column name)
1703957


In [4]:
SELECT DISTINCT [passenger_count]
    , ROUND (SUM ([fare_amount]),0) as TotalFares
    , ROUND (AVG ([fare_amount]),0) as AvgFares
FROM [dbo].[nyctaxi_sample]
GROUP BY [passenger_count]
ORDER BY  AvgFares DESC;
GO


passenger_count,TotalFares,AvgFares
0,388,43
3,923924,13
4,462841,13
2,2971879,13
1,14559642,12
5,1213683,12
6,832558,12
8,8,8


## Completing the Data Exploration in T-SQL

You can now use more Transact-SQL statements to more fully explore the dataset. The database is now in a secure, highly-performing Relational system, and you can train models, store and version them, and even score all from within this system using Python, R and other languages. This allows the Database Administrator control over the platform, and full flexibility for the Data Scientist to create and implement models. 

Now proceed to the **02 - Explore and visualize the data using Python** Jupyter Notebook.