# Read and write from Spark to SQL using the MSSQL jdbc Connector
A typical big data scenario a key usage pattern is high volume, velocity and variety data processing in Spark followed with batch/streaming writes to SQL for access to LOB applications. These usage patterns greatly benefit from a connector that utilizes key SQL optimizations and provides an efficient and reliable write to SQLServer Big Data Cluster or SQL DB. 

MSSQL JDBC connector, referenced by the name com.microsoft.sqlserver.jdbc.spark, uses [SQL Server Bulk copy APIS](https://docs.microsoft.com/en-us/sql/connect/jdbc/using-bulk-copy-with-the-jdbc-driver?view=sql-server-2017#sqlserverbulkcopyoptions) to implement an efficient write to SQL Server. The connector is based on Spark Data source APIs and provides a familiar JDBC interface for access.

The following sample shows how to use the MSSQL JDBC Connector for writing and reading to/from a SQL Source. In this sample we' ll 
- Read a file from HDFS and do some basic processing 
- post that we'll write the dataframe to SQL server table using the MSSQL Connector. 
- Followed by the write we'll read using the MSSQLConnector.

PreReq : 
- The sample uses a SQL database named "MyTestDatabase". Create this before you run this sample. The database can be created as follows
    ``` sql
    Create DATABASE MyTestDatabase
    GO 
    ``` 
- Download [AdultCensusIncome.csv]( https://amldockerdatasets.azureedge.net/AdultCensusIncome.csv ) to your local machine.  Create a hdfs folder named spark_data and upload the file there. 
- Configure the spark session to use the MSSQL Connector jar. The jar can be found at /jar/spark-mssql-connector-assembly-1.0.0.jar post deployment of Big Data Cluster.

``` sh
    %%configure -f
    {"conf": {"spark.jars": "/jar/spark-mssql-connector-assembly-1.0.0.jar"}}
```

    
 

# Configure the notebook to use the MSSQL Spark connector
This step woould be removed in subsequent CTPs. As of CTP2.5 this step is required to point the spark session to the relevant jar.
 

In [4]:
%%configure -f
{"conf": {"spark.jars": "/jar/spark-mssql-connector-assembly-1.0.0.jar"}}





# Read data into a data frame
In this step we read the data into a data frame and do some basic clearup steps. 



In [6]:
#Read a file and then write it to the SQL table
datafile = "/spark_data/AdultCensusIncome.csv"
df = spark.read.format('csv').options(header='true', inferSchema='true', ignoreLeadingWhiteSpace='true', ignoreTrailingWhiteSpace='true').load(datafile)
df.show(5)


Starting Spark application


The code failed because of a fatal error:
	Session 98 unexpectedly reached final status 'error'. See logs:
stdout: 

stderr: 
19/04/20 02:07:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/04/20 02:07:38 WARN deploy.DependencyUtils: Skip remote jar hdfs://mssql-master-pool-0.service-master-pool:9000/livy/rsc-jars/livy-api-0.5.33476.jar.
19/04/20 02:07:38 WARN deploy.DependencyUtils: Skip remote jar hdfs://mssql-master-pool-0.service-master-pool:9000/livy/rsc-jars/livy-rsc-0.5.33476.jar.
19/04/20 02:07:38 WARN deploy.DependencyUtils: Skip remote jar hdfs://mssql-master-pool-0.service-master-pool:9000/livy/rsc-jars/netty-all-4.1.17.Final.jar.
19/04/20 02:07:38 WARN deploy.DependencyUtils: Skip remote jar hdfs://mssql-master-pool-0.service-master-pool:9000/livy/spark/datanucleus-api-jdo-3.2.6.jar.
19/04/20 02:07:38 WARN deploy.DependencyUtils: Skip remote jar hdfs://mssql-master-pool-0.service-master-p

In [8]:

#Process this data. Very simple data cleanup steps. Replacing "-" with "_" in column names
columns_new = [col.replace("-", "_") for col in df.columns]
df = df.toDF(*columns_new)
df.show(5)



The code failed because of a fatal error:
	Session 96 unexpectedly reached final status 'error'. See logs:
stdout: 

stderr: 
19/04/20 02:02:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/04/20 02:02:04 WARN deploy.DependencyUtils: Skip remote jar hdfs://mssql-master-pool-0.service-master-pool:9000/livy/rsc-jars/livy-api-0.5.33476.jar.
19/04/20 02:02:04 WARN deploy.DependencyUtils: Skip remote jar hdfs://mssql-master-pool-0.service-master-pool:9000/livy/rsc-jars/livy-rsc-0.5.33476.jar.
19/04/20 02:02:04 WARN deploy.DependencyUtils: Skip remote jar hdfs://mssql-master-pool-0.service-master-pool:9000/livy/rsc-jars/netty-all-4.1.17.Final.jar.
19/04/20 02:02:04 WARN deploy.DependencyUtils: Skip remote jar hdfs://mssql-master-pool-0.service-master-pool:9000/livy/spark/datanucleus-api-jdo-3.2.6.jar.
19/04/20 02:02:04 WARN deploy.DependencyUtils: Skip remote jar hdfs://mssql-master-pool-0.service-master-p

# Write dataframe to SQL using MSSQL Spark Connector

In [10]:
#Write from Spark to SQL table using MSSQL Spark Connector
print("Use MSSQL connector to write to master SQL instance ")

servername = "jdbc:sqlserver://master-0.master-svc"
dbname = "MyTestDatabase"
url = servername + ";" + "databaseName=" + dbname + ";"

dbtable = "dbo.AdultCensus"
user = "sa"
password = "Yukon900"


try:
  df.write \
    .format("com.microsoft.sqlserver.jdbc.spark") \
    .mode("overwrite") \
    .option("url", url) \
    .option("dbtable", dbtable) \
    .option("user", user) \
    .option("password", password)\
    .save()
except ValueError as error :
    print("MSSQL Connector write failed", error)

print("MSSQL Connector write succeeded  ")




Use build in JDBC connector to write to SQLServer master instance in Big data 
MSSQL Connector write succeeded

# Read SQL Table using MSSQL Spark connector.
The following code uses the connetor to read the tables. To confirm the write about check table directly using SQL

In [11]:
#Read from SQL table using MSSQ Connector
print("read data from SQL server table  ")
jdbcDF = spark.read \
        .format("com.microsoft.sqlserver.jdbc.spark") \
        .option("url", url) \
        .option("dbtable", dbtable) \
        .option("user", user) \
        .option("password", password) \
        .load()

jdbcDF.show(5)

read data from SQL server table  
+---+----------------+------+---------+-------------+------------------+-----------------+-------------+-----+------+------------+------------+--------------+--------------+------+
|age|       workclass|fnlwgt|education|education_num|    marital_status|       occupation| relationship| race|   sex|capital_gain|capital_loss|hours_per_week|native_country|income|
+---+----------------+------+---------+-------------+------------------+-----------------+-------------+-----+------+------------+------------+--------------+--------------+------+
| 39|       State-gov| 77516|Bachelors|           13|     Never-married|     Adm-clerical|Not-in-family|White|  Male|        2174|           0|            40| United-States| <=50K|
| 50|Self-emp-not-inc| 83311|Bachelors|           13|Married-civ-spouse|  Exec-managerial|      Husband|White|  Male|           0|           0|            13| United-States| <=50K|
| 38|         Private|215646|  HS-grad|            9|        