# Read and write from Spark to SQL using the MSSQL jdbc Connector
A typical big data scenario a key usage pattern is high volume, velocity and variety data processing in Spark followed with batch/streaming writes to SQL for access to LOB applications. These usage patterns greatly benefit from a connector that utilizes key SQL optimizations and provides an efficient and reliable write to SQLServer Big Data Cluster or SQL DB. 

MSSQL JDBC connector, referenced by the name com.microsoft.sqlserver.jdbc.spark, uses [SQL Server Bulk copy APIS](https://docs.microsoft.com/en-us/sql/connect/jdbc/using-bulk-copy-with-the-jdbc-driver?view=sql-server-2017#sqlserverbulkcopyoptions) to implement an efficient write to SQL Server. The connector is based on Spark Data source APIs and provides a familiar JDBC interface for access.

The following sample shows how to use the MSSQL JDBC Connector for writing and reading to/from a SQL Source. In this sample we' ll 
- Read a file from HDFS and do some basic processing 
- post that we'll write the dataframe to SQL server table using the MSSQL Connector. 
- Followed by the write we'll read using the MSSQLConnector.

PreReq : 
- The sample uses a SQL database named "MyTestDatabase". Create this before you run this sample. The database can be created as follows
    ``` sql
    Create DATABASE MyTestDatabase
    GO 
    ``` 
- Download [AdultCensusIncome.csv]( https://amldockerdatasets.azureedge.net/AdultCensusIncome.csv ) to your local machine.  Create a hdfs folder named spark_data and upload the file there. 
- Configure the spark session to use the MSSQL Connector jar. The jar can be found at /jar/spark-mssql-connector-assembly-1.0.0.jar post deployment of Big Data Cluster.

``` sh
    %%configure -f
    {"conf": {"spark.jars": "/jar/spark-mssql-connector-assembly-1.0.0.jar"}}
```

    
 

# Configure the notebook to use the MSSQL Spark connector
This step woould be removed in subsequent CTPs. As of CTP2.5 this step is required to point the spark session to the relevant jar.
 

In [3]:
%%configure -f
{"conf": {"spark.jars": "/jar/spark-mssql-connector-assembly-1.0.0.jar"}}





# Read data into a data frame
In this step we read the data into a data frame and do some basic clearup steps. 



In [4]:
#Read a file and then write it to the SQL table
datafile = "/user/hive/warehouse/object"
df = spark.read.format('parquet').load(datafile)
df.show(5)


Starting Spark application


ID,YARN Application ID,Kind,State,Spark UI,Driver log,Current session?
3,application_1559756259453_0004,pyspark3,idle,Link,Link,✔


SparkSession available as 'spark'.


+----------------+------------------+------------------+--------------------+--------------------+--------------------+-------+----------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+------------------+--------------------+------------------+------------------+------------+------------+--------------+------------------+------------------+--------------------+--------------------+--------------------+-------------------+----------------+----------------+----------------+--------------+--------------+---------------------+-------------+------------------+------------------+-----------------+-----------------+----------------+---------------+------------------+-----------------+-----------------+--------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+-------------

# Write dataframe to SQL using MSSQL Spark Connector

In [5]:
#Write from Spark to SQL table using MSSQL Spark Connector
print("Use MSSQL connector to write to master SQL instance ")

servername = "jdbc:sqldatapool://controller-svc:8080/datapools/default"
dbname = "LSST"
url = servername + ";" + "databaseName=" + dbname + ";"

dbtable = "dbo.Object"
user = "sa"
password = "fooRiuzg54" # Please specify password here


try:
  df.write \
    .format("com.microsoft.sqlserver.jdbc.spark") \
    .mode("overwrite") \
    .option("url", url) \
    .option("dbtable", dbtable) \
    .option("user", user) \
    .option("password", password)\
    .save()
except ValueError as error :
    print("MSSQL Connector write failed", error)

print("MSSQL Connector write succeeded  ")




An error was encountered:
Session 3 did not reach idle status in time. Current status is busy.


# Read SQL Table using MSSQL Spark connector.
The following code uses the connetor to read the tables. To confirm the write about check table directly using SQL

In [11]:
#Read from SQL table using MSSQ Connector
print("read data from SQL server table  ")
jdbcDF = spark.read \
        .format("com.microsoft.sqlserver.jdbc.spark") \
        .option("url", url) \
        .option("dbtable", dbtable) \
        .option("user", user) \
        .option("password", password) \
        .load()

jdbcDF.show(5)