# Read and write from Spark to SQL
A typical big data scenario is large scale ETL in Spark and writing the processed data to SQLServer. The following samples shows 
- reading a HDFS file, 
- some basic processing on it and 
- then processed data to SQL Server table.

Need a database precreated in SQL for this sample. Here we are using database name "MyTestDatabase" that can be created using SQL statements below.

``` sql
Create DATABASE MyTestDatabase
GO 
``` 
 

In [8]:

#Read a file and then write it to the SQL table
datafile = "/spark_data/AdultCensusIncome.csv"
df = spark.read.format('csv').options(header='true', inferSchema='true', ignoreLeadingWhiteSpace='true', ignoreTrailingWhiteSpace='true').load(datafile)
df.show(5)


+---+----------------+------+---------+-------------+------------------+-----------------+-------------+-----+------+------------+------------+--------------+--------------+------+
|age|       workclass|fnlwgt|education|education-num|    marital-status|       occupation| relationship| race|   sex|capital-gain|capital-loss|hours-per-week|native-country|income|
+---+----------------+------+---------+-------------+------------------+-----------------+-------------+-----+------+------------+------------+--------------+--------------+------+
| 39|       State-gov| 77516|Bachelors|           13|     Never-married|     Adm-clerical|Not-in-family|White|  Male|        2174|           0|            40| United-States| <=50K|
| 50|Self-emp-not-inc| 83311|Bachelors|           13|Married-civ-spouse|  Exec-managerial|      Husband|White|  Male|           0|           0|            13| United-States| <=50K|
| 38|         Private|215646|  HS-grad|            9|          Divorced|Handlers-cleaners|Not-i

In [9]:

#Process this data. Very simple data cleanup steps. Replacing "-" with "_" in column names
columns_new = [col.replace("-", "_") for col in df.columns]
df = df.toDF(*columns_new)
df.show(5)



+---+----------------+------+---------+-------------+------------------+-----------------+-------------+-----+------+------------+------------+--------------+--------------+------+
|age|       workclass|fnlwgt|education|education_num|    marital_status|       occupation| relationship| race|   sex|capital_gain|capital_loss|hours_per_week|native_country|income|
+---+----------------+------+---------+-------------+------------------+-----------------+-------------+-----+------+------------+------------+--------------+--------------+------+
| 39|       State-gov| 77516|Bachelors|           13|     Never-married|     Adm-clerical|Not-in-family|White|  Male|        2174|           0|            40| United-States| <=50K|
| 50|Self-emp-not-inc| 83311|Bachelors|           13|Married-civ-spouse|  Exec-managerial|      Husband|White|  Male|           0|           0|            13| United-States| <=50K|
| 38|         Private|215646|  HS-grad|            9|          Divorced|Handlers-cleaners|Not-i

In [10]:
#Write from Spark to SQL table using JDBC
print("Use build in JDBC connector to write to SQLServer master instance in Big data ")

servername = "jdbc:sqlserver://mssql-master-pool-0.service-master-pool"
dbname = "MyTestDatabase"
url = servername + ";" + "databaseName=" + dbname + ";"

c = "dbo.AdultCensus"
user = "sa"
password = "****"

print("url is ", url)

try:
  df.write \
    .format("jdbc") \
    .mode("overwrite") \
    .option("url", url) \
    .option("dbtable", dbtable) \
    .option("user", user) \
    .option("password", password)\
    .save()
except ValueError as error :
    print("JDBC Write failed", error)

print("JDBC Write done  ")




Use build in JDBC connector to write to SQLServer master instance in Big data 
url is  jdbc:sqlserver://mssql-master-pool-0.service-master-pool;databaseName=MyTestDatabase;
JDBC Write done

In [13]:
#Read to Spark from SQL table using JDBC
print("read data from SQL server table  ")
jdbcDF = spark.read \
        .format("jdbc") \
        .option("url", url
        ) \
        .option("dbtable", dbtable) \
        .option("user", user) \
        .option("password", password) \
        .load()

jdbcDF.show(5)

read data from SQL server table  
+---+----------------+------+---------+-------------+------------------+-----------------+-------------+-----+------+------------+------------+--------------+--------------+------+
|age|       workclass|fnlwgt|education|education_num|    marital_status|       occupation| relationship| race|   sex|capital_gain|capital_loss|hours_per_week|native_country|income|
+---+----------------+------+---------+-------------+------------------+-----------------+-------------+-----+------+------------+------------+--------------+--------------+------+
| 39|       State-gov| 77516|Bachelors|           13|     Never-married|     Adm-clerical|Not-in-family|White|  Male|        2174|           0|            40| United-States| <=50K|
| 50|Self-emp-not-inc| 83311|Bachelors|           13|Married-civ-spouse|  Exec-managerial|      Husband|White|  Male|           0|           0|            13| United-States| <=50K|
| 38|         Private|215646|  HS-grad|            9|        