# Using Python and SQL with Microsoft SQL Server

In this notebook we illustrate how to connect to MS SQL Server via Python in order to convert SQL SELECT queires into pandas data frames.

## Introduction

This demo is based on https://github.com/garyfeng/DataScientistsNotebook. We use `docker-compose` to create a docker cluster with 
- mssql: Microsoft SQL Server v2017 running on linux
- jupyter data science notebook server, as a docker

To install:
- Make sure you have `Docker` and `git` installed on your computer
- In a terminal, `cd` to the folder where you wish to have this project setup, do `git clone https://github.com/garyfeng/DataScientistsNotebook.git`
- Go to the downloaded folder, and edit the `.env` file to change the directories to your setup.
- Go back to the terminal, do `docker-compose build` and make sure it succeeds

To test:
- then do `docker-compose up` and make sure all the logs are ok, no errors.
- open your browser to hppts://localhost:8888 and login using the Jupyter password set up in the `.env` file
- you should see the `work` folder. Click in, and open new notebooks, etc. Note that the MS SQL connection may fail, because you need the IP address of the SQL Server (that is not "localhost"). See below.
- go back to the terminal, do `docker-compose down` to shut down things

To run:
- in the terminal, do `docker-compose up -d` to avoid the verbose logs
- launch the browser the same way you did in test
- you need to find out the IP address for MS SQL Server. In terminal, type `ipconfig` for Windows users or `ifconfig` for mac and linux machines. You will have to look for something like `192.168.56.1` in the printout. On Windows this is typically associated with `VirtualBox`; on macs or linux machines this is typically associated with some words about "virtual" but not easy to find. It doesn't hurt to try them all - one of them is for sure to work.
- copy that IP address, paste it to the `server` address below in the notebook cell (until I find an automatic method).
- now run the `pymssql` code to try to connect, see whether it gives error. Repeate with all IP address until you find one that works ;-)
- you can now run the SQL exercises. Your Jupyter notebooks will be saved in your `python/notebooks` folder. 
- shut down using `docker-compose down` in the terminal; make sure you saved the notebooks first. Your saved notebooks will remain there next time you start the docker cluster, though you need to re-run them as the python environment has been cleared. 

## Set up 

Once you are in the Jupyter environment, you may need to install `pymssql`. 

In [1]:
! pip install pymssql

Collecting pymssql
  Downloading pymssql-2.1.4-cp37-cp37m-manylinux1_x86_64.whl (1.4 MB)
[K     |████████████████████████████████| 1.4 MB 2.3 MB/s eta 0:00:01
[?25hInstalling collected packages: pymssql
Successfully installed pymssql-2.1.4


### Using PyMSSQL


In [2]:
from os import getenv
import pymssql
import pandas as pd

  


In [3]:
# parameters to use for MS SQL Server connection
server = getenv("MSSQL_SERVERIP")
user = getenv("MSSQL_USER")
password = getenv("MSSQL_PASSWORD")


In [5]:
pymssql.connect(server, user, password, "")

<pymssql.Connection at 0x7f9e47cf8280>

### Using Alchemy

Which defaults to PyMSSQL anyways.

In [None]:
from sqlalchemy import create_engine
engine = create_engine('mssql+pymssql://{}:{}@{}:1433'.format(user, password, server))
conn = engine.connect()

## Create some data in the SQL Server

We now use `pymssql` to create a database `tempdb` and a data table `persons` therein. Will also put in some sample data to play with. 

We connect, do the above using SQL commands, and then close the connection. We also do a `Select` SQL query there and illustrate how to iterate the results row by row. But going forward we will use `pandas` to convert data into a `dataframe` directly, without having to deal with them one row at a time.

In [6]:

# server = getenv("PYMSSQL_TEST_SERVER")
# user = getenv("PYMSSQL_TEST_USERNAME")
# password = getenv("PYMSSQL_TEST_PASSWORD")

conn = pymssql.connect(server, user, password, "tempdb")

cursor = conn.cursor()
cursor.execute("""
IF OBJECT_ID('persons', 'U') IS NOT NULL
    DROP TABLE persons
CREATE TABLE persons (
    id INT NOT NULL,
    name VARCHAR(100),
    salesrep VARCHAR(100),
    PRIMARY KEY(id)
)
""")
cursor.executemany(
    "INSERT INTO persons VALUES (%d, %s, %s)",
    [(1, 'John Smith', 'John Doe'),
     (2, 'Jane Doe', 'Joe Dog'),
     (3, 'Mike T.', 'Sarah H.')])
# you must call commit() to persist your data if you don't set autocommit to True
conn.commit()

cursor.execute('SELECT * FROM persons WHERE salesrep=%s', 'John Doe')
row = cursor.fetchone()
while row:
    print("ID=%d, Name=%s" % (row[0], row[1]))
    row = cursor.fetchone()

conn.close()


ID=1, Name=John Smith


## SQL query using Pandas read_sql_query

Pandas supports the function [read_sql_query](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html) to execute a SQL `Select` query and convert the data into a data frame. See tutorial at https://datatofish.com/sql-to-pandas-dataframe/

In [7]:
import pandas as pd

conn = pymssql.connect(server, user, password, "tempdb")

SQL_Query = pd.read_sql_query(
    '''SELECT * FROM persons''', conn
)
df = pd.DataFrame(SQL_Query)
df


Unnamed: 0,id,name,salesrep
0,1,John Smith,John Doe
1,2,Jane Doe,Joe Dog
2,3,Mike T.,Sarah H.
