# Connect to the OCHIN DB through python

This notebook will walk you through how to connect to the OCHIN DB using python. 
Before you begin, make sure that you have access to the data and check to make sure the `db-credentials.txt` file is located in your home directory. 

## Install ODBC drivers as necessary

In [None]:
%%sh

sudo su 

#Download appropriate package for the OS version
#Choose only ONE of the following, corresponding to your OS version

#RHEL 7 and Oracle Linux 7
curl https://packages.microsoft.com/config/rhel/7/prod.repo > /etc/yum.repos.d/mssql-release.repo

#RHEL 8 and Oracle Linux 8
#curl https://packages.microsoft.com/config/rhel/8/prod.repo > /etc/yum.repos.d/mssql-release.repo

#RHEL 9
#curl https://packages.microsoft.com/config/rhel/9.0/prod.repo > /etc/yum.repos.d/mssql-release.repo

exit

sudo yum remove unixODBC-utf16 unixODBC-utf16-devel #to avoid conflicts
sudo ACCEPT_EULA=Y yum install -y msodbcsql17
# optional: for bcp and sqlcmd
sudo ACCEPT_EULA=Y yum install -y mssql-tools
echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bashrc
source ~/.bashrc
# optional: for unixODBC development headers
sudo yum install -y unixODBC-devel

## Import python packages

In [None]:
import pyodbc
print("List of ODBC drivers:")
dlist = pyodbc.drivers()
for drvr in dlist:
    print('\t', drvr)

print("End of list")

In [None]:
!pip install pandasql 
import pandas as pd
from pandasql import sqldf

## Read and parse your db credentials


In [None]:
import re

file_path = '/home/ec2-user/SageMaker/db-credentials.txt'

with open(file_path, 'r') as file:
    # Code to parse the data will go here
    file_contents = file.read()

# Remove newlines and extra spaces
cleaned_string = file_contents.replace('\n', '').strip()

# Extract variable-value pairs using regular expressions
pattern = r'"([^"]+)": "([^"]+)"'
pairs = re.findall(pattern, cleaned_string)

parsed_data = {}

for variable, data in pairs:
    parsed_data[variable] = data


**Change the code below to reflect the DB view that you have access to**

In [None]:
import pyodbc
db = 'S43' ### ENTER YOUR DB VIEW HERE, for example 'S43'

## Connect to the database

In [None]:
connection_string = "DRIVER={ODBC Driver 17 for SQL Server};" + \
                    "SERVER=" + parsed_data['host'] + ',' + parsed_data['port'] + ';' + \
                    "DATABASE=" + db + ';' + \
                    "UID=" + parsed_data['username'] + ';' + \
                    "PWD={" + parsed_data['password'] + "};"


In [None]:
conn = pyodbc.connect(connection_string, trusted_connection = 'no')

In [None]:
cursor = conn.cursor()

## Example queries

**Print all tables available in your database view**

In [None]:
cursor.execute('''
SELECT name AS TABLE_NAME
FROM sys.tables
''')
for row in cursor:
    print(row[0])

**Select the top 10 entries in the CONCEPT_DIMENSION table**

In [None]:
cursor.execute('''
SELECT TOP 10 *
FROM CONCEPT_DIMENSION;

''')

results = cursor.fetchall()

# Get the column names from the cursor description
columns = [column[0] for column in cursor.description]

# Create a DataFrame from the fetched results and column names
results_df = pd.DataFrame.from_records(results, columns=columns)
results_df['NAME_CHAR'] = results_df['NAME_CHAR'].astype('category')
results_df