# Data Access SQL

This document provides an introduction on how to access the [PostgreSQL](https://www.postgresql.org/) database used for Challenge 2 and the Project.

First we need to import the necessary python modules (we assume that the necessary OS dependencies are already installed). 

In [1]:
# install the modules on the OS (uncomment if needed)
#!pip install numpy
#!pip install pandas

# import the modules
import pandas as pd
import numpy as np

Next we define the database connection string (we use it to tell the libraries how to connect to the database).

In [2]:
# define the database connection string 
DB_HOST='server2053.cs.technik.fhnw.ch'
DB_PORT = 5432
DB_DBNAME = 'bank_db' # or 'warenkorb_db'
DB_USERNAME = 'db_user' 
DB_PASSWORD = 'db_user_pw'

import socket
try:
    sock = socket.create_connection((DB_HOST, DB_PORT), timeout=1)
    sock.shutdown(socket.SHUT_RDWR)
    sock.close()
except socket.timeout as err:
    DB_HOST = '86.119.36.94'
except socket.error as err:
    DB_HOST = '86.119.36.94'

db_str = 'postgresql://{username}:{password}@{host}:{port}/{dbname}'.format(username=DB_USERNAME,password=DB_PASSWORD,host=DB_HOST,port=DB_PORT,dbname=DB_DBNAME)
print(db_str)

postgresql://db_user:db_user_pw@86.119.36.94:5432/bank_db


## Accessing the Database

There are several python libraries that can be used to access databases and query data from them.

### IPython SQL

[IPython SQL](https://pypi.org/project/ipython-sql) is the first candidate and allows to use plain SQL to query the database.

### SQLAlchemy 

[SQLAlchemy](https://www.sqlalchemy.org/) is another option to access a database and query data. The result of the queries are stored in [Pandas Dataframe](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html).

### Psycopg2

[Psycopg2](https://pypi.org/project/psycopg2) is a PostgreSQL database adapter, i.e. in contrast to the above options this library only supports PostgreSQL. This comes with the downside that it is very specific to the underlying database and cannot be used for other databases directly. On the other hand, with the psycopg2 adapter we can take advantage of specific features of PostgreSQL like server-side-cursors (see [here](https://gist.github.com/lmyyao/56e03055006e09960972a16aa55da249) and [here](https://stackoverflow.com/a/48734989)) which allows to fetch a dataset from the database which exceeds the available RAM of the client computer (SQLAlchemy also allows for this option since it uses psycopg2 under the hood to access the database).