# Working with Databases

##  Working with Databases
A *database* is an organized collection of data that can be easily accessed, managed, and updated.

There are two categories of databases: 
* Relational databases and ,
* nonrelational databases(NoSQL)

Relational databases have a rigid structure implemented in the form of a schema for the data being stored. 

This approach helps ensure the integrity, consistency, and overall accuracy of the data.

However, the major drawback of relational databases is that they dont scale well as data volumes increase.

In contrast, NoSQL dabatabases don't impose restrictions on the structure of the data being stored, thus allowing for more flexibility, adaptability and scalability.


## Relational Databases

*Relational Databases*, also known as *row-and-column databases*, are the most common type of database in use today. 

Relational Databases are designed to allow the effiecient insertion, updating, and/or deleting of small of vast amounts of structured data. 

In particular, relational databases are well suited for *online transaction processing* (OLTP) applications, which process a high volume os transactions for a large number of users.

Some common relational database systems are:

* MySQL
* MariaDB
* PostgreSQ

## SQL statements

*SQL*, or *Structured Query Language*, is the primary tool for interactiong with a relational database. 

SQL statements are text commands recognized and executed by a database engine like MySQL. 

For example, this SQL statement ask a database to retrieve all the rows from a table called <code>orders</code> whose <code>status</code> field is set to <code>Shipped</code>:

```
SELECT * FROM orders WHERE status='Shipped';
```

SQL statements tipically have three major components:
* an *operation* to be performed,
* a *target* for that operation,
* and a *condition* that narrows the scope of the operation.

In the preceding example, <code>SELECT</code> is the SQL operation. The <code>orders</code> table is the target for the operation, as defined by the <code>FROM</code> clause. The condition is specified in the <code>WHERE</code> clause of the statement.

All SQL statement must have an operation and a target, but the condition is optional.

You can also refine SQL statement to only affect certain columns of a table. Here's how to retrieve only <code>pono</code> and <code>date</code> columns of all the rows in the <code>orders</code> table:

```
SELECT pono,date FROM orders;
```

### *Data Manipulation Language (DML) statements*

<code>SELECT</code> operations like those just shown are examples of *Data Manipulation Language (DML) statements*, a category of SQL statements that you use to access and manipulate database data.

Other DML operations include <code>INSERT</code>, <code>UPDATE</code>, and <code>DELETE</code>, which add, change, and remove records from a database, respectively.

### *Data Definition Language (DDL) statements*

*Data Definition Language (DDL) statements* are another common category of SQL statements.

You use these to actually define the database structure.

Typical DDL operations include <code>CREATE</code> to make, <code>ALTER</code> to modify, and <code>DROP</code> to delete data cointainers, whether that be columns, tables, or whole databases.

## [Install and setup PostgreSQL](https://colab.research.google.com/github/tensorflow/io/blob/master/docs/tutorials/postgresql.ipynb#scrollTo=yZmI7l_GykcW)


In [None]:
# Install postgresql server
!sudo apt-get -y -qq update
!sudo apt-get -y -qq install postgresql
!sudo service postgresql start

# Setup a password `postgres` for username `postgres`
!sudo -u postgres psql -U postgres -c "ALTER USER postgres PASSWORD 'postgres';"

# Setup a database with name `tfio_demo` to be used
!sudo -u postgres psql -U postgres -c 'DROP DATABASE IF EXISTS airq_db;'
!sudo -u postgres psql -U postgres -c 'CREATE DATABASE airq_db;'

#### Setup necessary environmental variables

The following environmental variables are based on the PostgreSQL setup in the last section. If you have a different setup or you are using an existing database, they should be changed accordingly:

In [None]:
%env DEMO_DATABASE_NAME=airq_db
%env DEMO_DATABASE_HOST=localhost
%env DEMO_DATABASE_PORT=5432
%env DEMO_DATABASE_USER=postgres
%env DEMO_DATABASE_PASS=postgres

### Prepare data in PostgreSQL server

For demo purposes this tutorial will create a database and populate the database with some data. The data used in this tutorial is from [Air Quality Data Set](https://archive.ics.uci.edu/ml/datasets/Air+Quality), available from [UCI Machine Learning Repository](http://archive.ics.uci.edu/ml).

Below is a sneak preview of a subset of the Air Quality Data Set:

Date|Time|CO(GT)|PT08.S1(CO)|NMHC(GT)|C6H6(GT)|PT08.S2(NMHC)|NOx(GT)|PT08.S3(NOx)|NO2(GT)|PT08.S4(NO2)|PT08.S5(O3)|T|RH|AH|
----|----|------|-----------|--------|--------|-------------|----|----------|-------|------------|-----------|-|--|--|
10/03/2004|18.00.00|2,6|1360|150|11,9|1046|166|1056|113|1692|1268|13,6|48,9|0,7578|
10/03/2004|19.00.00|2|1292|112|9,4|955|103|1174|92|1559|972|13,3|47,7|0,7255|
10/03/2004|20.00.00|2,2|1402|88|9,0|939|131|1140|114|1555|1074|11,9|54,0|0,7502|
10/03/2004|21.00.00|2,2|1376|80|9,2|948|172|1092|122|1584|1203|11,0|60,0|0,7867|
10/03/2004|22.00.00|1,6|1272|51|6,5|836|131|1205|116|1490|1110|11,2|59,6|0,7888|

More information about Air Quality Data Set and UCI Machine Learning Repository are availabel in [References](#references) section.

To help simplify the data preparation, a sql version of the Air Quality Data Set has been prepared and is available as [AirQualityUCI.sql](https://github.com/tensorflow/io/blob/master/docs/tutorials/postgresql/AirQualityUCI.sql).

The statement to create the table is:

The statement to create the table is:
```
CREATE TABLE AirQualityUCI (
  Date DATE,
  Time TIME,
  CO REAL,
  PT08S1 INT,
  NMHC REAL,
  C6H6 REAL,
  PT08S2 INT,
  NOx REAL,
  PT08S3 INT,
  NO2 REAL,
  PT08S4 INT,
  PT08S5 INT,
  T REAL,
  RH REAL,
  AH REAL
);
```

The complete commands to create the table in database and populate the data are:

In [None]:
!curl -s -OL https://github.com/tensorflow/io/raw/master/docs/tutorials/postgresql/AirQualityUCI.sql

!PGPASSWORD=$DEMO_DATABASE_PASS psql -q -h $DEMO_DATABASE_HOST -p $DEMO_DATABASE_PORT -U $DEMO_DATABASE_USER -d $DEMO_DATABASE_NAME -f AirQualityUCI.sql

### Interact with the database from Python code

You'll interact with the database from your Python code through the PostgreSQL Connector/Python driver. 

You can install it via <code>pip</code>, as follows:

In [None]:
pip install psycopg2

In [None]:
import psycopg2

""" Connect to the PostgreSQL database server """
conn = None

# connect to the PostgreSQL server
print('Connecting to the PostgreSQL database...')
conn = psycopg2.connect(host="localhost",
                        database="airq_db",
                        user="postgres",
                        password="postgres")
# create a cursor
cur = conn.cursor()

# execute a statement
print('PostgreSQL database version:')
cur.execute('SELECT version()')
# display the PostgreSQL database server version
db_version = cur.fetchone()
print(db_version)

cur.execute("SELECT * FROM AirQualityUCI")
print("The number of parts: ", cur.rowcount)

ac_row = []
row = cur.fetchone()
while row is not None:
    print(row)
    ac_row.append(row)
    row = cur.fetchone()
    
# get columns name
column_names = [desc[0] for desc in cur.description]
# close the communication with the PostgreSQL
cur.close()

In [None]:
# convert the query to Pandas DataFrame
import pandas as pd
import datetime

air_quality_UCI = pd.DataFrame(ac_row,columns=column_names)
air_quality_UCI

# Create rezago_social_db 

In [None]:
# Setup a database with name `tfio_demo` to be used
!sudo -u postgres psql -U postgres -c 'DROP DATABASE IF EXISTS rezago_social_db;'
!sudo -u postgres psql -U postgres -c 'CREATE DATABASE rezago_social_db;'

In [None]:
%env DEMO_DATABASE_NAME = rezago_social_db

In [None]:
!curl -s -OL https://raw.githubusercontent.com/milocortes/diplomado_ciencia_datos_mide/main/datos/sql/rezago_social_coneval.sql
!curl -s -OL https://raw.githubusercontent.com/milocortes/diplomado_ciencia_datos_mide/main/datos/rezago_coneval_2020.csv

In [None]:
!PGPASSWORD=$DEMO_DATABASE_PASS psql -q -h $DEMO_DATABASE_HOST -p $DEMO_DATABASE_PORT -U $DEMO_DATABASE_USER -d $DEMO_DATABASE_NAME -a -f rezago_social_coneval.sql

In [None]:
""" Connect to the PostgreSQL database server """
conn = None

# connect to the PostgreSQL server
print('Connecting to the PostgreSQL database...')
conn = psycopg2.connect(host="localhost",
                        database="rezago_social_db",
                        user="postgres",
                        password="postgres")
# create a cursor
cur = conn.cursor()

# execute a statement
print('PostgreSQL database version:')
cur.execute('SELECT version()')
# display the PostgreSQL database server version
db_version = cur.fetchone()
print(db_version)

cur.execute("SELECT * FROM tb_rezago_social")
print("The number of parts: ", cur.rowcount)

ac_row = []
row = cur.fetchone()
while row is not None:
    print(row)
    ac_row.append(row)
    row = cur.fetchone()

# get columns name
column_names = [desc[0] for desc in cur.description]
# close the communication with the PostgreSQL
cur.close()

In [None]:
# convert the query to Pandas DataFrame
rezago_social = pd.DataFrame(ac_row,columns=column_names)
rezago_social

# SIR Model

Sea:

* $S(t)$: individuos en el tiempo $t$ que son suseptibles a ser contagiados. Estos individuos entran en la categoría de susceptibles, $S$. 
* $I(y)$: individuos en el tiempo $t$ que se encuentran en la categoría de infectados, I.
* $R(t)$: individuos en la categoría de removidos, $R$, es decir aquellos que murieron por la enfermedad o que fueron contagiados y alcanzaron inmunidad.

Las tres ecuaciones diferenciales del modelo son:


\begin{equation}
  \dot{S}(t)= - \beta S(t)I(t)
\end{equation}

\begin{equation}
  \dot{I}(t)= \beta S(t)I(t)-\gamma I(t) 
\end{equation}

\begin{equation}
  \dot{R}(t)= \gamma I(t)
\end{equation}


This differential equation model (and also its discrete counterpart above) is known
as an $\textbf{SIR}$ model. 

The input data to the differential equation model consist of the parameter values for $\beta$ and $\gamma$ , as well as the initial conditions $S(0)$ = $S_0$ , $I(0)$ = $I_0$ ,
and $R(0)$ = $R_0$ .

In [None]:
import numpy as np
from numpy import exp

import matplotlib.pyplot as plt
from scipy.integrate import odeint

## Parámetros 
β = 0.0008
γ = 0.1

# Condiciones iniciales de L,w,a,N
S_0 = 1500  
I_0 = 1
R_0 = 0


x_0 = S_0,I_0,R_0

def F(x, t):
    """
    Derivada con respecto al tiempo del vector de estado.
        * x es el vector de estado (arreglo)
        * t es el tiempo (escalar)
    """
    S,I,R  = x

    # Derivadas con respecto al tiempo
    dS = -β*S*I
    dI = β*S*I - γ*I
    dR = γ*I

    return dS,dI,dR

## Se define una función para generar las trayectorias
def solve_path(t_vec, x_init=x_0):
    G = lambda x, t: F(x, t)
    S_path, I_path, R_path = odeint(G, x_init, t_vec).transpose()

    return S_path, I_path, R_path

## Se resuelve para 50 años
t_length = 50
t_vec = np.arange(0,50,1)
S_end,I_end,R_end = solve_path(t_vec)


In [None]:
# Plot results
plt.plot(S_end, label = "S")
plt.plot(I_end, label = "I")
plt.plot(R_end, label = "R")
leg = plt.legend(loc='upper right')
plt.show()

In [None]:
betas = np.arange(0.0001,0.0009,0.00008)

for β in betas:
    S_end,I_end,R_end = solve_path(t_vec)
    plt.plot(I_end,label = str(β))
    leg = plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))

plt.show()

In [None]:
# Setup a database with name `tfio_demo` to be used
!sudo -u postgres psql -U postgres -c 'DROP DATABASE IF EXISTS sir_db;'
!sudo -u postgres psql -U postgres -c 'CREATE DATABASE sir_db;'
%env DEMO_DATABASE_NAME = sir_db
!curl -s -OL https://raw.githubusercontent.com/milocortes/diplomado_ciencia_datos_mide/main/datos/sql/sir_db.sql
!PGPASSWORD=$DEMO_DATABASE_PASS psql -q -h $DEMO_DATABASE_HOST -p $DEMO_DATABASE_PORT -U $DEMO_DATABASE_USER -d $DEMO_DATABASE_NAME -f sir_db.sql

In [None]:
def build_sql(run_id,beta,gamma,suceptible,infected,recovered,day_time):
    sql = """INSERT INTO tb_sir(run_id,beta,gamma,suceptible,infected,recovered,day_time) VALUES({},{},{},{},{},{},{});""".format(run_id,beta,gamma,suceptible,infected,recovered,day_time)
    return sql

In [None]:
""" Connect to the PostgreSQL database server """
conn = None

# connect to the PostgreSQL server
print('Connecting to the PostgreSQL database...')
conn = psycopg2.connect(host="localhost",
                        database="sir_db",
                        user="postgres",
                        password="postgres")
# create a cursor
cur = conn.cursor()

betas = np.arange(0.0001,0.0009,0.00008)

for run_id,β in enumerate(betas):
    print(run_id)
    S_end,I_end,R_end = solve_path(t_vec)
    for (t,s),i,r in zip(enumerate(S_end),I_end,R_end):
        # execute the INSERT statement
        sql = build_sql(run_id,β,γ,s,i,r,t)
        cur.execute(sql)
        # commit the changes to the database
        conn.commit()

# close the communication with the PostgreSQL
cur.close()

In [None]:
""" Connect to the PostgreSQL database server """
conn = None

# connect to the PostgreSQL server
print('Connecting to the PostgreSQL database...')
conn = psycopg2.connect(host="localhost",
                        database="sir_db",
                        user="postgres",
                        password="postgres")
# create a cursor
cur = conn.cursor()

# execute a statement
print('PostgreSQL database version:')
cur.execute('SELECT version()')
# display the PostgreSQL database server version
db_version = cur.fetchone()
print(db_version)

cur.execute("SELECT * FROM tb_sir")
print("The number of parts: ", cur.rowcount)

ac_row = []
row = cur.fetchone()
while row is not None:
    print(row)
    ac_row.append(row)
    row = cur.fetchone()

# get columns name
column_names = [desc[0] for desc in cur.description]
# close the communication with the PostgreSQL
cur.close()

In [None]:
# convert the query to Pandas DataFrame
sir = pd.DataFrame(ac_row,columns=column_names)
sir