# Delo s podatkovnimi bazami in SQL

## Introduction to Databases

The pandas workflow works well when:
- the **data fits in memory** (a few gigabytes but not terabytes)
- the **data is relatively static** (doesn't need to be loaded into memory every minute because the data has changed)
- only a **single person is accessing** the data (shared access to memory is difficult)
- **security isn't important** (security is critical for company scale production situations)

### What is a database?

<img src="images/dbms.png">

<img src="images/database_workflow.svg">

## SQLite

https://www.sqlite.org/index.html

SQLite is a C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine.

SQLite is the most popular database in the world and is lightweight enough that the SQLite DBMS is included as a module in Python.

### SQLite vs Other SQL databases (PostgreSQL, MySQL, SQL Server)

### Commands

- [Command Line Shell For SQLite](https://sqlite.org/cli.html)

- `cd data`
- `sqlite3 logs.db`

- For a **listing of the available dot commands**, you can enter `.help` any time. 
    - `sqlite>.help`

- Run `.show` command to see **default setting** for your SQLite command prompt
    - `sqlite>.show`

- To specify that we want to **return the first 5 rows from weblog**, we need to run the following SQL query:
    - `sqlite> SELECT * FROM weblog LIMIT 5;`

- You can use the following sequence of dot commands to **format your output**.
    - `sqlite>.header on`
    - `sqlite>.mode column`
    - `sqlite>.timer on`
    - `sqlite> SELECT * FROM weblog LIMIT 5;`

- To see a **list of the tables in the database**, you can enter `.tables`.
    - `sqlite>.tables`

- The `.schema` command shows the **complete schema for the database**, or for a single table if an optional tablename argument is provided:
    - `sqlite>.schema`
    - `sqlite>.schema weblog`

## Introduction to SQL

<img src="images/sql_table.svg">

- `SELECT * FROM weblog LIMIT 5;`

<div class="alert alert-block alert-info">
<b>Vaja: </b> Write a SQL query that returns the first 15 rows from weblog.
</div>

<div class="alert alert-block alert-info">
<b>Vaja: </b> Write a SQL query that returns the logs where the ip is 10.131.2.1. Only return the ip and timestamp columns (in that order) and don't limit the number of rows returned.
</div>

<div class="alert alert-block alert-info">
<b>Vaja: </b> Count the number of rows returned from the previous query. <a href="https://www.w3schools.com/sql/sql_count_avg_sum.asp">Help</a>
</div>

Here are the comparison operators we can use:
- Less than: `<`
- Less than or equal to: `<=`
- Greater than: `>`
- Greater than or equal to: `>=`
- Equal to: `=`
- Not equal to: `!=`

```SQL
SELECT * FROM weblog
WHERE ip = "10.131.2.1" AND timestamp < "2017-11-29 13:47:00";
```

```SQL
SELECT * FROM weblog
WHERE ip = "10.131.2.1" OR status = 304;
```

```SQL
SELECT * FROM weblog
WHERE (ip = "10.131.2.1" AND status = 304) OR (method = "POST");
```

```SQL
SELECT * FROM weblog
WHERE (ip = "10.131.2.1" AND status = 304) OR (method = "POST")
ORDER BY timestamp DESC;
```

## Work with the SQLite database using raw Python

- [sqlite3 Python module](https://docs.python.org/3/library/sqlite3.html): The sqlite3 module provides an SQL interface compliant with the DB-API 2.0 specification described by PEP 249, and requires SQLite 3.7.15 or newer.

Import a CSV file data to a SQL database:

In [1]:
!head data/weblogs_clean.csv

IP,Time,Staus,Method
10.128.2.1,29/Nov/2017:06:58:55,200,GET
10.128.2.1,29/Nov/2017:06:59:02,302,POST
10.128.2.1,29/Nov/2017:06:59:03,200,GET
10.131.2.1,29/Nov/2017:06:59:04,200,GET
10.130.2.1,29/Nov/2017:06:59:06,200,GET
10.130.2.1,29/Nov/2017:06:59:19,200,GET
10.128.2.1,29/Nov/2017:06:59:19,200,GET
10.131.2.1,29/Nov/2017:06:59:19,200,GET
10.131.2.1,29/Nov/2017:06:59:30,200,GET


In [4]:
import sqlite3
from sqlite3 import OperationalError
from datetime import datetime
import csv

# create a connection to a databse
con = sqlite3.connect("data/my-weblogs.db")

# create a new table
create_table_query = """
    CREATE TABLE logs (
            id INTEGER PRIMARY KEY,
            ip VARCHAR(16),
            timestamp DATETIME,
            status INTEGER,
            method VARCHAR(20)
    );"""

try:
    con.execute(create_table_query)
    con.commit()
except OperationalError as err:
    print(f"Skippig: {err}")

Skippig: table logs already exists


In [5]:
# Then, insert rows of data:
with open('data/weblogs_clean.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    line_count = 0
    stmt = "INSERT INTO logs VALUES(NULL, ?, ?, ?, ?)"
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        else:
            timestamp_datetime_format = datetime.strptime(row[1], "%d/%b/%Y:%H:%M:%S")
            row[1] = timestamp_datetime_format
            con.execute(stmt, row)
            con.commit()
    print("DONE.")

Column names are IP, Time, Staus, Method
DONE.


In [22]:
# Close the connection.
con.close()

**SQLite Python: Querying Data**

In [7]:
con = sqlite3.connect("data/my-weblogs.db")
cursor = con.execute("SELECT * FROM logs LIMIT 5;")
rows = cursor.fetchall()
print(rows)
con.close()

[(1, '10.128.2.1', '2017-11-29 06:58:55', 200, 'GET'), (2, '10.128.2.1', '2017-11-29 06:59:02', 302, 'POST'), (3, '10.128.2.1', '2017-11-29 06:59:03', 200, 'GET'), (4, '10.131.2.1', '2017-11-29 06:59:04', 200, 'GET'), (5, '10.130.2.1', '2017-11-29 06:59:06', 200, 'GET')]


In [8]:
con = sqlite3.connect("data/my-weblogs.db")
cursor = con.execute("SELECT * FROM logs LIMIT 5;")
rows = cursor.fetchmany(3)
print(rows)
con.close()

[(1, '10.128.2.1', '2017-11-29 06:58:55', 200, 'GET'), (2, '10.128.2.1', '2017-11-29 06:59:02', 302, 'POST'), (3, '10.128.2.1', '2017-11-29 06:59:03', 200, 'GET')]


In [21]:
con = sqlite3.connect("data/my-weblogs.db")
cursor = con.execute("SELECT * FROM logs LIMIT 5;")
rows = cursor.fetchone()
print(rows)
con.close()

(1, '10.128.2.1', '2017-11-29 06:58:55', 200, 'GET')


- PostgreSQL:
    - `psycopg2`: [Psycopg](https://pypi.org/project/psycopg2/) is the most popular PostgreSQL database adapter for the Python programming language.
- Microsoft SQL Server:
    - `pyodbc`: [pyodbc](https://pypi.org/project/pyodbc/) is an open source Python module that makes accessing ODBC databases simple. It implements the DB API 2.0 specification but is packed with even more Pythonic convenience.
- MySQL:
    - `PyMySQL`: [PyMySQL](https://pypi.org/project/PyMySQL/) package contains a pure-Python MySQL client library, based on PEP 249.

## SQLAlchemy

- https://www.sqlalchemy.org/
- [ORM Quick Start](https://docs.sqlalchemy.org/en/14/orm/quickstart.html)
- [SQLAlchemy 1.4 / 2.0 Tutorial](https://docs.sqlalchemy.org/en/14/tutorial/index.html)

Installation: `pip install SQLAlchemy`

Quick check to verify that we are on version 1.4 of SQLAlchemy:

In [1]:
import sqlalchemy

sqlalchemy.__version__

'2.0.25'

### Establishing Connectivity - the Engine

In [12]:
from sqlalchemy import create_engine

engine = create_engine("sqlite+pysqlite:///data/my-weblogs.db", echo=True, future=True)

The main argument to `create_engine` is a string URL: `dialect+driver://username:password@host:port/database`

**Database Urls Examples**

`dialect+driver://username:password@host:port/database`

PostgreSQL:

In [None]:
# psycopg2 driver
engine = create_engine('postgresql+psycopg2://scott:tiger@localhost/mydatabase', echo=True, future=True)

MySQL:

In [None]:
# PyMySQL driver
engine = create_engine('mysql+pymysql://scott:tiger@localhost/foo', echo=True, future=True)

SQLite:

In [None]:
# sqlite://<nohostname>/<path>
# where <path> is relative:
engine = create_engine('sqlite:///data/foo.db', echo=True, future=True)
engine = create_engine("sqlite+pysqlite:///:memory:", echo=True, future=True)

### Working with Transactions and the DBAPI

In [4]:
from sqlalchemy import text

print(text("SELECT * FROM 'logs' LIMIT 2;"))

SELECT * FROM 'logs' LIMIT 2;


In [5]:
from sqlalchemy import create_engine

engine = create_engine("sqlite+pysqlite:///data/my-weblogs.db", echo=False, future=True)

with engine.connect() as conn:
    result = conn.execute(text("SELECT * FROM 'logs' LIMIT 2;"))
    print(type(result))
    print(result.all())



<class 'sqlalchemy.engine.cursor.CursorResult'>
[(1, '10.128.2.1', '2017-11-29 06:58:55', 200, 'GET'), (2, '10.128.2.1', '2017-11-29 06:59:02', 302, 'POST')]


In [6]:
engine = create_engine("sqlite+pysqlite:///data/my-weblogs.db", echo=False, future=True)

with engine.connect() as conn:
    conn.execute(text("CREATE TABLE some_table_3 (x int, y int)"))
    conn.execute(
        text("INSERT INTO some_table_3 (x, y) VALUES (:x, :y)"),
        [{"x": 1, "y": 1}, {"x": 2, "y": 4}],
    )
    conn.commit()

In [7]:
with engine.begin() as conn:
    conn.execute(
        text("INSERT INTO some_table (x, y) VALUES (:x, :y)"),
        [{"x": 6, "y": 8}, {"x": 9, "y": 10}],
    )

**Fetching Rows**

In [8]:
with engine.connect() as conn:
    result = conn.execute(text("SELECT * FROM logs LIMIT 5;"))
    for row in result:
        print(f"IP: {row.ip}  Medhod: {row.method}")

IP: 10.128.2.1  Medhod: GET
IP: 10.128.2.1  Medhod: POST
IP: 10.128.2.1  Medhod: GET
IP: 10.131.2.1  Medhod: GET
IP: 10.130.2.1  Medhod: GET


In [9]:
with engine.connect() as con:
    rs = con.execute(text('SELECT * FROM logs LIMIT 5;'))        
    data = rs.fetchone()
    print(data)

(1, '10.128.2.1', '2017-11-29 06:58:55', 200, 'GET')


In [10]:
with engine.connect() as conn:
    rs = conn.execute(text('SELECT * FROM logs LIMIT 5;'))       
    data1 = rs.fetchone()
    data2 = rs.fetchone()
    
    print(data1)
    print(data2)

(1, '10.128.2.1', '2017-11-29 06:58:55', 200, 'GET')
(2, '10.128.2.1', '2017-11-29 06:59:02', 302, 'POST')


In [11]:
with engine.connect() as conn:
    rs = conn.execute(text('SELECT * FROM logs LIMIT 5;'))        
    data = rs.fetchmany(3)
    print(data)

[(1, '10.128.2.1', '2017-11-29 06:58:55', 200, 'GET'), (2, '10.128.2.1', '2017-11-29 06:59:02', 302, 'POST'), (3, '10.128.2.1', '2017-11-29 06:59:03', 200, 'GET')]


In [12]:
with engine.connect() as conn:
    rs = conn.execute(text('SELECT * FROM logs LIMIT 5;'))        
    data = rs.fetchall()
    print(data)

[(1, '10.128.2.1', '2017-11-29 06:58:55', 200, 'GET'), (2, '10.128.2.1', '2017-11-29 06:59:02', 302, 'POST'), (3, '10.128.2.1', '2017-11-29 06:59:03', 200, 'GET'), (4, '10.131.2.1', '2017-11-29 06:59:04', 200, 'GET'), (5, '10.130.2.1', '2017-11-29 06:59:06', 200, 'GET')]


### Working with Database Metadata

In [13]:
from sqlalchemy import MetaData

metadata_obj = MetaData()

In [14]:
from sqlalchemy import Table, Column, Integer, String

user_table = Table(
    "user_account",
    metadata_obj,
    Column("id", Integer, primary_key=True),
    Column("name", String(30), nullable=False),
    Column("fullname", String, nullable=False),
)

In [15]:
user_table.c.name

Column('name', String(length=30), table=<user_account>, nullable=False)

In [16]:
user_table.c.keys()

['id', 'name', 'fullname']

In [17]:
engine = create_engine("sqlite+pysqlite:///data/users.db", echo=True, future=True)
metadata_obj.create_all(engine)

2024-03-05 18:42:04,470 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-03-05 18:42:04,474 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("user_account")
2024-03-05 18:42:04,476 INFO sqlalchemy.engine.Engine [raw sql] ()
2024-03-05 18:42:04,481 INFO sqlalchemy.engine.Engine COMMIT


### Working with Data

In [18]:
from sqlalchemy import insert

stmt1 = insert(user_table).values(name="matic", fullname="matic lalalala")
stmt2 = insert(user_table).values(name="jaka", fullname="jaka tatatatat")

In [19]:
with engine.connect() as conn:
    conn.execute(stmt1)
    conn.execute(stmt2)
    conn.commit()

2024-03-05 18:43:22,985 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-03-05 18:43:22,988 INFO sqlalchemy.engine.Engine INSERT INTO user_account (name, fullname) VALUES (?, ?)
2024-03-05 18:43:22,991 INFO sqlalchemy.engine.Engine [generated in 0.00615s] ('matic', 'matic lalalala')
2024-03-05 18:43:22,996 INFO sqlalchemy.engine.Engine INSERT INTO user_account (name, fullname) VALUES (?, ?)
2024-03-05 18:43:22,998 INFO sqlalchemy.engine.Engine [cached since 0.01343s ago] ('jaka', 'jaka tatatatat')
2024-03-05 18:43:23,001 INFO sqlalchemy.engine.Engine COMMIT


In [20]:
from sqlalchemy import select

In [21]:
stmt = select(user_table).where(user_table.c.name == "matic")
print(stmt)

SELECT user_account.id, user_account.name, user_account.fullname 
FROM user_account 
WHERE user_account.name = :name_1


In [22]:
with engine.connect() as conn:
    for row in conn.execute(stmt):
        print(row)

2024-03-05 18:45:12,061 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-03-05 18:45:12,063 INFO sqlalchemy.engine.Engine SELECT user_account.id, user_account.name, user_account.fullname 
FROM user_account 
WHERE user_account.name = ?
2024-03-05 18:45:12,065 INFO sqlalchemy.engine.Engine [generated in 0.00431s] ('matic',)
(1, 'matic', 'matic lalalala')
(3, 'matic', 'matic lalalala')
2024-03-05 18:45:12,068 INFO sqlalchemy.engine.Engine ROLLBACK


## Working with databases and Pandas

- [SQL queries](https://pandas.pydata.org/docs/user_guide/io.html#sql-queries)

<table border="1" class="longtable docutils">
<colgroup>
<col width="10%">
<col width="90%">
</colgroup>
<tbody valign="top">
<tr class="row-odd"><td><a class="reference internal" href="../reference/api/pandas.read_sql_table.html#pandas.read_sql_table" title="pandas.read_sql_table"><code class="xref py py-obj docutils literal notranslate"><span class="pre">read_sql_table</span></code></a>(table_name,&nbsp;con[,&nbsp;schema,&nbsp;…])</td>
<td>Read SQL database table into a DataFrame.</td>
</tr>
<tr class="row-even"><td><a class="reference internal" href="../reference/api/pandas.read_sql_query.html#pandas.read_sql_query" title="pandas.read_sql_query"><code class="xref py py-obj docutils literal notranslate"><span class="pre">read_sql_query</span></code></a>(sql,&nbsp;con[,&nbsp;index_col,&nbsp;…])</td>
<td>Read SQL query into a DataFrame.</td>
</tr>
<tr class="row-odd"><td><a class="reference internal" href="../reference/api/pandas.read_sql.html#pandas.read_sql" title="pandas.read_sql"><code class="xref py py-obj docutils literal notranslate"><span class="pre">read_sql</span></code></a>(sql,&nbsp;con[,&nbsp;index_col,&nbsp;…])</td>
<td>Read SQL query or database table into a DataFrame.</td>
</tr>
<tr class="row-even"><td><a class="reference internal" href="../reference/api/pandas.DataFrame.to_sql.html#pandas.DataFrame.to_sql" title="pandas.DataFrame.to_sql"><code class="xref py py-obj docutils literal notranslate"><span class="pre">DataFrame.to_sql</span></code></a>(self,&nbsp;name,&nbsp;con[,&nbsp;schema,&nbsp;…])</td>
<td>Write records stored in a DataFrame to a SQL database.</td>
</tr>
</tbody>
</table>

In [54]:
import pandas as pd
import numpy as np
from sqlalchemy import create_engine

engine = create_engine("sqlite:///:memory:")

### Writing a DataFrame to a SQL database

In [55]:
import datetime

c = ["id", "Date", "Col_1", "Col_2", "Col_3"]

d = [
    (26, datetime.datetime(2010, 10, 18), "X", 27.5, True),
    (42, datetime.datetime(2010, 10, 19), "Y", -12.5, False),
    (63, datetime.datetime(2010, 10, 20), "Z", 5.73, True),
]


data = pd.DataFrame(d, columns=c)

data

Unnamed: 0,id,Date,Col_1,Col_2,Col_3
0,26,2010-10-18,X,27.5,True
1,42,2010-10-19,Y,-12.5,False
2,63,2010-10-20,Z,5.73,True


In [56]:
data.to_sql("data", engine)

3

In [57]:
data.to_sql("data_chunked", engine, chunksize=500)

3

In [58]:
from sqlalchemy import inspect

inspector = inspect(engine)
print(inspector.get_table_names())

['data', 'data_chunked']


### SQL data types

In [59]:
from sqlalchemy.types import String

data.to_sql("data_dtype", engine, dtype={"Col_1": String})

3

**if_exists : {‘fail’, ‘replace’, ‘append’}, default ‘fail’**

How to behave if the table already exists.
- fail: Raise a ValueError.
- replace: Drop the table before inserting new values.
- append: Insert new values to the existing table.

In [60]:
# generate error
#data.to_sql("data", engine, if_exists="fail")

In [61]:
data.to_sql("data", engine, if_exists="append")

3

In [62]:
data.to_sql("data", engine, if_exists="replace")

3

### Importing data from a SQL database table

- [read_sql_table](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_table.html#pandas.read_sql_table)

In [63]:
data = pd.read_sql_table("data", engine)

In [64]:
data

Unnamed: 0,index,id,Date,Col_1,Col_2,Col_3
0,0,26,2010-10-18,X,27.5,True
1,1,42,2010-10-19,Y,-12.5,False
2,2,63,2010-10-20,Z,5.73,True


In [65]:
data.dtypes

index             int64
id                int64
Date     datetime64[ns]
Col_1            object
Col_2           float64
Col_3              bool
dtype: object

In [66]:
pd.read_sql_table("data", engine, index_col="id")

Unnamed: 0_level_0,index,Date,Col_1,Col_2,Col_3
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
26,0,2010-10-18,X,27.5,True
42,1,2010-10-19,Y,-12.5,False
63,2,2010-10-20,Z,5.73,True


In [67]:
pd.read_sql_table("data", engine, parse_dates=["Date"])

Unnamed: 0,index,id,Date,Col_1,Col_2,Col_3
0,0,26,2010-10-18,X,27.5,True
1,1,42,2010-10-19,Y,-12.5,False
2,2,63,2010-10-20,Z,5.73,True


### Querying a SQL database

In [68]:
pd.read_sql_query("SELECT * FROM data", engine)

Unnamed: 0,index,id,Date,Col_1,Col_2,Col_3
0,0,26,2010-10-18 00:00:00.000000,X,27.5,1
1,1,42,2010-10-19 00:00:00.000000,Y,-12.5,0
2,2,63,2010-10-20 00:00:00.000000,Z,5.73,1


In [69]:
pd.read_sql_query("SELECT id, Col_1, Col_2 FROM data WHERE id = 42;", engine)

Unnamed: 0,id,Col_1,Col_2
0,42,Y,-12.5


In [70]:
df = pd.DataFrame(np.random.randn(20, 3), columns=list("abc"))
df.head()

Unnamed: 0,a,b,c
0,0.899055,-1.153637,1.012166
1,1.077618,-0.583117,0.609518
2,1.521744,0.072881,0.20975
3,2.358321,-0.899797,0.615149
4,0.018241,-0.524184,-1.589565


In [71]:
df.to_sql("data_chunks", engine, index=False)

20

In [72]:
for chunk in pd.read_sql_query("SELECT * FROM data_chunks", engine, chunksize=5):
    print(chunk)

          a         b         c
0  0.899055 -1.153637  1.012166
1  1.077618 -0.583117  0.609518
2  1.521744  0.072881  0.209750
3  2.358321 -0.899797  0.615149
4  0.018241 -0.524184 -1.589565
          a         b         c
0  0.250138  0.954226  0.034682
1  0.151545 -0.936761 -0.520585
2  1.900041  1.086918 -0.187806
3 -0.474721 -0.612344  1.114790
4  0.353337  0.591369 -1.960167
          a         b         c
0 -1.283084 -0.922158  1.309969
1 -0.958558  0.305172  0.589642
2 -0.674304 -0.351499  0.841406
3  0.980569 -0.326798  0.252793
4  0.786163  0.272473  1.192571
          a         b         c
0  0.429923  0.286227  0.174444
1  1.059870  0.813949 -0.714172
2  0.707116 -0.236386 -2.144746
3 -1.081311  1.216574 -0.838694
4 -1.180923 -0.086709 -0.795342


In [73]:
import sqlalchemy

pd.read_sql_query(sqlalchemy.text("SELECT * FROM data where Col_1=:col1"), engine, params={"col1": "X"})

Unnamed: 0,index,id,Date,Col_1,Col_2,Col_3
0,0,26,2010-10-18 00:00:00.000000,X,27.5,1


In [76]:
from sqlalchemy import Table, Column
from sqlalchemy import MetaData

metadata = MetaData()

data_table = Table(
    "data",
    metadata,
    Column("index", sqlalchemy.Integer),
    Column("Date", sqlalchemy.DateTime),
    Column("Col_1", sqlalchemy.String),
    Column("Col_2", sqlalchemy.Float),
    Column("Col_3", sqlalchemy.Boolean),
)

In [77]:
pd.read_sql_query(data_table.select().where(data_table.c.Col_1 == "X"), engine)

Unnamed: 0,index,Date,Col_1,Col_2,Col_3
0,0,2010-10-18,X,27.5,True


## Primer: Uvoz podatkov iz CSV dokumenta v SQL bazo

In [78]:
weblog_df = pd.read_csv('data/weblogs_clean.csv')

In [79]:
weblog_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   IP      500 non-null    object
 1   Time    500 non-null    object
 2   Staus   500 non-null    int64 
 3   Method  500 non-null    object
dtypes: int64(1), object(3)
memory usage: 15.8+ KB


In [80]:
weblog_df.head()

Unnamed: 0,IP,Time,Staus,Method
0,10.128.2.1,29/Nov/2017:06:58:55,200,GET
1,10.128.2.1,29/Nov/2017:06:59:02,302,POST
2,10.128.2.1,29/Nov/2017:06:59:03,200,GET
3,10.131.2.1,29/Nov/2017:06:59:04,200,GET
4,10.130.2.1,29/Nov/2017:06:59:06,200,GET


[Pretvorbe](https://www.programiz.com/python-programming/datetime/strftime)

In [81]:
weblog_df['Time'] = pd.to_datetime(weblog_df['Time'], format='%d/%b/%Y:%H:%M:%S')

In [82]:
weblog_df.head()

Unnamed: 0,IP,Time,Staus,Method
0,10.128.2.1,2017-11-29 06:58:55,200,GET
1,10.128.2.1,2017-11-29 06:59:02,302,POST
2,10.128.2.1,2017-11-29 06:59:03,200,GET
3,10.131.2.1,2017-11-29 06:59:04,200,GET
4,10.130.2.1,2017-11-29 06:59:06,200,GET


In [83]:
weblog_df.rename(columns={'IP':'ip', 'Time':'timestamp', 'Staus':'status', 'Method':'method'}, inplace=True)

In [84]:
# dodamo HTTP_Ok če je Status enak 200
weblog_df['http_ok'] = weblog_df['status'] == 200

In [85]:
weblog_df.head()

Unnamed: 0,ip,timestamp,status,method,http_ok
0,10.128.2.1,2017-11-29 06:58:55,200,GET,True
1,10.128.2.1,2017-11-29 06:59:02,302,POST,False
2,10.128.2.1,2017-11-29 06:59:03,200,GET,True
3,10.131.2.1,2017-11-29 06:59:04,200,GET,True
4,10.130.2.1,2017-11-29 06:59:06,200,GET,True


Dodamo podatke v tabelo:

In [86]:
from sqlalchemy import create_engine
from sqlalchemy import DateTime, Integer, String, Boolean

In [87]:
# web.db v mapi data
engine = create_engine("sqlite:///data/web.db")

In [88]:
dtype_dict = {'ip': String(15), 
              'timestamp': DateTime(), 
              'status': Integer(), 
              'method': String(10), 
              'http_ok': Boolean()
}

In [89]:
# chunksize = 100, append, index = false
weblog_df.to_sql(name="weblog", con=engine, if_exists="append", chunksize=100, index=False, dtype=dtype_dict)

500

Preverimo podatke:

In [92]:
pd.read_sql_table("weblog", con=engine).head()

Unnamed: 0,ip,timestamp,status,method,http_ok
0,10.128.2.1,2017-11-29 06:58:55,200,GET,True
1,10.128.2.1,2017-11-29 06:59:02,302,POST,False
2,10.128.2.1,2017-11-29 06:59:03,200,GET,True
3,10.131.2.1,2017-11-29 06:59:04,200,GET,True
4,10.130.2.1,2017-11-29 06:59:06,200,GET,True


In [93]:
# 'ip', 'timestamp', 'method'
pd.read_sql_table('weblog', 
                  engine,  
                  columns=['ip', 'timestamp', 'method']).head()

Unnamed: 0,ip,timestamp,method
0,10.128.2.1,2017-11-29 06:58:55,GET
1,10.128.2.1,2017-11-29 06:59:02,POST
2,10.128.2.1,2017-11-29 06:59:03,GET
3,10.131.2.1,2017-11-29 06:59:04,GET
4,10.130.2.1,2017-11-29 06:59:06,GET


<div class="alert alert-block alert-info">
<b>Vaja: </b> Write a SQL query that returns a df with all columns for ip = '10.128.2.1' using method GET. 
</div>

In [94]:
pd.read_sql_query("SELECT * FROM weblog WHERE ip='10.128.2.1' AND method='GET';", engine).head()

Unnamed: 0,ip,timestamp,status,method,http_ok
0,10.128.2.1,2017-11-29 06:58:55.000000,200,GET,1
1,10.128.2.1,2017-11-29 06:59:03.000000,200,GET,1
2,10.128.2.1,2017-11-29 06:59:19.000000,200,GET,1
3,10.128.2.1,2017-11-29 13:38:20.000000,200,GET,1
4,10.128.2.1,2017-11-29 13:38:20.000000,200,GET,1


Using SQLAlchemy expressions: