#  Delo s podatkovnimi bazami in SQL

## Introduction to Databases


<p><img alt="Database Workflow" src="images/database_workflow.svg"></p>



## SQLAlchemy

The [SQLAlchemy](https://www.sqlalchemy.org/) SQL Toolkit and Object Relational Mapper is a comprehensive set of tools for working with databases and Python. It provides a full suite of well-known enterprise-level persistence patterns, designed for efficient and high-performing database access. SQLAlchemy and Django's ORM are two of the most widely used object-relational mapping tools in the Python community.

The SQLAlchemy has three ways of working with database data:
- Raw SQL
- SQL Expression language
- ORM


### Installations

Here we show how to install SQLAlchemy and other necessary packages:

`pip install SQLAlchemy`

We install the DBAPI drivers for PostgreSQL and MySQL. SQLAlchemy depends on these modules. The sqlite module is distributed with Python.

**[Dialects](https://docs.sqlalchemy.org/en/13/dialects/index.html)**

The dialect is the system SQLAlchemy uses to communicate with various types of DBAPI implementations and databases. The sections that follow contain reference documentation and notes specific to the usage of each backend, as well as notes for the various DBAPIs.

All dialects require that an appropriate DBAPI driver is installed.

Support for the MySQL database via the [PyMySQL driver](https://docs.sqlalchemy.org/en/13/dialects/mysql.html#module-sqlalchemy.dialects.mysql.pymysql).

`pip install PyMySQL`

Support for the PostgreSQL database via the [psycopg2 driver](https://docs.sqlalchemy.org/en/13/dialects/postgresql.html#module-sqlalchemy.dialects.postgresql.psycopg2).

`pip install psycopg2`

### SQLAlchemy version

### Connecting

To connect we use create_engine():

> **Lazy Connecting**: The Engine, when first returned by create_engine(), has not actually tried to connect to the database yet; that happens only the first time it is asked to perform a task against the database.

#### Database Urls

<p>The <a class="reference internal" href="#sqlalchemy.create_engine" title="sqlalchemy.create_engine"><code class="xref py py-func docutils literal notranslate"><span class="pre">create_engine()</span></code></a> function produces an <a class="reference internal" href="connections.html#sqlalchemy.engine.Engine" title="sqlalchemy.engine.Engine"><code class="xref py py-class docutils literal notranslate"><span class="pre">Engine</span></code></a> object based
on a URL.  These URLs follow <a class="reference external" href="http://rfc.net/rfc1738.html">RFC-1738</a>, and usually can include username, password,
hostname, database name as well as optional keyword arguments for additional configuration.
In some cases a file path is accepted, and in others a “data source name” replaces
the “host” and “database” portions.  The typical form of a database URL is:</p>

`dialect+driver://username:password@host:port/database`

<p>Dialect names include the identifying name of the SQLAlchemy dialect,
a name such as <code class="docutils literal notranslate"><span class="pre">sqlite</span></code>, <code class="docutils literal notranslate"><span class="pre">mysql</span></code>, <code class="docutils literal notranslate"><span class="pre">postgresql</span></code>, <code class="docutils literal notranslate"><span class="pre">oracle</span></code>, or <code class="docutils literal notranslate"><span class="pre">mssql</span></code>.
The drivername is the name of the DBAPI to be used to connect to
the database using all lowercase letters. If not specified, a “default” DBAPI
will be imported if available - this default is typically the most widely
known driver available for that backend.</p>

PostgreSQL:

In [None]:
# psycopg2 driver
engine = create_engine('postgresql+psycopg2://scott:tiger@localhost/mydatabase')

MySQL:

In [None]:
# PyMySQL driver
engine = create_engine('mysql+pymysql://scott:tiger@localhost/foo')

SQLite:

In [None]:
# sqlite://<nohostname>/<path>
# where <path> is relative:
engine = create_engine('sqlite:///foo.db')

### Execute SQL statements

In [None]:
from sqlalchemy import create_engine

eng = create_engine('sqlite:///data/logs.db')





In [None]:
with eng.connect() as con:
    rs = con.execute('SELECT * FROM weblog LIMIT 5;')        
    
    
    
    print(data)

In [None]:
with eng.connect() as con:
    rs = con.execute('SELECT * FROM weblog LIMIT 5;')        
    
    
    print(data)

In [None]:
with eng.connect() as con:
    rs = con.execute('SELECT * FROM weblog LIMIT 5;')        
    
    
    print(data)

## Working with databases and Pandas

In [None]:
import pandas as pd
import numpy as np

The key functions are:

<table border="1" class="longtable docutils">
<colgroup>
<col width="10%">
<col width="90%">
</colgroup>
<tbody valign="top">
<tr class="row-odd"><td><a class="reference internal" href="../reference/api/pandas.read_sql_table.html#pandas.read_sql_table" title="pandas.read_sql_table"><code class="xref py py-obj docutils literal notranslate"><span class="pre">read_sql_table</span></code></a>(table_name,&nbsp;con[,&nbsp;schema,&nbsp;…])</td>
<td>Read SQL database table into a DataFrame.</td>
</tr>
<tr class="row-even"><td><a class="reference internal" href="../reference/api/pandas.read_sql_query.html#pandas.read_sql_query" title="pandas.read_sql_query"><code class="xref py py-obj docutils literal notranslate"><span class="pre">read_sql_query</span></code></a>(sql,&nbsp;con[,&nbsp;index_col,&nbsp;…])</td>
<td>Read SQL query into a DataFrame.</td>
</tr>
<tr class="row-odd"><td><a class="reference internal" href="../reference/api/pandas.read_sql.html#pandas.read_sql" title="pandas.read_sql"><code class="xref py py-obj docutils literal notranslate"><span class="pre">read_sql</span></code></a>(sql,&nbsp;con[,&nbsp;index_col,&nbsp;…])</td>
<td>Read SQL query or database table into a DataFrame.</td>
</tr>
<tr class="row-even"><td><a class="reference internal" href="../reference/api/pandas.DataFrame.to_sql.html#pandas.DataFrame.to_sql" title="pandas.DataFrame.to_sql"><code class="xref py py-obj docutils literal notranslate"><span class="pre">DataFrame.to_sql</span></code></a>(self,&nbsp;name,&nbsp;con[,&nbsp;schema,&nbsp;…])</td>
<td>Write records stored in a DataFrame to a SQL database.</td>
</tr>
</tbody>
</table>

> **Note:** The function read_sql() is a convenience wrapper around read_sql_table() and read_sql_query() (and for backward compatibility) and will delegate to specific function depending on the provided input (database table name or sql query). Table names do not need to be quoted if they have special characters.

<p>To connect with SQLAlchemy you use the <code class="xref py py-func docutils literal notranslate"><span class="pre">create_engine()</span></code> function to create an engine
object from database URI. You only need to create the engine once per database you are
connecting to.
For more information on <code class="xref py py-func docutils literal notranslate"><span class="pre">create_engine()</span></code> and the URI formatting, see the examples
below and the SQLAlchemy <a class="reference external" href="https://docs.sqlalchemy.org/en/latest/core/engines.html">documentation</a></p>

### Importing data from database

#### [read_sql_table](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_table.html#pandas.read_sql_table)

Read SQL database table into a DataFrame.

Given a table name and a SQLAlchemy connectable, returns a DataFrame. This function does not support DBAPI connections.

In [None]:
eng = create_engine('sqlite:///data/logs.db')

#### [read_sql_query](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql_query.html#pandas.read_sql_query)

Read SQL query into a DataFrame.

Returns a DataFrame corresponding to the result set of the query string. Optionally provide an index_col parameter to use one of the columns as the index, otherwise default integer index will be used.

In [None]:
query ="SELECT ip, status FROM weblog WHERE status = 302;"

<div class="alert alert-block alert-info">
<b>Vaja: </b> Write a SQL query that returns a df with all columns for ip = '10.128.2.1' using method GET. Id should be the index.
</div>

### Writing a DataFrame to a SQL database

Assuming the following data is in a DataFrame data, we can insert it into the database using to_sql().

#### [to_sql](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html)

Write records stored in a DataFrame to a SQL database.

Databases supported by SQLAlchemy are supported. Tables can be newly created, appended to, or overwritten.

Create an in-memory SQLite database.

In [None]:
engine = create_engine('sqlite:///:memory:', echo=False)

In [None]:
df = pd.DataFrame({'name' : ['User 1', 'User 2', 'User 3']})

In [None]:
df

In [None]:
pd.read_sql_table('users', engine)

In [None]:
df1 = pd.DataFrame({'name' : ['User 4', 'User 5']})

**if_exists : {‘fail’, ‘replace’, ‘append’}, default ‘fail’**

How to behave if the table already exists.
- fail: Raise a ValueError.
- replace: Drop the table before inserting new values.
- append: Insert new values to the existing table.

In [None]:
df1

In [None]:
pd.read_sql_table('users', engine)

#### SQL data types

<p><a class="reference internal" href="../reference/api/pandas.DataFrame.to_sql.html#pandas.DataFrame.to_sql" title="pandas.DataFrame.to_sql"><code class="xref py py-func docutils literal notranslate"><span class="pre">to_sql()</span></code></a> will try to map your data to an appropriate
SQL data type based on the dtype of the data. When you have columns of dtype
<code class="docutils literal notranslate"><span class="pre">object</span></code>, pandas will try to infer the data type.</p>

You can always override the default type by specifying the desired SQL type of
any of the columns by using the <code class="docutils literal notranslate"><span class="pre">dtype</span></code> argument. This argument needs a
dictionary mapping column names to [SQLAlchemy types](https://docs.sqlalchemy.org/en/13/core/type_basics.html#generic-types) (or strings for the sqlite3
fallback mode).
For example, specifying to use the sqlalchemy <code class="docutils literal notranslate"><span class="pre">String</span></code> type instead of the
default <code class="docutils literal notranslate"><span class="pre">Text</span></code> type for string columns:

Specify the dtype (especially useful for integers with missing values). Notice that while pandas is forced to store the data as floating point, the database supports nullable integers. When fetching the data with Python, we get back integer scalars.

In [None]:
df = pd.DataFrame({"A": [1, None, 2], "B": ['dsds', 'haha', 'ldld']})

In [None]:
df

In [None]:
pd.read_sql_table('users', engine)

<p>With some databases, writing large DataFrames can result in errors due to
packet size limitations being exceeded. This can be avoided by setting the
<code class="docutils literal notranslate"><span class="pre">chunksize</span></code> parameter when calling <code class="docutils literal notranslate"><span class="pre">to_sql</span></code>.  For example, the following
writes <code class="docutils literal notranslate"><span class="pre">data</span></code> to the database in batches of 1000 rows at a time:</p>

```python
data.to_sql('data_chunked', engine, chunksize=1000)
```

### Primer: Uvoz podatkov iz CSV dokumenta v SQL bazo

In [None]:
weblog_df = pd.read_csv('data/weblogs_clean.csv')

In [None]:
weblog_df.info()

In [None]:
weblog_df.head()

[Pretvorbe](https://www.programiz.com/python-programming/datetime/strftime)

In [None]:
weblog_df.rename(columns={'IP':'ip', 'Time':'timestamp', 'Staus':'status', 'Method':'method'}, inplace=True)

In [None]:
from sqlalchemy import create_engine
from sqlalchemy import DateTime, Integer, String, Boolean

Dodamo podatke v tabelo:

In [None]:
dtype_dict = {'ip': String(15), 
              'timestamp': DateTime(), 
              'status': Integer(), 
              'method': String(10), 
              'http_ok': Boolean()
}

Preverimo podatke:

In [None]:
pd.read_sql_table('weblog', engine).head()