# Using SQLite with Python

Documentation can be found [here](https://docs.python.org/3/library/sqlite3.html).

In [3]:
import pandas as pd
import sqlite3 as sq

## Some DB basics

SQLite offers persistent storage instead of using RAM, and offers full CRUD support. RAM offers really fast access to data, but as we learnt in the last module, RAM is volatile, so any data disappears at shutdown, and it cannot be accessed by multiple users. SQL still gives fast access to data, but it is stored on the hard drive or external servers; it also allows multiple users to query simultaneously, and and stores data relationally using tables to allow for more efficient storage.

SQL gives us relational databases, as opposed to NoSQL, which is a document-based language. The emphasis is on storing data using keys and values to avoid unnecessarily large data being repeated multiple times in a single column.

## Comparison between Pandas and SQLite

More documentation [here](https://pandas.pydata.org/pandas-docs/stable/getting_started/comparison/comparison_with_sql.html).

Storing large datasets in csv becomes problematic when manual updates are required, for example updating addresses where a user is listed multiple times. Queries in SQL get around this by systematically editing data based on select parameters, and you can easily store queries that are re-run when triggered.

It is possible to join tables from CSV files based on common values using the `merge` command and the <i>left_on</i>,<i>right_on</i> and <i>how</i> parameters. But SQL allows for relationships between columns that can be referred to when filtering data from different tables, so it's only ever a line of code to do this.


## Creating a SQLite database

In [13]:
# follows this tutorial: https://docs.python.org/3/library/sqlite3.html

# creates a database if it doesn't already exist, if not just connects to the existing file
con = sq.connect("tutorial.db")
# all dbs need a cursor to execute statements and return results
cur = con.cursor()
# execute is an important command, it is how you pass SQL queries to Python
cur.execute("CREATE TABLE name(name text, address text)")
con.commit()

In [19]:
# confirms that table is now in the database
cur.execute("SELECT name FROM sqlite_master")
cur.fetchall()

[('name',)]

In [24]:
# insert data into the table
cur.execute("""
    INSERT INTO name VALUES
        ('aaaaadasl', 'madrid, spain'),
        ('fyhttmnhb', 'porto, portugal'),
        ('ddtgryuymns', 'sofia, bulgaria')
    """)
con.commit()

In [25]:
# confirms that data is now in the database
cur.execute("SELECT * FROM name")
cur.fetchall()

[('ffsjlvdasl', 'cologne, germany'),
 ('fgggggjk', 'town yetholm, scotland'),
 ('dkfkgfkgf', 'limerick, ireland'),
 ('aaaaadasl', 'madrid, spain'),
 ('fyhttmnhb', 'porto, portugal'),
 ('ddtgryuymns', 'sofia, bulgaria')]

In [26]:
# delete some data
cur.execute("DELETE FROM name WHERE name='aaaaadasl'")
con.commit()

In [27]:
# confirms that data is now deleted
cur.execute("SELECT * FROM name")
cur.fetchall()

[('ffsjlvdasl', 'cologne, germany'),
 ('fgggggjk', 'town yetholm, scotland'),
 ('dkfkgfkgf', 'limerick, ireland'),
 ('fyhttmnhb', 'porto, portugal'),
 ('ddtgryuymns', 'sofia, bulgaria')]

In [32]:
# insert lots of items, presumably you can pass csv data through this structure?
data = [
    ("grffdfda","reykjavik, iceland"),
    ("jghuyuyj","brussels, belgium"),
    ("retrtuykjh","accra, ghana"),
]
cur.executemany("INSERT INTO name VALUES(?,?)", data)
con.commit()

In [34]:
# confirms that executemany was successful, uses ROWID to return primary key
cur.execute("SELECT ROWID, name, address FROM name")
cur.fetchall()

[(1, 'ffsjlvdasl', 'cologne, germany'),
 (2, 'fgggggjk', 'town yetholm, scotland'),
 (3, 'dkfkgfkgf', 'limerick, ireland'),
 (5, 'fyhttmnhb', 'porto, portugal'),
 (6, 'ddtgryuymns', 'sofia, bulgaria'),
 (7, 'grffdfda', 'reykjavik, iceland'),
 (8, 'jghuyuyj', 'brussels, belgium'),
 (9, 'retrtuykjh', 'accra, ghana'),
 (10, 'grffdfda', 'reykjavik, iceland'),
 (11, 'jghuyuyj', 'brussels, belgium'),
 (12, 'retrtuykjh', 'accra, ghana')]

There are a number of duplicates above as I've run the code twice. Some databases wouldn't permit duplicate items but SQLite does, because it uses primary keys to determine whether an item is unique.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html