<h1 style="text-align: center;">What is a database and why do we need it?</h1>

<h1 style="text-align: center;">Python Collections</h1>

<h1 style="text-align: center;">list: [1, 2, 3 ...]</h1>

- ### expandable collection of objects that can be accessed with an index
- ### useful for 'I have a bunch of examples of X'
- ### or 'I need to enforce order'

<h1 style="text-align: center;"> tuple (1, 2, 3 ...) </h1>

- ### immutable collection of objects that can be accessed with an index
- ### useful for rows in a spreadsheet/database, arguments to a function, or other correlated data
- ### immutability is useful for sharing information outside the python process.
- ### look up namedtuple

<h1 style="text-align: center;">dictionary {"one": 1, "two": 2 ...}</h1>

- ### collection of key value pairs that provides 'jump to' access
- ### useful for quick lookup

<h1 style="text-align: center;">set {1, 2, 3, ...}</h1>

- ### unordered collection of unique values
- ### useful for when duplicate values aren't interesting
- ### Or quick membership testing

```python
if day in {'Saturday', 'Sunday', 'Monday'}:
    ...
```

### What collections would you combine to create a phone book?


- ### Databases have advanced <span style="color: red;">data modeling</span> capabilities.
- ### Databases provide <span style="color: red;">consistency</span> through <span style="color: red;">transactions</span>.
- ### Once your data no longer fits in ram, can still access it through an <span style="color: red;">index</span>.
- ### Relational databases allow queries through the <span style="color: red;">declarative language</span> SQL language.

<h1 style="text-align: center;">Relational Modeling</h1>

### How do you find the number of trees at Jordan's house?

### Before relational databases, there were hierarchical databases and network databases.

---

- ### Jordan
    - ### Jordan's house
        - ### Jordan's yard
            - ### Tree 1
            - ### Tree 2
            
---

### What if we want to count the trees on one block?

# Relational models use non-heirarchical tables.

<table style="border-collapse: separate; border-spacing: 30px;">
<tr><td>
    
|  <span style="font-size: 30px;">Name</span>  |  <span style="font-size: 30px;">House</span>  |
| ------ | ------- |
| <span style="font-size: 30px;">Jean</span>   | <span style="font-size: 30px;">House 1</span> |
| <span style="font-size: 30px;">Jordan</span> | <span style="font-size: 30px;">House 2</span> |
| <span style="font-size: 30px;">Jordan</span> | <span style="font-size: 30px;">House 3</span> |

</td><td>

|  <span style="font-size: 30px;">House</span>  | <span style="font-size: 30px;">Location</span> |
| ------- | -------- |
| <span style="font-size: 30px;">House 1</span> |   <span style="font-size: 30px;">Loc 1</span>  |
| <span style="font-size: 30px;">House 2</span> |   <span style="font-size: 30px;">Loc 1</span>  |
| <span style="font-size: 30px;">House 3</span> |   <span style="font-size: 30px;">Loc 2</span>  |

</td><td>

|  <span style="font-size: 30px;">Tree</span>  | <span style="font-size: 30px;">Location</span> |
| ------ | -------- |
| <span style="font-size: 30px;">Tree 1</span> |   <span style="font-size: 30px;">Loc 2</span>  |
| <span style="font-size: 30px;">Tree 2</span> |   <span style="font-size: 30px;">Loc 2</span>  |
| <span style="font-size: 30px;">Tree 3</span> |   <span style="font-size: 30px;">Null</span>   |

</td>
</tr></table>

This allows us to find the number of trees at Jordan's house by
1. Finding all Jordan's houses
2. Finding all the locations of Jordan's houses
3. Finding all the trees with those locations

### Relational databases make it
* ### Easier to decide who owns what information
* ### Easier to keep data from duplicating
* ### Easier to keep data consistent

<h1 style="text-align: center;">Let's try it out</h1>

### Get a sqlite dataset of trees in San Fransisco from 
### https://san-francisco.datasettes.com/sf-trees-ebc2ad9

In [15]:
import sqlite3
from pprint import pprint

# Return one row with the column names attached
with sqlite3.connect('sf-trees.db', timeout=1) as conn:
    conn.row_factory = sqlite3.Row  # return keys and values in rows
    cursor = conn.execute('''
            SELECT * FROM Street_Tree_List
            LIMIT 1;
            '''
        )
        
    for row in cursor:
        pprint(dict(row))

{'DBH': 21,
 'Latitude': 37.7759676911831,
 'Location': '(37.7759676911831, -122.441396661871)',
 'Longitude': -122.441396661871,
 'PermitNotes': 'Permit Number 25401',
 'PlantDate': '07/21/1988 12:00:00 AM',
 'PlantType': 1,
 'PlotSize': 'Width 0ft',
 'SiteOrder': 1,
 'TreeID': 141565,
 'XCoord': 6000609.0,
 'YCoord': 2110829.0,
 'qAddress': '501X Baker St',
 'qCareAssistant': None,
 'qCaretaker': 1,
 'qLegalStatus': 1,
 'qSiteInfo': 1,
 'qSpecies': 1}


In [30]:
# Return all values from qCaretaker
with sqlite3.connect('sf-trees.db', timeout=1) as conn:
    cursor = conn.execute("SELECT value FROM qCaretaker;")
        
    for row in cursor:
        print(row)

('Private',)
('DPW',)
('Rec/Park',)
('SFUSD',)
('Dept of Real Estate',)
('MTA',)
('Mayor Office of Housing',)
('Health Dept',)
('Port',)
('DPW for City Agency',)
('Fire Dept',)
('City College',)
('PUC',)
('Public Library',)
('Purchasing Dept',)
('Asian Arts Commission',)
('War Memorial',)
('Police Dept',)
('Arts Commission',)
('Housing Authority',)
('Office of Mayor',)


In [32]:
# Return the address of all trees that are taken care of by the police
with sqlite3.connect('sf-trees.db', timeout=1) as conn:
    cursor = conn.execute('''
            SELECT Street_Tree_List.qAddress
            FROM Street_Tree_List, qCaretaker
            WHERE Street_Tree_List.qCaretaker == qCaretaker.id
            AND qCaretaker.value == "Police Dept"
            ;
            '''
        )
        
    for row in cursor:
        print(row)

('1499 TURK ST',)
('1499 Turk St',)
('1499 TURK ST',)
('1499 TURK ST',)
('1499 Turk St',)
('1499 TURK ST',)
('1499 TURK ST',)
('1499 TURK ST',)
('1499 TURK ST',)
('1499 Turk St',)
('1499 TURK ST',)
('1499 TURK ST',)
('1499 Turk St',)
('630 Valencia St',)
('630 Valencia St',)
('630 Valencia St',)
('1401X Turk St',)
('1401X Turk St',)
('3403x 17TH ST',)
('1401X Turk St',)
('3401x 17TH ST',)
('1150X Steiner St',)
('1150X Steiner St',)
('630 Valencia St',)
('1401X Turk St',)
('3403x 17TH ST',)
('1401X Turk St',)
('1150X Steiner St',)
('630 Valencia St',)
('1401X Turk St',)
('630 Valencia St',)
('3401x 17TH ST',)
('1401X Turk St',)
('3403x 17TH ST',)
('630 Valencia St',)
('3403x 17TH ST',)
('1150X Steiner St',)
('630 Valencia St',)
('630 Valencia St',)
('1401X Turk St',)
('1401X Turk St',)
('3401x 17TH ST',)
('1401X Turk St',)
('1401X Turk St',)
('1401X Turk St',)
('1401X Turk St',)
('1150X Steiner St',)
('3403x 17TH ST',)
('3403x 17TH ST',)
('425 07th St',)


- ### Find the address of all trees that are cared for by the war memorial
- ### Find all trees that are cared for by the police, and have a legal status of "Significant Tree"

### Notes on SQL
* ### The Sqlite documentation is probably my favorite documentation
    * ### https://sqlite.org/lang.html
* ### There are an amazing number of resources on using SQL
* ### I think it is really easy to write bugs in SQL

<h1 style="text-align: center;">Transactional Consistency</h1>
<h3 style="text-align: center;">Avoid invalid intermediate states.</h3>

In [0]:
# Without transactions, we can accidentally create money

Bank1 = 100
Bank2 = 50

Bank1 += 50
1 / 0  # uh oh!
Bank2 -= 50


In [None]:
with sqlite3.connect('sf-trees.db', timeout=1) as conn:
    with conn as transaction:
        transaction.execute('''
            UPDATE banks
            SET cash = cash + 50
            WHERE name == "bank 1"
        ''')
        
        1 / 0  # we are fine
        
        transaction.execute('''
            UPDATE banks
            SET cash = cash + 50
            WHERE name == "bank 1"
        ''')

<h3 style="text-align: center;">While we are talking about defensive programming</h3>

- Why do we need a timeout?
- What is the outer with statement doing here?

<h1 style="text-align: center;">SQLite</h1>

<h3 style="text-align: center;">Why use SQLite?</h3> 

- "Tiny, self contained, serverless, zero-configuration, transactional SQL database"
- "The intent of the developers is to support SQLite through the year 2050"
- "Think of SQLite not as a replacement for Oracle but as a replacement for fopen"
- Has my favorite documentation of any software project -- https://www.sqlite.org


<h3 style="text-align: center;">Wait a second, Serverless?</h3>

In [10]:
#Create database locking error

conn1 = sqlite3.connect('test.db', timeout=1)
conn2 = sqlite3.connect('test.db', timeout=1)
cur1 = conn1.cursor()
cur2 = conn2.cursor()

cur1.execute('''INSERT INTO student VALUES (1, 'Jared', 'Garst', '');''')

#cur2.execute('''INSERT INTO student VALUES (2, 'Jared', 'Garst', 'Mad Dog');''')


OperationalError: database is locked

<h3 style="text-align: center;">Create a Table</h3>

- go to https://archive.ics.uci.edu/ml/datasets/forest+fires and look at the csv dataset
- use the sqlite documentation to create a table that can store this information. https://www.sqlite.org/datatype3.html
- insert the first row of the table with

```SQL
INSERT INTO tablename VALUES (col1, col2, ...);
```

- check that the data is in the table with

```python
cur.execute('SELECT * FROM tablename;').fetchall()
```

- We will iterate and insert the rest of the data later.

<h3 style="text-align: center;">Always keep Data and Logic seperate</h3>

In [65]:
# Little bobby tables
name = "); DROP TABLE student;"
with sqlite3.connect('test.db', timeout=1) as conn:
    with conn as cur:
        cur.execute(f'INSERT INTO student VALUES (3, ?, "b", ?);', ('Jared', 'Mad Dog'))

<h1 style="text-align: center;">Download and insert the rest of the fires data into your database</h1>

<h1 style="text-align: center;">SQL uses indices to access the data.</h1>

- ### Indices take up space, and computational power on insertion.
- ### They allow you to decouple the question of 'how am I storing my data', and 'how am I accessing my data'.
- ### They can be used to enforce consistency accross rows.

<h3 style="text-align: center;">SQL is declarative. You tell it what you want, it figures out how to get there</h3>

- ### The set of actions it takes can vary wildly, depending on if you are operating on 100, 10,000 or 1,000,000 rows
- ### Ask it what it is doing by using Explain

<h1 style="text-align: center;">Key Takeaways</h1>

- ### Databases are the only realistic way to keep your data uncorrupted
- ### Databases allow sophisticated control over 
    - ### How the data is stored
    - ### How to data is verified
    - ### How the data is accessed
- ### Databases decouple those questions. Changing the answer to one does not change the answer to the others.