# Week 2: Relational Databases and Python

```mermaid
flowchart TD
    Week1["Week 1: Introduction to Databases"]
    Week2["Week 2: SQL Basics & Python"] 
    Week3["Week 3: SQL Querying & Pandas"]
    Week4["Week 4: Split-Apply-Combine"]
    Week5["Week 5: Aggregation & Transformation"]
    
    Week1 --> Week2
    Week2 --> Week3
    Week3 --> Week4
    Week4 --> Week5
    
    style Week2 fill:#f96,stroke:#333,stroke-width:4px
```

## Learning Objectives

By the end of today's class, you will be able to:
- Explain the components of a relational database system
- Write basic SQL statements (CREATE, INSERT, SELECT, WHERE)
- Understand type systems and their impact on database design
- Connect database concepts to Python programming
- Apply these skills to solve information management problems

### Announcements?

## Office Hours

By Appointment.

https://links.porg.dev/book

## Review

- relational database
- SQL
- no-SQL
- structured data
- semi-structured data

## Review Lab Skills

- What's a 'notebook'?
- Python
   - What are lists?
   - How do strings, integers, and floats differ?
       - What happens when you add them?
   - How do you set a variable?

## Our map of skills

Let's take a moment for some tentative wayfinding.

We start class with a great deal of new concepts, and we'll learn them by taking time to think through how they relate.

- What's the relationship of
   - relational databases to SQL?
   - structured data to relational databases?
   - Python to Colab?
   - Python to SQL?

Discuss this in groups of 2 or 3.

*I will get redundant about Python versus SQL. If you get it, forgive the repetition - but this early in the quarter, it's tricky to juggle the two concepts concurrently*

## Librarianship

### What is Data?

‘any  information  that  can  be  stored  in  digital  form,  including  text,  numbers,  images,  video  or  movies,  audio,  software,  algorithms,  equations,  animations,  models,  simulations,  etc.’ 

National Science Board. (2005). Long-Lived  Digital  Data Collections: Enabling Research and  Education in the 21st Century. Arlington, VA: National Science Foundation.

![](../images/big-data.png)

**Library articles discussing 'big data'**

Zhan, M., & Widén, G. (2017). Understanding big data in librarianship. Journal of Librarianship and Information Science, 0961000617742451. https://doi.org/10.1177/0961000617742451

### What is 'Big Data' in Librarianship?

- Volume: Lots of data!
- Velocity: Working with data that comes in quick.
- Variety: Working with a wide variety of data and data types
- Veracity: Ensuring the reliability and integrity of data

Fosso Wamba, S., Akter, S., Edwards, A., Chopin, G., & Gnanzou, D. (2015). How ‘big data’ can make big impact: Findings from a systematic review and a longitudinal case study. International Journal of Production Economics, 165, 234–246. https://doi.org/10.1016/j.ijpe.2014.12.031

From a review of 35 library articles (Zhan & Widén 2017):
 - 29 mention volume, 12 mention velocity, 12 mention variety, 12 mention veracity

## Data Literacy

> the ability to understand, use and manage data.

Koltay, T. (2017). Data literacy for researchers and data librarians. Journal of Librarianship and Information Science, 49(1), 3–14. https://doi.org/10.1177/0961000615616450

> All librarians 
are interested in information literacy; archivists and data 
librarians are interested in data literacy.

Schield, M. (2004). Information Literacy, Statistical Literacy and Data Literacy.

Such as... 
> [Information Science] students need to understand a wide variety 
of tools for accessing, converting and manipulating data.  
These may need to understand structured query language 
(SQL), relational databases (e.g. MS Access), data 
manipulation techniques, statistical software (e.g., SPSS, 
STATA, Minitab and MS Excel) and data presentation 
software (e.g., MS Excel and MS PowerPoint).  

### Core competencies: What should a data literate librarian know?

- Curation: active management of research data
- Wrangling: ability to process, clean, normalize, and transform data
- Mix of practical skills and theory (Cox et al 2013, via Semeler et al 2017)
- Adaptable to new contexts and technical needs

Semeler et al (2017):
    
 - specific knowledge on data usage, e.g. data types, metadata standards, legal and regulatory details, preservation
 - knowledge of info tech, such as Python, SQL, Java, XML, design of databases, large data, NLP

> A  data  librarian  need  not  become  a  programmer,  but should  be  interested  in  learning  about  the  languages  and  
programming  logic  of  computers

#### [Digital Curation Librarian - at Wake Forest University](https://jobs.diglib.org/job/digital-curation-librarian-4/)

"Desired:

- Demonstrated skills with scripting languages and/or tools for data manipulation (e.g. OpenRefine, Python, XSLT, etc.)"

#### [Metadata Librarian - at University of Florida](https://jobs.diglib.org/job/metadata-librarian-7/)

"Required:

- Demonstrated interest in and experience with any of the following tools (or others like them) to create, extract, transform, analyze, and/or quality control metadata: XSLT, Microsoft Excel, MarcEdit, OpenRefine, *scripting languages*, regular expressions, *SQL*."

#### [Library Systems Supervisor - City of Aurora](https://joblist.ala.org/job/library-systems-supervisor/40162599/)

Minimum Qualifications
     
 - Master's Degree in Information Systems or Library science 
 
Our ideal candidate must possess:
     
 - Knowledge of troubleshooting database issues
 - Strong proficiency with SQL and its variation among popular databases
 - Knowledge of Basic Scripting and Web Management

#### [Digital Library Software Architect - at University of Colorado Boulder](https://jobs.diglib.org/job/digital-library-software-architect-2/)

"What we require:

- Demonstrated programming skills in a programming language such as Javascript, Java, Python, and REST or SOAP APIs.
- Experience with database management software (e.g. MySQL) and search/indexing software such as Apache Solr."



#### [Business Research Librarian - at Stanford](https://joblist.ala.org/job/business-research-librarian/39943022/)

"Desired qualifications – Knowledge of SQL, statistical software packages and programming languages, such as Python."

## Relational Databases

Let's consider the parts of relational databases by zooming out from the finest value.

### Field

A single piece of data.

e.g. One datum.
e.g. A single column of a single record.

![](../images/part-dbms/field.png)

### Record

A collection of related fields, represented as one row in the table.

![](../images/part-dbms/record.png)

_What does a record represent?_

### Table

A collection of records, following the same schema, or description of fields.

![](../images/part-dbms/table.png)

### Relational Database

A collection of tables all in the same domain.

![](../images/part-dbms/db.png)

### DBMS

A _Database Management System_ (DBMS, or more exactly RDBMS) is the set of software programs that run your databases.

- e.g. MySQL, SQLite, PostgresSQL, Microsoft SQL Server, etc.

_What's the difference?_

- 'Database' is the organizational model for your content, the DBMS is how that model and content is actually managed by a system.
- DBMS is like _Word_, the database is your document.

### Structure of a relational database

- DBMS
  - ↳ Database
    - ↳ Table
      - ↳ Records or Rows

### Spreadsheet as Database

As per [Launch School](https://launchschool.com/books/sql/read/introduction#spreadsheetdb):

Because relational databases are tabular, we can imagine a single table as akin to a spreadsheet.

![](../images/launchschool-users.png)

When do you put something in a database? When in a table?

### Concurrency

- ability for multiple users to access the same record

### Client-Server architecture

![](../images/client-server-msg.png)

Via [Launch School](https://launchschool.com/books/sql/read/interacting_with_postgresql)

## Intro to SQL

**SQL is intended to read naturally: even if you can't write SQL, you should be able to read it**

```sql
CREATE TABLE cats (breed);
```

**SQL is expressed in _statements_, constructed from _keywords_**

Statement:

```sql
CREATE TABLE cats (breed);
```

keywords: `CREATE TABLE`

**Keywords are capitalized - by convention, not by requirement**

This is preferred:
    
```sql
CREATE TABLE cats (breed);
```

...but this works just fine:
    
```sql
create table cats (breed);
```

**End statements with a semi-colon**

Won't work in most systems:

```sql
CREATE TABLE cats (breed)
```

## SQL in Action: Library Catalog

```sql
-- Creating a books table
CREATE TABLE books (
  id INTEGER PRIMARY KEY,
  title TEXT,
  author TEXT,
  year INTEGER
);

-- Adding some classic titles
INSERT INTO books VALUES 
  (1, 'To Kill a Mockingbird', 'Harper Lee', 1960),
  (2, '1984', 'George Orwell', 1949);

-- Finding books by an author
SELECT title, year FROM books 
  WHERE author = 'George Orwell';
```

Let's take a moment to consider this example before looking at each statement and keyword in turn.

In [2]:
# Ignore this for now: connecting to a transient, in-memory database
%load_ext sql
%sql sqlite://

### CREATE TABLE

In [3]:
%%sql
CREATE TABLE books (title, author, year);

 * sqlite://
Done.


[]

What's going on here?

>"create a table named _cats_ with two fields: _breed_ and _affection_"

What if I run `CREATE TABLE` again?

In [4]:
%%sql
CREATE TABLE books (title, author, year);

 * sqlite://
(sqlite3.OperationalError) table books already exists
[SQL: CREATE TABLE books (title, author, year);]
(Background on this error at: https://sqlalche.me/e/20/e3q8)


**Solution**: IF NOT EXISTS keyword

In [5]:
%%sql
CREATE TABLE IF NOT EXISTS books (title, author, year);

 * sqlite://
Done.


[]

_This doesn't do anything if the table exists, but avoids an error._

(Better solution: don't try to create tables that exist!)

### DROP TABLE

To delete a table, try the following:

In [7]:
%%sql
DROP TABLE books;

 * sqlite://
Done.


[]

Use carefully!

### Creating with data types

In [8]:
%%sql
CREATE TABLE books (title TEXT, author TEXT, year INTEGER);

 * sqlite://
Done.


[]

_How is this different?_

### Some data types

- text
- int - integer (like `int` in Python)
- float - floating point number (like `float` in Python)
- boolean - True/False values

### `INSERT` keyword

Pairs with `INTO` and `VALUES()`

_In plain text, what's happening here? Do the line breaks matter?_

In [9]:
%%sql
INSERT INTO books VALUES 
  ('To Kill a Mockingbird', 'Harper Lee', 1960),
  ('1984', 'George Orwell', 1949);

 * sqlite://
2 rows affected.


[]

"Insert two records into the table `books`, representing the book `title`, `author`, and `year`."

- As in Python, text is quoted, integers are not.

## `SELECT` keyword

Get the data!

In [10]:
%%sql
SELECT title FROM books;

 * sqlite://
Done.


title
To Kill a Mockingbird
1984


In plain text:

> "SELECT the `title` fields of the `books` table"

How might we ask for "both the breed and affection field?" 
How might we ask for "all fields"?

- Separate field names by commas
- Say 'all fields' with `*`

In [11]:
%%sql
SELECT title, author from books;

 * sqlite://
Done.


title,author
To Kill a Mockingbird,Harper Lee
1984,George Orwell


In [12]:
%%sql
SELECT * from books;

 * sqlite://
Done.


title,author,year
To Kill a Mockingbird,Harper Lee,1960
1984,George Orwell,1949


### The `WHERE` clause

How would you select just the books published in 1960?

In [16]:
%%sql
SELECT * FROM books
    WHERE year == 1960;

 * sqlite://
Done.


title,author,year
To Kill a Mockingbird,Harper Lee,1960


**psst**

In SQL, you can say `year == 2` or `year = 2`, but in most programming languages, only `==` is for comparisons. Use `==` to avoid confusion when you switch between Python and SQL

#### Other logical operators

`==` is a _logical operator_, comparing values on the left and on the right. If the comparison is _true_ then the record matches.

What other logical operators might we see?

- not equal to
- less than
- greater than

`<`, `<=`, `>`, `>=`, `!=`

In [17]:
%%sql
SELECT * from books WHERE year <= 1950;

 * sqlite://
Done.


title,author,year
1984,George Orwell,1949


### Unexpected Input

In [None]:
## SQL in Action: Library Catalog

```sql
-- Creating a books table
CREATE TABLE books (
  id INTEGER PRIMARY KEY,
  title TEXT,
  author TEXT,
  year INTEGER
);

-- Adding some classic titles
INSERT INTO books VALUES 
  (1, 'To Kill a Mockingbird', 'Harper Lee', 1960),
  (2, '1984', 'George Orwell', 1949);

-- Finding books by an author
SELECT title, year FROM books 
  WHERE author = 'George Orwell';
```

Let's take a moment to consider this example before looking at each statement and keyword in turn.

_What if you add quotes to the year score?_

In [23]:
%%sql
INSERT INTO books
    VALUES ('On Tyranny: Twenty Lessons from the Twentieth Century','Timothy Snyder', '2017');

 * sqlite://
1 rows affected.


[]

SQLite figured it out, converting the string to an integer.

*Some other DBMSs won't let you do this*

In [24]:
result = %sql SELECT title, year from books
for book, year in result:
    print(year, "is type:", type(year))

 * sqlite://
Done.
1960 is type: <class 'int'>
1949 is type: <class 'int'>
2017 is type: <class 'int'>


_What if you add a quoted non-number to the affection score?_

In [25]:
%%sql
INSERT INTO books
    VALUES ('How Democracies Die','Steven Levitsky and Daniel Ziblatt', 'BadYearValue');

 * sqlite://
1 rows affected.


[]

SQLite lets you do it, even though the value inside is now a *string* rather than *integer*.

*Most other DBMSs won't let you do this*

In [27]:
result = %sql SELECT title, year from books
for book, year in result:
    print(year, "is type:", type(year))

 * sqlite://
Done.
1960 is type: <class 'int'>
1949 is type: <class 'int'>
2017 is type: <class 'int'>
BadYearValue is type: <class 'str'>


_What if you don't put quotes around text?_

In [28]:
%%sql
INSERT INTO books
    VALUES (The Origins of Totalitarianism, Hannah Arendt, 1951);

 * sqlite://
(sqlite3.OperationalError) near "Origins": syntax error
[SQL: INSERT INTO books
    VALUES (The Origins of Totalitarianism, Hannah Arendt, 1951);]
(Background on this error at: https://sqlalche.me/e/20/e3q8)


SQLite says, 'no way'!

## Typing

![](../images/sql-lit.png)

Most databases use *strict* typing: you say what data type your columns are, and the database only allows that type.
    
SQLite does *dynamic* typing: it tries to figure out what you want, _and_ it tries to be accomodating.

Questions

- Why learn types?
- Benefits of strict typing?
- Benefits of dynamic typing?

## Lab

- SQLite
    - Connecting to a simple database, via notebook (without Python) or command line
- SQL
    - `CREATE TABLE`
    - `DROP TABLE`
    - `SELECT`
    - `INSERT`
    - `WHERE` clause
- Python
    - Logical Operators
        - `==`, `!=`, `<`, `<=`, `>`, `>=`
    - `for` loops on arrays
    - `print()`
    - Tab indentation
    - boolean datatype: `True`, `False`
- Colab
    - Auto-complete
    - Documentation lookup