# DATABASES AND SQL FOR DATA SCIENCE WITH PYTHON

## Table of Contents
1. [Module 1: Getting Started with SQL](#module-1)
2. [Module 2: Introduction to Relational Databases and Tables](#module-2)
3. [Module 3: ](#module-3)
4. [Module 4: ](#module-4)
5. [Module 5: ](#module-5)
6. [Module 6: ](#module-6)

# Module 1: Getting Started with SQL <a name="module-1"></a>

## Basic SQL

### Introduction to Databases

**SQL (Structured Query Language)** is a language used to interact with relational databases. It allows you to perform tasks like querying data, adding data, modifying data, and defining database structures.

**Data** is a collection of facts, which can be in the form of words, numbers, or even pictures. Data is a valuable asset for businesses and organizations.

**Database** is a program that stores data and provides functionalities for adding, modifying, and querying that data. Databases help keep data secure, organized, and easily accessible.

**Relational database** is a type of database that organizes data into tables with rows and columns, similar to a spreadsheet. Tables can be related to each other based on common fields.

**RDBMS (Relational Database Management System)** is a set of software tools that control the data in a relational database, including access, organization, and storage. They are used in various industries like banking, transportation, and healthcare. Examples of RDBMS include MySQL, Oracle Database, DB2 Warehouse, and DB2 on Cloud.

##### 5 Basic SQL Commands:  
* `CREATE TABLE`: Creates a new table in the database.
* `INSERT`: Adds data into a table.
* `SELECT`: Retrieves data from a table.
* `UPDATE`: Modifies existing data in a table.
* `DELETE`: Removes data from a table.

### `SELECT` Statement

`SELECT` statement is the primary command in SQL. It's used to retrieve data from a database table. It's a Database Manipulation Language (DML) statement that allows you to specify which columns and rows you want to retrieve.

A Database Management System (DBMS) not only stores data but also provides tools for retrieving and manipulating that data. The `SELECT` statement is a key tool for data retrieval.

A `SELECT` statement is often called a **query**, and the output it produces is called a **result set** or **result table**.

##### Basic Syntax
* `SELECT * FROM table_name`: Retrieves all columns and all rows from the specified table.
* `SELECT column1, column2 FROM table_name`: Retrieves only the specified columns from the table.

**Comparison operators** used in predicates to compare values:
* `=`: Equal to
* `>`: Greater than
* `<`: Less than
* `>=`: Greater than or equal to
* `<=`: Less than or equal to
* `<>` or `!=`: Not equal to

**Example:** `SELECT title FROM book WHERE book_id = 'B1`: Retrieves the title of the book with the ID 'B1'

### `COUNT`, `DISTINCT`, and `LIMIT` Functions

`COUNT()` is a function that retrieves the number of rows that match the specified criteria.
* `COUNT(*)` counts all rows in a table. For example, `SELECT COUNT(*) FROM employees` counts all rows in the "employees" table.
* `COUNT(column_name)` counts the number of non-null values in a specific column. For example, `SELECT COUNT(city) from customers` counts the number of non-values in the "city" column of the "customers" table.

`DISTINCT` is used to remove duplicate values from a result set, returning only unique values.  For example, `SELECT DISTINCT country FROM customers` retrieves a list of unique countries from the "country" column of the "customers" table.

`LIMIT` restricts the number of rows returned by a query, which is useful for previewing data or working with large datasets. For example, `SELECT * FROM products LIMIT 10` retrieves the first 10 rows from the "products" table. If we use `SELECT * FROM FilmLocations LIMIT 15 OFFSET 10;` the results start from row 11, leaving the first 10 row aside.

### `INSERT` Statement

`INSERT` statement is used to add new rows to a table in a relational database. The syntax is:

`INSERT INTO table_name (column_name1, column_name2, ...) VALUES (value1, value2, ...)`

We can also add multiple rows at a time:

`INSERT INTO table_name (column_name1, column_name2, ...) VALUES (value1_for_row1, value2_for_row1, ...), (value1_for_row2, value2_for_row2, ...)`

#### Important Considerations:
* The number of values in the `VALUES` clause must match the number of column names specified.
* If the column names are omitted, the values must be in the same order as the columns defined in the table.
* If a column is not specified in the `column_name` list, its value will be set to NULL (missing value)

### `UPDATE` and `DELETE` Statements

**`UPDATE` statement** is used to modify existing data in a table. Syntax is:
`UPDATE table_name SET column_name1 = value 1, column_name2 = value2, ... WHERE condition;`

* `table_name` is the name of the table you want to update
* `SET` specifies the columns to be updated and their new values
* `WHERE` specifies which rows to update. If omitted, all rows will be updated.

Example:  
`UPDATE authors SET last_name='Katta', first_name='Lakshmi' WHERE author_id='A2';` updates the last and first names of the author with the ID 'A2'.

**`DELETE` statement** is used to remove rows from a table. Syntax is:
`DELETE FROM table_name WHERE condition`

Example: `DELETE FROM authors WHERE author_id IN ('A2', 'A3')` deletes rows with author IDs 'A2' and 'A3'

# Module 2: Introduction to Relational Databases and Tables <a name="module-2"></a>

### Relational Database Concepts

**Relational Model and Data Independence:** The relational model is widely used for databases because it provides data independence. This means that the way data is stored and accessed is independent of the way it is used by applications. This offers flexibility and easier maintenance.

**Entity-Relationship (ER) Model:** An alternative to the relational model, the ER model is often used as a tool to design relational databases. It represents data as entities (objects) and their relationships.

**Entities and Attributes:**  
* **Entities:** Represent real-world objects or concepts (e.g., book, author, borrower). In an ER diagram, they are represented as triangles.
* **Attributes:** Characteristics or properties of an entity (e.g., book title, author's name, borrower's address). In an ER diagram, they are represented as ovals.

**Mapping Entities and Attributes to Tables and Columns:**  
In a relational database, entities become tables, and attributes become columns within those tables.

**Data Types:**  
Each column has a specific data type that defines the kind of value it can store. Common data types include:
* Characters (`CHAR`, `VARCHAR`) for storing text.
* Numbers (`INTEGER`, `DECIMAL`) for storing numerical values.
* Timestamps (`DATE`, `TIME`) for storing dates and times.

**Primary Key** is a column or set of columns that uniquely identifies each row in a table. It prevents duplicate data and is crucial for establishing relationships between databases.

**Foreign Key** is a column that refers to the primary key of another table, creating a link between the two tables. 

### Types of SQL Statements (DDL vs. DML)

**SQL statements**, commands used to interact with databases, are categorized into two types:

1. **Data Definition Language (DDL) Statements:** Used to define, change, or drop database objects like tables:
* `CREATE`: Creates tables and defines their columns.
* `ALTER`: Modifies existing tables (adding, dropping, or modifying columns).
* `TRUNCATE`: Deletes all data from a table but keeps the data structure. 
* `DROP`: Deletes tables.

2. **Data Manipulation Language (DML) Statements:** Used to read and modify data within tables. Also known as CRUD (Create, Read, Update, Delete) Operations.
* `INSERT`: Adds new rows of data to a table.
* `SELECT`: Reads or retrieves data from a table.
* `UPDATE`: Modifies existing data in a table.
* `DELETE`: Removes rows of data from a table.

### `CREATE TABLE` Statement

`CREATE TABLE` is a DDL (Data Definition Language) statement used to create tables in a relational database. The syntax is:

In [None]:
CREATE TABLE table_name (
    column_name1 datatype constraints,
    column_name2 datatype constraints,
    ...
);

Example:

In [None]:
CREATE TABLE author (
    author_id CHAR(2) PRIMARY KEY NOT NULL,
    lastname VARCHAR(15) NOT NULL,
    firstname VARCHAR(15) NOT NULL,
    email VARCHAR(40),
    city VARCHAR(15),
    country CHAR(2)
);

### `ALTER`, `DROP` and `TRUNCATE` Tables

**`ALTER TABLE` Statement** is used to modify the structure of an existing table.

* `ADD COLUMN` adds a new column to the table.
* `MODIFY COLUMN` changes the data type or constraints of an existing column
* `DROP COLUMN` removes a column from the table.

Syntax:  
`ALTER TABLE table_name`  
`ADD COLUMN column_name datatype constraints,`  
`MODIFY COLUMN column_name datatype constraints,`    
`DROP COLUMN column_name,`  
`...;`  

**`DROP TABLE` Statement** is used to delete an entire table from the database. 

Syntax:  

`DROP TABLE table_name`

**`TRUNCATE TABLE` Statement** is used to delete all rows from a table while keeping the table structure intact. 

Syntax:  

`TRUNCATE TABLE table_name IMMEDIATE;`