
# **PostgreSQL Practical Course: From Beginner to Intermediate**

## **Course Overview**
This course will guide you through the fundamentals of PostgreSQL, a powerful open-source relational database management system (RDBMS). By the end of this course, you will be able to:
- Install and set up PostgreSQL.
- Perform basic and advanced database operations.
- Write efficient SQL queries.
- Design and optimize databases.
- Work with real-world datasets.


## **Course Outline**

### **Module 1: Introduction to PostgreSQL**
1. **What is PostgreSQL?**
   - Overview of relational databases.
   - Features of PostgreSQL (ACID compliance, extensibility, etc.).
   - Use cases for PostgreSQL.

2. **Installation and Setup**
   - Installing PostgreSQL on Windows, macOS, and Linux.
   - Setting up `psql` (PostgreSQL command-line tool).
   - Introduction to pgAdmin (graphical tool for PostgreSQL).

3. **Basic Commands**
   - Connecting to a database.
   - Creating and dropping databases.
   - Listing databases and tables.


### **Module 2: Working with Tables and Data**
1. **Creating Tables**
   - Data types in PostgreSQL (e.g., `INTEGER`, `VARCHAR`, `DATE`, `JSONB`).
   - Creating tables with `CREATE TABLE`.
   - Adding constraints (e.g., `PRIMARY KEY`, `FOREIGN KEY`, `UNIQUE`, `NOT NULL`).

2. **Inserting Data**
   - Inserting single and multiple rows.
   - Using `INSERT INTO ... VALUES`.

3. **Querying Data**
   - Basic `SELECT` statements.
   - Filtering data with `WHERE`.
   - Sorting with `ORDER BY`.
   - Limiting results with `LIMIT` and `OFFSET`.

4. **Updating and Deleting Data**
   - Updating rows with `UPDATE`.
   - Deleting rows with `DELETE`.
   - Using `TRUNCATE` to clear a table.


### **Module 3: Advanced SQL Queries**
1. **Joins**
   - `INNER JOIN`, `LEFT JOIN`, `RIGHT JOIN`, `FULL OUTER JOIN`.
   - Practical examples of joining tables.

2. **Aggregations**
   - Using `GROUP BY` and `HAVING`.
   - Aggregation functions (`COUNT`, `SUM`, `AVG`, `MIN`, `MAX`).

3. **Subqueries**
   - Writing nested queries.
   - Using subqueries in `SELECT`, `FROM`, and `WHERE` clauses.

4. **Common Table Expressions (CTEs)**
   - Writing reusable queries with `WITH`.


### **Module 4: Database Design and Optimization**
1. **Normalization**
   - Understanding 1NF, 2NF, and 3NF.
   - Designing normalized tables.

2. **Indexes**
   - Creating and using indexes for faster queries.
   - Types of indexes (e.g., `B-tree`, `GIN`, `GiST`).

3. **Transactions**
   - Understanding ACID properties.
   - Using `BEGIN`, `COMMIT`, and `ROLLBACK`.

4. **Performance Tuning**
   - Analyzing query performance with `EXPLAIN`.
   - Optimizing slow queries.

# **PostgreSQL Practical Course: From Beginner to Intermediate**

## **Course Overview**
This course will guide you through the fundamentals of PostgreSQL, a powerful open-source relational database management system (RDBMS). By the end of this course, you will be able to:
- Install and set up PostgreSQL.
- Perform basic and advanced database operations.
- Write efficient SQL queries.
- Design and optimize databases.
- Work with real-world datasets.


## **Course Outline**

### **Module 1: Working with Tables and Data**

#### **1.1 Introduction to the Dataset**
- Overview of the Target Brazil dataset.
- Understanding the 8 CSV files:
  - `customers.csv`: Customer details.
  - `sellers.csv`: Seller details.
  - `order_items.csv`: Items in each order.
  - `geolocation.csv`: Geolocation data.
  - `payments.csv`: Payment details.
  - `orders.csv`: Order details.
  - `products.csv`: Product details.
  - `reviews.csv`: Customer reviews.

#### **1.2 Setting Up the Database**
- Creating a new database in PostgreSQL.
- Importing CSV files into PostgreSQL using `pgAdmin`:
  - Using the `Import/Export` tool.
  - Writing `COPY` commands to load data.

#### **1.3 Creating Tables**
- Designing tables based on the dataset.
- Writing `CREATE TABLE` statements with appropriate data types and constraints.
  - Example: Creating the `customers` table.
    ``` sql
    CREATE TABLE customers (
        customer_id VARCHAR PRIMARY KEY,
        customer_unique_id VARCHAR,
        customer_zip_code_prefix VARCHAR,
        customer_city VARCHAR,
        customer_state VARCHAR
    );
    ```

#### **1.4 Inserting and Updating Data**
- Inserting data into tables.
- Updating records (e.g., correcting customer details).
- Deleting records (e.g., removing test data).

#### **1.5 Basic Queries**
- Retrieving data with `SELECT`.
- Filtering data with `WHERE`.
- Sorting data with `ORDER BY`.
- Limiting results with `LIMIT`.

### **Module 2: Advanced SQL Queries (Very Important)**

#### **2.1 Joins**
- Understanding relationships between tables.
- Writing `INNER JOIN`, `LEFT JOIN`, and `RIGHT JOIN` queries.
  - Example: Joining `orders` and `customers` to find customer details for each order.
    ```sql
    SELECT o.order_id, c.customer_id, c.customer_city
    FROM orders o
    INNER JOIN customers c ON o.customer_id = c.customer_id;
    ```

#### **2.2 Aggregations**
- Using `GROUP BY` and `HAVING`.
- Aggregation functions (`COUNT`, `SUM`, `AVG`, `MIN`, `MAX`).
  - Example: Calculating total revenue by seller.
    ```sql
    SELECT seller_id, SUM(price) AS total_revenue
    FROM order_items
    GROUP BY seller_id;
    ```

#### **2.3 Subqueries**
- Writing nested queries.
- Using subqueries in `SELECT`, `FROM`, and `WHERE`.
  - Example: Finding customers who placed more than 5 orders.
    ```sql
    SELECT customer_id
    FROM (
        SELECT customer_id, COUNT(order_id) AS order_count
        FROM orders
        GROUP BY customer_id
    ) AS customer_orders
    WHERE order_count > 5;
    ```

#### **2.4 Common Table Expressions (CTEs)**
- Writing reusable queries with `WITH`.
  - Example: Calculating average order value by customer.
    ```sql
    WITH customer_order_totals AS (
        SELECT customer_id, SUM(price) AS total_spent
        FROM orders o
        JOIN order_items oi ON o.order_id = oi.order_id
        GROUP BY customer_id
    )
    SELECT AVG(total_spent) AS avg_order_value
    FROM customer_order_totals;
    ```

### **Module 3: Database Design and Optimization **

#### **3.1 Normalization**
- Understanding 1NF, 2NF, and 3NF.
- Designing normalized tables for the Target dataset.

#### **3.2 Indexes**
- Creating indexes for faster queries.
  - Example: Creating an index on `customer_id` in the `orders` table.
    ```sql
    CREATE INDEX idx_customer_id ON orders (customer_id);
    ```

#### **3.3 Transactions**
- Using `BEGIN`, `COMMIT`, and `ROLLBACK`.
- Ensuring data integrity with transactions.

#### **3.4 Performance Tuning**
- Analyzing query performance with `EXPLAIN`.
- Optimizing slow queries.

### **Module 4: Advanced Features (Little Important)**

#### **4.1 JSON Data**
- Storing and querying JSON data with `JSONB`.
  - Example: Storing customer preferences in a `JSONB` column.

#### **4.2 Stored Procedures and Functions**
- Writing PL/pgSQL functions.
  - Example: Creating a function to calculate total revenue for a seller.

#### **4.3 Triggers**
- Automating tasks with triggers.
  - Example: Updating a `last_updated` timestamp automatically.

# **Module 1: Working with Tables and Data**

## **1.1 Introduction to the Dataset**
```markdown

In this module, we will work with the Target Brazil dataset, which consists of 8 CSV files. Each file contains specific information about customers, sellers, orders, products, payments, and geolocations. Below is a brief overview of the dataset:

- **customers.csv**: Contains customer details such as `customer_id`, `customer_unique_id`, `customer_zip_code_prefix`, `customer_city`, and `customer_state`.
- **sellers.csv**: Contains seller details such as `seller_id`, `seller_zip_code_prefix`, `seller_city`, and `seller_state`.
- **order_items.csv**: Contains details about items in each order, such as `order_id`, `order_item_id`, `product_id`, `seller_id`, `shipping_limit_date`, `price`, and `freight_value`.
- **geolocation.csv**: Contains geolocation data such as `geolocation_zip_code_prefix`, `geolocation_lat`, `geolocation_lng`, `geolocation_city`, and `geolocation_state`.
- **payments.csv**: Contains payment details such as `order_id`, `payment_sequential`, `payment_type`, `payment_installments`, and `payment_value`.
- **orders.csv**: Contains order details such as `order_id`, `customer_id`, `order_status`, `order_purchase_timestamp`, `order_delivered_carrier_date`, `order_delivered_customer_date`, and `order_estimated_delivery_date`.
- **products.csv**: Contains product details such as `product_id`, `product_category_name`, `product_name_lenght`, `product_description_length`, `product_photos_qty`, `product_weight_g`, `product_length_cm`, `product_height_cm`, and `product_width_cm`.
- **reviews.csv**: Contains customer reviews for products.
```


## **1.2 Creating Tables**
```markdown
Before importing the data, we need to create tables in PostgreSQL. Below are the `CREATE TABLE` commands for each CSV file:

#### 1. **Customers Table**
```sql
CREATE TABLE customers (
    customer_id VARCHAR PRIMARY KEY,
    customer_unique_id VARCHAR,
    customer_zip_code_prefix VARCHAR,
    customer_city VARCHAR,
    customer_state VARCHAR
);
```

#### 2. **Sellers Table**
```sql
CREATE TABLE sellers (
    seller_id VARCHAR PRIMARY KEY,
    seller_zip_code_prefix VARCHAR,
    seller_city VARCHAR,
    seller_state VARCHAR
);
```

#### 3. **Order Items Table**
```sql
CREATE TABLE order_items (
    order_id VARCHAR,
    order_item_id INT,
    product_id VARCHAR,
    seller_id VARCHAR,
    shipping_limit_date TIMESTAMP,
    price NUMERIC,
    freight_value NUMERIC
);
```

#### 4. **Geolocation Table**
```sql
CREATE TABLE geolocation (
    geolocation_zip_code_prefix VARCHAR,
    geolocation_lat NUMERIC,
    geolocation_lng NUMERIC,
    geolocation_city VARCHAR,
    geolocation_state VARCHAR
);
```

#### 5. **Payments Table**
```sql
CREATE TABLE payments (
    order_id VARCHAR,
    payment_sequential INT,
    payment_type VARCHAR,
    payment_installments INT,
    payment_value NUMERIC
);
```

#### 6. **Orders Table**
```sql
CREATE TABLE orders (
    order_id VARCHAR PRIMARY KEY,
    customer_id VARCHAR,
    order_status VARCHAR,
    order_purchase_timestamp TIMESTAMP,
    order_delivered_carrier_date TIMESTAMP,
    order_delivered_customer_date TIMESTAMP,
    order_estimated_delivery_date TIMESTAMP
);
```

#### 7. **Products Table**
```sql
CREATE TABLE products (
    product_id VARCHAR PRIMARY KEY,
    product_category_name VARCHAR,
    product_name_lenght INT,
    product_description_length INT,
    product_photos_qty INT,
    product_weight_g INT,
    product_length_cm INT,
    product_height_cm INT,
    product_width_cm INT
);
```

#### 8. **Reviews Table**
```sql
CREATE TABLE reviews (
    review_id VARCHAR PRIMARY KEY,
    order_id VARCHAR,
    review_score INT,
    review_comment_title VARCHAR,
    review_comment_message TEXT,
    review_creation_date TIMESTAMP,
    review_answer_timestamp TIMESTAMP
);
```
```

## **1.3 Importing CSV Files**



## **1.4 PostgreSQL Query Questions**

Now that the data is loaded, let’s write 20 PostgreSQL queries to analyze the dataset. Below are the questions and their solutions:

**1. Retrieve all customers from the state of São Paulo (SP).**

**2. Count the total number of orders.**

**3. Find the top 5 cities with the most customers.**

**4. Retrieve all orders with the status "delivered".**

**5. Calculate the total revenue generated from all orders.**

**6. Find the average price of products.**

**7. Retrieve the top 10 most expensive products.**

**8. Count the number of unique product categories.**

**9. Find the total number of orders placed by each customer.**

**10. Retrieve all orders placed in 2017.**

**11. Find the top 5 sellers with the highest total sales.**

**12. Calculate the average freight value for all orders.**

**13. Retrieve all customers who have not placed any orders.**

**14. Find the total number of payments made via credit card.**

**15. Retrieve the product with the highest weight.**

**16. Find the average review score for all products.**

**17. Retrieve all orders that were delivered late.**

**18. Count the number of orders with more than one payment installment.**

**19. Find the top 3 product categories with the most orders.**

**20. Retrieve the total revenue generated by each payment type.**

# **Module 2: Advanced SQL Queries (Very Important)**


## **1. Joins**
Joins are used to combine rows from two or more tables based on a related column.

### **1.1 `INNER JOIN`**
- **Definition**: Returns only the rows that have matching values in both tables.
- **When to Use**: When you need to retrieve records that have matching values in both tables.
- **Example**:
  ```sql
  SELECT c.customer_id, o.order_id
  FROM customers c
  INNER JOIN orders o ON c.customer_id = o.customer_id;
  ```

### **1.2 `LEFT JOIN`**
- **Definition**: Returns all rows from the left table and the matched rows from the right table. If no match is found, `NULL` values are returned for columns from the right table.
- **When to Use**: When you want to include all records from the left table, even if there are no matches in the right table.
- **Example**:
  ```sql
  SELECT c.customer_id, o.order_id
  FROM customers c
  LEFT JOIN orders o ON c.customer_id = o.customer_id;
  ```

### **1.3 `RIGHT JOIN`**
- **Definition**: Returns all rows from the right table and the matched rows from the left table. If no match is found, `NULL` values are returned for columns from the left table.
- **When to Use**: When you want to include all records from the right table, even if there are no matches in the left table.
- **Example**:
  ```sql
  SELECT o.order_id, p.product_id
  FROM orders o
  RIGHT JOIN payments p ON o.order_id = p.order_id;
  ```

### **1.4 `FULL OUTER JOIN`**
- **Definition**: Returns all rows when there is a match in either the left or right table. If no match is found, `NULL` values are returned for columns from the table without a match.
- **When to Use**: When you want to include all records from both tables, regardless of whether there is a match.
- **Example**:
  ```sql
  SELECT c.customer_id, o.order_id
  FROM customers c
  FULL OUTER JOIN orders o ON c.customer_id = o.customer_id;
  ```


## **2. Aggregations**
Aggregation functions perform calculations on a set of values and return a single value.

### **2.1 `COUNT()`**
- **Definition**: Counts the number of rows that match a specified condition.
- **When to Use**: When you want to count the number of records in a table or group.
- **Example**:
  ```sql
  SELECT COUNT(*) AS total_orders FROM orders;
  ```

### **2.2 `SUM()`**
- **Definition**: Calculates the sum of a numeric column.
- **When to Use**: When you want to calculate the total value of a numeric column.
- **Example**:
  ```sql
  SELECT SUM(payment_value) AS total_revenue FROM payments;
  ```

### **2.3 `AVG()`**
- **Definition**: Calculates the average value of a numeric column.
- **When to Use**: When you want to find the average value of a numeric column.
- **Example**:
  ```sql
  SELECT AVG(price) AS avg_price FROM order_items;
  ```

### **2.4 `MIN()`**
- **Definition**: Finds the minimum value in a column.
- **When to Use**: When you want to find the smallest value in a column.
- **Example**:
  ```sql
  SELECT MIN(price) AS min_price FROM order_items;
  ```

### **2.5 `MAX()`**
- **Definition**: Finds the maximum value in a column.
- **When to Use**: When you want to find the largest value in a column.
- **Example**:
  ```sql
  SELECT MAX(price) AS max_price FROM order_items;
  ```

### **2.6 `GROUP BY`**
- **Definition**: Groups rows that have the same values into summary rows.
- **When to Use**: When you want to aggregate data based on one or more columns.
- **Example**:
  ```sql
  SELECT customer_id, COUNT(*) AS order_count
  FROM orders
  GROUP BY customer_id;
  ```

### **2.7 `HAVING`**
- **Definition**: Filters groups based on a condition.
- **When to Use**: When you want to filter aggregated data.
- **Example**:
  ```sql
  SELECT customer_id, COUNT(*) AS order_count
  FROM orders
  GROUP BY customer_id
  HAVING COUNT(*) > 5;
  ```



## **3. Subqueries**
Subqueries are queries nested inside another query.

### **3.1 Subquery in `SELECT`**
- **Definition**: A subquery that returns a single value and is used in the `SELECT` clause.
- **When to Use**: When you want to include a calculated value in the result set.
- **Example**:
  ```sql
  SELECT order_id, (SELECT AVG(payment_value) FROM payments) AS avg_payment
  FROM orders;
  ```

### **3.2 Subquery in `WHERE`**
- **Definition**: A subquery that returns a value used in the `WHERE` clause.
- **When to Use**: When you want to filter records based on a condition derived from another query.
- **Example**:
  ```sql
  SELECT * FROM orders
  WHERE customer_id IN (SELECT customer_id FROM customers WHERE customer_state = 'SP');
  ```



## **4. Common Table Expressions (CTEs)**
CTEs are temporary result sets that can be referenced within a query.

### **4.1 `WITH` Clause**
- **Definition**: Defines a CTE that can be used in the main query.
- **When to Use**: When you want to simplify complex queries by breaking them into smaller, reusable parts.
- **Example**:
  ```sql
  WITH customer_spending AS (
      SELECT customer_id, SUM(payment_value) AS total_spent
      FROM payments
      GROUP BY customer_id
  )
  SELECT * FROM customer_spending;
  ```



## **5. Advanced Functions**

### **5.1 `EXTRACT()`**
- **Definition**: Extracts a part of a date (e.g., year, month, day).
- **When to Use**: When you want to extract specific parts of a date.
- **Example**:
  ```sql
  SELECT EXTRACT(YEAR FROM order_purchase_timestamp) AS order_year
  FROM orders;
  ```

### **5.2 `CASE` Statement**
- **Definition**: Performs conditional logic in SQL.
- **When to Use**: When you want to perform conditional calculations or transformations.
- **Example**:
  ```sql
  SELECT order_id,
         CASE WHEN payment_value > 100 THEN 'High' ELSE 'Low' END AS payment_category
  FROM payments;
  ```



## **6. Window Functions**
Window functions perform calculations across a set of table rows related to the current row.

### **6.1 `RANK()`**
- **Definition**: Assigns a rank to each row within a partition of a result set.
- **When to Use**: When you want to rank rows based on a specific column.
- **Example**:
  ```sql
  SELECT seller_id, total_revenue,
         RANK() OVER (ORDER BY total_revenue DESC) AS rank
  FROM seller_revenue;
  ```

### **6.2 `SUM() OVER()`**
- **Definition**: Calculates a cumulative sum over a set of rows.
- **When to Use**: When you want to calculate running totals or cumulative sums.
- **Example**:
  ```sql
  SELECT order_date, revenue,
         SUM(revenue) OVER (ORDER BY order_date) AS cumulative_revenue
  FROM daily_revenue;
  ```

### **6.3 `AVG() OVER()`**
- **Definition**: Calculates a moving average over a set of rows.
- **When to Use**: When you want to calculate a rolling average.
- **Example**:
  ```sql
  SELECT order_date, revenue,
         AVG(revenue) OVER (ORDER BY order_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS moving_avg_revenue
  FROM daily_revenue;
  ```


## **2.1 Joins**


#### **1. Retrieve customer details along with their order details.**

#### **2. Find all orders with their corresponding payment details.**


#### **3. Retrieve product details for each order item.**


#### **4. Find the seller details for each order item.**


#### **5. Retrieve geolocation details for customers and sellers.**


## **2.2 Aggregations**


#### **6. Calculate the total revenue generated by each seller.**


#### **7. Find the average order value for each customer.**


#### **8. Count the number of orders placed in each city.**


#### **9. Find the total freight value for each order.**


#### **10. Calculate the total number of products sold in each category.**




## **2.3 Subqueries**

#### **11. Find customers who have placed more than 5 orders.**


#### **12. Retrieve products that have never been ordered.**


#### **13. Find the top 3 customers with the highest total spending.**


#### **14. Retrieve orders with a total payment value greater than the average payment value.**


#### **15. Find sellers who have sold products in more than one category.**



## **2.4 Common Table Expressions (CTEs)**


#### **16. Calculate the total revenue for each product category.**


#### **17. Find the average review score for each seller.**


#### **18. Retrieve the top 5 customers with the highest total spending using CTEs.**



## **2.5 Advanced Queries**

#### **19. Find the percentage of orders delivered late.**


#### **20. Retrieve the top 3 product categories with the highest average review score.**


#### **21. Find the cumulative revenue generated over time.**


#### **22. Retrieve the top 5 customers with the highest lifetime value (LTV).**




## **2.6 Window Functions**


#### **23. Rank sellers by total revenue within each state.**


#### **24. Calculate the moving average of daily revenue.**


#### **25. Find the top 3 most expensive products in each category.**
