<a target="_blank" href="https://colab.research.google.com/github/lukebarousse/Int_SQL_Data_Analytics_Course/blob/main/6_Data_Cleaning/2_Strings.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# String Formatting

## Overview

### 🥅 Analysis Goals

- What we’re going to use for this dataset to do X e.g. Use the following in order to explore a dataset on experience and salaries
    - Major topic 1
    - Major topic 2
    - Major topic 3
- The end goal of this is e.g. Identify which jobs meet our expectations of years experience and total salary.

### 📘 Concepts Covered

General concepts we’re going to cover

- Concept 1
- Concept 2
- Concept 3

---
## CONCAT

### 📝 Notes

**`CONCAT`**

- **CONCAT**: Combines two or more strings into a single string.

- Syntax:

  ```sql
  SELECT CONCAT(string1, string2, ...);
  ```

- Automatically handles `NULL` values as empty strings, avoiding `NULL` results when concatenating.

### 💻 Final Result

- Describe what the final result should be e.g. return the retention by X cohort.

#### Problem Description

**`FUNCTION` / Concept Covered**

1. Go into specific step / what we’re going to do. E.g. Use the `=` operator to set a new column to be equal to Experience

In [None]:
SELECT
    customerkey,
    CONCAT(givenname, ' ', surname) AS cleaned_name
FROM customers;

<img src="../Resources/query_results/6_string_formatting_1.png" alt="Query Results 1" style="width: 50%; height: auto;">

#### Problem Description

**`FUNCTION` / Concept Covered**

1. Using the final query from `Handling_Nulls` clean up the customer name and combine the `givenname` and `surname`.

In [None]:
WITH sales_data AS (
        SELECT
            customerkey,
            EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year, 
            SUM(quantity * netprice * exchangerate) AS net_revenue,
            COUNT(orderkey) AS num_orders
        FROM sales
        GROUP BY
            customerkey
)

SELECT
    c.customerkey,
    s.cohort_year,
    CONCAT(c.givenname, ' ', c.surname) AS cleaned_name, -- Added
    COALESCE(s.net_revenue, 0) AS net_revenue,
    COALESCE(s.num_orders, 0) AS total_orders,
    s.net_revenue / NULLIF(s.num_orders, 0) AS avg_order_value  
FROM customer c
LEFT JOIN sales_data s ON c.customerkey = s.customerkey;

<img src="../Resources/query_results/6_string_formatting_2.png" alt="Query Results 2" style="width: 80%; height: auto;">

---
## UPPER

### 📝 Notes

**`UPPER()`**

- **UPPER**: Converts a string to uppercase.

- Syntax:

  ```sql
  SELECT UPPER(string_column);
  ```

- Useful for standardizing text, such as converting names or codes to uppercase for comparison.

### 💻 Final Result

- Describe what the final result should be e.g. return the retention by X cohort.

#### Problem Description

**`FUNCTION` / Concept Covered**

1. Using `UPPER` uppercase the `givenname`, `surnname` and the `full_name` (which was created using `CONCAT`).

In [None]:
SELECT
    customerkey,
    UPPER(givenname) AS uppercase_givenname,
    UPPER(surname) AS uppercase_surname,
    CONCAT(UPPER(givenname), ' ', UPPER(surname)) AS uppercase_full_name    
FROM customers;

<img src="../Resources/query_results/6_string_formatting_3.png" alt="Query Results 3" style="width: 80%; height: auto;">

#### Problem Description

**`FUNCTION` / Concept Covered**

1. Using query from `CONCAT` clean up the customer name and to make the combined customer name both upper case. 

In [None]:
WITH sales_data AS (
        SELECT
            customerkey,
            EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year, 
            SUM(quantity * netprice * exchangerate) AS net_revenue,
            COUNT(orderkey) AS num_orders
        FROM sales
        GROUP BY
            customerkey
)

SELECT
    c.customerkey,
    s.cohort_year,
    CONCAT(UPPER(c.givenname), ' ', UPPER(c.surname)) AS cleaned_name, --Added
    COALESCE(s.net_revenue, 0) AS net_revenue,
    COALESCE(s.num_orders, 0) AS total_orders,
    s.net_revenue / NULLIF(s.num_orders, 0) AS avg_order_value  
FROM customer c
LEFT JOIN sales_data s ON c.customerkey = s.customerkey;

<img src="../Resources/query_results/6_string_formatting_4.png" alt="Query Results 1" style="width: 80%; height: auto;">

---
## LOWER

### 📝 Notes

**`LOWER()`**

- **LOWER**: Converts a string to lowercase.  
- Syntax:  
  ```sql
  SELECT LOWER(string_column);
  ```
- Useful for standardizing text, such as making email addresses or usernames case-insensitive for comparisons.

### 💻 Final Result

- Describe what the final result should be e.g. return the retention by X cohort.



#### Problem Description

**`FUNCTION` / Concept Covered**

1. Using `LOWER` lowercase the `givenname`, `surnname` and the `full_name` (which was created using `CONCAT`).

In [None]:
SELECT
    customerkey,
    LOWER(givenname) AS lowercase_givenname,
    LOWER(surname) AS lowercase_surname,
    CONCAT(LOWER(givenname), ' ', LOWER(surname)) AS lowercase_full_name    
FROM customer;

<img src="../Resources/query_results/6_string_formatting_5.png" alt="Query Results 5" style="width: 80%; height: auto;">

#### Problem Description

**`FUNCTION` / Concept Covered**

1. Using query from `UPPER` clean up the customer name and to make the combined customer name both lower case. 

In [None]:
WITH sales_data AS (
        SELECT
            customerkey,
            EXTRACT(YEAR FROM MIN(orderdate)) AS cohort_year,
            SUM(quantity * netprice * exchangerate) AS net_revenue,
            COUNT(orderkey) AS num_orders
        FROM sales
        GROUP BY
            customerkey
)

SELECT
    c.customerkey,
    s.cohort_year,
    CONCAT(LOWER(c.givenname), ' ', LOWER(c.surname)) AS cleaned_name, -- Added
    COALESCE(s.net_revenue, 0) AS net_revenue,
    COALESCE(s.num_orders, 0) AS total_orders,
    s.net_revenue / NULLIF(s.num_orders, 0) AS avg_order_value
FROM customer c
LEFT JOIN sales_data s ON c.customerkey = s.customerkey;

<img src="../Resources/query_results/6_string_formatting_6.png" alt="Query Results 6" style="width: 80%; height: auto;">

---
## TRIM

### 📝 Notes

**`TRIM()`**

- **TRIM**: Removes leading and/or trailing spaces (or specified characters) from a string.  
- Syntax:  
  ```sql
  SELECT TRIM([BOTH | LEADING | TRAILING] 'characters' FROM string_column);
  ```
- Default behavior removes spaces from both ends of a string.  
- Useful for cleaning up user input, formatting text, or ensuring consistent comparisons.  

### 💻 Final Result

- Describe what the final result should be e.g. return the retention by X cohort.

#### Problem Description

**`FUNCTION` / Concept Covered**

1. Go into specific step / what we’re going to do. E.g. Use the `=` operator to set a new column to be equal to Experience

In [None]:
SELECT
    customerkey,
    TRIM(givenname) AS trimmed_givenname,
    TRIM(surname) AS trimmed_surname,
    CONCAT(TRIM(givenname), ' ', TRIM(surname)) AS trimmed_full_name  
FROM customer;