# SQL Working with STRINGs

In [None]:
!wget https://github.com/gt-cse-6040/bootcamp/raw/main/SQL/syllabus/NYC-311-2M_small.db

In [None]:
# create a connection to the database
import sqlite3 as db
import pandas as pd

# Connect to a database (or create one if it doesn't exist)
conn_nyc = db.connect('NYC-311-2M_small.db')

## STRINGs

*   https://www.sqlitetutorial.net/sqlite-functions/sqlite-substr/
*   https://www.sqlitetutorial.net/sqlite-functions/sqlite-instr/

SQLite offers a basic set of string functions, which can make complex string operations challenging. These functions allow you to perform operations like concatenating, trimming, searching, replacing, or changing the case of strings.

Notably, SQLite lacks:

*    Built-in support for regular expressions
*    Advanced string to datetime parsing functionailty

These limitations can force you to implement more creative solutions, often involving combinations of functions like TRIM, SUBSTR, INSTR, and REPLACE to achieve results that would be straightforward in other environments.

Common SQLite string functions:

| Function | Description |
| ---- | ---- |
| LENGTH(str) | Returns the length of a string. |
| LOWER(str) | Converts a string to lowercase. |
| UPPER(str) | Converts a string to uppercase. |
| SUBSTR(str, start, length) | Extracts a substring from a string. |
| TRIM(str) | Removes leading and trailing spaces from a string. |
| LTRIM(str) | Removes leading spaces from a string. |
| RTRIM(str) | Removes trailing spaces from a string. |
| REPLACE(str, old, new) | Replaces occurrences of a substring with a new one. |
| INSTR(str, substr) | Returns the position of the first occurrence of a substring. |
| CONCAT(str1, str2, ...) | Concatenates strings. |
| LIKE | Performs pattern matching in strings. |
| GROUP_CONCAT(str) | Concatenates multiple rows into one string with a delimiter. |

### SUBSTR() in SQLite

In SQLite, the `substr()` function is used to extract a substring from a given string. It is helpful to get a part of a string, starting from a specific position and optionally specifying the length of the substring. It is useful for the following usecases:
*  Extracting parts of text from a database column (e.g., a part of a name or address).
*  Trimming or formatting data during queries.
*  Handling and manipulating text data more efficiently.

The function uses the following syntax and parameters:
```sql
SUBSTR(string, start_position, [length])
```

*  string: The input string from which you want to extract a substring.
*  start_position: The position in the string where the substring starts. The index is 1-based, meaning the first character in the string is at position 1. If the position is negative, it starts counting from the end of the string (i.e., position -1 is the last character).
* length (optional): The number of characters to extract from the string. If not specified, the substring will go from the start position to the end of the string.

#### Some basic examples showcasing usage.

1.  Basic Usage: Extract a substring from a specific position

In [None]:
def substr1_example() ->str:
    query = '''
                SELECT substr('Hello, World!', 1, 5)
                '''
    return query

substr1_example = pd.read_sql(substr1_example(),conn_nyc)
display(substr1_example)

This starts at position 1 (the letter 'H') and extracts 5 characters, giving 'Hello'.

2. Negative start_position: Start counting from the end of the string.

In [None]:
def substr2_example() ->str:
    query = '''
                SELECT substr('Hello, World!', -6, 5)
                '''
    return query

substr2_example = pd.read_sql(substr2_example(),conn_nyc)
display(substr2_example)

This starts at the 6th character from the end, which is 'W', and extracts 5 characters, giving 'World'.

3. Substring until the end: If you don’t specify a length, the substring will go to the end of the string.

In [None]:
def substr3_example() ->str:
    query = '''
                SELECT substr('Hello, World!', 8)
                '''
    return query

substr3_example = pd.read_sql(substr3_example(),conn_nyc)
display(substr3_example)

This starts at position 8 ('W') and continues to the end of the string.

4. Substring with a large length: If the specified length is larger than the remaining characters in the string, SQLite will just return the substring until the end of the string.

In [None]:
def substr4_example() ->str:
    query = '''
                SELECT substr('Hello', 3, 10)
                '''
    return query

substr4_example = pd.read_sql(substr4_example(),conn_nyc)
display(substr4_example)

Since there are only three characters starting from position 3, it will return 'llo'.

### INSTR() in SQLite

In SQLite, the `intsr()` function is used to find the position of the first occurrence of a substring within a string. It returns the 1-based index of the first occurrence of the substring. If the substring is not found, it returns 0. It is useful for the following usecases:
*  Locating substrings and the position of text in a column, which is useful when processing or extracting parts of data.
*  Data validation to check if a certain substring exists in a column.
*  Text matching for pattern matching, such as identifying if a string contains certain keywords of markers.

The function uses the following syntax and parameters:
```sql
INSTR(string, substring)
```

*  string: The input string in which you are looking for the substring.
*  substring: The substring whose position you want to find in the string.

Some basic examples showcasing usage.

1.  Basic Usage: Find the position of a substring

In [None]:
def instr1_example() ->str:
    query = '''
                SELECT instr('Hello, World!', 'World')
                '''
    return query

instr1_example = pd.read_sql(instr1_example(),conn_nyc)
display(instr1_example)

This returns `8` because the substring `World` starts at position 8 in the string `Hello, World!`

2. Substring not found: If a substring is not present, it returns 0

In [None]:
def instr2_example() ->str:
    query = '''
                SELECT instr('Hello, World!', 'SQLite')
                '''
    return query

instr2_example = pd.read_sql(instr2_example(),conn_nyc)
display(instr2_example)

The substring `SQLite` is not found in `Hello, World!`, so it returns `0`.

3. Case-sensitive search: The search is case-sensitive.

In [None]:
def instr3_example() ->str:
    query = '''
                SELECT instr('Hello, World!', 'hello')
                '''
    return query

instr3_example = pd.read_sql(instr3_example(),conn_nyc)
display(instr3_example)

Since `hello` is lowercase and the string `Hello, World!` has an uppercase `H`, the function returns `0`.

### REPLACE() in SQLite

In SQLite, the `replace()` function is used to replace all occurrences of a specified substring within a string with another substring. This function is helpful for text manipulation when you need to make substitutions or clean up data in your database. It is useful for the following usecases:
*  Data cleaning to remove or change certain characters or words in your data. For example, replacing unwanted characters in phone numbers or email addresses.
*  Text formatting including adjusting the formatting of text, such as replacing delimiters (commas with semicolons) or correcting typos in strings.
*  Dynamic text replacements including replacing dynamic content in strings, such as template variables in emails or reports.

The function uses the following syntax and parameters:
```sql
REPLACE(string, search, replace_with)
```

*  string: The input string in which you want to perform the replacement.
*  search: The substring that you want to find in the string.
*  replace_with: The substring that will replace every occurrence of the search substring.

Some basic examples showcasing usage.

1.  Basic usage: Replace a substring with another substring.

In [None]:
def replace1_example() ->str:
    query = '''
                SELECT replace('Hello, World!', 'World', 'SQLite')
                '''
    return query

replace1_example = pd.read_sql(replace1_example(),conn_nyc)
display(replace1_example)

In this example, the substring `World` is replaced by `SQLite`, so the result is `Hello, SQLite!`.

2. Replacing multiple occurrences: If the substring appears more than once, all occurrences will be replaced.

In [None]:
def replace2_example() ->str:
    query = '''
                SELECT replace('abc abc abc', 'abc', '123')
                '''
    return query

replace2_example = pd.read_sql(replace2_example(),conn_nyc)
display(replace2_example)

Every occurrence of `abc` is replaced with `123`, so the result is `123 123 123`.

3. No match found: If the search substring is not found in the string, the original string is returned.

In [None]:
def replace3_example() ->str:
    query = '''
                SELECT replace('Hello, World!', 'abc', 'XYZ')
                '''
    return query

replace3_example = pd.read_sql(replace3_example(),conn_nyc)
display(replace3_example)

Since `abc` is not found, the string is returned as it is.

4. Case-sensitive replacement: The search is case-sensitive.

In [None]:
def replace4_example() ->str:
    query = '''
                SELECT replace('Hello, World!', 'world', 'SQLite')
                '''
    return query

replace4_example = pd.read_sql(replace4_example(),conn_nyc)
display(replace4_example)

### TRIM() in SQLite

In SQLite, the `trim()` function is used to remove specified characters from the beginning and end of a string. By default, it removes spaces (whitespace characters) from both ends of the string, but you can also specify a set of characters to remove. It is useful for the following usecases:
*  Data cleaning to remove unwanted leading or trailing spaces from strings, especially when importing data.
*  Text formatting to clean up user input or other text fields to ensure consistent formatting before performing comparisons or storing the data in a database.
*  Removing unwanted characters by specifying any set of characters to be trimmed, not just spaces. For example, removing unwanted punctuation or padding from strings.

The function uses the following syntax and parameters:
```sql
TRIM(string, [trim_chars])
```

*  string: The string from which you want to remove the characters.
*  trim_chars (optional): A string containing characters to remove from the beginning and end of the string. If not provided, spaces (whitespaces) will be removed by default.

Some basic examples showcasing usage.

1.  Trim spaces (default behavior):

In [None]:
def trim1_example() ->str:
    query = '''
                SELECT trim('   Hello, World!   ')
                '''
    return query

trim1_example = pd.read_sql(trim1_example(),conn_nyc)
display(trim1_example)

In this example, the leading and trailing spaces are removed, leaving 'Hello, World!'.

2. Trim specific characters: Remove characters specified in the trim_chars argument.

In [None]:
def trim2_example() ->str:
    query = '''
                SELECT trim('000Hello, World!000','0')
                '''
    return query

trim2_example = pd.read_sql(trim2_example(),conn_nyc)
display(trim2_example)

Here, the function removes the leading and trailing '0's from the string.

3. No characters to remove: If there are no characters to remove, the original string is returned.

In [None]:
def trim3_example() ->str:
    query = '''
                SELECT trim('Hello, World!','a')
                '''
    return query

trim3_example = pd.read_sql(trim3_example(),conn_nyc)
display(trim3_example)

Since there are no leading or trailing 'a' characters, the string remains unchanged.

### CAST() in SQLite

In SQLite, the `cast()` function is used to convert a value from one data type to another. This is useful when you need to explicitly change the type of a value, such as converting a string to an integer or a floating-point number to a text representation. It is useful for the following usecases:
*  Data conversion to convert data from one type to another (e.g., from text to numeric).
*  Ensuring correct type such as converting a value of a certain type to another type for calculations or comparisons. For example, casting a STRING to INTEGER for numeric operations.
*  Cleaning data with inconsistent types (such as numeric values stored as text).

The function uses the following syntax and parameters:
```sql
CAST(expression AS target_type)
```

*  expression: The value (or column) you want to convert.
*  target_type: The type you want to convert the expression into. This can be one of the following data types:
   *  INTEGER: Used for integer values.
   *  REAL: Used for floating-point values.
   *  TEXT: Used for text (string) values.
   *  BLOB: Used for binary data.

Some basic examples showcasing usage.

1.  Converting a string to an integer:

In [None]:
def cast1_example() ->str:
    query = '''
                SELECT CAST('123' AS INTEGER)
                '''
    return query

cast1_example = pd.read_sql(cast1_example(),conn_nyc)
display(cast1_example)

This converts the string `'123'` into the integer `123`.

2. Converting a string to a real number:

In [None]:
def cast2_example() ->str:
    query = '''
                SELECT CAST('123.45' AS REAL)
                '''
    return query

cast2_example = pd.read_sql(cast2_example(),conn_nyc)
display(cast2_example)

The string `'123.45'` is successfully converted into a floating-point number `123.45`.

3. Converting a number to text:

In [None]:
def cast3_example() ->str:
    query = '''
                SELECT CAST(123 AS TEXT)
                '''
    return query

cast3_example = pd.read_sql(cast3_example(),conn_nyc)
display(cast3_example)

The integer `123` is converted into the string `'123'`.

4. Invalid conversion (returns 0):

In [None]:
def cast4_example() ->str:
    query = '''
                SELECT CAST('abc' AS INTEGER)
                '''
    return query

cast4_example = pd.read_sql(cast4_example(),conn_nyc)
display(cast4_example)

Since `'abc'` cannot be converted to an integer, SQLite returns `0`.

### CONCAT() in SQLite

In SQLite, `CONCAT()` function **IS NOT** directly supported. However you can achieve string concatenation using the `||` operator, which is the standard way to concatenate strings in SQLite. You can concatenate literal strings, columns, or a combination of both.

Syntax:
```sql
string1 || string2 || ... || stringN
```

Use-cases include:
*  Combining data such as creating full names, addresses, or any combined data from multiple columns
*  Formatting output into specific layouts e.g. combining first and last names
*  Dynamic data manipulation allowing you to modify and create dynamic content directly in your queries e.g. generating URLs or constructing dynamic reports.

1. Concatenating Two Strings

In [None]:
def concat1_example() ->str:
    query = '''
                SELECT Agency || ':' || City
                FROM data
                LIMIT 10
                '''
    return query

concat1_example = pd.read_sql(concat1_example(),conn_nyc)
display(concat1_example)

The result shows the `Agency`:`City`.

2. Concatenating Strings with Literals

In [None]:
def concat2_example() ->str:
    query = '''
                SELECT Agency || ':' || City || ',NY'
                FROM data
                LIMIT 10
                '''
    return query

concat2_example = pd.read_sql(concat2_example(),conn_nyc)
display(concat2_example)

The query added the suffix `,NY` to represent the state of the New York.