# String

Strings are sequences of digits, characters, and spaces denoted by <code>str</code>. A string can be written using either single or double quotation marks.

Elements in a string can be accessed using indexes (positive or negative).

## String Operations

### Changing Case
Use <code>.title</code>, <code>.upper()</code>, and <code>.lower</code> to change a string into title, upper, or lower case.

In [1]:
name = "ada lovelace"
print(name.title())
print(name.upper())
print(name.lower())

Ada Lovelace
ADA LOVELACE
ada lovelace


#### Slicing a string
Strings can be sliced using <code>string_name[start:end]</code> in which <code>start</code> (inclusive) and <code>end</code> (exclusive) indicate the index positions in the sliced string.

In [2]:
name = "Michael Jackson"
name[0:4]

'Mich'

We can also input a stide value using <code>string_name[start:end:step]</code> to select characters at every *nth* position.

In [3]:
name[::2]

'McalJcsn'

#### Manipulating a string
Strings are sequences and, therefore, can apply functions that work on lists and tuples. Strings also have snother set of functions that just work on strings.

Use <code>.upper()</code> and <code>.lower()</code> to change the characters in a string into upper or lower cases.

Use <code>.replace(old, new, count)</code> to replace a segaement if a string. The <code>count</code> parameter indicated the number of times the segament will be replace. If not specified, all occurrences will be replaced. If the substring to replace is unfound, the original string is returned unchanged.

Use <code>.find(substring)</code> to find a substring. The output is the first index of the substring in the original string.

In [4]:
name.find('el')

5

## Escape sequences

Back slashes represent the begining of escape sequences. Escape sequences represent strings that are difficult to input.
1. <code>/n</code> represents a new line.

2. <code>/t</code> represents a tab.

## RegEx

RegEx (short for Regular Expression) is a tool for mataching and handling strings. Python provides a built-in module <code>re</code> which allows you to work with regular expressions.

The <code>.search()</code> function searches for specified patterns in a string.

In [5]:
import re

string1 = "Michael Jackson is the best"

pattern = r"Jackson" # Define the pattern to search for
result = re.search(pattern, string1)

if result:
    print("Match found!")
else:
    print("Match not found.")

Match found!


Regular expressions (RegEx) are patterns used to match and manipulate strings of text. There are several special sequences in RegEx that can be used to match specific characters or patterns.

| Special Sequence | Meaning                 | 	Example             |
| -----------  | ----------------------- | ----------------------|
|\d|Matches any digit character (0-9)|"123" matches "\d\d\d"|
|\D|Matches any non-digit character|"hello" matches "\D\D\D\D\D"|
|\w|Matches any word character (a-z, A-Z, 0-9, and _)|"hello_world" matches "\w\w\w\w\w\w\w\w\w\w\w"|
|\W|Matches any non-word character|	"@#$%" matches "\W\W\W\W"|
|\s|Matches any whitespace character (space, tab, newline, etc.)|"hello world" matches "\w\w\w\w\w\s\w\w\w\w\w"|
|\S|Matches any non-whitespace character|"hello_world" matches "\S\S\S\S\S\S\S\S\S"|
|\b|Matches the boundary between a word character and a non-word character|"cat" matches "\bcat\b" in "The cat sat on the mat"|
|\B|Matches any position that is not a word boundary|"cat" matches "\Bcat\B" in "category" but not in "The cat sat on the mat"|

The <code>.findall(pattern, string)</code> function finds all occurrences of a specified pattern within a string. The ourput is a list.

In [8]:
pattern = r"\W"
text = "Hello, world!"
matches = re.findall(pattern, text)
print("Matches:", matches)

Matches: [',', ' ', '!']


The <code>.split(pattern, string)</code> function splits a string into a list of substrings based on a specified pattern.

In [10]:
string2 = "Michael Jackson was a singer and known as the 'King of Pop'"
split_array = re.split("\s", string2)

print(split_array)

['Michael', 'Jackson', 'was', 'a', 'singer', 'and', 'known', 'as', 'the', "'King", 'of', "Pop'"]


  split_array = re.split("\s", string2)


The <code>.sub(old, new, string)</code> to replace all occurrences of a segament within a string with a specified sequence.

In [11]:
pattern = r'King of Pop'
replacement = 'legend'

new_string = re.sub(pattern, replacement, string2, flags=re.IGNORECASE)
print(new_string)

Michael Jackson was a singer and known as the 'legend'
