# Strings

You've already seen strings in Python, we've been using them from the start. This notebook goes into a little more detail on strings, how they're used, and how you can harness their full power.

## Creating Strings

We usually create strings using a string literal. A string literal uses single quotes (') or double quotes (") to enclose the text. You can use whichever you prefer 

In [None]:
str1 = 'hello'
str2 = "hello"

# Are they the same?
str1 == str2

The only difference between double and single quotes is in **escaping**. If I open a string with a single quote, it will end as soon as it finds the next single quote. This can be a pain if you need to include single quotes in the string

In [None]:
# This won't work
name = 'They're'

You can get around this by using a backslash to *escape* single quotes inside the string. The backslash tells the Python interpreter that the next single-quote (or double-quote) character is part of the string.

In [None]:
# This works, but it's ugly
name = 'They\'re'

This is generally ugly and error prone, so an easier method is to use double quotes to mark the string and then you can use the single quotes inside it with no problems

In [None]:
# This works
name = "They're"

Similarly, if your string contains double quotes you can use single quotes to enclose it

In [None]:
str1 = '"Come with me", she said...'
str1

## Format Strings

You may have noticed throughout the notebooks that I've somtimes been putting the letter f just before a string. This is known as an f-string (f for format) and allows you to add variables directly into the string

In [None]:
# Using variables inside a string old-school
name = "Lucas"
greeting = "Hello " + name
print(greeting)

# Note that we have to convert the number to a string below
answer = 42
line = "The answer to life, the universe and everything is " + str(answer)
print(line)

line = f"The answer to life, the universe and everything is {answer}"
print(line)

If we prefix a string with f we can add a variable using curly brackets. The python interpreter will look after converting it to a string for us. If we need to use curly brackets inside the string, we can escape them. To escape curly braces in an f-string we just use double curly braces

In [None]:
with_escape = f"F-Strings use {{variable_name}} to add variables directly to a string"
print(with_escape)

## String Access
A string is really just a list of single characters so we can access sub-strings just like we would access the elements of any list

In [None]:
greeting = 'Hello World'
print(f"The first letter of greeting is {greeting[0]}")
print(f"The second letter of greeting is {greeting[1]}")
print(f"The last letter of greeting is {greeting[-1]}")

If we want to pull out multiple letters we can use slicing

In [None]:
greeting = 'Hello World'
print(greeting[1:3])
print(greeting[7:10])
print(greeting[:5])
print(greeting[6:])

We can also iterate through the letters of a string using a for loop

In [None]:
for letter in greeting:
    print(f"*{letter}*")

## Upper and Lower case

We can easily convert strings to upper or lower case using the string **upper()** and **lower()** methods. This is often useful when comparing two strings to see if they match. By default, Python treats strings with different capitalisation as different. If we want to do do a **case-insensitive** comparison we can convert both strings to upper or lower case

In [None]:
str1 = 'hello'
str2 = 'HELLO'

print(f"{str1} and {str2} equal? {str1 == str2}")

print(f"use lower or upper if you don't care about the capitalization... Match? {str1.lower() == str2.lower()}")

## Finding Substrings
A substring is any string of characters within another string. 'Hello' is a substring within 'Hello World'. We can use the string **find()** method to find a substring within a string. The **find()** function returns the index of the beginning of the *first occurrence* of the substring. If the substring isn't found then it returns -1

In [None]:
greeting = 'Hello World'
print(f"The index of H is {greeting.find('H')}")
print(f"The index of l is {greeting.find('l')}")
print(f"The index of x is {greeting.find('x')}")
print(f"The index of World is {greeting.find('World')}")

If we just want to check for a substring we can use the **in** operator. This returns True or False indicating whether the substring was found

In [None]:
print(f"Hello is in the string? {'Hello' in greeting}")
print(f"Bonjour is in the string? {'Bonjour' in greeting}")


## Replacing Substrings
The **replace()** function allows us to find a certain substring and replace it with something else

In [None]:
greeting.replace('Hello', 'Goodbye')

If the text to replace isn't found then nothing happens, it won't throw an error

In [None]:
greeting.replace('Bonjour', "Au Revoir")

## Splitting Strings
CSV files use commas to separate values. They're a simple way to store and load data so they're quite common in data science. Python allows us to split a string into a list of strings using the **split()** function. This is particularly handy for reading CSV files

In [None]:
line = "Lucas,D1234567,TU123"
data = line.split(',')

print(f"Name is {data[0]}")
print(f"Student Number is {data[1]}")
print(f"Course is {data[2]}")

We can also condense this into a one-liner by assigning line.split() to multiple variables

In [None]:
name, student_no, course = line.split(',')
print(f"Name is {name}")
print(f"Student Number is {student_no}")
print(f"Course is {course}")

Another common use of the split() method is to break a string into a list of words. Words are usually seperated by a space, so the following code will give us each word as a separate list item

In [None]:
text = "It was the best of times, it was the blurst of times"
words = text.split(' ')
words

## Joining Strings

Joining strings is the opposite of splitting them. If we have a list of values we can convert it to a string using **join()**. We need to choose a character to use to join the strings together.

In [None]:
grades = ['62', '70', '64', '68']
print('|'.join(grades))
print(','.join(grades))

In [None]:
verbs = ['get', 'stand', 'stand up for your rights']
bob = ' up, '.join(verbs)
print(bob)