                                                           Notebook created by Dragos Gruia and Valentina Giunchiglia

# Introduction to Strings Manipulation

In this section, we are going to cover different operations using strings. You should already be familiar with the concept of strings from previous lectures, but here we are going to discuss some more detailed applications.

As shown in previous tutorials, a string can be defined by a sequence of characters or numbers, which is saved using Quotation Marks. Their outputs can be easily printed via the `print` function.

In [None]:
my_string = "I can hold characters and numbers 123"
print(my_string)

However, this notation does not always work. What if we wanted our string to contain Quotation Marks? Or maybe we wanted our string to be outputted across multiple rows? Python has a functionality which allows one to deal with these special cases called Escaping Characters. The most commonly used escaping characters are `Backlashes (\)`. To use those, you add the backlash, followed by the special character that you want to be included in the string.

In [None]:
my_string_backlash = "\"This is a quote\""
print(my_string_backlash)

Similarly, you can use backlashes to tell Python start a new line via the `\n` command.

In [None]:
print("This is a line \n This is another line")

We can also format strings to contain information from other variables via the `format()` method. 

In [None]:
age_mark = 26
age_steven = 22

print("Mark just turned {} while Steven is only {}".format(age_mark,age_steven))

In this scenario, the empty curly brackets serve as placeholders for the string and its arguments.

An alternative way to use the format method is by writing `f` at the beginning of the string. In this case, within the curly brackets you have to specify the variable to input.

In [None]:
print(f"Mark just turned {age_mark} while Steven is only {age_steven}")

## Indexing and Iterating Through Strings 

The process of indexing a string is very similar to that of an array or a list.

In [None]:
my_string = "Taylor Swift"

my_string[0] #First character in the string
my_string[-1] #Last character in the string
my_string[:3] #All characters from 0 until index N-1

The `IN` command is particularly useful for strings. One way to use it is to check whether a certain character or set of characters are present in a string.

In [None]:
print("Taylor" in my_string) # Prints True as "Taylor" is found within the string
print("Jake Gyllenhaal" in my_string) # Prints False as "Jake Gyllenhaal" is not found within the string

Similarly, one can use the `for in` approach to iterate through all the characters in a string.

In [None]:
for character in my_string:
    print(character)

We can also have a list of strings and use the `for in` approach to go through each string in the list, one by one, and analyse its contents.

In [None]:
songwriters = ["Charlie Puth", "Ariana Grande", "Hozier", "Taylor Swift"]

for string in songwriters:
    if string == "Taylor Swift":
        print("I found Taylor")

Additionally, you can add another loop and check specific characters in each string. We call these nested loops, as they are loops within loops.

In [None]:
for string in songwriters:
    for character in string:
        if character == "t":
            print(string)
        

The previous code goes through each string in the list, then goes through each character in that string and checks whether the string contains the letter `t`. If the string contains that specific letter, the whole string is printed. Can you guess what would be printed if instead we were checking for the letter `a`?

Finally, another useful string method to identify different strings is `startswith`, which checks what's the beginning of the string, or `endswith`, which checks what's the end of the string.

In [None]:
songwriters = ["Charlie Puth", "Ariana Grande", "Hozier", "Taylor Swift"]

for string in songwriters:
    if string.startswith("Tay"):
        print("I found Taylor")

## String Methods

Strings contain certain built-in functions which can be used to format them. The key difference between these string methods and functions is that string methods can only be applied to strings, while functions as a rule of thumb are more general and can be applied on multiple data types like lists, arrays, etc. 

Below we show the most common string methods. The first are `lower()` and `upper()`, which transform all your uppercase letters to lowercase, and vice versa.

In [None]:
print(my_string.lower())
print(my_string.upper())

Other string methods include `title` which only transforms the first letter of each word to uppercase.


In [None]:
print(my_string.title())

We can also split a string into a list of strings via the `split()` method. By default, the method splits a string whenever it comes across a whitespace, but you can also indicate your own splitting character. For example, we can ask Python to split our string whenever it comes across the letter 'a'.

In [None]:
print(my_string.split())
print(my_string.split('a'))

But what if we changed our mind and after we split the strings apart, and we want to bring them back together? We can use the `join()` to join several strings into one. 

In [None]:
my_string_separated = my_string.split()
print(my_string_separated)
my_string_reunited = " ".join(my_string_separated)
print(my_string_reunited)

Notice that we used join on an empty string, rather than applying it the way we have done so far (e.g. `my_string_separated.join()`). The reason why it has to be done this way is because string methods only work on strings. If we applied the `join()` method on `my_string_separated` variable then we would effectively apply it on a list, rather than on a string. This would lead to an error.


The empty string before the `join()` function is called a separator and it can be set to any character. In this case we used a whitespace but the user can decide how he wants the strings to be separated (or if they should be separated at all)


In [None]:
my_string_reunited = "-".join(my_string_separated) #using dash as separation
print(my_string_reunited)
my_string_reunited = "".join(my_string_separated) #not using any separation
print(my_string_reunited)

## Cleaning a messy string

Sometimes the data we work with is very messy, so before we can analyse it, we need to process it a little bit. In this section, we will use several Python methods to clean and format a string.

In [None]:
my_messy_string = "    TheRE are Many FAMous ?Artists     "

As you can see, there are many issues with the above string. First, it contains a lot of whitespace characters at the starnt and at the end. These are redundant and we want to get rid of them. Luckily, Python has a method called `strip()` which removes by default all whitespace characters from the beginning and end of a string. If we want a different set of characters to be removed we can simply specify them between the brackets.

In [None]:
my_messy_string = my_messy_string.strip()
print(my_messy_string)

Now, we might want to get rid of the question mark, as it does not really fit well with the overall message. Python has a method called `find()` which allows one to find the first occurence of a given string. The `find()` method then outputs the index of that occurence. If no such string is found, the value -1 is returned instead. 

In [None]:
print(my_messy_string.find("?"))
index_questionmark = my_messy_string.find("?")

Now that we know at which index the question mark is found, we can remove it from the string, using the `replace()` method, which takes in a string and replaces a certain character or set of characters, with another. In our case, we want to replace the question mark, with an empty string, thus removing the question mark from the string.

In [None]:
my_messy_string = my_messy_string.replace(my_messy_string[index_questionmark], "")
print(my_messy_string)

-------------

### Code here

Now, if we wanted to make our string lowercase, then make only the first letter of each word uppercase, how would we do that? Additionally, if we wanted to separate our string in a list of strings, what function would we use?

In [None]:
# CODE HERE



