# Week 5 Lecture - Strings

* What is a String
* String Slicing
* String Comparison
* String Methods

## What is a string?

You already know about strings, they are a basic Python data type! 

In [None]:
my_string = "I am a string"
type(my_string)

However, strings are actually a more complicated data structure with a bunch of useful features and functions. 

Strings are a *sequence* of characters. Very similar to a *list*.

In [None]:
# loop over a string as a sequence of characters
for character in my_string:
    print(character)

Because strings are sequences, that means you can use the indexing syntax to access parts of a string.
* Remember use the `variable_name[x]` notation to get the item at index position `x` from the list or string stored in `variable_name`

In [None]:
# give me the character at index position 5
my_string[5]

Remember, indexing starts at 0 intead of 1, so it is easy to end out "out of range" when you aren't careful with your counting. Ugh.

In [None]:
# get the length of the string using the len() function
string_length = len(my_string)
print(string_length) # Should return 13 becuase the string has 13 characters
# grab the last item by using string_length as the index
my_string[string_length]

Ugh. 

In [None]:
# try it again with some math
# get the length of the string using the len() function
string_length = len(my_string)
print(string_length) # Should return 13 becuase the string has 13 characters
# grab the last item by using string_length-1 as the index
my_string[string_length - 1] 

This works because the last item is in the 12th index position. Fortunately, Python has an easier way of grabbing the last item. Remember, we used it with lists!

In [None]:
my_string[-1]

## Slicing Strings
![No Strings Attached](https://media.giphy.com/media/jnJLHYI447XkhFSMLi/giphy.gif)



You can also use the same slicing notation we used on lists with strings. Very handy for quickly cleaning or subsetting strings.

In [None]:
# Create a new string
city_name = "Pittsburgh"
print(city_name[2:5])
print(city_name[-5:]) # slice startings and the end minus 5 and continue to the end

In [None]:
# print the fight song, but what is the index position?
print("Hail to " + city_name[0:???])

## String Comparison

We have compared strings with the equals operator (`==`) to see if two strings are the same. You can also perform strings comparisons with the other comparison operators (`<`, `>`). With less than or greather than, Python will compare each letter in a pairwise comparison.

In [None]:
# is the string "one" less than the string "two"
"One" < "Two"

In [None]:
# is one less than this other, rather large, number?
"One" < "Fourteen million trillion gazillion"

Wut? Why is "One" less than that other big number? Because "O" is greater than "F"
* The `ord()` function returns the Unicode number for a one-character string.

In [None]:
print("An O corresponds to the number:", ord("O"))
print("A F corresponds to the number:", ord("F"))

Why do those characters correspond to those numbers? Becuase of [UNICODE]()
![Unicode Unicorn](https://cdn.dribbble.com/users/1318269/screenshots/11898064/media/7377c67c5dd852bf22d16630099d5794.png)

Python provides an `in` operator to check to see if a value is present in a squence. This means it works for both lists and strings.

In [None]:
# make a list of cheeses
cheeses = ['Cheddar', 'Edam', 'Gouda', 'Brie', "Pepper Jack"]
# is the strings "Brie" in my list of cheeses
"Brie" in cheeses # yum

Remember, computers are very precise and particular

In [None]:
# is "chedder in cheeses"
"cheddar" in cheeses

Why did we get that answer?

In [None]:
# Is the string inside another string
"Ched" in "Cheddar"

In [None]:
# is the item inside the list
"Ched" in cheeses

List comparisons and string comparisons operate very similarly. But it is important to understand the conceptual difference.
* list comparison - is this value present as an item in the list
* string comparison - is this sequence of characters present in this other string

In [None]:
# using the in keyword in a conditional 
if "Cheddar" in cheeses:
    print("All is good")
else:
    print("Go grocery shopping")

In [None]:
# create a big string of rambling thoughts on cheese
cheesey_monolog = """
I'd say my favorite cheese is chedder and thats probably because I am from Vermont, but I also like brie. Brie is 
really good baked with some pears and pastry...yeah, that's really good. But don't eat too much, otherwise you will
spoil your dinner."""
# check to see if the string "chedder" is somewhere in the monolog
if "chedder" in cheesey_monolog:
    print("Yes")
else:
    print("No")

## String Methods
![Method Man](https://media.giphy.com/media/JnLSi2bWr80p2/giphy.gif)

Because Python strings are a more complex data structure (similar to Lists), which means there a a bunch of functions (technically called "methods") designed to work specifically with strings.

In [None]:
string_of_words = "A string of words becomes a list"
# the split method is an oft used string method
list_of_words = string_of_words.split()
print(list_of_words)

There is another string method that allows you to go in reverse, but it is a bit wonky

In [None]:
# join a list together into a single string
" ".join(list_of_words)

What is happening here, why can't we say `list_of_words.join()`"

In [None]:
list_of_words.join()

That is because `join()` is a *string method* that only works on strings. The variable `list_of_words` is a list, which doesn't have a method called join.

What we do instead is take a string, in this case a space, and use its join method to create a string from the list passed to the function. The string we put first gets used as a separator.

In [None]:
# join a list together into a single string
"*".join(list_of_words)

In [None]:
# join a list together into a single string
"||||".join(list_of_words)

The official Python documentation has an [explaination of all string methods](https://docs.python.org/3/library/stdtypes.html#string-methods).

You can also use the autocomplete feature of Jupyter. Put the cursor after the "." below and hit TAB. That will show you all the possible string methods available.

Look at the documentation and autocomplete to try a couple of string methods below.

In [None]:
# Try a strig method
"a string of things".

In [None]:
# Try another string method
"a string of things".

In [None]:
# one more
"a string of things".

In [None]:
"something".

## Putting it All Together

We can use string looping, slicing, and string methods to do some data science on strings.

In [None]:
# create a string of some classic song lyrics
lyric = "cash rules everything around me"
# create an empty list for storing letters
letters = []
# Use the split function to create a list of words and loop over the list
for word in lyric.split():
    # append the first letter of the word converted to uppercase to the list 
    letters.append(word[0].upper()) 

# Join all the items in the string together with periods
greatest_wu_tang_song = ".".join(letters)
print(greatest_wu_tang_song)