<a href="https://colab.research.google.com/github/lmu-cmsi1010-fall2021/lab-notebook-originals/blob/main/Strings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Strings

For more details on this topic, make sure to read [Think Python](https://greenteapress.com/thinkpython2/thinkpython2.pdf) chapter 8!

In [None]:
# A quick review of the basics:
sample_string = "kit"
another_sample_string = 'kat'

english2spanish = {}
english2spanish['one'] = 'uno'

Strings are ***immutable***—once created, they can’t (won’t) change. They are also *sequences* of characters, so you can loop over them!

In [None]:
# Run the prior code block before running this one.
for letter in sample_string:
    print(letter)

In [None]:
for character in 'This string is hardcoded—why???':
    print(character)

***
**Conditional and function review with strings**

Write a function `has_x` that takes a string as an argument, loops through the characters in that string, and returns `True` if it contains the character `x` and `False` if not.

In [None]:
def has_x(string_to_check):
    pass

print(has_x('Excellent!')) # Should output True.
print(has_x('Bogus!')) # Should output False.

***
**String operators**

String _concatenation_ `+` puts two strings together.

In [None]:
'Thunderbolts' + ' and ' + 'lightning'

In [None]:
'Very ' + 'very ' + 'frightening' + ' me'

String *repetition* `*` concatenates copies of the same string a given number of times.

In [None]:
'Galileo ' * 5

And of course you can combine them as needed.

In [None]:
'Galileo ' * 5 + 'Figaro magnifico!'

***
**Indexing**

The individual characters in a string can be accessed individually. Remember that the first character has an index of `0`!

In [None]:
lyric = 'Scaramouch'
print(lyric[1], lyric[4], lyric[6])

Negative index values start from the last character!

In [None]:
lyric = 'Will you do the fandango'
print(lyric[-1], lyric[-9], lyric[-15])

***
**Slicing**

Placing index ranges (indices separated by a colon `:`) between the brackets will yield substrings—this is called *slicing* the string.

The ending index is the character *past* the last character that you want to include in the slice.

Remember that strings are immutable, so your slices are *copies* of those parts of the original string.

In [None]:
fruit = 'strawberry'
print(fruit[2:5])

# Indices on either side of the colon are optional.
print(fruit[:5])
print(fruit[5:])

# Negative indices work too.
print(fruit[6:-1])
print(fruit[-8:-5])

# This is technically correct but you may as well just use the string.
print(fruit[:])

Slices can take a second colon to specify a *step size*. And yes, that step can be negative—although the way it counts can take some getting used to.

In [None]:
fruit = 'strawberry' # Repeated so you don't need to run the previous block.
print(fruit[5::2])
print(fruit[4:0:-1])

***
**A computer science favorite: *palindromes***

*Palindromes* are words that are the same whether spelled forward or backward (e.g. “radar,” “tenet”). Computer science has a strong affinity to them! (you’ll see when you learn theory)

But for now, can you use what you have learned about strings so far to write an `is_palindrome` function? How short can you make that function?

In [None]:
def is_palindrome(word):
    pass

print(is_palindrome('not a palindrome')) # Should print False.
print(is_palindrome('polyp')) # Should print False.
print(is_palindrome('tenet')) # Should print True.

# Ignoring spaces is more advanced; for now, count them.
print(is_palindrome('race car')) # Should print False.
print(is_palindrome('racecar')) # Should print True.

***
**String methods**

String *methods* are really just functions, but the way they are written, it looks like the string is performing an action rather than being passed as an argument. This change in look motivates the change of name.

In [None]:
# Case-changing methods.
greeting = 'Hello, LMU!'
print(greeting.lower()) # As opposed to lower(greeting).
print(greeting.upper())
print(greeting)
greeting = greeting.upper()
print(greeting)

In [None]:
# The split method separates a string into a list of substrings.
greeting = 'Hello, LMU!'
print(greeting.split()) # We split on spaces by default…
print(greeting.split(', ')) # …but we can customize that with an argument.

In order to process natural language, we need something more sophisticated. This version is called *tokenization*.

This code isn’t in a runnable block because it uses a specialized library called `nltk`—this notebook doesn’t have access to that library. You would need to install it on your computer to get this code to run:

```python
from nltk.tokenize import word_tokenize

greeting = 'Hello, LMU!'
word_tokenize(greeting)
```

In [None]:
# The join method is the reverse of split: it combines a list
# into a single string.
#
# In Python, join is a method of the string that will connect
# the items in the list.
words = ['Hello,', 'LMU!', 'Good', 'to', 'be', 'here']
print(' '.join(words))
print('-'.join(words))
print('...'.join(words))
print('__'.join(words))

In [None]:
# The find method takes a substring and returns the first index
# where the substring appears.
greeting = 'Hello, LMU!'
index = greeting.find('LMU!')

print(index)
print(greeting[index:])

snark = 'Well well well'
index = snark.find('well') # Remember it's the _first_ index.
print(index)
print(snark[index:])

In [None]:
# The replace method takes a string to replace and its replacement.
# All instances of the former get replaced by the latter.
greeting = 'Hello, LMU!'
print(greeting.replace('Hello', 'Hola'))

snark = 'Well well well'
print(snark.replace('well', 'Marsha'))

# Methods can be chained together. This is like nesting function
# calls but looks more sequential.
print(snark.replace('well', 'Marsha').replace('Well', 'Marsha'))

This is just the tip of the iceberg—you can visit https://docs.python.org/3/library/stdtypes.html#string-methods to see the full list of available string methods!