# Strings


## Lesson Overview

> A string is an ordered sequence of characters, generally used to store text data. 

An example string might look like this:

In [None]:
greeting = 'Hello, World!'

### Creating strings

Strings can be defined using either single quotes (`' '`) or double quotes (`" "`). 

In [None]:
quote = 'Hi, friends!'
print(quote)

quote = "Hi, friends!"
print(quote)

### Accessing characters in a string

A string is usually represented as a list of characters.

We can also perform certain actions to access individual characters within that string:

In [None]:
greeting = 'Hello, World!'
print(greeting[0]) # 'H'
print(greeting[5]) # ','
print(len(greeting)) # 13

Keep in mind that strings are 0-indexed. That means they start counting their characters from 0 rather than 1. Therefore, the first character (the `'H'` in `'Hello'`) is accessible via `greeting[0]`.


### Changing characters in a string and immutability

Accessing characters is an important part of working with strings, but changing characters is more difficult. In Python and most other languages, a string is **immutable**, meaning that its characters cannot be edited once they're in that form. If a string needs to be changed, there are many functions and built-in features for many languages that make it easy to copy and modify strings.

In [None]:
greeting = 'Hello, World!'
print(greeting[0]) # 'H'

greeting[0] = 'M' # This will cause an error!

### Changing characters in a string and immutability

Typically a string is modified by creating a new string with the change.

Try this:

In [None]:
greeting = 'Hello, World!'

new_greeting = 'M'
for i in range(1, len(greeting)):
  new_greeting += greeting[i]
print(new_greeting)

### String formatting

You may also use strings for formatting, which allows you to substitute certain data types for formatting characters in strings: 

*   `%s` is used to substitute strings and boolean values.
*   `%d` is used to substitute integers.
*   `%f` is used to substitute floating point numbers.

In [None]:
print('Here is a sample string: %s' % 
      'This string will be inserted in place of the first "%s".')
print('This string prints a number: %d' % 50)
print('Kiran has two favorite numbers: %d and %d' % (0, 1))
print('Sidnie\'s favorite number is %d and her favorite word is %s.' % 
      (8, 'neat'))

### Escaping characters

When creating strings, it's best to be consistent, but there will be times when that's challenging.

 Consider the following text:

> He said, "No, that's my horse!".

To encode that as a string, you'll need to use single quotes and double quotes, which means that you'll have to **escape** some characters. You can do that using a backslash, or a `\`.

In [None]:
quote = "He said, \"No, that's my horse!\"."
print(quote)

quote = 'He said, "No, that\'s my horse!".'
print(quote)

## Question 1

Which of the following best defines an string?

**a)** An unordered collection of characters.

**b)** An ordered collection of characters.

**c)** An ordered collection of integers.

**d)** An ordered collection of objects with any data type.

### Solution

The correct answers are **b)**.

**a)** A string has an ordering.

**c)** Be careful. The characters in a string can represent numbers, but they are not of type `int`.

**d)** This defines another data strucure called an *array*.

## Question 2

Write a function that returns the following passage from Emily Bronte's *Wuthering Heights* as a single string. 

> Wuthering Heights is the name of Mr. Heathcliff's dwelling. "Wuthering" being a significant provincial adjective, descriptive of the atmospheric tumult to which its station is exposed in stormy weather.


In [None]:
def wuthering_passage():
  # TODO(you): Implement
  print('This function has not been implemented.')

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
print(wuthering_passage()[:120])
# Should print: Wuthering Heights is the name of Mr. Heathcliff's dwelling. “Wuthering” being a significant provincial adjective, descri

print(wuthering_passage()[120:])
# Should print: ptive of the atmospheric tumult to which its station is exposed in stormy weather.

### Solution

The key idea here is to add the quotation marks and apostrophe by escaping (e.g. `'can\'t'`) or using both double- and single-quotes (e.g.`"can't"`).

We break the passage onto different lines to limit line length in our code. 

In [None]:
def wuthering_passage():
  return ('Wuthering Heights is the name of Mr. Heathcliff\'s '
  + 'dwelling. “Wuthering” being a significant provincial adjective, '
  + 'descriptive of the atmospheric tumult to which its station is exposed '
  + 'in stormy weather.')

## Question 3

For many string manipulation tasks, you'll need to quickly look through the string and perform certain actions. Let's try that by writing a method called `print_vowels`. This method takes in a string and prints all the vowels (`['a', 'e', 'i', 'o', 'u']`) in it, one vowel per line.

In [None]:
def print_vowels(input_text):
  # TODO(you): Implement
  print('This function has not been implemented.')

For example, when given the string `'Hello, World!'`, the function should print out the letters `e`, `o`, and `o`, each on their own line.


### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
print_vowels('Test')
# Should print: e

print_vowels('More vowels!')
# Should print multiline:
# o
# e
# o
# e

### Solution

For this we'll iterate through the string using a `for` loop. If we match any vowels, we'll print them out:


In [None]:
def print_vowels(input_text):
  for i in range(len(input_text)):
    c = input_text[i]
    if c == 'a':
      print(c)
    if c == 'e':
      print(c)
    if c == 'i':
      print(c)
    if c == 'o':
      print(c)
    if c == 'u':
      print(c)

  You can also simplify this by using `or` statements:

In [None]:
def print_vowels(input_text):
  for i in range(len(input_text)):
    c = input_text[i]
    if c == 'a' or c == 'e' or c == 'i' or c == 'o' or c == 'u':
      print(c)

As you cover more data structures, you may find a way to make this function shorter. Some languages even have their own built-in functions for determining if a given character is a vowel or not.

## Question 4

Another thing strings allow is for **slicing**, or removing parts of the string. Let's take a look at a few examples:



In Python, the syntax for slicing is `[start_index:end_index:step]`.  

*   `step` is a particularly interesting parameter, as it tells you how to move from `start_index` to `end_index`. If not included, it defaults to `1`. 
*   `start_index` and `end_index` default to the beginning and end of the string. 

In [None]:
greeting = 'Hello, World!'

# The step parameter is optional (1, by default).

print(greeting[1:3]) # 'el'
print(greeting[4:8]) # 'o, W'
print(greeting[2::1]) # 'llo, World!'
print(greeting[::2]) # 'Hlo ol!'
print(greeting[:10:2]) # 'Hlo o'
print(greeting[::-1]) # '!dlroW ,olleH'

print(greeting[5:]) # ', World!'
print(greeting[5::1]) # ', World!'
print(greeting[:10]) # 'Hello, Wor'
print(greeting[:10:1]) # 'Hello, Wor'

Slicing allows us to do all sorts of interesting things. Take, for instance, [Spoonerisms](https://en.wikipedia.org/wiki/Spoonerism), a classic wordplay where you swap the first letter of two different words. For example, the phrase `'train down'` would become `'drain town'`. 

Write a `spoonerize` function that takes two strings as input and returns them with their first letters reversed. Slicing may help you here.

In [None]:
def spoonerize(first_word, second_word):
  # TODO(you): Implement
  new_first_word = '' # Fill this in.
  new_second_word = '' # Fill this in.
  return new_first_word + ' ' + new_second_word

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
print(spoonerize('train', 'down')) 
# Should print: drain town

print(spoonerize('jelly', 'beans')) 
# Should print: belly jeans

### Solution

Slicing will let us quickly separate the first character from the rest of the string.

In [None]:
def spoonerize(first_word, second_word):
  new_first_word = second_word[:1] + first_word[1:]
  new_second_word = first_word[:1] + second_word[1:]
  return new_first_word + ' ' + new_second_word

## Question 5

Palindromes are another interesting application of wordplay. A **palindrome** is a word that reads the same starting from the front or back. For instance, "madam" and "racecar" are palindromes. Complete the `is_palindrome` function, which returns `True` if a given string is a palindrome and `False` otherwise.



In [None]:
def is_palindrome(input_text):
  """Determines whether or not a given string is a palindrome.

  Args:
    input_text: A string input.

  Returns: 
    Boolean. True if input_text is a palindrome, false otherwise.
  """
  # TODO(you): Implement
  return False

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
print(is_palindrome('racecar')) 
# Should print: True

print(is_palindrome('car')) 
# Should print: False

### Solution

There are a number of ways to do this, but the easiest is to remember how slicing and the `step` parameter works.

In [None]:
def is_palindrome(input_text):
  """Determines whether or not a given string is a palindrome.

  Args:
    input_text: A string input.

  Returns: 
    Boolean. True if input_text is a palindrome, False otherwise.
  """
  return input_text == input_text[::-1]

## Question 6

It feels a bit strange to exclude things that we know are palindromes just because they have additional spaces and other punctuation. For instance, "Madam, I'm Adam" can be considered a palindrome, even though it has punctuation in it. Let's fix that. Write a function called `strip_punctuation` that removes all punctuation and spacing from a string and returns the new string. Then update your `is_palindrome` to use this new function.

You may assume all letters are lowercase. To guarantee that, we'll use the `lower` method to convert them all to lowercase. `lower` is one of many [built-in methods](https://docs.python.org/3/library/stdtypes.html#string-methods) for strings in Python. There are plenty of other methods, including ones that will convert a string to uppercase or titlecase and ones that tell you if a particular character is a letter, number, or space.

In [None]:
def strip_punctuation(input_text):
  # TODO(you): Implement
  return input_text
 
def is_palindrome(input_text):
  """Determines whether or not a given string is a palindrome.

  Args:
    input_text: A string input.

  Returns: 
    Boolean. True if input_text is a palindrome, false otherwise.
  """
  input_text = input_text.lower()
  input_text = strip_punctuation(input_text)
  # TODO(you): Copy your is_palindrome function from the last problem, and
  # modify it to use strip_punctuation.
  return False

### Hint

Use the built-in method [`isalpha`](https://docs.python.org/3/library/stdtypes.html#str.isalpha) to check if a given character is a letter or not.

In [None]:
print('b'.isalpha()) # True
print('8'.isalpha()) # False
print('!'.isalpha()) # False

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
print(is_palindrome('Madam, I\'m Adam'))
# Should print: True

print(is_palindrome('So many dynamos!'))
# Should print: True

print(is_palindrome('blob'))
# Should print: False

### Solution

This can be done two ways. We can iterate through the string, slicing it every time we see a character we want to omit. That's not great, though; generally, you don't want to modify a structure that you're iterating through. Instead, let's create a `result` variable and add letters to it as we see them. Then we can fix up `is_palindrome`:

In [None]:
def strip_punctuation(input_text):
  result = ''
  for c in input_text:
    if c.isalpha():
      result += c
  return result

def is_palindrome(input_text):
  """Determines whether or not a given string is a palindrome.

  Args:
    input_text: A string input.

  Returns: 
    Boolean. True if input_text is a palindrome, false otherwise.
  """
  input_text = input_text.lower()
  input_text_no_punctuation = strip_punctuation(input_text)
  return input_text_no_punctuation == input_text_no_punctuation[::-1]

## Question 7

Next let's try making acronyms. An acronym is usually a combination of the first letters in a phrase to create a new word, like **SCUBA** (self-contained underwater breathing apparatus) or **LASER** (light amplification by stimulated emission of radiation). Some acronyms omit some words, such as LASER omitting "by" and "of". But for your `make_acronym` function, don't worry about that yet. You'll also notice that these are all capitalized; you may use `upper` to convert any string to its uppercase equivalent. 

In [None]:
print('asap'.upper()) # 'ASAP'

Implement `make_acronym` below. 



In [None]:
def make_acronym(input_text):
  # TODO(you): Implement
  return input_text

### Hint

You may want to make use of the `isalpha` or `isspace` functions.

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
print(make_acronym('as soon as possible')) 
# Should print: ASAP

print(make_acronym('do it yourself'))  
# Should print: DIY

print(make_acronym('pacific daylight time'))  
# Should print: PDT

print(make_acronym('national basketball association'))  
# Should print: NBA

print(make_acronym('fun with acronyms'))  
# Should print: FWA

### Solution

The trick here is to always grab the first letter after a space and add it to `result`.

In [None]:
def make_acronym(input_text):
  result = ''
  is_first_character = True
  grab_next_character = False

  for c in input_text:

    if is_first_character:
      result += c
      is_first_character = False

    elif c.isspace() or c == '-':
      # If you see a whitespace character or a hyphen, you know the next
      # character is the one that you want.
      grab_next_character = True

    elif c.isalpha() and grab_next_character:
      result += c
      # Make sure to set grab_next_character to False! Otherwise, you will grab
      # all of the letters after the space instead of just grabbing the first
      # one. This is a common bug.
      grab_next_character = False
      
  return result.upper()

## Question 8

[Advanced] How does `make_acronym` need to change if you do want to ignore stopwords?

 Let's say you need to ignore the following words:

```python
'with', 'of', 'the', 'and', 'by'
```

In [None]:
def make_acronym(input_text):
  # TODO(you): Implement
  return input_text

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
print(make_acronym('light amplification by stimulated emission of radiation'))
# Should print: LASER

print(make_acronym('United States of America'))
# Should print: USA

### Solution

This one is substantially more challenging. You need to check each word and make sure it's not your "stop word." Let's see how we can build off the previous solution:

In [None]:
def make_acronym(input_text):
  result = ''
  next_word = ''

  for c in input_text:
    if c.isspace():
      # You don't need grab_next_character anymore, since you're grabbing the 
      # entire word and checking it. Instead, you want to check a word before 
      # you process the next one.
      if (
        next_word == 'with' or next_word == 'of' or next_word == 'the' or 
        next_word == 'and' or next_word == 'by'):
          next_word = ''
      else:
        result += next_word[0]
        next_word = ''
    elif c.isalpha():
      next_word += c

  # The most tricky part of this problem is that you will usually exit your loop
  # with a word still stored in next_word. You need to make sure you process 
  # that final word, as well.
  if (
    next_word == 'with' or next_word == 'of' or next_word == 'the'
    or next_word == 'and' or next_word == 'by'):
      next_word = ''
  else:
    result += next_word[0]
    next_word = ''
    
  return result.upper()

## Question 9

Your colleague wants to post some spoilers about the latest season of your favorite TV show to your office chat group, but they don't want to actually ruin the show for anyone. To avoid doing so, they're using [ROT13](https:#en.wikipedia.org/wiki/ROT13), or an encryption method that "encrypts" a message by "rotating" a letter to the letter 13 spaces later in the alphabet (wrapping around if it would go past z, back to a). For example, when you encrypt `'secret'`, it becomes `'frperg'`.

Unfortunately, their implementation has some issues, and you need to get this right or your whole office is going to get spoiled. It looks like `rotate_character` is doing what it should, but `rot13` keeps crashing. Can you fix it?

In [None]:
LENGTH_OF_ALPHABET = 26
ROTATION_LENGTH = 13
START_OF_ALPHABET = 'a'
 
# This function takes a character `c` and rotates it `rotation` characters
# forward in the alphabet. If it reaches the end of the alphabet, it
# wraps around using the modulo operator.
def rotate_character(c, rotation):
  # First, update the value by moving it forward by ROTATION_LENGTH.
  # ord converts a char to its ascii value so that we can 'rotate' it.
  rotated_value = ord(c) - ord(START_OF_ALPHABET) + rotation
  # Next, use the modulo operator so that we don't run off the end of the 
  # alphabet.
  new_character_index = rotated_value % LENGTH_OF_ALPHABET
  # Finally, convert the index back to a character and set its value in the 
  # string.
  # chr converts an ascii value back to a character.
  new_character = chr(new_character_index + ord(START_OF_ALPHABET))
  return new_character
 
def rot13(input_text):
  for i in range(len(input_text)):
    current_character = input_text[i]
    if current_character.isalpha():
      input_text[i] = rotate_character(current_character, ROTATION_LENGTH)
  return input_text

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
print(rot13('a')) 
# Should print: n

print(rot13('an')) 
# Should print: na

print(rot13('and')) 
# Should print: naq

print(rot13('my new rock band name')) 
# Should print: zl arj ebpx onaq anzr

### Solution

The first big issue with this `rot13` function is that it's trying to update a character in a string. Strings are immutable, which means they cannot be modified. Instead, make a `result` string and add characters to that, as we've done in the past.

In [None]:
LENGTH_OF_ALPHABET = 26
ROTATION_LENGTH = 13
START_OF_ALPHABET = 'a'

# This function takes a character `c` and rotates it `rotation` characters
# forward in the alphabet. If it reaches the end of the alphabet, it
# wraps around using the modulo operator.
def rotate_character(c, rotation):
  # First, update the value by moving it forward by ROTATION_LENGTH.
  # ord converts a char to its ascii value so that we can 'rotate' it.
  rotated_value = ord(c) - ord(START_OF_ALPHABET) + rotation
  # Next, use the modulo operator so that we don't run off the end of the 
  # alphabet.
  new_character_index = rotated_value % LENGTH_OF_ALPHABET
  # Finally, convert the index back to a character and set its value in the 
  # string.
  # chr converts an ascii value back to a character.
  new_character = chr(new_character_index + ord(START_OF_ALPHABET))
  return new_character

def rot13(input_text):
  result = ''
  for i in range(len(input_text)):
    current_character = input_text[i]
    if current_character.isalpha():
      result += rotate_character(current_character, ROTATION_LENGTH)
    else:
      result += current_character
  return result

## Question 10

After fixing the first issue, your colleague noticed that the output looks weird when they started trying to put in longer phrases and sentences. It's ending up incomprehensible, which at first glance seems to work as intended, but `rot13` is rotating more characters than it should be. Can you fix it?

In [None]:
LENGTH_OF_ALPHABET = 26
ROTATION_LENGTH = 13
START_OF_ALPHABET = 'a'

# This function takes a character `c` and rotates it `rotation` characters
# forward in the alphabet. If it reaches the end of the alphabet, it
# wraps around using the modulo operator.
def rotate_character(c, rotation):
  # First, update the value by moving it forward by ROTATION_LENGTH.
  # ord converts a char to its ascii value so that we can 'rotate' it.
  rotated_value = ord(c) - ord(START_OF_ALPHABET) + rotation
  # Next, use the modulo operator so that we don't run off the end of the 
  # alphabet.
  new_character_index = rotated_value % LENGTH_OF_ALPHABET
  # Finally, convert the index back to a character and set its value in the 
  # string.
  # chr converts an ascii value back to a character.
  new_character = chr(new_character_index + ord(START_OF_ALPHABET))
  return new_character
 
def rot13(input_text):
  for i in range(len(input_text)):
    current_character = input_text[i]
    input_text[i] = rotate_character(current_character, ROTATION_LENGTH)
  return input_text

### Unit Tests

Run the following cell to check your answer against some unit tests.

In [None]:
print(rot13('ebfrohq jnf fbzr thl\'f fyrq!'))
# Should print: rosebud was some guy's sled!

### Solution

The second issue is a bit more difficult to notice. Currently our solution takes every character in `input_text` and converts it. That's fine if you're only converting single words. For full sentences, you need to ignore anything that isn't a letter, similar to what we did in `strip_punctuation` earlier in the lesson. The key here is that if it's not a letter, we should add it to `result` unaltered. That should skip the digits and special characters in the string, converting only the letters.

In [None]:
LENGTH_OF_ALPHABET = 26
ROTATION_LENGTH = 13
START_OF_ALPHABET = 'a'

# This function takes a character `c` and rotates it `rotation` characters
# forward in the alphabet. If it reaches the end of the alphabet, it
# wraps around using the modulo operator.
def rotate_character(c, rotation):
  # First, update the value by moving it forward by ROTATION_LENGTH.
  # ord converts a char to its ascii value so that we can 'rotate' it.
  rotated_value = ord(c) - ord(START_OF_ALPHABET) + rotation
  # Next, use the modulo operator so that we don't run off the end of the 
  # alphabet.
  new_character_index = rotated_value % LENGTH_OF_ALPHABET
  # Finally, convert the index back to a character and set its value in the 
  # string.
  # chr converts an ascii value back to a character.
  new_character = chr(new_character_index + ord(START_OF_ALPHABET))
  return new_character

def rot13(input_text):
  result = ''
  for i in range(len(input_text)):
    current_character = input_text[i]
    if not input_text[i].isalpha():
      result += input_text[i]
    else:
      result += rotate_character(current_character, ROTATION_LENGTH)
  return result