<a href="https://colab.research.google.com/github/rfpg/python-examples/blob/main/day1_rg_strings.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#An Introduction to Strings and String Methods in Python

Strings, in simple terms, are structures that are comprised on plain text or characters. They form an important basis in how we can use Python to analyze large bodies of text. You need to know how to use strings in order to engage with more advanced applications like predictive text or grammar checks. That's what we'll focus on in this lesson.

##Learning Objectives

1.   Create string variables containing palindromes.
2.   Explore the use of indexing to retrieve specific characters in your text.
1.   Count the number of characters in a given string.
1.   Optimize the structure of string by removing unnecessary spaces or letters.
2.   Utilize string methods to examine segments of your string. 
1.   Create fun puzzles/ciphers by offsetting characters in a string.




### **Example 1: Here's a simple palindrome as a string. It reads the same left-to-right and right-to-left.**

In [None]:
palindrome = "race car"

# Let's print the string to our console. 
# Again, we can print by typing the word, "print" with our variable name nested in parentheses 
# Or by simply typing the name of our variable but only if it's the only item in the code cell.

print(palindrome)

race car


###**Consider these points**
* Note that there's a space between the words "race" and "car". 
* The text "race car" is now stored in **string** called "palindrome."
* When using text in a computer program, it's a good idea to think about how we can optimize that text before doing something with it.

###**Consider these questions**
* How might a computer recognize a palindrome?
* How might we augment the text to make it easier to discern?
* How might we get rid of the space in-between the two words so we can write a more helpful computer program? 
* We can use a python string **method** to remove that space. 

A **method** is a tool that performs a specific function on a specific thing. It's not too dissimilar to the different blades on a swiss army knife. Each of the blades usually have specific applications or use-cases. It could be a blade to saw a piece of wood or to peel an apple. 


###**How do programmers talk about this stuff?**
Programmers might refer to the **thing** in question here as an **object** and the action to be taken as a **method** and the process of applying the tool as an **invocation**. 

*** A technical detail: in Python, some functions are associated with a particular class of objects (e.g., strings). The word method is used in this case, and we have a new way to call them: the dot operator. ***

`myString.method()`

We'll discuss methods with more examples as we move forward. For now, all you need to know is that we can apply this kind of very specific tooling by placing a period after the name of our thing and providing the name of the tool that we need. The tool or method we need in this case is called **"replace"** and we use it to replace the whitespace with effectively a non-space. This is similar to find and replace in a text editor.

In [None]:
palindrome_nospaces = palindrome.replace(" ", "")

#Let's print the result to see if the method did what we wanted it to do!

palindrome_nospaces

'racecar'

At this point, we now have two strings or two versions of our text.

In [None]:
palindrome, palindrome_nospaces

('race car', 'racecar')

A human might see these both as equivalent forms of a given palindrome but a computer will not. One basic difference is each string's respective length. How many characters exist in each string of text? Let's use the function **"len"** to calculate the number of characters for each of our strings and print the lengths to our console. Notice that we write len with parentheses like we write print above. There's no variable name or dot included in how we write out the code. Just note the difference for now. More on this later.

In [None]:
len(palindrome)

8

In [None]:
len(palindrome_nospaces)

7

We see that our original text has **8** characters including the blank space, whereas our text without the space has **7** characters. Knowing the length of a string of plain text allows us to search within the bounds of the text for similarities and differences using out methods. Here's a basic example where use the "index" method to find exactly where the space occurs in our original palindrome.

In [None]:
palindrome.index(' ')

4

The number that is returned represents the index within the string where the space exists. Remember, in programming languauges, we count from 0. Take a moment to write out the original palindrome with the indices 0 through 4 underneath each of the characters to get a clearer sense of how this works.

We can use indices to query where certain letters are in a given string.

In [None]:
firstLetter = palindrome[0]
length = len(palindrome)
lastLetter = palindrome[length-1]

#print the result
firstLetter, lastLetter

('r', 'r')

We can also shift the characters around to make fun puzzles. We can use the `ord()` function to convert the character to an integer and `chr()` to convert it back to a char.

In [None]:
firstLetterAsNumber = ord(firstLetter)

#print 
print(firstLetterAsNumber)

newLetter = firstLetterAsNumber + 2;

#print 
print(newLetter)

chr(newLetter)

114
116


't'

##Exercise 1
Given the string "dnktt" and the knowledge that each letter is offset by 2, use Python to decode the message one letter at a time.

In [None]:
secret_message = "dnkuu"

#hint: use ord() and chr() to convert each letter to a number and shift the letters by 2! Note the last two letters are the same!


#Solution

In [None]:
#nice, nested solution

one = chr(ord(secret_message[0]) - 2)
two = chr(ord(secret_message[1]) - 2)
three = chr(ord(secret_message[2]) - 2)
four = chr(ord(secret_message[3]) - 2)

one, two, three, four, four

('b', 'l', 'i', 's', 's')



---



#What have we learned about strings so far?
* We've learned that each letter in a string has its own index.
  * e.g., the letter at index 1 of the string, "hello", would be 'e'.
  * i.e., we can use that index value to call and act on its associated value.
* We applied string **methods** to manipulate basic text. 

###Let's look at some text from well-known publications.

Let's imagine we are trying to build a dataset of newspaper headlines so that we can do some kind of text analysis on it. Perhaps you want to go through them all to see if they have a particular word, or how far into the headline a particular word appears. We won't be writing the code to scrape the website here, right now, however we can dive into some of the steps that might be involved in working with that kind of text.

Here's an example from the New York Times:

"Tiny Vanuatu Uses Its ‘Unimportance’ to Launch Big Climate Ideas"

You can use single quotes or double quotes. **Just keep it consistent.**

In [4]:
headline = 'Tiny Vanuatu Uses Its ‘Unimportance’ to Launch Big Climate Ideas'

In [3]:
headline = 'Tiny Vanuatu Uses Its ‘Unimportance’ to Launch Big Climate Ideas'

The **`count()`** method gives the number of occurrences of a substring in a range. The arguments for the range are optional. 

*Syntax:*

`str.count(substr, start, end)`

`substring` is the thing we want to count (a letter)
Here, `start` and `end` are integers that indicate the indices where to start and end the count. For example, if we want to know how many instances of the letter `'a'` we have in the whole string, we can do:

In [35]:
headline.count('a')

6

If we want to know how many of those 'a' characters are in the range [0:10] (first 10 chars), we do:

In [36]:
headline.count('a', 0, 10)

2

We can read more about string methods in the [Python documentation.](https://docs.python.org/3/library/stdtypes.html#string-methods). Let's look at two more string methods that may serve us well.

### `find()`

The **find()** method tells us if a string `'substr'` occurs in the string we are applying the method on.


*Syntax:*

`str.find(substr)`

If the string `'substr'`is in the original string, the `find()` method will return the index where the substring starts, otherwise it will return `-1`.

What if we're looking for any number or one of many different possible substrings? `find()` is rather simple in what it can do. 

Let's do the following:

1. Use the count() method to count how many letters 'c' are in the headline.
2. Use the find() method to find the position of the word 'climate' in theheadline.


In [30]:
headline.count('c')

2

In [31]:
headline.find('climate')

-1

Wait? Why is `find()` returning `-1`. We can see that the word, "climate", does indeed feature in our example headline. So, what's up? The case of our text matters!

In [14]:
headline.find('Climate')

51

What's a quick way to level the playing field?

In [16]:
headline.lower().find('climate')

51

#Exercise 2

Let's see if we can bring this all together to solve a problem. Don't worry if you struggle a bit, you're still starting out and we're here to guide you through this exercise. Try to write one code cell for each request to help step through the problem.

You are given a string with some simple text and a brand new method that we haven't looked at called `split()`. What does `split()` do? Where can we find out more about our methods? **[The documentation](https://docs.python.org/3/library/stdtypes.html#string-methods).**

In [None]:
"Stately, plump Buck Mulligan came from the stairhead, bearing a bowl of lather on which a mirror and razor lay crossed. A yellow dressinggown, ungirdled, was sustained gently behind him on the mild morning air."

Please do the following to complete the exercise:
  1. Assign the text to a variable and ensure the entire string is lowercase.
  2. So far, we have only counted the instances of single letters. Can we use `count()` to return the number of times "ing" occurs in this text?
  3. Use the `find()` string method to determine if the text is more than one sentence. Hint: What would we use as our separator? 
  4. `split()` the string into two sentences.
  5. Print the result to the console. How does it look different?

In [38]:
joyce = "Stately, plump Buck Mulligan came from the stairhead, bearing a bowl of lather on which a mirror and razor lay crossed. A yellow dressinggown, ungirdled, was sustained gently behind him on the mild morning air."

In [40]:
joyce.count("ing")

3

In [41]:
joyce.lower()

'stately, plump buck mulligan came from the stairhead, bearing a bowl of lather on which a mirror and razor lay crossed. a yellow dressinggown, ungirdled, was sustained gently behind him on the mild morning air.'

In [42]:
result = joyce.split(".")
print(result)

['Stately, plump Buck Mulligan came from the stairhead, bearing a bowl of lather on which a mirror and razor lay crossed', ' A yellow dressinggown, ungirdled, was sustained gently behind him on the mild morning air', '']


##Notes for Team

At this point in the lesson, I would hope the students would feel somewhat comfortable exploring methods and the documentation with our guidance and so I tried to build a simple exercise to test this theory. I think it's alright to let the students struggle a bit with the exercise and for all of us to walk around the room and help them out as they go. 

I removed the note about the `start` and `end` indices for `find()` because this is something that could be introduced later with a better example.

I did not include `startswith()` because I think it could be introduced later with a good example as opposed to tacked on to the end of this lesson. It will become immediately useful when we use split() to make a list of multiple strings.

I included `split()` as a new method to explore within the context of a short exercise to serve as a nice segue to the next topic it returns a list of strings.