## Manipulating Strings

### A string object

Almost everything in Python is an [*object*](https://en.wikipedia.org/wiki/Object_%28computer_science%29). In this notebook, we will begin learning what that means.

An object has *attributes*: both *data* and *methods* (functions attached to the object). Let's look at what those are for a string.

We access the data and methods of a Python object with the *dot operator*: `.`

In [None]:
i = 7
i.__class__

In [None]:
s = "A String"

Some attributes of `s`:

In [None]:
print(s.__doc__)

In [None]:
print(s.__class__)

A string object also has *methods*: functions attached to the object:

In [None]:
s.lower()

In [None]:
print(s)

In [None]:
s.upper()

We can get all of an objects attributes and methods with the built-in function `dir()`:

In [None]:
dir(i)

In [None]:
print(i >= 7)
print(i.__ge__(7))
print(i + 10)
print(i.__add__(10))

In [None]:
len(s)

In [None]:
s.__len__()

We can see all string attributes by running:

In [None]:
dir(s)

In [None]:
s2 = "ZZZZZ!"
s < s2

In [None]:
s.__lt__(s2)

In [None]:
"Is This Formatted Like A Title?".istitle()

We are going to focus on the following string methods:

- `upper()`: returns string in uppercase
- `lower()`: returns string in lowercase
- `capitalize()`: returns capitalized string
- `strip()`: returns string stripped of leading and trailing whitespace
- `count(item)`: returns the number of times `item` occurs in a string
- `replace(old, new)`: returns a string with each occurence of `old` replaced by `new`
- `find(item)`: returns the first index in a string where `item` is found, or `-1` if it is not found
- `format(substitutions)`: returns a string formatted according to `substitutions` 

We've already looked at `upper()` and `lower()`. Let's try the rest:

In [None]:
s = "porgie tirebiter"
print(s.capitalize())
print(s)  # original is unchanged

In [None]:
s = "   lot's o' whitespace   "
print(s)
s_stripped = s.strip()
print(s_stripped)
print("len s = ", len(s), "; len s_stripped =", len(s_stripped))

In [None]:
s = "Beeeeeeeeeeeer"
print(s.count("eee"))

In [None]:
s2 = "846-902-3474".replace("-", "")
print(s2)

Notice we **have not** changed `s`:

In [None]:
print(s)

`find()` returns the index of the substring of a string if the substring is found, or -1 if not.

In [None]:
s2.find("234")

In [None]:
s2.find("432")

Let's find repeated instances of a substring in a string:

In [3]:
s = "812-921-4535-12345"
i = 0
print("Looking for -")
while i != -1:
    i = s.find("-", i)
    if i >= 0:
        print("found - at index", i)
        i += 1

Looking for -
found - at index 3
found - at index 7
found - at index 12


Given the possibility the string is not there, we need to check:

In [None]:
name = "Duke Reginald Balderdash"
index = name.find("Sir")
print("index =", index)
if index != -1:
    # the next line uses slicing, described below
    s3 = name[index+4:]
    print(s3)
else:
    print("No 'Sir' found.")

### String Operators

String *concatenation* (pasting two strings together):

In [None]:
s1 = "foo"
s2 = "bar"
s3 = s1 + s2
print(s3)

String multiplication:

In [None]:
s = "-" * 20
print(s)
print("Hello class!")
print(s)

### String Slicing

*Slicing* a string allows us to "slice" out parts of it:

In [None]:
s = "--History, Stephen said, is a nightmare from which I am trying to awaken."
print(s[1:4])
print("", s[11:19], "", sep="|")
print(s[19])

What indices did we get by doing `s[0:4]`?

a) 1, 2, 3, 4  
b) 0, 1, 2, 3, 4  
c) 0, 1, 2, 3

We can also use negative numbers to slice from the end of the string. To get the last three characters, for instance, use:

In [None]:
print(s[-3:None])
print(s[-1])

(Why do we need `None` or a blank as the second index?)

In [None]:
-0

### `format()`

The [`format()`](https://docs.python.org/3/library/stdtypes.html#str.format) method allows us to do fancy string formatting; for instance, we can place variables into our string:

In [None]:
i = 8192
print("s = {}; s2 = {}; i = {:>8x} end".format(s, s2, i))

## Some string manipulating programs

### Letter frequency program

In [None]:
address = '''Four score and seven years ago our fathers brought forth on this
continent, a new nation, conceived in Liberty, and dedicated to the proposition
all men are created equal.

Now we are engaged in a great civil war, testing whether that nation, or any
nation so conceived and dedicated, can long endure. We are met on a great
battle-field of that war. We have come to dedicate a portion of that field, as
a final resting place for those who here gave their lives that that nation
might live. It is altogether fitting and proper that we should do this.

But, in a larger sense, we can not dedicate — we can not consecrate — we can
not hallow — this ground. The brave men, living and dead, who struggled here,
have consecrated it, far above our poor power to add or detract. The world
will little note, nor long remember what we say here, but it can never forget
what they did here. It is for us the living, rather, to be dedicated here to
the unfinished work which they who fought here have thus far so nobly advanced.
It is rather for us to be here dedicated to the great task remaining before us
— that from these honored dead we take increased devotion to that cause for
which they gave the last full measure of devotion — that we here highly
resolve that these dead shall not have died in vain — that this nation, under
God, shall have a new birth of freedom — and that government of the people, by
the people, for the people, shall not perish from the earth.
'''
num_of_a = 0
num_of_b = 0
num_of_periods = 0

print ("Processing the Gettysburg Address:", end="")
for letter in address:
    letter = letter.lower()
    print(".", end="")
    if letter == 'a':
        num_of_a += 1
    elif letter == 'b':
        num_of_b += 1
    elif letter == '.':
        num_of_periods += 1

print("\nThere are {:>10d} periods, {:<12d} occurrences of the letter a, and {} of the letter b.".format( \
    num_of_periods, num_of_a, num_of_b))
print("\nThere are", num_of_periods, "periods,", num_of_a, "occurrences of the letter a, and",
      num_of_b, "of the letter b.")

### Counting Words

In [None]:
user_input = input("Enter a string or phrase: ")
count = user_input.strip().count(" ") + 1
print ("There are {} words in the string you entered.".format(count))

### Reordering Words

In [None]:
# Program given the name as "firstname middlename lastname", reorder the string
# and print it as "lastname, firstname middlename"

name = "Eugene J. Callahan"

firstname, middlename, lastname = name.split() # splits on ' ' by default
firstmiddle, lastname = name.split('.') # splits on ' ' by default
print("Firstmiddle = ", firstmiddle, "lastname = ", lastname)
transformed = "{}, {} {}".format(lastname, firstname, middlename)
print(transformed)
print("name = {}".format(name)) # name is unchanged

In [None]:
# Common data format is a CSV file (comma separated values) - set split(",")
#
# lname,fname,year,gpa
name = "Goodman,Saul,sophomore,2.88"

lastname, firstname, grade, gpa = name.split(",")
transformed = "{0} {1} ({3:<10.8f}): {2}".format(firstname,
                                                lastname,
                                                grade,
                                                float(gpa))
print(transformed)

### Palindrome Checker

Checks a simple palindrome:

In [None]:
original = "kayak"
backward = original[::-1]
print("backward = ", backward)
print("{} vs {}: {}\n".format(original, backward,
                              original == backward))

This palindrome fails the test, since case of first and last 'm' don't match case, as well as 4th 'a' and 4th to last 'a':

In [None]:
# nonsense palindrome:
s = "aoxomoxoa"
backward = s[::-1]
s == backward

In [None]:
original = "Madam I'm Adam"
backward = original[::-1]
print("{} vs {}: {}\n".format(original, backward,
                              original == backward))

This palindrome will work:

In [None]:
original = "Able was I ere I saw elba"
backward = original[::-1]
print("{} VS. {}: {}\n".format(original, backward,
                              original == backward))

We'll use the string module to get access to:

- `string.punctutation`
- `string.digits`
- `string.ascii_lowercase`
- `string.whitespace`

In [None]:
import string

original = "Madam I'm Adam"
original = original.lower()
print("{}: {}\n".format(original, original == original[::-1]))

bad_chars = string.whitespace + string.punctuation
for char in bad_chars:
    original = original.replace(char, "")

print("{}: {}\n".format(original,
                        original == original[::-1]))