Text is one of the most common forms of data your programs will handle. You already know how to concatenate two string values together with the + operator, but you can do much more than that. You can extract partial strings from string values, add or remove spacing, convert letters to lowercase or uppercase, and check that strings are formatted correctly. You can even write Python code to access the clipboard for copying and pasting text. In this module, we still study techniques of strings manipulation. 

The first thing we need to know is how to use triple strings to comment. This achieves the same as the hash sign "#". 

In [1]:
"""This is a test Python program.
Written by Al Sweigart al@inventwithpython.com
This program was designed for Python 3, not Python 2.
"""
def spam():
    """This is a multiline comment to help 
    explain what the spam() function does."""
print('Hello!') # the triple strings are comments

Hello!


The second thing we need to realize that there are many methods associated with strings. Below are some examples. 

In [2]:
spam = 'Hello world!'
spam = spam.upper()
print(spam)
spam = spam.lower()
print(spam)

HELLO WORLD!
hello world!


The isupper() and islower() methods will return a Boolean 'True' value if the string has at least one letter and all the letters are uppercase or lowercase, respectively. Otherwise, the method returns 'False'.

In [3]:
Janice='Chandler Bing!'
print(Janice.islower())
print(Janice.isupper())
Janice2='O-M-G!'
print(Janice2.isupper()) # True

False
False
True


Along with islower() and isupper(), there are several string methods that have names beginning with the word 'is'. These methods return Boolean values that describe the nature of the string. Here are some common 'isX' string methods: 1) isalpha() returns 'True' if the string consists only of letters and is not blank. 2) isalnum() returns 'True' if the string consists only of letters and numbers and is not blank. 3) isdecimal() returns 'True' if the string consists only of numeric characters and is not blank. 4) isspace() returns 'True' if the string consists only of spaces, tabs, and new-lines and is not blank. 5) istitle() returns 'True' if the string consists only of words that begin with an uppercase letter followed by only lowercase letters.

In [1]:
s1=['hello', 'hello123', '123', 'Hello123', '', 'Hello']
print(s1[0].isalpha()) # True
print(s1[1].isalpha()) # False
print(s1[0].isalnum()) # True
print(s1[1].isalnum()) # True
print(s1[2].isdecimal()) # True
print(s1[4].isspace()) # False

True
False
True
True
True
False


The startswith() and endswith() methods return 'True' if the string value they are called on begins or ends (respectively) with the string passed to the method; otherwise, they return 'False'.

In [2]:
print('Hello world!'.startswith('Hello'))
print('Hello world!'.endswith('world!'))

True
True


The join() method is useful when you have a list of strings that need to be joined together into a single string value. The join() method is called on a string, gets passed a list of strings, and returns a string. The returned string is the concatenation of each string in the passed-in list. Notice that what the string join() calls on is inserted between each string of the list argument. For example, when join(['cats', 'rats', 'bats']) is called on the ', ' string, the returned string is ‘cats, rats, bats’.

In [3]:
s2=', '
animals=['cats', 'rats', 'bats']
print(s2.join(animals))
s3=' '
sentences=['My', 'name', 'is', 'Simon']
print(s3.join(sentences))

cats, rats, bats
My name is Simon


The split() method does the opposite: It’s called on a string value and returns a list of strings.

In [4]:
'My name is Simon'.split()

['My', 'name', 'is', 'Simon']

By default, the string 'My name is Simon' is split wherever whitespace characters such as the space, tab, or newline characters are found. These whitespace characters are not included in the strings in the returned list. You can pass a delimiter string to the split() method to specify a different string to split upon.

In [5]:
s4='MyABCnameABCisABCSimon'
print(s4.split('ABC'))

['My', 'name', 'is', 'Simon']


Next, the rjust() and ljust() string methods return a padded version of the string they are called on, with spaces inserted to justify the text. The first argument to both methods is an integer length for the justified string. For example, 'Hello'.rjust(10) says that we want to right-justify 'Hello' in a string of total length 10. 'Hello' is five characters, so five spaces will be added to its left, giving us a string of 10 characters with 'Hello' justified right. In addition, an optional second argument to rjust() and ljust() will specify a fill character other
than a space character. The center() string method works like ljust() and rjust() but centers the text rather than justifying it to the left or right. Below are some examples:

In [6]:
print('Hello'.rjust(10))
print('Hello'.ljust(20, '-'))

     Hello
Hello---------------


In [10]:
print('Hello'.center(20))
print('Hello'.center(20, '='))

       Hello        


As an example, here is a more complicated program that uses these string-associated methods. In this program, we define a printPicnic() method that will take in a dictionary of information and use center(), ljust(), and rjust() to display that information in a neatly aligned table-like format.

In [7]:
def printPicnic(itemsDict, leftWidth, rightWidth):
    print('PICNIC ITEMS'.center(leftWidth + rightWidth, '-'))
    for k, v in itemsDict.items():
        print(k.ljust(leftWidth, '.') + str(v).rjust(rightWidth))
picnicItems = {'sandwiches': 4, 'apples': 12, 'cups': 4, 'dumplings': 80}
printPicnic(picnicItems, 12, 5)
printPicnic(picnicItems, 20, 6)

---PICNIC ITEMS--
sandwiches..    4
apples......   12
cups........    4
dumplings...   80
-------PICNIC ITEMS-------
sandwiches..........     4
apples..............    12
cups................     4
dumplings...........    80


Sometimes you may want to strip off whitespace characters (space, tab, and newline) from the left side, right side, or both sides of a string. The strip() string method will return a new string without any whitespace characters at the beginning or end. This is very similar to the strip() function in SAS. The lstrip() and rstrip() methods will remove whitespace characters from the left and right ends, respectively.

In [8]:
text = '    Hello World              '
print(text.strip()) # 'Hello World'
print(text.lstrip()) # 'Hello World '
print(text.rstrip()) # ' Hello World'
garbage = 'SpamSpamBaconSpamEggsSpamSpam' 
print(garbage.strip('ampS')) # a string argument can also specify which characters on the ends should be stripped

Hello World
Hello World              
    Hello World
BaconSpamEggs
