# Strings

### Strings are the basic data type for storing text.

We can create a string with quotes: single `'`, double `"`, triple `'''` - all quotes work in Python!

In [1]:
str0 = '''Prof.'''
str1 = 'Daniel'
str2 = "Trugman"
print(str0,str1,str2) # the print statement puts a space between the strings

Prof. Daniel Trugman


### Use `\` for special characters
* `\n` - newline
* `\t` - tab

In [2]:
print('This is how \n you print on 2 lines')

This is how 
 you print on 2 lines


In [3]:
print('This is how \nyou print on 2 lines')

This is how 
you print on 2 lines


In [4]:
print('A tab\tbetween words')

A tab	between words


In [5]:
print("What if we want an actual backslash? \\")

What if we want an actual backslash? \


### Combine strings with `+`

In [6]:
str3 = str0 + str1 + str2
print(str3) # no spaces

Prof.DanielTrugman


In [7]:
str3 = str0 + ' ' + str1 + ' ' + str2
print(str3) # add spaces

Prof. Daniel Trugman


### What happens when you multiply strings by a number?

In [8]:
str4 = 3 * str1
print(str4)

DanielDanielDaniel


In [9]:
str5 = 5 * '5'
print(str5)

55555


### Access string elements just like indexing with a list
Think of strings as a list of characters: `"abcd" = 'a' + 'b' + 'c' + 'd'`

In [10]:
print(str1)
print(str1[0], str1[-1]) # first and last character

Daniel
D l


### You can slice strings by index just like normal lists

In [11]:
print(str2)
print(str2[0:5]) # first 5 characters only

Trugman
Trugm


In [12]:
print(str2[0::2]) # every other character

Tumn


In [13]:
print(str2[5:]) # start at index 5 and go to the end

an


### Strings are immutable!!!
This means we cannot change characters in a string once created. (This is distinct from the way lists work).

In [14]:
str1 = 'eric' # define a string
str1[3] = 'k' # change the spelling

TypeError: 'str' object does not support item assignment

In [15]:
str1 = 'eric' # define a string
str1[4] = 'k' # can't append either

TypeError: 'str' object does not support item assignment

#### A workaround: redefine a string to update it

In [16]:
str1 = 'eric' # define a string
str1 = str1[0:-1] + 'k' # a new string with character replaced
print(str1)

erik


In [17]:
str1 = 'eric' # define a string
str1 += 'k' # append to end by redefinition
print(str1)

erick


### `len()` returns length of string (number of characters)

In [18]:
print(len(str1), len(str2)) # analogous to list

5 7


### Use `split()` for splitting a string into pieces using a break character (e.g. space, comma).

This is especially useful when breaking a long block of text into words, or parsing a text file with a known format.

In [19]:
# Make a sentence
sentence = 'This class     teaches fundamental and scientific Python'
print(sentence)

# Split with no arguments will split on white space by default
words = sentence.split()
print(words)

This class     teaches fundamental and scientific Python
['This', 'class', 'teaches', 'fundamental', 'and', 'scientific', 'Python']


In [20]:
# What happens if there is leading or trailing whitespace?
sentence = '  This class teaches fundamental Python and scientific Python  '
words = sentence.split()
print(words)

['This', 'class', 'teaches', 'fundamental', 'Python', 'and', 'scientific', 'Python']


In [21]:
# We can split with other delimiters like a comma
languages='English, French, German, Japanese, Spanish, Chinese'
words2 = languages.split(',') # comma delimiter
print(words2) # Note that some of these words have spaces with them

['English', ' French', ' German', ' Japanese', ' Spanish', ' Chinese']


In [22]:
# Compare to default options
words3 = languages.split()
print(words3) # These words don't have spaces, but have commas

['English,', 'French,', 'German,', 'Japanese,', 'Spanish,', 'Chinese']


### Remove or change characters with .replace()
* `mystr.replace(',', '')` will remove a comma

In [23]:
# notice the usage of \ to continue writing on a new line
mystring = 'that this nation, under God, shall have a new birth of freedom -\
and that government of the people, by the people, for the people shall not \
perish from the earth.'

In [24]:
# Note that because strings are immutable, we can't do this "in-place"
mystring = mystring.replace(',', '') 
mystring = mystring.replace('.', '')
mystring = mystring.replace('-', '')
print(mystring)

that this nation under God shall have a new birth of freedom and that government of the people by the people for the people shall not perish from the earth


In [25]:
mystring = 'that this nation, under God, shall have a new birth of freedom -\
and that government of the people, by the people, for the people shall not \
perish from the earth.'

In [26]:
# this doesnt do anything to mystring - why?
mystring.replace(',', '')
print(mystring)

that this nation, under God, shall have a new birth of freedom -and that government of the people, by the people, for the people shall not perish from the earth.


#### You can enchain multiple operations like `.replace().replace().split()`

The operations are then performed right to left.

In [27]:
mystring = 'that this nation, under God, shall have a new birth of freedom - \
and that government of the people, by the people, for the people shall not \
perish from the earth.'

# this will remove all punctuation and split into a list of words at the end
wordlist = mystring.replace(',', '').replace('.', '').replace('-', '').split()
print(wordlist)

['that', 'this', 'nation', 'under', 'God', 'shall', 'have', 'a', 'new', 'birth', 'of', 'freedom', 'and', 'that', 'government', 'of', 'the', 'people', 'by', 'the', 'people', 'for', 'the', 'people', 'shall', 'not', 'perish', 'from', 'the', 'earth']


### The opposite of `split()` is `join()`

In [28]:
str3 = " ".join(["Hello", "world"]) # puts a space between Hello and world
print(str3)

Hello world


In [29]:
str3 = "___".join(["Hello", "world"]) # puts ___ between Hello and world
print(str3)

Hello___world


In [30]:
animals = ['giraffe', 'monkey', 'lion', 'elephant']
str_animals = ', '.join(animals) # convert the list to a phrase
print(str_animals)

giraffe, monkey, lion, elephant


In [31]:
mylist = [3.0, 4, 5.0, 6.3]
test = ', '.join(mylist) # whoops
print(test)

TypeError: sequence item 0: expected str instance, float found

### Strings have many built-in functions for things like capitalizing, lowercase, uppercase,  etc.
https://docs.python.org/3/library/stdtypes.html#string-methods

In [32]:
names = 'John, Rick, Mary, Sebastian, Jaime'
names = names.lower() # all lowercase
print(names)

john, rick, mary, sebastian, jaime


In [33]:
names = names.upper() # all uppercase
print(names)

JOHN, RICK, MARY, SEBASTIAN, JAIME


In [34]:
names = names.capitalize() # first letter caps, only
print(names)

John, rick, mary, sebastian, jaime


In [35]:
### A large block of text to process... 
mystring = 'Fourscore and seven years ago our fathers brought forth on this continent, a new nation, \
conceived in Liberty, and dedicated to the proposition that all men are created equal. Now we are \
engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, \
can long endure. We are met on a great battle-field of that war. We have come to dedicate a portion of \
that field, as a final resting place for those who here gave their lives that that nation might live. \
It is altogether fitting and proper that we should do this. But, in a larger sense, we can not dedicate - \
we can not consecrate - we can not hallow - this ground. The brave men, living and dead, who struggled here, \
have consecrated it, far above our poor power to add or detract. The world will little note, nor long \
remember what we say here, but it can never forget what they did here. It is for us the living, rather, \
to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced. \
It is rather for us to be here dedicated to the great task remaining before us - that from these honored \
dead we take increased devotion to that cause for which they gave the last full measure of devotion - that \
we here highly resolve that these dead shall not have died in vain - that this nation, under God, shall \
have a new birth of freedom - and that government of the people, by the people, for the people shall not \
perish from the earth.'

# We did this last time, now hopefully it makes sense!
# remove punctuation, convert to lowercase, split into words
wordlist = mystring.replace(',', '').replace('.','').replace('-', '').lower().split()
print(wordlist)

['fourscore', 'and', 'seven', 'years', 'ago', 'our', 'fathers', 'brought', 'forth', 'on', 'this', 'continent', 'a', 'new', 'nation', 'conceived', 'in', 'liberty', 'and', 'dedicated', 'to', 'the', 'proposition', 'that', 'all', 'men', 'are', 'created', 'equal', 'now', 'we', 'are', 'engaged', 'in', 'a', 'great', 'civil', 'war', 'testing', 'whether', 'that', 'nation', 'or', 'any', 'nation', 'so', 'conceived', 'and', 'so', 'dedicated', 'can', 'long', 'endure', 'we', 'are', 'met', 'on', 'a', 'great', 'battlefield', 'of', 'that', 'war', 'we', 'have', 'come', 'to', 'dedicate', 'a', 'portion', 'of', 'that', 'field', 'as', 'a', 'final', 'resting', 'place', 'for', 'those', 'who', 'here', 'gave', 'their', 'lives', 'that', 'that', 'nation', 'might', 'live', 'it', 'is', 'altogether', 'fitting', 'and', 'proper', 'that', 'we', 'should', 'do', 'this', 'but', 'in', 'a', 'larger', 'sense', 'we', 'can', 'not', 'dedicate', 'we', 'can', 'not', 'consecrate', 'we', 'can', 'not', 'hallow', 'this', 'ground', 't

### Convert numbers to strings using `str()`

In [36]:
k = 123 # an int
print(k,"has type:", type(k))
kstr = str(k) # now a string
print(kstr,"has type:", type(kstr))

123 has type: <class 'int'>
123 has type: <class 'str'>


### Locate substrings with `find()`
This locates the index in a larger string where a *substring* starts. (Substrings are just a small pieces of a larger string, one or more characters). `find()` only finds the first occurrence!

In [37]:
print(str_animals) # we defined this above
print("0123456789012345678901234567890") # indices for reference

giraffe, monkey, lion, elephant
0123456789012345678901234567890


In [38]:
print(str_animals.find('monkey')) # starts at index 9

9


In [39]:
print(str_animals.find('giraffe')) # starts at index 0

0


In [40]:
print(str_animals.find('longhorn')) # not in our list

-1


In [41]:
print(str_animals) # defined above
print(str_animals.find('a')) # searches from left to right
print(str_animals.rfind('a')) # search from right to left

giraffe, monkey, lion, elephant
3
28


### Count substrings with `count()`

In [42]:
print(str_animals.count('a')) # appears twice

2


In [43]:
print(str_animals.count('g')) # appears once

1


In [44]:
print(str_animals.count('z')) # appears not at all

0


### The `string` module has lots of interesting string constants

In [45]:
import string # need to import to use

In [46]:
# lowercase letters
print(string.ascii_lowercase)

abcdefghijklmnopqrstuvwxyz


In [47]:
# uppercase letters
print(string.ascii_uppercase)

ABCDEFGHIJKLMNOPQRSTUVWXYZ


In [48]:
# digits 0-9
print(string.digits)

0123456789


In [49]:
# punctuation characters
print(string.punctuation)

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~


### Example: remove punctuation

In [50]:
# replace all punctuation as follows
mytext = 'Smash! What happened?'
print(mytext)
for val in string.punctuation:
    mytext = mytext.replace(val, '')
print(mytext)

Smash! What happened?
Smash What happened


### String formatting: An Introduction
Using `format()` can be very useful, but takes some getting used to. We'll return to this later, but for those that are interested in diving in now, check this out: https://docs.python.org/3/library/string.html#format-string-syntax

Python supports two different types of formatting, one using the character `%` and one using the characters `{:}`. The first one is similar to formatting in other languages and is somewhat simpler. The second option is more flexible and recommended once you get the hang of it.

In the example below, notice how we use different letters after `%` to print in different ways.

In [51]:
# "old-school formating" with % is a bit less flexible than the new school approach
a = 3.0
b = 30
c = 300000
d = "three hundred"
print("a=%.1f b=%d c=%.1e d='%s'" %(a, b, c, d))

a=3.0 b=30 c=3.0e+05 d='three hundred'


The block of code below does the same thing but with the new-school formatting strings `{:}`.

In [52]:
# new school formatting, with the .format()
a = 3.0
b = 30
c = 300000
d = "three hundred"
print("a={:.1f} b={:d} c={:.1e} d='{:s}'".format(a, b, c, d))

a=3.0 b=30 c=3.0e+05 d='three hundred'


We can get pretty fancy with padding strings in different ways...

In [53]:
# :> pads on left
a = 3.0
b = 30
c = 300000
d = "three hundred"
print("a={:>8.4f} b={:d} c={:.4e} d='{:>15s}'".format(a, b, c, d)) 

a=  3.0000 b=30 c=3.0000e+05 d='  three hundred'


In [54]:
# :< pads on right
a = 3.0
b = 30
c = 300000
d = "three hundred"
print("a={:<8.4f} b={:d} c={:.4e} d='{:<15s}'".format(a, b, c, d)) 

a=3.0000   b=30 c=3.0000e+05 d='three hundred  '


There is a lot more that can be done, see https://docs.python.org/3/library/string.html#format-string-syntax to get started.

# Summary
* Strings store textual data
* Strings are lists of characters and be indexed or sliced
* Strings are immutable: they cannot be modified "in-place" but can be updated as a new variable
* There are many built-in and handy functions: `split()`, `replace()`, `join()`, etc.
* String formatting can be used to control printing of numeric data
* Knowing these tricks will be useful when parsing trick text files of data