## More fun with the Python built-in string type

[Strings](#str)

[Searching strings](#search)

[Regular expressions](#reg)

[Format](#format)

[Dict](#dict)

## <a name='str'>Strings

Let's start with a few more basic str methods. Aside from using tab in Jupyter, you can always use dir() to list the methods of a class.

In [1]:
dir('')

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',
 'zfill']

Capitalize() capitalizes the first letter in the string. It does not understand periods separating sentences.

In [2]:
s = "hello world. how are you?"
s.capitalize()

'Hello world. how are you?'

Let's capitalize sentences in a one liner. A lot going on here:
* Think about what '. '.join() means and why it's valid
* Looks like we called a method from a method. What does that mean? What if we reverse the order?
* What do you think lstrip does?
* What about punctuation other than a period?

In [3]:
cap = '. '.join([sentence.lstrip().capitalize() for sentence in s.split('.')])
print(cap)

Hello world. How are you?


If we need to compare letters but don't care about case, we can lowercase everything:

In [6]:
s.upper().casefold() #strings () are immutable, to change you need to reassign
s= s.upper()
print(s)

hello world. how are you?


Or uppercase if we're feeling loud:

In [7]:
s.upper()

'HELLO WORLD. HOW ARE YOU?'

## <a name="search">Searching strings
Remember how find() only returns the first index? What if we want to find all?

In [8]:
print(s)
s.find('o')

hello world. how are you?


4

In [9]:
def findall(string,char,start=0):
    index = string.find(char,start)
    if index > -1:
        return [index] + findall(string,char,index+1)
    return []

In [10]:
findall(s.casefold(),'o')

[4, 7, 14, 22]

Or maybe avoid find altogether. Remember enumerate?

In [11]:
[i for i,j in enumerate(s) if j=='h'] #doesn't look for substrings

[0, 13]

In [12]:
s.find('are') # s.index() is the same but returns error instead of -1 if not found

17

## <a name="reg">Regular Expressions
With the __re__ package we can use regular expressions. This is just a small example of what it can do. For more:
https://docs.python.org/3/library/re.html

In [13]:
import re

In [14]:
result = re.search('are',s)
print(s)
result.span()

hello world. how are you?


(17, 20)

In [15]:
code = 'US123A4'
re.fullmatch('[A-Z]{2}[0-9]{3}[A-Z][0-9]',code)

<_sre.SRE_Match object; span=(0, 7), match='US123A4'>

In [16]:
re.fullmatch('[A-Z]{8}',code) == None

True

Strings are immutable, otherwise they couldn't be used as keys in a dictionary:

In [17]:
# s[4] = 'a' # throws an error

But we can maniputate the string and reassign the entire variable.

In [18]:
s.replace('o','a')

'hella warld. haw are yau?'

In [19]:
s.translate({ord('o'):'a'})

'hella warld. haw are yau?'

We also have a few functions to test certain attributes of strings and characters. Take some time to play with these.

In [20]:
s.islower() # also try .isupper

True

In [21]:
'ac'.isalpha() # also try .isnumeric

True

In [22]:
'   O     '.strip() # also try .rstrip()

'O'

## <a name="format">Format
Format is very useful. You can for example have a message that changes based on a few variables. It handles numeric types automatically.

In [28]:
s.format??

In [24]:
message = 'Say "{0}" {num} times'

In [25]:
print(message.format(s,num=5))

Say "hello world. how are you?" 5 times


In [29]:
'{:010.5f},{:50d}'.format(3.14159265359,100)

'0003.14159,                                               100'

## <a name='dict'>Dict
A bit more on dicts

In [30]:
mydict = {'Alice': 100, 'Bob' : 2000000, 'Christopher' : 123456789} #key must be immutable- 
d = {(1,2):'fish'}

In [31]:
mydict.get('Alice') # same as mydict['Alice']

100

In [32]:
mydict.get('John',0) # returns a default value if key not found

0

In [33]:
mydict.setdefault('John',200) # sets key to default if not found, otherwise returns true val

200

In [34]:
mydict.values()

dict_values([2000000, 100, 123456789, 200])

In [35]:
mydict.keys()

dict_keys(['Bob', 'Alice', 'Christopher', 'John'])

In [36]:
for i,j in mydict.items(): #key and values, easy to separate
    print(i)
    print(j)

Bob
2000000
Alice
100
Christopher
123456789
John
200


Use \* to unpack a list foo(\*[1,2]) -> foo(1,2)

\*\* for key:value pairs

In [37]:
print('{:10s}{Alice}\n{:10s}{Bob}'.format(*mydict.keys(),**mydict))

Bob       100
Alice     2000000


Exercises
* Reformat a date string into something else, like ‘9/14/17’ to ‘2017-09-14’.
* Reformat it to ‘Sep 14, 2017’ without using the datetime library.
* Write a loop to format/print values in a dictionary