# Strings in Python

## What is a string?

A "string" is a series of characters of arbitrary length.
Strings are immutable - they cannot be changed once created. When you modify a string, you automatically make a copy and modify the copy.

In [None]:
s1 = 'Godzilla'
print s1, s1.upper(), s1

## String literals

A "literal" is essentially a string constant, already spelled out for you. Python uses either on output, but that's just for formatting simplicity.

In [None]:
"Godzilla"

### Single and double quotes

Generally, a string literal can be in single ('), double ("), or triple (''') quotes. Single and double quotes are equivalent - use whichever you prefer (but be consistent). If you need to have a single or double quote in your literal, surround your literal with the other type, or use the backslash to escape the quote.

In [None]:
"Godzilla's a kaiju."

In [None]:
'Godzilla\'s a kaiju.'

In [None]:
'We call him... "Godzilla".'

### Triple quotes (''')

Triple quotes are a special form of quoting used for documenting your Python files (docstrings). We won't discuss that type here.

### Raw strings

Raw strings don't use any escape character interpretation. Use them when you have a complicated string that you don't want to clutter with lots of backslashes. Python puts them in for you.

In [None]:
print('This is a\ncomplicated string with newline escapes in it.')

In [None]:
print(r'This is a\ncomplicated string with newline escapes in it.')

## Strings and numbers

In [None]:
x=int('122', 3)
x+1

### String objects

String objects are just the string variables you create in Python.

In [None]:
kaiju = 'Godzilla'
print(kaiju)

In [None]:
kaiju

Note the print() call shows no quotes, while the simple variable name did. That is a Python output convention. Just entering the name will call the repr() method, which displays the value of the argument as Python would see it when it reads it in, not as the user wants it.

In [None]:
repr(kaiju)

In [None]:
print(repr(kaiju))

### String operators

When you read text from a file, it's just that - text. No matter what the data represents, it's still text. To use it as a number, you have to explicitly convert it to a number.

In [None]:
one = 1
two = '2'
print one, two, one + two

In [None]:
one = 1
two = int('2')
print one, two, one + two

In [None]:
num1 = 1.1
num2 = float('2.2')
print num1, num2, num1 + num2

You can also do this with hexadecimal and octal numbers, or any other base, for that matter.

In [None]:
print int('FF', 16)
print int('0xff', 16)
print int('777', 8)
print int('0777', 8)
print int('222', 7)
print int('110111001', 2)

If the conversion cannot be done, an exception is thrown.

In [None]:
print int('0xGG', 16)

#### Concatenation

In [None]:
kaiju1 = 'Godzilla'
kaiju2 = 'Mothra'
kaiju1 + ' versus ' + kaiju2

#### Repetition

In [None]:
'Run away! ' * 3

### String keywords

#### in()

NOTE: This _particular_ statement is false regardless of how the statement is evaluated! :^)

In [None]:
'Godzilla' in 'Godzilla vs Gamera'

### String functions

#### len()

In [None]:
len(kaiju)

### String methods

Remember - methods are functions attached to objects, accessed via the 'dot' notation.

#### Basic formatting and manipulation

##### capitalize()/lower()/upper()/swapcase()/title()

In [None]:
kaiju.capitalize()

In [None]:
kaiju.lower()

In [None]:
kaiju.upper()

In [None]:
kaiju.swapcase()

In [None]:
'godzilla, king of the monsters'.title()

##### center()/ljust()/rjust()

In [None]:
kaiju.center(20, '*')

In [None]:
kaiju.ljust(20, '*')

In [None]:
kaiju.rjust(20, '*')

##### expandtabs()

In [None]:
tabbed_kaiju = '\tGodzilla'
print('[' + tabbed_kaiju + ']')

In [None]:
print('[' + tabbed_kaiju.expandtabs(16) + ']')

##### join()

In [None]:
' vs '.join(['Godzilla', 'Hedorah'])

In [None]:
','.join(['Godzilla', 'Mothra', 'King Ghidorah'])

##### strip()/lstrip()/rstrip()

In [None]:
'   Godzilla   '.strip()

In [None]:
'xxxGodzillayyy'.strip('xy')

In [None]:
'    Godzilla   '.lstrip()

In [None]:
'    Godzilla   '.rstrip()

##### partition()/rpartition()

In [None]:
battle = 'Godzilla x Gigan'
battle.partition(' x ')

In [None]:
battle = 'Godzilla and Jet Jaguar vs. Gigan and Megalon'
battle.partition(' vs. ')

In [None]:
battle = 'Godzilla vs Megalon vs Jet Jaguar'
battle.partition('vs')

In [None]:
battle = 'Godzilla vs Megalon vs Jet Jaguar'
battle.rpartition('vs')

##### replace()

In [None]:
battle = 'Godzilla vs Mothra'
battle.replace('Mothra', 'Anguiras')

In [None]:
battle = 'Godzilla vs a monster and another monster'
battle.replace('monster', 'kaiju', 2)

In [None]:
battle = 'Godzilla vs a monster and another monster and yet another monster'
battle.replace('monster', 'kaiju', 2)

##### split()/rsplit()

In [None]:
battle = 'Godzilla vs King Ghidorah vs Mothra'
battle.split(' vs ')

In [None]:
kaijus = 'Godzilla,Mothra,King Ghidorah'
kaijus.split(',')

In [None]:
kaijus = 'Godzilla Mothra King Ghidorah'
kaijus.split()

In [None]:
kaijus = 'Godzilla,Mothra,King Ghidorah,Megalon'
kaijus.rsplit(',', 2)

##### splitlines()

In [None]:
kaijus_in_lines = 'Godzilla\nMothra\nKing Ghidorah\nEbirah'
print(kaijus_in_lines)

In [None]:
kaijus_in_lines.splitlines()

In [None]:
kaijus_in_lines.splitlines(True)

##### zfill()

In [None]:
age_of_Godzilla = 60
age_string = str(age_of_Godzilla)
print(age_string, age_string.zfill(5))

#### String information

##### isXXX()

In [None]:
print('Godzilla'.isalnum())
print('*Godzilla*'.isalnum())
print('Godzilla123'.isalnum())

In [None]:
print('Godzilla'.isalpha())
print('Godzilla123'.isalpha())

In [None]:
print('Godzilla'.isdigit())
print('60'.isdigit())

In [None]:
print('SpaceGodzilla'.isspace())
print('   '.isspace())

In [None]:
print('Godzilla'.islower())
print('godzilla'.islower())

In [None]:
print('Godzilla'.isupper())
print('GODZILLA'.isupper())

In [None]:
print('Godzilla vs Mothra'.istitle())
print('Godzilla X Mothra'.istitle())

##### count()

In [None]:
monsters = 'Godzilla and Space Godzilla and MechaGodzilla'
print 'There are ', monsters.count('Godzilla'), ' Godzillas.'
print 'There are ', monsters.count('Godzilla', len('Godzilla')), ' pseudo-Godzillas.'

##### startswith()/endswith()

In [None]:
king_kaiju = 'Godzilla'
print king_kaiju.startswith('God')
print king_kaiju.endswith('lla')
print king_kaiju.startswith('G')
print king_kaiju.endswith('amera')

##### find()/index()/rfind()/rindex()

In [None]:
kaiju_string = 'Godzilla,Gamera,Gorgo,Space Godzilla'
print 'The first Godz is at position', kaiju_string.find('Godz')
print 'The second Godz is at position', kaiju_string.find('Godz', len('Godz'))

In [None]:
kaiju_string.index('Minilla')

In [None]:
kaiju_string.rindex('Godzilla')

#### Advanced features

##### decode()/encode()/translate()

Used to convert strings to/from Unicode and other systems. Rarely used in science code.

##### String formatting

Similar to formatting in C, FORTRAN, etc.. There is a _lot_ more to this than I am showing here.

In [None]:
kaiju = 'Godzilla'
age = 60
print '%s is %d years old.' % (kaiju, age)

## The _string_ module

The _string_ module is the Python equivalent of "junk DNA" in living organisms. It's been around since the beginning, but many of its functions have been superseded by evolution. But some ancient code still relies on it, so they leave the old parts in....

For modern code, the _string_ module does have some useful constants and functions.

In [None]:
import string

In [None]:
print string.ascii_letters
print string.ascii_lowercase
print string.ascii_uppercase

In [None]:
print string.digits
print string.hexdigits
print string.octdigits

In [None]:
print string.letters
print string.lowercase
print string.uppercase

In [None]:
print string.printable
print string.punctuation
print string.whitespace

The _string_ module also provides the _Formatter_ class, which can be useful for sophisticated text formatting.

## Regular Expressions

### What is a regular expression?

Regular expressions ('regexps') are essentially a mini-language for describing string operations. Everything shown above with string methods and operators can be done with regular expressions. Most of the time, the regular expression verrsion is more concise. But not always more readable....

To use regular expressions, you have to import the 're' module.

In [None]:
import re

### A very short, whirlwind tour of regular expressions

#### Scanning

In [None]:
kaiju_truth = 'Godzilla is the King of the Monsters. Ebirah is also a monster, but looks like a giant lobster.'
re.findall('Godz', kaiju_truth)

In [None]:
print re.findall('(^.+) is the King', kaiju_truth)

For simple searches like this, using in() is typically easier.
Regexps are by default case-sensitive.

In [None]:
print re.findall('\. (.+) is also', kaiju_truth)

In [None]:
print re.findall('(.+) is also a (.+)', kaiju_truth)[0]
print re.findall('\. (.+) is also a (.+),', kaiju_truth)[0]

#### Changing

In [None]:
some_kaiju = 'Godzilla, Space Godzilla, Mechagodzilla'
print re.sub('Godzilla', 'Gamera', some_kaiju)
print re.sub('(?i)Godzilla', 'Gamera', some_kaiju)

#### And so much more...

You could spend a whole day (or more) just learning about regular expressions. But they are incredibly useful and powerful, especially in the all-to-frequent drudgery of munging files from one format to another.

Regular expressions can be internally compiled for speed.