# Python for testers -    
# Strings

# Basics

# Python strings

* Sequence of characters
* Supports full Unicode range (unless compiled otherwise)
* Length only limited by memory
* Memory use grows and shrinks as needed

## Declare a string

In [304]:
first_name = 'Bärbel'  # single quotes preferred
surname = "Huber"  # double quotes possible

## Concatenation

In [305]:
name = surname + ', ' + first_name
print(name)

Huber, Bärbel


In [306]:
name = name + ', MBA'
print(name)

Huber, Bärbel, MBA


In [307]:
name += ' MSc'
print(name)

Huber, Bärbel, MBA MSc


## Length

In [308]:
len(name)  # number of characters (not bytes used for internal storage)

22

##  Part of a string (slice)

In [309]:
first_name[0]  # first character

'B'

In [310]:
first_name[:3]  # first 3 characters

'Bär'

In [311]:
first_name[1:4]

'ärb'

In [312]:
first_name[-3:]  # last 3 characters

'bel'

# Comparison

In [313]:
first_name == 'Josef'

False

In [314]:
first_name.startswith('Jos')

False

In [315]:
first_name.endswith('sef')

False

In [316]:
first_name > 'Albert'

True

In [317]:
first_name > 'Äther'  # uses unicode number

False

# Searching

In [318]:
haystack = 'same old, same old'
needle = 'old'

In [319]:
needle in haystack

True

In [320]:
haystack.count(needle)  # number of ocurrences

2

## Searching: `index()`

In [321]:
haystack.index(needle)  # index of first ocurrence

5

In [322]:
haystack.index(needle, 6)  # index of second ocurrence

15

In [323]:
haystack.index('???')  # no match results in an error

ValueError: substring not found

## Searching: `find()`

In [324]:
haystack.find(needle)  # in case of success, same as index()

5

In [325]:
haystack.find('???')  # on no match, return -1

-1

## `index()` vs `find()`

* Use `index()` if `haystack` and `needle` come from "inside your program" and you are certain that `needle` is in `haystack`. If your assumptions are wrong, you will soon find out and can fix your code.
* Use `find()` if `haystack` and `needle` are out of your control (read from file; entered by user). However, then you have to manually deal with a result of -1 to avoid spurious exceptions later.
* In case if doubt, prefer `index()` - "crash early"


# Transformations

## Change case

In [326]:
'HELLO'.lower()

'hello'

In [327]:
'hello'.upper()

'HELLO'

In [328]:
'Mäher'.upper()  # also works with non ASCII characters

'MÄHER'

## Replace and remove

In [329]:
'hello'.replace('he', 'Ha')  # change 'he' to 'Ha'

'Hallo'

In [330]:
'hello'.replace('l', '')  # remove all 'l's

'heo'

In [331]:
'  hello '.strip()  # strip white space on both sides

'hello'

In [332]:
'xoxohelloxoxo'.lstrip('ox')  # strip specific characters on the left side

'helloxoxo'

In [333]:
'xoxohelloxoxo'.strip('ox')  # strip specific characters on both sides

'hell'

# Special characters

## Escape sequences

Escape sequences starts with a backslash (similar to C and Java) and allow to represent special characters:
* `\\` - single backslash
* `\"` - double quote
* `\'` - single quote
* `\n` - linefeed, newline
* `\r` - carriage return
* `\t` - tabulator

## Unicode  characters

In [334]:
surname = 'Müller'
surname = 'M\xfcller' # 0xfc = hex(252) = Unicode of ü; works between 0 and 0xff (8 bit)
surname = 'M\u00fcller'  # works between 0 and 0xffff (16 bit)
surname = 'M\U000000fcller'  # works between 0 and 0xffffffff (32 bit)

Also, you can use Unicode character names:

In [335]:
print('\N{snowman}')

☃


For a better view, visit http://unicodesnowmanforyou.com/.

## Multiple lines

In [336]:
text = """a string
  that spawns
    multiple lines"""
print(text)

a string
  that spawns
    multiple lines


The same can be achieved using escape sequences:

In [337]:
print('a string\n  that spawns\n    multiple lines')

a string
  that spawns
    multiple lines


## Raw strings

Use the `r` prefix to supress escape sequences:

In [338]:
print('\t/')

	/


In [339]:
print(r'\t/')

\t/


This is particular useful for regular expressions - more on that later.

## Legacy Unicode strings

* In Python 3, every string can contain unicode characters
* In Python 2, strings had to be prefixed with `u`
* Python 3.2 or later still supports this notation in order to make it easier to write source code that works with Python 2 and 3.

In [340]:
'Mäher'

'Mäher'

In [342]:
u'Mäher'

'Mäher'

# Summary

* Python strings are powerful easy to use
* Slicing gives access to parts of a string
* Many functions to transform strings
* Escape sequence can describe special characters (even snowmen!)
* Prefices like `r` and `u` simplify certain applications