# Python for testers -    
# Strings

# Basics

# Python strings

* Sequence of characters
* Supports full Unicode range (unless compiled otherwise)
* Length only limited by memory
* Memory use grows and shrinks as needed

## Declare a string

In [123]:
first_name = 'Josef'  # single quotes preferred
surname = "Huber"  # double quotes possible

## Using non ASCII characters

In [227]:
surname = 'Müller'
surname = 'M\xfcller' # 0xfc = hex(252) = Unicode of ü; works between 0 and 0xff (8 bit)
surname = 'M\u00fcller'  # works between 0 and 0xffff (16 bit)
surname = 'M\U000000fcller'  # works between 0 and 0xffffffff (32 bit)

## Concatenation and length

In [131]:
name = surname + ', ' + first_name
print(name)

Müller, Josef


In [132]:
name += ', MSc'
print(name)

Müller, Josef, MSc


In [133]:
len(name)  # number of characters (not bytes used for internal storage)

18

##  Part of a string (slice)

In [134]:
first_name[0]  # first character

'J'

In [135]:
first_name[:3]  # first 3 characters

'Jos'

In [136]:
first_name[1:4]

'ose'

In [137]:
first_name[-3:]  # last 3 characters

'sef'

## Comparison

In [138]:
first_name == 'Josef'

True

In [139]:
first_name.startswith('Jos')

True

In [140]:
first_name.endswith('sef')

True

In [141]:
first_name > 'Albert'

True

In [142]:
first_name > 'Äther'  # uses unicode number

False

# Searching

In [196]:
haystack = 'same old, same old'
needle = 'old'

In [197]:
needle in haystack

True

In [200]:
haystack.find(needle)

5

In [207]:
haystack.find(needle, 6)  # find second ocurrence

15

In [204]:
haystack.find('???')  # on no match, return -1

-1

In [202]:
haystack.index(needle)  # in case of success, same as find()

5

In [203]:
haystack.index('???')  # unlike find(), raise error if needle can not be found

ValueError: substring not found

In [209]:
haystack.count(needle)  # number of ocurrences

2

# Transformations

## Change case

In [191]:
'HELLO'.lower()

'hello'

In [192]:
'hello'.upper()

'HELLO'

In [217]:
'Mäher'.upper()  # also works with non ASCII characters

'MÄHER'

## Replace and remove

In [221]:
'hello'.replace('he', 'Ha')  # change 'he' to 'Ha'

'Hallo'

In [220]:
'hello'.replace('l', '')  # remove all 'l's

'heo'

In [213]:
'  hello '.strip()  # strip white space on both sides

'hello'

In [212]:
'xoxohelloxoxo'.lstrip('ox')  # strip only specific characters on the left side

'helloxoxo'

# Special characters

# Escape sequences

Escape sequences starts with a backslash (similar to C and Java) and allow to represent special characters:
* `\\` - single backslash
* `\"` - double quote
* `\'` - single quote
* `\n` - linefeed, newline
* `\r` - carriage return
* `\t` - tabulator

Also, you can use Unicode character names:

In [226]:
print('\N{snowman}')

☃


# Multiple lines

In [224]:
text = """a string
  that spawns
    multiple lines"""
print(text)

a string
  that spawns
    multiple lines


The same can be achieved using escape sequences:

In [223]:
print('a string\n  that spawns\n    multiple lines')

a string
  that spawns
    multiple lines


# Raw strings

Use the `r` prefix to supress escape sequences:

In [228]:
print('\t/')

	/


In [229]:
print(r'\t/')

\t/


This is particular useful for regular expressions - more on that later.

## Unicode strings

* In Python 3, every string can contain unicode characters
* In Python 2, strings had to be prefixed with `u`
* Python 3 still supports this notation in order to make it easier to write source code that works with Python 2 and 3.

In [231]:
'Mäher'

'Mäher'

In [233]:
u'Mäher'

'Mäher'