# Strings

In this section we shall take a closer look at the string type and some of the operations associated with them. The following section makes heavy reference to online notes by Dr. Andrew N. Harrington, [Hands-on Python 3 Tutorial](http://anh.cs.luc.edu/python/hands-on/3.1/handsonHtml/index.html) released under the [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license.

## Concatenation `+`
For strings the `+` symbol is used to concatenate two strings together. For example:

In [3]:
print('One string' + ' and another')

One string and another


## Duplication `*`
The duplication `*` operator takes a string and an integer and repeats the string as many times as the integer value:

In [6]:
print('hello '*4)
print(2*'bye ')

hello hello hello hello 
bye bye 


## Indexing `[]`
Strings can be seen as a collection of characters. Each of these character has an integer index associated with it, based on it's position in the string. For example, take the string `'computer'`:

|---------|-|-|-|-|-|-|-|-|
character |c|o|m|p|u|t|e|r|
index     |0|1|2|3|4|5|6|7|

You can access individual characters in the string by index using:
```
string[index]
```
for example:

In [1]:
computer_string = 'computer'

print('Index 3:', computer_string[3])

print('Index 7:', computer_string[7])

Index 3: p
Index 7: r


If you use an index that is too large for the given string, Python will return an error:

In [8]:
print('Index 11', computer_string[11])

IndexError: string index out of range

You can find the number of characters in a string using the `len()` function:

In [9]:
print('There are', len(computer_string), 'characters in the string')

There are 8 characters in the string


Notice how the length of `computer_string` is one greater than its largest index. This is because Python indexes from `0`.

Thus, if we don't know how long a string is before hand (if a variable holding a string is subject to change for instance) and we want to index the last value of the string, we could use `len() - 1` as the index:

In [11]:
print('The last character:', computer_string[len(computer_string) - 1])

The last character: r


This method works, but Python gives us a far cleaner way of doing this: using an index of `-1`. This won't work for most other programming languages. 

In [12]:
print('The last character:', computer_string[-1])

The last character: r


In general, negative indices in Python index the strings (and other objects) backwards:

In [13]:
print('Second last character', computer_string[-2])

print('Third last character', computer_string[-3])

Second last character e
Third last character t


Note that the index `-8` corresponds to the `0` index (`len(computer_string) - 8` is `0`) so anything less than this would be out of bounds.

## Slicing
Slicing allows us to extract segments of the string, as apposed to individual characters. The syntax for string slicing is:
```
string[start_index:stop_index]
```
where the `stop_index` is not included in the slice, rather the slice stops before this index. For example, consider the slice:

In [14]:
print(computer_string[2:5])

mpu


where the last character is `'r'`, but the character with index `5` is `'t'`.

If we want to take a slice from the beginning of a string we could use `0` as the `start_index`:

In [21]:
print(computer_string[0:3])

com


Alternatively if we left the `start_index` blank Python will interprate this as starting from the beginning of the string:

In [22]:
print(computer_string[:3])

com


Similarly if we wanted to take a slice up to and including the last character in the string, we can use: 

In [25]:
print(computer_string[3:len(computer_string)])

puter


or simply leave the `stop_index` blank:

In [24]:
print(computer_string[3:])

puter


Notice the slice above is not the same as if we used `-1` as the `stop_index`:

In [27]:
print(computer_string[3:-1])

pute


even though the same rules apply as with indexing, the slice always stops **before** the `stop_index`.

We can use a third index when slicing as a step size:
```
string[start_index: stop_index: step_size]
```
For example, we can get every second character from a string using a step size of `2`:

In [20]:
print('Starting from 0:', computer_string[0:8:2])
print('Starting from 1:', computer_string[1:8:2])

Starting from 0: cmue
Starting from 1: optr


The step size can be any integer. Note that by default it is set to 1. As another example lets print out every second character from `computer_string` starting from the first:

In [4]:
print(computer_string[::3])

cpe


The step size need not be positive. If a negative step size is used the string will be sliced backwards. For example if we want to print out the whole of `computer_string` backwards:

In [6]:
print(computer_string[::-1])

retupmoc


Note, when slicing with a negative step size you must ensure that `start_index` is greater than `stop_index`, otherwise your slice will be empty.

In [9]:
print('Empty slice:', computer_string[0:6:-1])
print('Not empty slice:', computer_string[6:0:-1])

Empty slice: 
Not empty slice: etupmo


Also notice how, in the second slice above, the `0` index character is not present. Even when slicing with a negative step size the `stop_index` is **not** included in the slice.

## String Formatting

Concatenating strings can sometimes be cumbersome and hard to automate. If you need to include variables and/or values in your string, you may be better off using string formatting. We will use this technique more extensively later on.

There are a few ways to format strings. We will cover one of the ways introduced in Python 3. That is using the `string.format()` method.

This method treats everything contained in curly braces`{}` in the string as a replacement field, everything in and including the braces are replaced with the arguments of format in the output string.

In [1]:
print('Hello {}, how are you?'.format('world'))

Hello world, how are you?


As you can see above, the blank curly braces were replaced with the string argument `'world'`.

Note that the method does not change the string itself but returns a new string.

You can make multiple replacements at a time if you have a string with multiple replacement fields:

In [2]:
print('{}, {}, {}'.format(1, 2, 3))

1, 2, 3


Sometimes you will want more control over how the arguments of format are placed into the string. There is a specific syntax for formatting which you can read in the [documentation](https://docs.python.org/3.4/library/string.html#format-string-syntax). We will cover a few examples.

### Specify Arguments by Position

If you want to specify the order in which the arguments of format are placed into the string, you can put numbers in the replacement fields to reference the positional arguments:

In [2]:
print('{0}, {2}, {1}'.format(1, 2, 3))

1, 3, 2


Note that this also allows you to repeat elements:

In [3]:
print('{0}, {2}, {1}, {2}'.format(1, 2, 3))

1, 3, 2, 3


### Specify Arguments by Name

You can also specify arguments by name, the arguments must then be presented as keyword arguments:

In [35]:
print('You can find the point at position ({x}, {y}).'.format(x = 2, y = 6)) #Arguments with names 'x' and 'y'

You can find the point at position (2, 6).


### Specifying Numerical Types and Precision

To put it simply, when formatting numerical arguments the format specifier (to be placed in the replacement field) is of the structure:
`[argument_reference]:[width][.precision][type]`

Where
- `argument_reference` is the position of or name of the argument. 
- `width` specifies the minimum width that a replacement will take (look to the docs for alignment options)
- For floats `precision` can be seen as the number of decimal places.
- `type` specifies what type you want to display the number as. Multiple types exist for both integers and floats, but the most commonly used types are `d` for decimal integer and `f` for fixed point number (which you can use for floats)

Each of these parts of the format specifier are optional.

As a first example, lets display an integer:

In [27]:
print('{:d}'.format(5))

5


Now, lets see how the width affects the output:

In [28]:
print('{:d}'.format(5)) #minimum width of 0
print('{:1d}'.format(5)) #minimum width of 1
print('{:2d}'.format(5)) #minimum width of 2
print('{:3d}'.format(5)) #minimum width of 3

5
5
 5
  5


As you can see the first 2 outputs are the same. That is because the output is of length 1.

If you want to display a float to 2 decimal places, specify precision:

In [29]:
print('{:.2f}'.format(1.232435455))

1.23


If you want to specify the position of the argument, include a reference to the argument position:

In [32]:
print('{1:.3f}'.format(1.232435455, 5.35362)) #argument position of 1

5.354


### Including Curly Braces in Formatted String

If you want to include curly braces in a string you are formatting you can double them up:

In [5]:
print('Format a {} while keeping {{}}'.format('string'))

Format a string while keeping {}


You can also enclose the replacement field in double braces:

In [8]:
print('{{{}}}'.format('Text inside braces'))

{Text inside braces}
