# <b>Book 3b - Variables: string manipulation
****

## Part 1 - Basics

A string is a sequence of characters treated as a single unit in Python. They are defined with single **'  '** or double **"  "** quote marks, which are equivalent (but must match). For example:

In [1]:
print('This is a string!')
print("This is a string!")

This is a string!
This is a string!


Two different strings can be joined together, or concatenated, using the **+** operator; other operators have different effects. 

Note that the same operator is performing a different operation depending on the class it's acting on. Trying to combine operators on different types will often result in an TypeError. 

In [6]:
print("This is a string " + "which is now joined together with this second string!")
print("Rolling " * 3)

This is a string which is now joined together with this second string!
Rolling Rolling Rolling 


Strings can also be stored in variables, which can be concatenated in the same way.

In [7]:
string1 = "This is a string "
string2 = "which is now joined together with this second string!"

string3 = string1 + string2
print(string3)

This is a string which is now joined together with this second string!


Strings can also contain special characters, like inverted commas or quotation marks which may otherwise end the string. They are included by placing a backslash, '\\' before that character. (To type a backslash, two backslashes are required i.e., '\\\\')

There exist pre-defined functions in Python which act on strings. One example is the **len()** function, which returns the length of a string.
<br/><br/>
**Note** this will count **all characters** in a string, including spaces and special characters. 

In [8]:
# Here we are uising the strings defined in the above cell, you could try changing the variable name from string1
# to string 2 or string3.
len(string1)


17

You can read more about built-in string methods <a href="https://www.w3schools.com/python/python_ref_string.asp">here</a>.

## Part 2 - Slicing and indexing

You can also index the different characters within the string, by specifying the position of the desired character. 

**Note**, In Python, as with most programming languages, counting (or <i>indexing</i>) starts from 0.

In [6]:
# Try changing this number and see what happens. You can also change string1 to string 2 or string3
# Remember, the spaces are part of the string, if you index a space you will print a space, try this and see
# what happens (character 4 in string1 for example).

string1[3]

's'

As well as accessing specific characters within a string, groups of characters can be accessed. This can be done by providing a <i>range</i> within the string, defined by a colon between two string positions. The second number is the charater to stop at—it will not be returned. 
<br/><br/>
Try and change the range to access whole words within the string.

In [9]:
string3[10:16]

'string'

This technique is called **slicing**, i.e., you are slicing up your text. Try the following examples and see if you can figure out what each is indexing within our string.
<br/><br/>
Remember, string3 = "This is a string which is now joined together with this second string!"

In [10]:
print(string3[:])
print(string3[-1])
print(string3[5:-7])

This is a string which is now joined together with this second string!
!
is a string which is now joined together with this second 


<font color = "skyblue"> 
In the box below, see how many ways you can extract the word 'string' from the variable string3 using variable slicing.
</font>

In [11]:
print(string3[:])



This is a string which is now joined together with this second string!


In the cell below, what type of variable do you think the numbers are? 
<br/><br/>
Why don't you add a line to text your thoughts (**type()**)

In [13]:
a = '456'



If you enclose numbers within quotes then they are considered a striong by python and can be used as a string. Out of the two cells below, which do you think will work and which will give an error?

In [30]:
x + 7

NameError: name 'x' is not defined

In [14]:
print('This is a large number: ' + a)

This is a large number: 456


We can tell python that a number is a string by using **str()**, thid is called typecasting.

In [15]:
print(a + str(7))


4567


## Part 3 - Working with strings

There are a few different ways to display strings with a variable. The most common being using a comma or using a % symbol. For example:

In [16]:
name = 'Alan'

print('Good morning',name)

Good morning Alan


In [36]:
name = 'Alan'

print('Good morning %s, what did you have for breakfast today?' %name)

Good morning Alan, what did you have for breakfast today?


**Note** the letter beside the % relates to the type of variable you are placing. **%s** is for a string. Some other examples would be:
<li>%d, %i --> Integer</li>
<li>%s     --> String </li>
<li>%f, %g --> float</li>

Strings also have certain **modifiers**, which changes how they are processed by the compiler. Prepending 'f' before a string allows code to be executed within a string, and 'r' will process the string as a raw character sequence. These can be combined, e.g. 'rf'. Read more <a href="http://docs.python.org/reference/lexical_analysis.html#strings">here</a>. Don't worry too much about these for now.

In [18]:
name = 'Alan'

print(f'Good morning {name}, what did you have for breakfast today?')

Good morning Alan, what did you have for breakfast today?


Another useful string operation is to find a string within a string. For example, you may want to find a specific keyword in a string. Here we want to find out is a patient recieved radiotherapy. The **.find()** returns the index of the start of the string used within the find function. The function stops executing after the first instance of the value is found: only the index of the first instance will be returned.

In [24]:
trt = 'The patient recieved radiotherapy as part of their treatment pathway.'

In [25]:
trt.find('radiotherapy')

21

<font color = "skyblue"> 
Change the word in the function above and see what happens? What happens if you look for a work which isn't in the function? Is the find() function case sensitive?
</font> 

You can also split a string based on a given **deliminter**, i.e., what you want to split a string with. For example, the **deliminter** could be a space, a comma, a tab, or any character, ...  (Try spliting on 'a' or 'e')
<br/><br/>
<i> Note that this finction will return a **list**, we will cover lists the following notebook.</i>

In [50]:
words = trt.split(' ')
print(words)



['The', 'patient', 'recieved', 'radiotherapy', 'as', 'part', 'of', 'their', 'treatment', 'pathway.']


We can also do the opposite and **.join()** two strings, which we will apply to our variable <i>words</i> above to recreate the original string.

In [52]:
joined_string = ' '.join(words)
print(joined_string)

The patient recieved radiotherapy as part of their treatment pathway.


<font color = "skyblue"> 
The ' ' before .join() tells the function what to put between the entries in the list. Try changing it, you can but (almost) anything, or nothing, between the quotes.
</font>

In [58]:
joined_string2 = ' '.join(words)
print(joined_string2)

The patient recieved radiotherapy as part of their treatment pathway.


Another useful function for strings is **.replace()**. As the name suggests, this will allow us to replace one part of a string with another string. Here we will put the section of string we want to replace first, followed by the new string to go in its place. 

In Python, strings are immutable, meaning they canot be changed. However, their label can be reassigned.

For example, if our patient changed from radiotherapy to chemotherapy we could reassign the variable 'trt' as follows: 

In [30]:
trt = 'The patient recieved radiotherapy as part of their treatment pathway.'
trt = trt.replace('radiotherapy', 'chemotherapy')
print(trt)

The patient recieved chemotherapy as part of their treatment pathway.


<font color = "skyblue"> 
There are many more string opperations you will discover, try some of the following lines and see what happens. Also spend a little time looking at online resources to see what else you can do with strings. One resource to start with is the code academy cheat sheet for python strings here: <br/><br/>
https://www.codecademy.com/learn/learn-python-3/modules/learn-python3-strings/cheatsheet

</font>

In [73]:
txt = 'radiotherapy'
print(txt.lower())  # changes all letters to lower case
print(txt.upper())  # changes all letters to upper case
print(txt.isalpha()) # checks if text only contains characters
print(txt.isdigit()) # checks if text only contains numbers
print(txt.isspace()) # checks is text is a space
print(txt.startswith('r')) # checks if a string starts with a given character
print(txt.endswith('r')) # checks if a string ends with a given character

radiotherapy
RADIOTHERAPY
True
False
False
True
False


We can also use **escape characters** within strings. The most common being **\t** for a tab, **\n** for a new line and **\r** for a 'carriage return' (taken from old style type-writers).

<font color = "skyblue"> 
Can you figure out what the carriage return is doing? Change the position of the **\r**

</font>

In [76]:
print('This is a \t tab.')
print('This is a \n new line.')
print('This is a carriage \r return.')

This is a 	 tab.
This is a 
 new line.
 return.a carriage 


## Part 3 - Logical operations on strings

We can also use boolean logic to test if a string is within a string. We can use the keyword '**in**' to look if a string is in a string and the modifier '**not**' to negate the '**in**' keyword. 
<br/><br/>
Read the lines of code below and see if you can predict the output (**True or False**) before you run each in turn.

In [77]:
test_string = 'Hi, how are you today?'

In [78]:
'you' in test_string

True

In [79]:
'me' in test_string

False

In [80]:
'me' not in test_string

True

In [81]:
'you' not in test_string

False

****

That's all for string manipulations! See you in the next notebook where we'll talk about data structures!