# Introdution to Jupyter Notebooks and Text Processing in Python
This 'document' is a Jupyter notebook. It allows you to combine explanatory **text** and **code** that executes to produce results you can see on the same page.

## Notebook Basics

### Text cells

The box this text is written in is called a *cell*. It is a *text cell* written in a very simple markup language called 'Markdown'. Here is a useful [Markdown cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet). You can edit and then run cells to produce a result. Running this text cell produces formatted text.

### Code cells

The other main kind of cell is a *code cell*. The cell immediately below this one is a code cell. Running a code cell runs the code in the cell and produces a result.

In [46]:
# This is a comment in a code cell. Comments start with a # symbol. They are ignored and do not do anything.

In [47]:
# This box is a code cell. When this cell is run, the code below will execute and produce a result
3 + 4

7

## Simple String Manipulation in Python
This section introduces some very basic things you can do in Python to create and manipulate *strings*. A string is a simple sequence of characters, like `flabbergast`. This introduction is limited to those things that may be useful to know in order to understand the *Bughunt!* data mining in the following two notebooks.

### Creating and Storing Strings in Variables
Strings are simple to create in Python. You can simply write some characters in quote marks.

In [48]:
'Butterflies are important as pollinators.'

'Butterflies are important as pollinators.'

In order to do something useful with this string, other than print it out, we need to store in a *variable* by using the assignment operator `=` (equals sign). Whatever is on the right-hand side of the `=` is stored into a variable with the name on the left-hand side.

In [49]:
# my_variable is the variable on the left
# 'manuscripts' is the string on the right that is stored in the variable my_variable

my_variable = 'Butterflies are important as pollinators.'

Notice that nothing is printing to the screen. That's because the string is stored in the variable `my_variable`. In order to see what is inside the variable `my_variable` we can simply write `my_variable` in a code cell, run it, and the interpreter will print it out for us.

In [50]:
my_variable

'Butterflies are important as pollinators.'

### Manipulating Bits of Strings

#### Accessing Individual Characters
A strings is just a sequence (or list) of characters. You can access **individual characters** in a string by specifying which ones you want in square brackets. If you want the first character you specify `1`.

In [51]:
my_variable[1]

'u'

Hang on a minute! Why did it give us `u` instead of `B`?

In programming, everything tends to be *zero indexed*, which means that things are counted from 0 rather than 1. Thus, in the example above, `1` gives us the *second* character in the string.

If you want the first character in the string, you need to specify the index `0`! 

In [52]:
my_variable[0]

'B'

#### Accessing a Range of Characters

You can also pick out a **range of characters** from within a string, by giving the *start index* followed by the *end index* with a semi-colon (`:`) in between.

The example below gives us the character at index `0` all the way up to, *but not including*, the character at index `20`.

In [53]:
my_variable[0:20]

'Butterflies are impo'

### Changing Whole Strings with Functions
Python has some built-in *functions* that allow you to change a whole string at once. You can change all characters to lowercase or uppercase:

In [54]:
my_variable.lower()

'butterflies are important as pollinators.'

In [55]:
my_variable.upper()

'BUTTERFLIES ARE IMPORTANT AS POLLINATORS.'

NB: These functions do not change the original string but create a new one. Our original string is still the same as it was before:

In [61]:
my_variable

'Butterflies are important as pollinators.'

### Testing Strings

You can also test a string to see if it is passes some test, e.g. is the string all alphabetic characters only?

In [57]:
my_variable.isalpha()

False

Does the string have the letter `p` in it?

In [60]:
'p' in my_variable

True

### Lists of Strings
Another important thing we can do with strings is creating a list of strings by listing them inside square brackets `[]`:

In [65]:
my_list = ['Butterflies are important as pollinators',
          'Butterflies feed primarily on nectar from flowers',
          'Butterflies are widely used in objects of art']
my_list

['Butterflies are important as pollinators',
 'Butterflies feed primarily on nectar from flowers',
 'Butterflies are widely used in objects of art']

### Manipulating Lists of Strings
Just like with strings, we can access individual items inside a list by index number:

In [70]:
my_list[0]

'Butterflies are important as pollinators'

And we can access a range of items inside a list by *slicing*:

In [69]:
my_list[0:2]

['Butterflies are important as pollinators',
 'Butterflies feed primarily on nectar from flowers']

### Advanced: Creating Lists of Strings with List Comprehensions
We can create new lists in an elegant way by combining some of the things we have covered above. Here is an example where we have taken our original list `my_list` and created a new list `new_list` by going over each string in the list:

In [72]:
new_list = [string for string in my_list]
new_list

['Butterflies are important as pollinators',
 'Butterflies feed primarily on nectar from flowers',
 'Butterflies are widely used in objects of art']

Why do this? If we combine it with a test, we can have a list that only contains strings with the letter `p` in them:

In [74]:
new_list_p = [string for string in my_list if 'p' in string]
new_list_p

['Butterflies are important as pollinators',
 'Butterflies feed primarily on nectar from flowers']

This is a very powerful way to quickly create lists. We can even change all the strings to uppercase at the same time!

In [75]:
new_list_p_upper = [string.upper() for string in my_list if 'p' in string]
new_list_p_upper

['BUTTERFLIES ARE IMPORTANT AS POLLINATORS',
 'BUTTERFLIES FEED PRIMARILY ON NECTAR FROM FLOWERS']