# A Python (Jupyter) notebook

This is a Python notebook. It is a way of explaining what your code does (to yourself as well as others!), while also including working code that can be run again and again (by yourself as well as others). These used to be called Jupyter notebooks but increasingly are more generically referred to as just 'python notebooks'.

The text parts (like this) are written in **Markdown**. These are then broken up with **code snippets**.

Let's start with the obligatory code to print the text string 'hello world'...

In [1]:
print("Hello world")

Hello world


Happy now? OK, let's move on to some basic programming concepts. These are:

* Variables
* Loops
* If/else tests
* Functions
* Libraries

## Variables: storing and manipulating information

A variable is a way of storing and manipulating information in programming. A typical example would be the user's name: in a piece of programming you might want to ask the user for their name, and then store it somehow to use later (in a welcome message, or a result at the end, for example). 

Another example would be the user's score in a quiz or game: this is a variable which you not only want to store, but also *change*: every right answer, for example, might add 10 points to their score variable.

If you've ever written a spreadsheet formula that uses a cell reference like A2 or C10 then you've already used a form of variable: a formula like `A2*10` can different results depending on the value in A2 - and is more flexible: instead of having to type a new formula each time you want to multiply a number by 10, you only have to change one thing: the value in A2.

In Python you create a variable by specifying a name for the variable, followed by the `=` operator, and the value that you want to store in that variable, like this:

In [20]:
mynewvariable = "Paul"
myage = 18

The name of the variable can be almost anything - the only exceptions are a few special words which already exist as commands, like `print`, but if you try to use any of these you will get a warning anyway. It cannot have spaces (use underscores instead).

Once created variables can then be used - and changed - by using the name, like so:

In [4]:
print(mynewvariable)

Paul


You can change a variable by assigning a new value in the same way:

In [5]:
mynewvariable = "Sarah"
print(mynewvariable)

Sarah


And you can even use a variable to update itself:

In [6]:
print(myage)
myage = myage+1
print(myage)

18
19


Note that the code to the right of the `=` operator is executed *first*, so in the part `myage+1`, the variable `myage` is 18. 18+1 is 19, and that value is then **assigned** to the `myage` variable, overwriting its previous value of 18.

### Types of variable

Note that the two values we used above - `"Paul"` and `18` - have an important difference: one uses quotation marks. This indicates that the value is a **string**, i.e. a series of characters. If you try to use a word without quotation marks, the code will assume that it has a special meaning, like a variable, function or library (more on those below).

See what happens, for example, if we try to assign the value `Paul` *without* quotation marks to a variable:

In [7]:
mynewvariable = Paul

NameError: name 'Paul' is not defined

The error tells us that `'Paul' is not defined`. In other words, we have not *defined* a function or variable called Paul. That's because it's not a function or variable - it's a string, but we forgot to indicate that by putting it inside quotation marks.

Numbers do not need quotation marks because they are, well, numerical. And we want to treat them that way. Sometimes we *might* want to treat numbers as strings - for example if we are putting them into a URL, or adding them to a postcode or similar code. In those cases we would put them in quotation marks. But if want to perform calculations like adding, multiplying, and so on, we don't use quotation marks.

It's worth mentioning here that there is more than one type of number: **integers** are whole numbers, and **floats** are numbers with decimal points. 

As well as strings, integers and floats, there are two other types of variable in Python worth mentioning here: **lists** and **dictionaries**.

A list is useful for storing multiple objects that you might want to pull items from, or measure. For example if you wanted to add together a series of numbers, you'd probably want to store them in a list in order to do that. Likewise, if you wanted to check whether a document contained any directors from a particular company, you'd probably want to store the names of those directors in a list too.

A list is created by using square brackets, with each item separated by a comma, like so:

In [11]:
mynewlist = [10,20,40]
myotherlist = ["Paul", "Sarah", "Diane"]

Lists are often used with **loops**, which we'll come on to.

The other type of variable - a **dictionary** - is essentially a different type of list; a list of *key-value pairs*. It's easiest to demonstrate with an example:

In [None]:
mynewdict = {"name" : "Paul", "age" : 18, "hometown" : "Birmingham"}

The *key* is like a column heading in a spreadsheet: name, age, hometown. The *value* is like the cells underneath that column heading: "Paul", 18, "Birmingham". A **colon** is used to create the *key-value pair*, and commas are used to separate each pair into a list.

Dictionaries are useful in storing data. In a scraper, for example, you might grab one piece of information from a webpage and store it in a dictionary variable against a relevant key, then grab a different piece of information and store that against another key, and so on. If you get into scraping you'll get more experience with this.

## Loops

Loops are a way to perform repetitive actions in code. They are extremely useful for situations involving lists or ranges of numbers: for example you might loop through a list of items and check each one, or save it in a datastore; or loop through a range of numbers and add them to a URL to create a page number.

Here is an example:

In [12]:
for i in mynewlist:
    print(i)

10
20
40


A loop has a number of parts which are worth breaking down:

* `for` and `in`
* the variable here called `i`
* the colon
* the indented code after the colon

The word `for` indicates that we want to begin looping. What we want to loop *through* is indicated by `in`. So in this case we want to loop through items in the list variable `mynewlist`. I'll come back to `i` in a minute.

At the end of this line comes a colon, and that begins an indented section which contains the code we want to be executed for *each* item in that list. In this case the command `print(i)` will run 3 times - once for each of the 3 items in the list.

Now what about that `i`?

When we loop through a list we need some way to store each item while we're working with it. The word between `for` and `in` is a way of assigning that item to a variable - so `i` is a name for the item the loop is currently working with. The first time the loop runs, `i` is `10` (the first item in the list); the second time, `i` is `20`, and so on. 

Within the indented code, then, we can use that `i` variable and do things with it, each time the loop runs. In other words, we can do things with *each item* in a list.

The choice of `i` is entirely arbitrary: we can use any name we want. It's quite common for people to use `i` in loops because it's short and also represents an 'item' in a 'list', but we can make other choices which are more meaningful. For example, if we wanted to loop through a list of names we might write code like this:

In [13]:
usernames = ["Paul", "Sarah", "Diane"]
for username in usernames:
    print("The user is "+username)

The user is Paul
The user is Sarah
The user is Diane


## If/else tests

An if/else test in programming allows us to test if something is true or false, and then do different things depending on the answer. Here's an example:

In [14]:
userage = 30
if userage > 25:
    print("The user is over 25")
else:
    print("The user is 25 or younger")

The user is over 25


In the example above we store the number 30 in the variable `userage`. The first `if` test asks if that variable is above 25. Note that it ends in a colon and the indented code after that colon will only run if that test returns `True`.

If it does not return `True`, then it ignores the indented code and moves to the next line that begins `else`. Again, this ends in a colon and some indented code, which will now run.

In [16]:
userage = 21
if userage > 25:
    print("The user is over 25")
else:
    print("The user is 25 or younger")

The user is 25 or younger


These tests can be very useful in all sorts of contexts. For example you can use it to only run certain code if you know it is going to work (asking for the 5th item in a list, for example, means you may need to test that there are at least 5 items in that list).

In addition to `if` and `else` you can insert extra conditions using `elif` (else if) like so:

In [17]:
userage = 21
if userage > 25:
    print("The user is over 25")
elif userage <18:
    print("The user is below 18")
else:
    print("The user is between 18 and 25")

The user is between 18 and 25


Because these tests are likely to be used more than once, you may want to store them as a **function** so you don't have to write the same code over and over again.

## Functions

In order to do things with variables, it's likely that we will need **functions** of some sort. These are special words which perform some sort of action, such as calculating an average from a collection of numbers, or the biggest number in that collection, or measuring the numbers of characters in a string, or replacing certain characters.

We've already come across one function: `print`. This is used to display something in the console. We used it with the string "hello world" and with variables too.

The function `print` is followed by parentheses containing what it needs to work properly: the object you want it to print.

This is a defining feature of functions: they are always followed by parentheses, and those parentheses contain any *ingredients* that are needed for it to work. Some functions have 1 ingredient, some have 2 or more. A few have none, but they still need the parentheses.

The function `len`, for example, will tell you the length of a string. Here we use it to find out how long our variable is:

In [10]:
len(mynewvariable)

5

The function `sum` will add up all numbers in a list. Below we first create a list variable, and then use the function on it:

In [16]:
mylist = [1,3]
sum(mylist)

4

A list can also refer to variables. Here for example we store the area for a number of countries in appropriately named variables. Then we store those in a *list*. Finally we use `sum` to calculate the total of that list, and print it.

In [8]:
#Is Africa really as big as this: https://twitter.com/simongerman600/status/944535955867881472
africa = 30.37
#Here are the rest based on quick Google searches
usa = 9.834
india = 3.287
china = 9.597
europe = 10.18
mexico = 1.964
japan = 0.377
countrieslist = [usa,india,china,europe,mexico,japan]
print(countrieslist)
allcountries = sum(countrieslist)
print(allcountries)
print(africa<allcountries)

[9.834, 3.287, 9.597, 10.18, 1.964, 0.377]
35.239
True


### Creating your own functions

Earlier I mentioned how if/else tests might be used over and over again, in which case you may want to save it as a user-defined function. These are functions created by you - the user - and they are created with the command `def` like so:

In [18]:
def myagetest(userage):
    if userage > 25:
        print("The user is over 25")
    elif userage <18:
        print("The user is below 18")
    else:
        print("The user is between 18 and 25")

The `def` is short for *define*. It needs to be followed by a *name* for your new function (so that it can be used - called - later), then some *parentheses* containing any ingredients it needs to work, followed by a *colon*.

As you might guess, after the colon comes some indented code. That indented code is the code you want to run when the function is *called*. If you've specified any ingedients then chances are the code is going to do something with that variable.

Once you've **defined** a function like this you call it by using its name like so:

In [19]:
myagetest(22)

The user is between 18 and 25


What happens here is worth breaking down a little. 

Firstly, the code looks for a function called `myagetest` - it needs to have been defined *before* this line runs. 

Secondly, it *passes* the value `22` to that function, as an ingredient - an *argument*.

Now we look at the function as we defined it above. In the line `def myagetest(userage):` that ingredient is given the name `userage`. In other words, `22` is stored in a variable called `userage`, which can then be used by the code inside the function. 

This is what's called a **local variable**. In other words, it only exists within the scope of the function itself. Once the function is finished, the variable no longer exists. 

That variable is then tested by the code that comes next, and the relevant `print` command is executed.

### Using functions from other libraries

Python comes with a number of built-in functions, but you can access more functions by importing **libraries**. Using libraries in Python notebooks is a bit more complex than using in environments like Morph.io, so I've [covered that in a separate notebook]()