# Introduction to Python 

This tutorial is intended to get you started with coding in Python. By the end you should:

* Be able to use basic commands like `print` to see what the code is doing
* Explain coding concepts like variables, loops and if/else tests - and explain how those can be used in journalistic applications like scraping
* Identify different types of data: numerical (integers and floats), text (strings), lists and dictionaries

Let's get started.


## Programming environments

To use Python you need to be using a computer on which Python has been installed. There are two ways of doing this:

* Use a **web-based environment** which allows you to run Python on a remote computer where it is already installed (also called **server-based**)
* Install Python on your own computer (what's called **client-based**). 

Web-based environments include:

* [Google's Colab notebooks](http://colab.research.google.com/), which you can access and share through your own Google Drive account
* [Execute Python Online](https://github.com/paulbradshaw/python_demo/blob/master/executepythononline.md)
* [Try Jupyter](https://github.com/paulbradshaw/python_demo/blob/master/tryjupyter.md) (a good way to get used to notebooks - but code is stored only temporarily)
* [PythonAnywhere](https://github.com/paulbradshaw/python_demo/blob/master/pythonanywhere.md)
* [Trinket](https://trinket.io/) (which could be a way to allow people to play with your code as you can embed both code and results in an iframe), and 
* [Repl.it](https://repl.it/).

If you want to run Python "locally" (i.e. on your own computer) the easiest way to install Python is to [install Anaconda](https://www.anaconda.com/download/#macos). This not only installs Python but also useful tools like **Jupyter Notebook** (a way of making your code accessible to others - you are reading a notebook right now) and **Spyder** (a coding 'environment' that allows you to see what you're working with, similar to RStudio). [You can find a video tutorial on installing Python and Jupyter Notebook here](https://www.lynda.com/Software-Development-tutorials/Installing-Python-Jupyter-Notebook/576698/605448-4.html). 



# A brief introduction to Google Colab

[Google's Colab notebooks](http://colab.research.google.com/) allow you to use Python inside Google Drive. Start there.

If you are logged into your Google account when you go to [colab.research.google.com](https://colab.research.google.com) a window will open showing any notebooks, with a **NEW NOTEBOOK** option in the bottom right corner. Click that.

A new notebook will be created, with the name 'Untitled0.ipynb'

Click on that to rename it. Keep the `.ipynb` extension - this means a Python notebook.

A Colab notebook has, broadly, 3 sections:

* The top of the screen has your menus: File, Edit, View and so on. 
* The main part of the screen is the notebook itself: to start with it has an empty code block at the top; and above that you should see a **+ Code** and a **+ Text** button that can be clicked on to create new blocks of code or text. 
* The left area has three icons: a table of contents button; a 'code snippets' button; and a Files button. Clicking on any of these expands that left hand area to show you more. These will become more useful as you start to edit your notebook and run code.

Click inside the empty code block and start typing. Try typing this code:

In [None]:
#This is a print statement
print("Hello World!")

Hello World!



The first line is a **comment**. Comments start with a hash symbol (`#`) and do not do anything, apart from describe what comes after. They are useful for explaining what the code is doing, both for others and for yourself when you return to it later. 

They can also be used to turn code 'off' temporarily by 'commenting out' a line of code (the line can then be turned back 'on' by removing the comment, or 'uncommenting').

The second line is a **print** command. The word `print` tells Python to *display* anything that comes after it. In this case, a **string** of text: `"Hello World!"`

If you click the play button (to "execute" the code), those two lines of code will run, and that string of text displayed in the output window below.

## 'Printing'

When a computer runs code, you cannot see what it is doing unless you ask it to show you. 

In Python, you ask it to show you by using the `print` command. This doesn't actually print things to a physical printer, but instead *displays* information in an area (normally called the **console**: in a notebook that area is immediately under the code block). 

*Note: some online Python services, such as Execute Python Online, use an earlier version of Python which doesn't require parentheses after `print` but from version 3.0 onwards they need to be included.*

## Variables: storing and manipulating information

A variable is a way of storing and manipulating information in programming. In journalism, for example, you will need to store any URLs that you want to scrape, or page numbers that you want to cycle through, or numbers that you want to calculate, or text that you want to search.

If you've ever written a spreadsheet formula that uses a cell reference like A2 or C10 then you've already used a form of variable: a formula like `=A2*10` or `=FIND("Smith",A2)` can produce different results depending on the value in A2 - and is more flexible: instead of having to type a new formula each time you want to multiply a number by 10 or find a word, you only have to change one thing: the value in A2.

In Python you create a variable by specifying a name for the variable, followed by the `=` operator, and the value that you want to store in that variable.

Type two lines to create two different variables:

In [None]:
mynewvariable = "Paul"
myage = 18

Now test that the variables have been created by using the `print` command to show the variable in the next line:

In [None]:
print(mynewvariable)

Paul


The name of the variable can be almost anything - the only exceptions are a few special words which already exist as commands (`print` for example) but if you try to use any of these you will get a warning anyway. 

A variable name cannot have spaces (use underscores instead).

There are different *types* of variable in Python. The main ones to know are:

* **Strings** (any text, like "Hello World")
* **Integers** (whole numbers, like 15)
* **Floats** (numbers with decimal places, like 13.0)
* **Booleans** (`True`/`False` values)
* **Lists** (a series of items. Those items can be numbers, strings, or any of the other types - including dictionaries and lists)
* **Dictionaries** (key-value pairs, like "name":"paul" - typically more than one)

Create one of each yourself, like so:

In [None]:
hername = "Jane"
herage = 18
hertemp = 37.5
isfemale = True
herfriends = ["Paul","Sarah","Raneem","Alf"]
herassets = {"house":90000,"car":10000,"computer":1000,"phone":300}

Note the differences between each type of variable: numbers don't use quotation marks, but **strings** do use quotation marks (you can use single or double quotes). **Booleans** must use the word `True` or `False` with the first letter capitalised (`TRUE` or `true` will not work). Colour coding often helps indicate the type of data you have created.

**Lists** use square brackets, with a comma between each item. And **dictionaries** use curly brackets, with a comma between each *key-value pair*, and a colon to separate the key and value themselves.

## Checking a variable type

You can check a variable type by putting the name of the variable inside `type()` like so:

In [None]:
type(hername)

str

In [None]:
type(herage)

int

In [None]:
type(hertemp)

float

In [None]:
type(isfemale)

bool

In [None]:
type(herfriends)

list

In [None]:
type(herassets)

dict



### Key-value pairs

I should explain what this key-value business is about. The *key* is like a column heading in a spreadsheet: name, age, hometown. The *value* is like the cells underneath that column heading: "Paul", 18, "Birmingham". 

Dictionaries are useful in storing data. In a scraper, for example, you might grab one piece of information from a webpage and store it in a dictionary variable against a relevant key, then grab a different piece of information and store that against another key, and so on. If you get into scraping you'll get more experience with this.

Note that the list and dictionary can contain either strings or numbers, or both. For example:

```python
herlastfivescores = [10,20,6,9,14]
herbirthdate = {"year":2000, "month":"July", "date":27}
```

You can also have *empty* lists, dictionaries, or strings, like so:

```python
nothingtoseehere = ""
fillme = []
waitingforsomethingtohappen = {}
```

Lists and dictionaries can also contain their own lists or dictionaries. When you get lists within lists or dictionaries within dictionaries it can look a bit confusing at first, but you get used to it.

## Changing variables

Once created variables can then be used - and changed - by using the name again, like so:

In [None]:
mynewvariable = "Sarah"
print(mynewvariable)

Sarah


You can change a variable by assigning a new value in the same way:

And you can even use a variable to update itself:

In [None]:
print(myage)
myage = myage+1
print(myage)

18
19


Note that the code to the right of the `=` operator is executed *first*, so in the part `myage+1`, the variable `myage` is 18. 18+1 is 19, and that value is then **assigned** to the `myage` variable, overwriting its previous value of 18.

## Errors and data types

Errors are part and parcel of coding. It is a rare piece of code which doesn't generate some sort of error at some point - and solving those errors becomes part of the fun of coding.

Typically when you get an error you will also get information that helps you solve it. Here are some that you might get when creating variables...

First, if you try to use a word *without* quotation marks, the code will assume that it has a special meaning, like a variable, function or library (more on those below).

See what happens, for example, if we try to assign the value `Paula` *without* quotation marks to a variable:

In [None]:
hername = Paula

NameError: name 'Paula' is not defined

The error tells us that `'Paula' is not defined`. In other words, we have not *defined* a function or variable called Paula. That's because it's not a function or variable - it's a string, but we forgot to indicate that by putting it inside quotation marks.

Numbers do not need quotation marks because they are, well, numerical. And we want to treat them that way. Sometimes we *might* want to treat numbers as strings - for example if we are putting them into a URL, or adding them to a postcode or similar code. In those cases we would put them in quotation marks. But if want to perform calculations like adding, multiplying, and so on, we don't use quotation marks.

If you try to combine a string with a number you will get an error like so:

In [None]:
combiningstuff = hername+herage

TypeError: Can't convert 'int' object to str implicitly

That error tells you that it cannot convert an `'int'` (integer) to `str` (string) *implicitly*. In other words, without us being *explicit* that we want it to do so. 

To be explicit, we would have to add an extra instruction to convert that number into a string. That instruction is `str()`: a function that converts numbers to strings. Here it is in action:

In [None]:
combiningstuff = hername+str(herage)
print combiningstuff

Jane18


## Loops

Loops are a way to perform repetitive actions in code. They are extremely useful for situations involving lists or ranges of numbers: for example you might loop through a list of items and check each one, or save it in a datastore; or loop through a range of numbers and add them to a URL to create a page number.

Here is an example:

In [None]:
for i in herfriends:
    print(i)

Paul
Sarah
Raneem
Alf


A loop has a number of parts which are worth breaking down:

* `for` and `in`
* the variable here called `i`
* the colon
* the indented code after the colon

The word `for` indicates that we want to begin looping. What we want to loop *through* is indicated by `in`. So in this case we want to loop through items in the list variable `mynewlist`. I'll come back to `i` in a minute.

At the end of this line comes a colon, and that begins an indented section which contains the code we want to be executed for *each* item in that list. In this case the command `print(i)` will run 3 times - once for each of the 3 items in the list.

Now what about that `i`?

When we loop through a list we need some way to store each item while we're working with it. The word between `for` and `in` is a way of assigning that item to a variable - so `i` is a name for the item the loop is currently working with. The first time the loop runs, `i` is `10` (the first item in the list); the second time, `i` is `20`, and so on. 

Within the indented code, then, we can use that `i` variable and do things with it, each time the loop runs. In other words, we can do things with *each item* in a list.

The choice of `i` is entirely arbitrary: we can use any name we want. It's quite common for people to use `i` in loops because it's short and also represents an 'item' in a 'list', but we can make other choices which are more meaningful. For example, if we wanted to loop through a list of names we might write code like this:

In [None]:
usernames = ["Paul", "Sarah", "Diane"]
for username in usernames:
    print("The user is "+username)

The user is Paul
The user is Sarah
The user is Diane


## If/else tests

An if/else test in programming allows us to test if something is true or false, and then do different things depending on the answer. Here's an example:

In [None]:
userage = 30
if userage > 25:
    print("The user is over 25")
else:
    print("The user is 25 or younger")

The user is over 25


In the example above we store the number 30 in the variable `userage`. The first `if` test asks if that variable is above 25. Note that it ends in a colon and the indented code after that colon will only run if that test returns `True`.

If it does not return `True`, then it ignores the indented code and moves to the next line that begins `else`. Again, this ends in a colon and some indented code, which will now run.

In [None]:
userage = 21
if userage > 25:
    print("The user is over 25")
else:
    print("The user is 25 or younger")

The user is 25 or younger


These tests can be very useful in all sorts of contexts. For example you can use it to only run certain code if you know it is going to work (asking for the 5th item in a list, for example, means you may need to test that there are at least 5 items in that list).

In addition to `if` and `else` you can insert extra conditions using `elif` (else if) like so:

In [None]:
userage = 21
if userage > 25:
    print("The user is over 25")
elif userage <18:
    print("The user is below 18")
else:
    print("The user is between 18 and 25")

The user is between 18 and 25


Because these tests are likely to be used more than once, you may want to store them as a **function** so you don't have to write the same code over and over again.

## Functions

In order to do things with variables, it's likely that we will need **functions** of some sort. These are special words which perform some sort of action, such as calculating an average from a collection of numbers, or the biggest number in that collection, or measuring the numbers of characters in a string, or replacing certain characters.

We've already come across one function: `str`. This function converts a number to a string.

The function `str` is followed by parentheses containing what it needs to work properly: the object you want it to convert.

This is a defining feature of functions: they are always followed by parentheses, and those parentheses contain any *ingredients* that are needed for it to work. Some functions have 1 ingredient, some have 2 or more. A few have none, but they still need the parentheses.

The function `len`, for example, will tell you the length of a string. Here we use it to find out how long our variable is:

In [None]:
len(mynewvariable)

5

Some functions work in different ways with different types of variable. Use `len` with a list, for example, and it will tell you how many *items* are in that list.

The function `sum` will add up all numbers in a list. Below we first create a list variable, and then use the function on it:

In [None]:
mylist = [1,3]
sum(mylist)

4

A list can also refer to variables. Here for example we store the area for a number of countries in appropriately named variables. Then we store those in a *list*. Finally we use `sum` to calculate the total of that list, and print it.

In [None]:
#Is Africa really as big as this: https://twitter.com/simongerman600/status/944535955867881472
africa = 30.37
#Here are the rest based on quick Google searches
usa = 9.834
india = 3.287
china = 9.597
europe = 10.18
mexico = 1.964
japan = 0.377
countrieslist = [usa,india,china,europe,mexico,japan]
print(countrieslist)
allcountries = sum(countrieslist)
print(allcountries)
print(africa<allcountries)

[9.834, 3.287, 9.597, 10.18, 1.964, 0.377]
35.239
True


### Creating your own functions

Earlier I mentioned how if/else tests might be used over and over again, in which case you may want to save it as a user-defined function. These are functions created by you - the user - and they are created with the command `def` like so:

In [None]:
def myagetest(userage):
    if userage > 25:
        print("The user is over 25")
    elif userage <18:
        print("The user is below 18")
    else:
        print("The user is between 18 and 25")

The `def` is short for *define*. It is followed by:

* a *name* for your new function so that it can be used later. (When a function is used we talk about **calling** a function)
* some *parentheses* inside which... 
* you name the ingredients it needs to work. Basically you are creating variables. When someone calls the function and specifies any ingredients in the parentheses, those ingredients are assigned to these variables.
* a *colon* then, indented after that...
* the code that you want to run when the function is *called*

If you've specified any ingedients then chances are the code is going to do something with that variable.

Once you've **defined** a function like this you call it by using its name like so:

In [None]:
myagetest(22)

The user is between 18 and 25


What happens here is worth breaking down a little. 

Firstly, the code looks for a function called `myagetest` - it needs to have been defined *before* this line runs. 

Secondly, it **passes** the value `22` to that function, as an ingredient - an **argument**.

Now we look at the function as we defined it above. In the line `def myagetest(userage):` that ingredient is given the name `userage`. In other words, `22` is stored in a variable called `userage`, which can then be used by the code inside the function. 

This is what's called a **local variable**. In other words, it only exists within the scope of the function itself. Once the function is finished, the variable no longer exists. 

That variable is then tested by the code that comes next, and the relevant `print` command is executed.

*Note: Python comes with a number of built-in functions, but you can access more functions by importing **libraries**.*

## Saving your code

Try to save your code regularly by pressing CTRL+S or CMD+S. You can also export your notebook by selecting **File > Download .ipynb**. This can then be shared with others on GitHub etc.