<a href="https://colab.research.google.com/github/scottlynn73/python_training/blob/main/Python_programming_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Python programming- Strathclyde University, March 2024

Please get this file too.

https://colab.research.google.com/drive/1BPB_mdsgWHbGf8SgHn8DC9cg9JZGLlP_?usp=sharing

.
.
.
.

This 'notebook' is a combination of text, explanations of key concepts, executable code, and some real world case studies that will hopefully show you that Python programming is worth pursuing as part of your studies.

**I have deliberately made it practical as that's the best way to learn, by doing!**

First I will provide some nice additional learning resources you might take a look at if you want to take this further.



## Basics of working with this notebook file

This is a `Jupyter Notebook` - this is the tool we're using to write and execute code. You can also combine code with text describing what the code is doing. Some people write entire books and reports with this technology!

Say you want to do a simple calculation adding two integers, but you want to explain what these represent so that its easy to follow later, or for a colleague or academic supervisor to understand your thinking.

**You would do it like this**

* You will need a 'Text' cell for your text explanation.
* You will need a 'Code' cell for your python code

In this notebook, we can hover the cursor just under the cell and click `+ Text` and `+ Code` to create the two cells we need.

## Data types/structures in Python
The main ones I use are:
* Strings (`e.g. "Today is the 21st of March")

* Integers (`e.g. 1, 2, 3, 4`)

* Floats (`e.g. 1.2, 1.4, 1.43893939`)

* Lists (`e.g. ["Ball", "Bat", "Table", 1, 1.32345]`) (in Python we use square brackets to contain the items in the list).

* Dataframes e.g. a 2D table, columns and rows, like a spreadsheet)

| Item         | Price |  Stock     | Stock value |
|--------------|:-----:|-----------:|----------- :|
| Apples       |  1.99 |        739 | To be calculated... |
| Bananas      |  1.89 |          6 |
| Pears        | 2.29. |        158 |

.
.

Dataframes are good because you can calculate new columns very easily. Say we wanted to calculate the Stock value column in a dataframe called "shop_df", and at the same time add 20% to the value, we would do this:

`shop_df['Stock_value'] = shop_df['Price'] * shop_df['Stock'] * 1.20`

Our dataframe Stock_value column would now have the values from the calculation

| Item         | Price |  Stock     | Stock value |
|--------------|:-----:|-----------:|----------- :|
| Apples       |  1.99 |        739 | 1764.7 |
| Bananas      |  1.89 |          6 | 13.6.  |
| Pears        |  2.29 |        158 | 434.2  |

This is very powerful for some key reasons:

1) **The calculation is clear and visible in a single location**. Unlike Excel where formulas are hidden which makes it hard to validate.

2) When you have a lot of rows and columns, it is **orders of magnitude faster than spreadsheets**. Millions of rows can be handled (try asking excel for the max, min, mean and median of 1 million rows of data!)

3) It is very **easy to generate statistics and plots** which we will do later. You can decide on what stats you want, and how you want a plot to look, and reuse those same 'templates' over and over. Excel cannot autogenerate things.

..

Later we will run a library called Pandas, this is the most powerful package in Python for data science. A real example we'll go through will show you some of the things you can do with it.

..

## What's good about Jupyter Notebooks for running Python?
Jupyer notebooks are great for **storing resuable workflows** that you can share with colleagues or supervisors.

..

Say you put together 100 discrete steps that take a raw dataset, transform it in some way, add new columns, do calculations, make plots. The chances of you remembering those steps (and crucially WHY you're doing them) in a few months/years is pretty slim.

..

Notebooks allow you to show the workflow in a linear manner with commentary to guide you or whoever is reading it- if you document it properly, you have no choice but to make a completely reproducible 'living' document that can be run again in the future.

..

Compare this with Excel, where a lot of decisions can be made in a spreadsheet and never recorded anywhere, making it very hard to retrace your own steps. In Python you will chain together a series of simple steps in a clear, logical manner. In Excel the same steps would be spread through a bunch of functions, nested IF statements and so on (yikes).

..

For example, imagine I gave you a .csv file with 200,000 rows in it, and said `"can you get rid of any rows where the 'Sales' column is blank, convert the "Cost" column from $ to £, and write out a new .csv with the changes. There are some values of "Cost" that were recorded wrongly as "666", convert them to the mean of all values in the "Cost" column. Also calculate the max, min, mean, 25%ile, 50%ile, 75%ile of all the columns, and make a few explanatory plots. Oh and could you make different output files for all the different territories in the "Location" column."`

In Excel, this is several hours of **unbelievably tedious** work. In Python (in the pandas library) its 10 minutes and about 10 lines of code- and you get to iterate and change any part of the workflow, and redo the entire thing in seconds. If the boss adds yet another requirement, you add another line of code. **We'll do some of this trickery later**.

..


With a notebook, there is only one way - start at the top, run the cells, and the answer at the bottom will be the same every time. In my case I'm quite forgetful, so 6 months after I write code, I need to be able to understand what I was doing at the time- so I write as if I am trying to guide my future (forgetful) self.

In [None]:
from IPython.display import YouTubeVideo
import pandas as pd

## Quick primer on Markdown (the text format we are using here)

I'll show you how to

Make headings

Format text as bold/italic

Make bullet lists

Make numbered lists

Code chunk highlighting.

Most of the things you can do in Word to style text can also be done in Markdown- I'll show you the basics.

## Doing a simple calculation with integers
Below we work with a couple of **integers** which are a type of data in python.

#### Notes for my supervisor (in a text cell)
The calculation below computes the sum of the mass of thing 1, plus the mass of thing 2. The units are in tons per year, and represent values for the year 2022. The values are from ***Smith et al (2022)*** and ***Jones et al (2022)***.

In [None]:

#  Calculation- in a code cell.
total = 195 + 205
print("The total is")
print(total)

The total is
400


In the case above we did a few things.

We wrote a **comment**, these start with **#**. Use these to write notes in a code cell, say to remember what a line of code is doing. It's not part of the code, the computer ignores it when you hit run.

We created a **variable**, called `total`, and defined its value as the sum of two integers, using Python's built in calculator.

**Print** out some useful message to show the result.

When you run a notebook, it remembers any values for variables you define so the value of `total` is available to use in further calculations.

Say you want to multiply the new `total` variable by 5, and then 10, and then 100- like this.

In [None]:
total_times_5 = total * 5
total_times_10 = total * 10
total_times_100 = total *100

print("Original total x 5")
print(total_times_5)
print("Original total x 10")
print(total_times_10)
print("Original total x 100")
print(total_times_100)

Original total x 5
2000
Original total x 10
4000
Original total x 100
40000


So you now have some new variables you can use in even more calculations.

## Working with strings
**Strings** are just the term we use for text in python code, again these are a data type in python.

A string is defined like this - with quotation marks around it

`my_string= "Hello world"`

Lets do the same thing in a code cell.

In [None]:
my_string = "Hello world"
print(my_string)

Hello world


We did two things- set a **variable** called `my_string` and printed it out.

..


Lets deliberately break python by trying to add a string to an integer.

..

Type this into a code cell

`total = "10" + 10`

In [None]:
total = "10" + 10

TypeError: can only concatenate str (not "int") to str

Didn't work right? Python won't let you add a string to an integer. Sometimes the error message isn't too helpful but this is what is wrong in this case.

How would we fix that? Well we can change the type of the data that's causing the problem.... this is called "casting" a variable to a new data type.

`total = int("10") + 10`

In [None]:
total = int("10") + 10
print(total)

20


Above, we added a new step, putting the string "10" inside brackets, and putting `int` on the front.

This is saying "pass the string to the int function, and add the result to the integer 10"

The `int` part is what's called a **method** in Python. In this case its a piece of functionality that comes baked into Python. Other ones are `sum`, `min`, `mean` for arithmetic operations. You also have `float` and `str`, which you can use to convert one datatype to another.

..

This is an example of how you apply a built-in python function- in this case the `int` function, to do something useful. Normally the name of the function is outside the brackets, and the thing we want it to apply to is inside the brackets.

You could do the reverse, make a string from an integer, like this.

`string_10 = str(10)`

This would make a text string "10" by passing the integer 10 to the str() function.

..

There are thousands of built in methods for all sorts of things. Inevitably you will want to do something that isn't built into Python already- which is where `libraries` come in- this is how we extend Python. The most common libraries we would need are called `numpy`, `pandas` and `matplotlib`. We'll look at these later- these extend Python to allow it to do complex mathematics, handle arrays, make plots and so on.

..

If you can think of a data or science problem, there is a very good chance a python library exists that could help.

..



### Combining strings
Learning how to combine or alter strings is really powerful. You can use this trick to build meaningful text variables- say a path on your computer to a file, or a title for a plot.

e.g. say we want to generate a title for a plot that's made up of some parts "Plot of the total", "representing", "my calculation"

string_a = "Plot

## Code snippets
Sometimes you just want the solution to a problem, say you want to know how to remove duplicate rows in a 2D Dataframe. Its worth checking the `Code Snippets` menu, under `Insert` at the top.

Type 'Pandas' in the box to filter the results, you'll see a list of things you can do, with some code to copy/paste and amend to suit your needs.



## Potential learning resources (beginner level)
There is a lot of excellent tutorial material on Youtube. For those with a scientific or statistical interest you might want to search for "Introduction to Pandas", or "Python Data Science".


### Web resources
*	`Automate the Boring Stuff with Python` (web pages) (https://automatetheboringstuff.com/)


* `Office for National Statistics Python Course` (free)
https://datasciencecampus.ons.gov.uk/capability/data-science-campus-faculty/introduction-to-python-programming/

## Video resources

### Learn to program with Python (excellent!)
Derek Banas is a well known educator on Youtube, his programming videos are excellent. Its best to play them, do the exercises as you go, pausing the video. This one is a general intro but still with good practical examples.

In [None]:
YouTubeVideo("nwjAHQERL08", width=400)

### Jupyter notebooks
This is a Jupyter notebook we are working with....

In [None]:
YouTubeVideo("5pf0_bpNbkw", width=400)

### Beginner's guide to general Python programming

In [None]:
YouTubeVideo("kqtD5dpn9C8", width=400)

### Automate the boring stuff with Python
This is the best guide for general purpose things that can save you time. Say you wanted to do something like rename 100 photos on your computer all at the same time using the same structure for the name, or if you wanted to remove a bunch of files based on their file extension.

In [None]:
YouTubeVideo("1F_OgqRuSdI", width=400)

# Python Objects

Object is the general name for data types, data structures, functions and so on that Python stores in memory and references with an identifier. Technically, an object is a specific instance of a 'class', which is an abstract template for a data type, data structure etc. A class is a pattern for creating new objects that can be reused and extended.

```python
days_list = ["Mon", "Tues", "Weds", "Thurs", "Fri", "Sat", "Sun"]
```
`days_list` is the identifier for a Python 'list' data structure (object) that in this case stores some text values.

All objects have properties and methods, which relate to data that the object stores and behaviours (procedures) that can be performed on the object to get a specific output.

Accessing the properties and methods of an object usually means using the '.' (dot notation). This means we type the name of the object we are interested in, put a dot, then type the name of the property or method we want to call.

For instance, calling the `clear()` method of a list object named `fruit` would be achieved like this:
```python
fruit.clear()
```
This would have the effect of removing any items stored by the list `fruit`, emptying the list.

Similarly, if we wanted to use the count method to count the number of times a particular item occured in a list, we might do the following:
```python
fruit = ['apple','apple','banana','orange'] # create a list called fruit.
fruit.count('apple') # count the number of times that the string 'apple' occurs in the fruit list
```
In the above, we would expect the `.count()` method, with the parameter 'apple' to return the value 2, because 'apple' appears twice in the list `fruit`.

Additionally, there are some special in-built Python functions that interact with objects and perform special behaviours, like `print()` and `len()`, or the numeric functions (e.g. `pow()`) you've seen already.

For example, calling the Python `print()` function enables us to print a representation of an object. This is based on how a given object behaves with respect to that function.

The Python sorted function declaration is as follows: <br>

sorted(iterable[, key = None][, reverse = False])

```python
sample_str = ['Modi', 'Trump', 'Putin', 'Jinping']
print("Default sort: ", sorted(sample_str))
print("Reversed sort: ", sorted(sample_str, reverse = True))
```

## 3.1 Lists

* The Python list object is (possibly) the most versatile of the built in data structures.

* It can hold any sequence of objects, including mixes of objects like strings, integers and Boolean variables together.

* Lists can also hold other lists and dictionaries.

* They can also hold custom data structures, and can be used to create custom data structures.

* Lists are generally created, or 'instantiated', using square brackets `[]`.

* A new list is considered to be an 'instance' of the list object.

* Each item in the list is separated by a comma.

* Lists can be amended (they are 'mutable' - they can be changed).

* Technically, a string is a list of characters, this becomes clear if we explictly make it a list.

* The *in* construct on its own is an easy way to test if an element appears in a list (or other collection) -- value in  collection -- tests if the value is in the collection, returning True/False.