# Class 1 Homework: Python thinking

This homework has more on the basics of Python. These are the kind of things that can cause errors, so we are going to practice making some errors and understanding what to do next.

Some of this material is also covered in the Python tutorial at https://docs.python.org/3/tutorial/introduction.html

It's also covered in Python for Biologists, by Martin Jones, and our recommended course books.


We'll load modules first, as it's helpful to have them at the beginning of a notebook, even though we won't use these until much later.

In [None]:
import numpy as np
import pandas as pd

## Python data types

What is the difference between these lines of code?

Before you run each line, try to predict the outcome.

If the line gives an error, try to understand it.

In [None]:
1 + 1

In [None]:
'1' + '1'

In [None]:
1 + '1'

In [None]:
one + one

In [None]:
'one' + 'one'

In [None]:
'one' + '1'

Does 1 plus 1 equal 2, or 11?

This may seem absurd, but programming languages need to keep track of many different kinds of things: numbers, text, dates, data frames, etc.

Everything that Python plays with, has a "data type" that tracks what kind of thing it is, what the allowed values are, and what you can do with it. Most programming languages have a concept of "data type", to tell the difference between numbers, text, and so on. [Data type at simple english wikipedia](https://simple.wikipedia.org/wiki/Data_type).

**Quotes** around a Python object tell Python to interpret it as a string, i.e. text.

We'll use data frames mostly on this course. Data frames can contain different kinds of information in different columns - one column can be a number and another text. However, some background knowledge on how Python deals with other kinds of objects helps to understand how it works, and especially what goes wrong and how to fix it.

In Python, the `type` function tells you what kind of data a given thing is. Common data types include:

- int / integer for whole numbers
- float / numeric for non-whole numbers
- str / string for text
- datetime - basic dates and times
- bool / logical for True and False values. (bool is short for Boolean logic)

In [None]:
type(1)

In [None]:
type('1')

In [None]:
type(1.0)

In [None]:
type(True)

So with code like `1 + '1'` - they are different types. It makes no sense to add a number to a string, so it's not allowed by Python.

But it does make sense to add two different kinds of numbers. So add an integer to a floating point number and you get another floating point number.

Note that Python tells you that it thinks a number is a floating point number by displaying the decimal place. What's happening in the next cells?

In [None]:
1 + 1.5

In [None]:
1 + 2

In [None]:
1 + 2.0

In [None]:
1.5 + 1.5

This can have intended, or unintended effects. For example, logical values can be interpreted `True` as `1`, and `False` as `0` - this is bad if you don't want it, and good if you do.

Try playing around with these to help you understand it.

In [None]:
1 + True

## Wait, can I just use Python as a calculator?

Yes you can. You can add, multiply, divide, and exponentiate to your heart's content.

More advanced functions, like logarithms and exponentials, tend to be found in the `numpy` module.

Play around here, if you like...

In [None]:
1000000 + 1

In [None]:
np.log2(8)

In [None]:
np.exp(10)

Again, there's more on this in the Python tutorial we linked to at the beginning.

Remember, the `+` sign can do different things on different data types. As you will be able to see from the exercises above, using the `+` operator to combine ints or floats will return ints and floats respectively. If there is a mix of ints and floats, then Python will return a float.

You may be thinking, but then why can I add two strings? Remember, operators and functions do different things on different data types. So when we use the `+` operator on strings, we are joining (or 'concatenating') them together.

In [None]:
'how' + ' ' + 'to' + ' ' + 'add' + ' ' + 'strings'

Using strings and the `+` operator, try writing some code in the cell below that will produce a short sentence, just like the example above.

## Functions

Functions have input and output - we are revising here from the material on functions in Discovery first year course.

One function we like to use is `print`, to display things.

In [None]:
print(1)

In [None]:
print(1 + 2)

In [None]:
type(print)

Functions take arguments, i.e. input, and they need to be written in Python with brackets in order to act. Otherwise they just, well, they just are there.

What's happening in the next few cells?


In [None]:
print

In [None]:
print()

In [None]:
print(print)

You can ask for help about functions using a `?` in Python

In [None]:
print?

You can look inside existing functions, or write your own functions.

One of the workshops in Discovery wrote functions to calculate distances between two strings.

The code below defines a function to add one to whatever the input is.

In [None]:
def addone(x):
    return x + 1

In [None]:
addone(100)

You can play around with writing your own functions, if you want.

A big learning point here is that complex Python objects are build out of simpler Python objects. 

## Lists

Often, we need to keep track of many different pieces of data together.

Python lists are the easiest way of doing this. They're created and displayed using square brackets.

In [None]:
[1, 2, 3]

In [None]:
type([1, 2, 3])

Adding two lists creates a longer list (or, "append" the second list).

In [None]:
[1, 2, 3] + [4]

The importance of square brackets for defining lists means that having brackets in the wrong place can get confusing. Like quotes.

What's happening in the next examples?

In [None]:
[1] + [1]

In [None]:
[1] + ['1']

In [None]:
[1 + 1]

In [None]:
[1] + 1

Yes... when learning to code, it helps to be precise and even pedantic.

Lists can contain different kinds of objects.

In [None]:
[1, 2, "many"]

That inconsistency can cause problems, so in this course we'll try to work with lists that contain only one kind of thing, like the columns of a data frame.

## Wait, what is a data frame really?

We now want to understand what kind of data type a data frame is. We'll re-use some of the material from Class 1 notebook.

Next make a dataframe

In [None]:
# example data frame with just numbers
df_simple = pd.DataFrame(
    {"a" : [4, 5, 6],
     "b" : [7, 8, 9] })

df_simple

What kinds of objects are these?

In [None]:
type(pd.DataFrame)

In [None]:
type(df_simple)

This means that the "type" DataFrame is defined by the pandas module

Remember, you can ask for help on functions straight from your Python session.

In [None]:
pd.DataFrame?

These help files are usually findable on nicely formatted websites too. See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html

In [None]:
# example data frame with numbers and names - taken from https://en.wikipedia.org/wiki/List_of_universities_in_Scotland
df_scottishuniversities = pd.DataFrame(
    {"University" : ["St Andrews", "Glasgow", "Aberdeen", "Edinburgh"],
     "Founding_year" : [1413, 1451, 1495, 1582],
     "Total_students_2023" : [11895, 38125, 15455, 40625 ]
    })

df_scottishuniversities

In [None]:
type(df_scottishuniversities)

In [None]:
df_penguins = pd.read_csv('../Datasets/penguins.csv')

df_penguins

Complex Python objects are made out of simpler Python objects. Clearly, this data frame contains both numbers and strings. It's arranged into columns and rows.

Conceptually, a dataframe is built out of columns.

Python calls a single column of a data frame a **Series**.  A Python Series is a bit like a Python list. However, a Series contains only one kind of data type, and has an order, index and, like dataframes, there are functions to describe and summarise them. A series can be called in different ways. Here are some examples of how we can quantify just the bill length measurements in the penguins dataset.

In the first example, we assign to a variable.

In [None]:
df_penguins_col = df_penguins['bill_length_mm']

In [None]:
max(df_penguins_col)

Some Python types, like the data frame, come with "built-in functions" (also called bound methods) that are accessed by using a dot at the end.

In [None]:
df_penguins_col.max()

In [None]:
df_penguins_col.max?

In these next examples, we avoid assigning the column to a variable and call it directly instead.

In [None]:
max(df_penguins['bill_length_mm'])

In [None]:
max(df_penguins.bill_length_mm) # note: this only works if the column name has no spaces or odd characters.

In [None]:
df_penguins['bill_length_mm'].max()

There are lots of functions you can use to summarise a series:  
    
-  `P.min()`
-  `P.mean()`
-  `P.median()`
-  `P.std()`
-  `P.skew()`
-  `P.quantile(0.5)`
-  `P.nlargest(3)`
-  `P.sample(2)`
-  `P.argmin()`
-  `P.argmax()`
-  `P.count()`
-  `P.value_counts()`

Have a go running some of these commands in the cells below. Can you work out what they are doing?

## Wait, what's going on?

The goal of this homework was to revise some Python basics, in a way that you can learn many key features and how to troubleshoot errors.

Thinking about about why these lines of code work the way they do will help you code in the rest of the course.

Yes, it requires precision and pedantry. It's like learning a language - it takes time and practice. By starting to understand what each component of a line of code does, you can learn to make the code work for you.

### The worst part - there's lots of ways of doing the same thing

Yes. This is the worst part. This is a frequent complaint about Python.

Learn one thing at a time, and you can learn to make Python work for you. Millions of people use Python and other coding languages every day - it gets easier as you learn to recognise patterns.