# Class 3 homework: understanding python commands

This notebook briefly revises how to think about python syntax and break down commands to build your understanding. It connects to and builds on the Class 1 homework, so you may wish to revise that first.

The main idea is that **complex commands are built up from simple commands**. So, understanding a complex command involves breaking it down into simple parts. And, doing something complex involves putting together the right simpler ingredients.

We will start by loading a data frame.


In [None]:
import pandas as pd
import numpy as np

In [None]:
df = pd.read_csv('../Datasets/count.csv')

## Reminder: punctuation, quotes, brackets, are important!

What's going on in each of these lines of code?

Try to predict what each line does, then run it and try to understand the output or error message.

In [None]:
df

In [None]:
"df"

In [None]:
'df'

In [None]:
print(df)

In [None]:
print("df")

In [None]:
print("df'")

## Breaking down a complex command into parts

Suppose you encounter a complex command, how do you go about understanding it?

For example, the next command.

In [None]:
np.log10(df.iloc[:, 1:2])

How can you figure out this complex command?

Break it down from the inside to the outside. Ask for help.

Try these commands, then try to explain in words what the more complex command is doing.

In [None]:
df.iloc?

In [None]:
df.iloc[:, 1:2]

Try some variations on that command - edit the `:` and the numbers to see what changes.

In [None]:
?np.log10

## Another complex command

Take the same approach to the following command. This one builds up a complex command using the Python approach where a method, or bound function, is added on to the end of a variable with a dot. Just like, we run the function `head` on a data frame like `df` by typing `df.head()`.

Only, this command does that again and again.


In [None]:
pd.read_csv('../Datasets/gannets.csv').sort_values('male', ascending = False).head(10)

Don't panic! What are the simple parts? Can you ask for help on them? What does each one do?

In this example, it would give more readable code to first load the data and assign in to a data frame. Then to sort values, and so on. This step-by-step approach is what we've been doing in most of the class notebooks.

In a previous homework, we remarked that there are many ways to do the same thing, and that's the worst part about learning Python. Sometimes one way to do something is easier to follow than another way, for example, if it's clear that one step is happening at a time. Try doing things so that they are easy to follow.

## Order and repetition

What's happening in the following code, if you run it in order?

For each line, first try to predict what the outcome will be, and then run the line and try to understand the result.

In [None]:
x = 0

In [None]:
print(x)

In [None]:
x = x + 1

In [None]:
print(x)

In [None]:
x = x + 1

In [None]:
print(x)

We now introduce the `del` function, that deletes a variable from the Python workspace.

In [None]:
del(x)

In [None]:
print(x)

In [None]:
x = 'life is full of change'

In [None]:
print(x)

These pieces of code illustrate that you can run the same line of code, and get different output, based on what other code has been run first.

In this case, `print(x)` prints the value of a variable, `x`. If the value of `x` changes, then the outcome is different. Same if we add one to `x`.

This makes it risky to write code where we change the value of a variable without changing the same name. We have done this in notebooks before, for example in Class 3 notebook:

In [None]:
df

In [None]:
df['Cereals'] = df['Barley'] + df['Oats']

In [None]:
df

Sometimes this makes for readable code, but it creates the risk that you overwrite something you need, or get confused in circular logic.

This can be avoided by:

1. giving new values new names
2. if things get confusing, start again

What do we mean "start again"? The small way to do this is to use `del` to delete the confused variable, or create it again from the beginning.

The big way to start again is to restart the entire Python session and run from the beginning. In Noteable, Kernel menu, then "Restart Kernel" is the command that does that. After restarting, you can fix the problem lines of code.

For example, to check that the code in these notebooks will work for you, the course team has Restarted the Kernel and re-run every line of code from the beginning, in the answers notebooks.

As you practise with Python, you'll built your mental model of what is happening and how variables are updated as code runs. Try to practise some more here.

## When disaster strikes

It is possible for code to be so wrong, that you need to start again. Even by accident.

The next example has happened many times with students on this course.

Again, for each line, first try to predict what the outcome will be, and then run the line and try to understand the result.

In [None]:
year

In [None]:
# this assigns the value 1965 to the variable year
year = 1965 

In [None]:
print

The next line is where disaster strikes...

In [None]:
print = year

In [None]:
print(year)

In [None]:
print

You may think "this shouldn't be possible", and many would agree. The assignment `print = year` has changed the value of `print`, so that the `print` function no longer works. Other similar problems can happen with unintended assignment of values to variables.

However, we can recover. In this case, deleting the new variable fixes the problem.

In [None]:
del(print)

In [None]:
print

But if everything seems hopelessly confusing, and we find ourselves re-running the same code chunks again and again, it's time to restart the kernel.

## More intro to python material

- Software Carpentry python novice
 https://swcarpentry.github.io/python-novice-inflammation/
- Variation 1 self-study notebooks
 https://github.com/edinburgh-bto/Variation1
- Python for Biologists, Martin Jones (also at library)
https://pythonforbiologists.com/tutorial/