# Session 0 - Using Jupyter Notebooks

## A brief introduction

Jupyter notebooks are built on the `python` programming language, which itself was designed to offer a clear and simple coding experience. Python code can be run in many ways, meaning you don't necessarily need a Jupyter notebook to perform the tasks you will learn in these sessions. However, Jupyter allows for the easy communication, visualisation and sharing of python code and so we will take advantage of these benefits.

## Resources

The Jupyter website (https://jupyter.org/) contains many good interactive resources and online documentation for learning about Jupyter notebooks and experimenting with examples. In addition...

#### Online tutorials

A number of people have uploaded excellent blog posts that give good guides for getting started, and even more advanced functionality 

* https://www.dataquest.io/blog/jupyter-notebook-tutorial/
* https://www.datacamp.com/community/tutorials/tutorial-jupyter-notebook

#### YouTube videos

There are also free youtube videos that provide nice introductions.

* https://www.youtube.com/watch?v=HW29067qVWk
* https://www.youtube.com/watch?v=jZ952vChhuI
* https://www.youtube.com/watch?v=3C9E2yPBw7s


## Getting started

Firstly, click on the help menu and select `user interface tour`. A pop up menu will appear. Then use the keyboard arrow keys to see some of the functionality offered by Jupyter notebooks.

#### Cells

Jupyter is comprised of grey boxes called **cells**. These contain one of two things

* Python code than can be run or executed
* Text content to help with explanation

We start with the former.

Click on the cell below, change the text 'Hello World' to something else, e.g. 'Hello Jupyter'. This is the command that will be executed. To execute the command do one of the following
* Select Cell/Run Cell from the dropdown menu
* Hit ctrl+Enter on the keyboard

You should see the output change to what you specified

In [1]:
print('Hello World')

Hello World


## Programming Basics
The notes below introduce soem programming ideas. It is not essential to cover all of these: go as far as you can depending on your previous experience. 

#### Performing calculations

Add a new cell by clicking on this cell (the one with the text) and doing one of the following

* Click the small '+' icon on the toolbar
* Select Insert/Insert Cell Below from the menu
* Hit shift+Enter on the keyboard

Then type in an mathematical operation, e.g. 4 + 3, or 12/4 or 3*9 or 7-5, and run the cell (ctrl+Enter). Select the cell again and try something else.

In [1]:
(1+3+5)*3*2

54

#### Assigning variables

To do more interesting things it usually becomes necessary to remember quantities for later. Try the following

* Create a new cell below this one (see above)
* Write `x = 3` in this cell and run the cell (see above)

You will notice there is no output this time. This is because we have assigned the value 3 to x. Create another cell, type `x` and run the cell. You should see an output this time of 3.

In [11]:
x = 3

In [12]:
x

3

#### Reassigning variables

In Jupyter it is possible to reassign the values of variables. For example, we can reassign the value assigned to x that we set above. Do the following

* Create a new cell below this one
* Write `x = 11` in this cell and run the cell
* Create a new cell below that one and type x

You should see the output has changed to 11.

**IMPORTANT** - This brings us to a key aspect of Jupyter: The outputs depend on the order on which you run the cells, not the way they are order in the notebook. For example, run the cell above for which you previously entered `x`. This should now output 11. Moreover, the number inside of the `Out[.]` should change, this indicates the order in which the cells have been run.


In [5]:
x = 11

In [13]:
x

3

#### Using multiple line programs

Rather than using single lines Jupyter is designed to handle blocks of code that run together. This gives us more flexibility in writing longer commands and helps breakup the whole notebook into workable chunks. This is particularly necessary in writing things like for-loops.

Below are some examples.

In [7]:
my_name = 'Alan Turing'
print('My name is', my_name)

My name is Alan Turing


In [8]:
for count in range(0,3):
    print(count)

0
1
2


#### Exercise

Change the for loop in the code above so that the ouput is

* 1, 2, 3
* 0, 2, 4, 6

In [9]:
my_list = ['apple', 'dog', 45, True]
for item in my_list:
    
    print(item)
    
    if item == 'dog':
        
        print('Yes')
    

apple
dog
Yes
45
True


#### Getting Errors

If you have never run programs before then be aware that it is very common to get errors. Whenever Juptyer (or Python) encounters an error it displays an error message. This is nothing to worry about, it simply says there is something wrong with the command you have tried to run and tries to help you identify what is wrong.

#### Exercise:

Try running the following cells - they will give you error messages. From those messages attempt to figure out what is wrong, correct the command and run the cells again

In [10]:
print('This is my message'

SyntaxError: unexpected EOF while parsing (<ipython-input-10-cc98faec0927>, line 1)

In [None]:
z = 3*'7'
z

#### Writing text comments

As you will have noticed, there are many comments in this text. You can do the same should you wish. To do so create a new cell, select that cell and then from the dropdown menu in the toolbar change 'code' to 'markdown. You can then start writing in the cell. Run the cell as normal to see the text changed from edit mode to display mode.

You can also double-click on any of the other text, or markdown, cells above to see how to write headings, bullet points, or bold type-face.

A nice cheatsheet for markdown syntax is given here
* https://medium.com/ibm-data-science-experience/markdown-for-jupyter-notebooks-cheatsheet-386c05aeebed

# Introduction to DataFrames

Dataframes are essentially tables of information, like those traditionally seen in Excel. They are one of the most common forms of storing data, particularly for data-visulatisation, statistical analysisand machine-learning.

Pandas (Python Data Analysis Library) is an open-source package for python that allows the user to easily create and manipulate DataFrames. Many other packages now use pandas DataFrames as a means for inputting information.

## Resources

#### Pandas website

The website for pandas is https://pandas.pydata.org/. It contains extensive documentation about all the commands you can use as well as tutorials for getting started. In fact we will be loosely following their '10 minutes to Pandas' guide, which you can find here

* https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html

#### Other resources

Again, you will be able to find many blogposts and youtube videos giving both introductory and advanced tutorials on Pandas, here are two for example

* https://www.youtube.com/watch?v=e60ItwlZTKM
* https://www.youtube.com/watch?v=vmEHCJofslg

## Outline

In the current session we will learn about

1. Creating Dataframes
2. Viewing the data
3. Selecting/filtering data
4. Reassigning data

## Creating DataFrames

#### Importing pandas
In order to use pandas we must use the `import` statement, as below. We only need to do this once in a notebook, however you must run that cell before any of the others, otherwise you will get an error. In fact, try running another cell beforehand to see what happens, then do it again after running the one with the import command.

#### Series
The first thing we are going to create is a series, this is essentially an ordered list of data. For example, a list of names

In [1]:
import pandas as pd

names = pd.Series(['Ada', 'Alan', 'Isaac', 'Rosalind'])

names

0         Ada
1        Alan
2       Isaac
3    Rosalind
dtype: object

#### Dataframes

As aluded to in the beginning, dataframes are tables of information. In the pandas world they are collections of series, where each column of the table is a series (i.e. an ordered list of data).

There are numerous ways of creating dataframes in pandas, all of them require using the `DataFrame` object. Below are two examples.

In [2]:
df1 = pd.DataFrame([['Lovelace', 35], ['Turing', 15], 
                          ['Newton', 35], ['Franklin', 15]], columns=['Surname', 'Hours'])

df1

Unnamed: 0,Surname,Hours
0,Lovelace,35
1,Turing,15
2,Newton,35
3,Franklin,15


In [3]:
df2 = pd.DataFrame({'Department': ['Computer Science', 'Mathematics', 'Physics', 'Chemistry'],
                   'Years Service': [5, 7, 3, 9],
                   'EmployeeID': [1, 2, 3, 4]})

df2

Unnamed: 0,Department,Years Service,EmployeeID
0,Computer Science,5,1
1,Mathematics,7,2
2,Physics,3,3
3,Chemistry,9,4


## Indexing, viewing and filtering

Before proceeding, let us join our two dataframes together

In [4]:
df = df1.join(df2)
df

Unnamed: 0,Surname,Hours,Department,Years Service,EmployeeID
0,Lovelace,35,Computer Science,5,1
1,Turing,15,Mathematics,7,2
2,Newton,35,Physics,3,3
3,Franklin,15,Chemistry,9,4


In [5]:
df['Surname']

0    Lovelace
1      Turing
2      Newton
3    Franklin
Name: Surname, dtype: object