# Data Exploration with Python and Jupyter

Basic usage of the Pandas library to download a dataset,
explore its contents, clean up missing or invalid data,
filter the data according to different criteria,
and plot visualizations of the data.

- **Part 1: Python and Jupyter**
- [Part 2: Pandas with toy data](https://ssciwr.github.io/jupyter-data-exploration/pandas-toy-data.slides.html)
- [Part 3: Pandas with real data](https://ssciwr.github.io/jupyter-data-exploration/pandas-real-data.slides.html)

*Press Spacebar or the right arrow key to go to the next slide*

# Python
is a widely used programming language with many useful libraries

# Jupyter
an interactive notebook style of using a programming language (aka the "Kernel")

# Jupyter notebook: Cells

Notebook is separated into cells, which can be
- **code** cells
  - contain Python code to be executed
- **markdown** cells
  - contains text in markdown format

To select a cell: click on it with the mouse

To run the selected cell, press `Ctrl+Enter` or click "Cell -> Run Cells" on the menubar

In [None]:
# This is a code cell: press Ctrl+Enter to execute the code in it
#
print("Hello World!")

### Markdown cell
This is a markdown cell, which can contain
- headings, lists, *formatted* **text**, [links to websites](https://ssc.iwr.uni-heidelberg.de/).
- math in latex format: $\int_0^\infty \cos(x) dx$
- ![images can also be displayed](https://ssc.iwr.uni-heidelberg.de/sites/default/files/inline-images/logo_ssc_iwr_uni.png)

# Jupyter notebook: Mode

Two *modes* of interacting with the active/selected cell
- <span style="color:green">**edit**</span> mode
  - edit the text inside the cell (green outline)
- <span style="color:blue">**command**</span> mode
  - use keyboard shortcuts to modify the cell or run commands (blue outline)


- To enter edit mode: double click inside a cell, or press `Enter` with a cell selected
- To enter command mode: click to the left of a cell inside the green outline, or press `Escape`

# Jupyter notebook: Commands

Lots of keyboard shortcuts available. Press `Escape` to enter command mode, then the `H` key to see a list.

Some commonly used shortcuts:
- `A`: insert a cell above the current cell
- `B`: insert a cell below the current cell
- `M`: convert the current cell to a markdown cell
- `Y`: convert the current cell to a code cell
- `Shift+Enter`: run the current cell and advance to the next cell

## Jupyter notebook: Order of Execution

- you are free to execute / run cells in any order you choose
- they can make use of and modify any objects, functions or variables that have already been created
- however this can quickly get confusing and make reproducing results difficult!

## Top to bottom

- it is good practice to have a top-to-bottom flow of execution
- i.e. write your notebook so that it can be executed in the order it is written
- this makes it easier to understand what is going on

## Useful commands when things go wrong

- menubar `Kernel -> Restart` (or command mode shortcut: `0 0`)
  - fresh start (your code is still there, but all existing objects, functions and variables are cleared)
- menubar `Kernel -> Restart and Clear Output`
  - as above, but additionally clears all cell outputs
- menubar `Kernel -> Restart and Run All`
  - as above, but additionally executes all the cells in order


# Python: Variables

In [None]:
# any lines starting with "#" are comments that Python ignores
#
# assign the number 12 to the variable "a":
#
a = 12

In [None]:
# any variable or object can be printed
print(a)

In [None]:
# display the type of an object
type(a)

In [None]:
# variables can be re-assigned, including to different types
a = "Hello!"

In [None]:
print(a)

In [None]:
type(a)

# Python: Lists

In [None]:
# a list is an ordered container of objects (the objects don't have to have the same type)
my_list = [1, 3, 88, -13, "hello"]

In [None]:
print(my_list)

In [None]:
# can reference an item in the list by it's index: 0 is the first item
print(my_list[0])

In [None]:
# can also use negative indices: -1 is the last item, -2 the second-to-last, etc
print(my_list[-1])

In [None]:
# can use slicing to get a subset of the list: here elements with index 1 up to (but not including) index 3:
print(my_list[1:3])

In [None]:
# can add two lists together: this concatenates them into a single long list
print(my_list + [5, 6, 7])

In [None]:
# can iterate over the items in a list
for item in my_list:
    print(item)

# Python: Dictionaries

In [None]:
# a dictionary is an unordered set of key-value pairs
my_dict = {"name": "Bob", "age": 6}

In [None]:
print(my_dict)

In [None]:
# can look up a value using its key
print(my_dict["name"])

In [None]:
# can add a key-value pair to the dictionary by assinging a value to a key
my_dict["sizes"] = [1, 2, 3]

In [None]:
print(my_dict)

In [None]:
# adding an existing key overwrites the old value with the new one
my_dict["sizes"] = [5, 10, 24]

In [None]:
print(my_dict)

In [None]:
# can iterate over dictionary items
for key, value in my_dict.items():
    print(key, value)

# Python: Functions

In [None]:
# functions are defined using the def keyword
def my_function():
    print("hi")

In [None]:
my_function()

In [None]:
# functions can take arguments
def my_function(name):
    print("hi", name)

In [None]:
my_function("Liam")

# Python: Libraries

In [None]:
# import a library, and (optionally) give it a shorter name
import numpy as np

In [None]:
my_list = [1, 2, 3, 4, 5]
# library functions accessed using library_name.function
# here we create a numpy array from a list
my_array = np.array(my_list)

In [None]:
print(my_array)

In [None]:
type(my_array)

In [None]:
# apply the numpy `sqrt` function to every element of the array
np.sqrt(my_array)

In [None]:
# display help about this sqrt function
np.sqrt?

In [None]:
np.mean(my_array)

In [None]:
np.std(my_array)

# Next

- [Part 2: Pandas with toy data](https://ssciwr.github.io/jupyter-data-exploration/pandas-toy-data.slides.html)