# Data Exploration with Python and Jupyter

Basic usage of the Pandas library to download a dataset,
explore its contents, clean up missing or invalid data,
filter the data according to different criteria,
and plot visualizations of the data.

- **Part 1: Python and Jupyter**
- [Part 2: Pandas with toy data](pandas-toy-data.slides.html)
- [Part 3: Pandas with real data](pandas-real-data.slides.html)

*Press Spacebar or the right arrow key to go to the next slide*

# Python
is a widely used programming language with many useful libraries

# Jupyter
an interactive notebook style of using a programming language

# Jupyter notebook

Separated into cells, which can be
- code cells
  - contain Python code to be executed: displays any output below
- markdown cells
  - contain markdown text: displays rendered markdown (nicely formatted text)

To run the selected cell, press `Shift+Enter` or click "Cell -> Run Cells" on the menubar

In [None]:
# This is a code cell: press Shift+Enter to execute the code in it
#
print("Hello!")

### Markdown cell
This is a markdown cell, with *formatted* **text**, useful for documenting / organizing your notebook

## Jupyter order of execution

- you are free to execute / run cells in any order you choose
- they can make use of and modify any objects, functions or variables that have already been created
- however this can quickly get confusing

## Top to bottom

- it is good practice to have a top-to-bottom flow of execution
- i.e. write your notebook so that it can be executed in the order it is written
- this makes it easier to understand what is going on

## Useful commands when things go wrong

- menubar `Kernel -> Restart and Clear Output`
  - fresh start (your code is still there, but all existing objects, functions and variables are cleared)
- menubar `Kernel -> Restart and Run All`
  - as above, but additionally executes all the cells in order


# Python variables

In [None]:
# any lines starting with "#" are comments that Python ignores
#
# assign the number 12 to the variable "a":
#
a = 12

In [None]:
# any variable or object can be printed
print(a)

In [None]:
# display the type of an object
type(a)

In [None]:
# variables can be re-assigned, including to different types
a = "Hello!"

In [None]:
print(a)

In [None]:
type(a)

# Python lists

In [None]:
# a list is an ordered container of objects (the objects don't have to have the same type)
my_list = [1, 3, 88, -13, "hello"]

In [None]:
print(my_list)

In [None]:
# can reference an item in the list by it's index: 0 is the first item
print(my_list[0])

In [None]:
# can also use negative indices: -1 is the last item, -2 the second-to-last, etc
print(my_list[-1])

In [None]:
# can use slicing to get a subset of the list: here elements with index 1 up to (but not including) index 3:
print(my_list[1:3])

In [None]:
# can add two lists together: this concatenates them into a single long list
print(my_list + [5, 6, 7])

In [None]:
# can iterate over the items in a list
for item in my_list:
    print(item)

# Python dictionaries

In [None]:
# a dictionary is an unordered set of key-value pairs
my_dict = {"name": "Bob", "age": 6}

In [None]:
print(my_dict)

In [None]:
# can look up a value using its key
print(my_dict["name"])

In [None]:
# can add a key-value pair to the dictionary by assinging a value to a key
my_dict["sizes"] = [1, 2, 3]

In [None]:
print(my_dict)

In [None]:
for key, value in my_dict.items():
    print(key, value)

# Libraries

In [None]:
# import a library, and (optionally) give it a shorter name
import numpy as np

In [None]:
my_list = [1, 2, 3, 4, 5]
# library functions accessed using library_name.function
# here we create a numpy array from a list
my_array = np.array(my_list)

In [None]:
print(my_array)

In [None]:
type(my_array)

In [None]:
np.sqrt(my_array)

# Next

- [Part 2: Pandas with toy data](pandas-toy-data.slides.html)