# Data Exploration with Python and Jupyter

Basic usage of the Pandas library to download a dataset,
explore its contents, clean up missing or invalid data,
filter the data according to different criteria,
and plot visualizations of the data.

- [Part 1: Python and Jupyter](https://ssciwr.github.io/jupyter-data-exploration)
- **Part 2: Pandas with toy data**
- [Part 3: Pandas with real data](https://ssciwr.github.io/jupyter-data-exploration/pandas-real-data.slides.html)

*Press Spacebar or the right arrow key to go to the next slide*

# Pandas
is a data analysis and manipulation Python library

In [None]:
# Import the Pandas library
import pandas as pd

In [None]:
# Import some toy data as a pandas DataFrame
df = pd.read_csv("https://ssciwr.github.io/jupyter-data-exploration/data.csv")

In [None]:
type(df)

In [None]:
len(df)

In [None]:
# Display the first few rows of data
df.head()

In [None]:
# List the columns
df.columns

# Selecting rows and columns

Three main ways of doing this:

- Python-style indexing operator `[]`
- Pandas `loc` function (label-based)
- Pandas `iloc` function (index-based)

We'll start with the more intuitive Python-style methods, and later move into the more powerful `loc` and `iloc` alternatives

In [None]:
# A DataFrame is a bit like a Dictionary - we can lookup columns by name
names = df["Name"]

In [None]:
# A column of a DataFrame is a Series
type(names)

In [None]:
print(names)

In [None]:
# A Series is a bit like a List - we can select items by index
names[0]

In [None]:
# Here are the first five items:
names[0:5]

# Next

- [Part 3: Pandas with real data](https://ssciwr.github.io/jupyter-data-exploration/pandas-real-data.slides.html)