<img src=images/gdd-logo.png width=300px align=right>

# Pandas introduction

The next notebooks will cover how to use the `pandas` library to explore datasets.

In this section we will cover:

* [Pandas overview](#overview)
* [Lambdas starter](#lambdas)
* [Benefits of using pandas](#benefits)
* [Data wrangling](#wrangling)
* [<mark>Exercise: Exploratory Data Analysis</mark>](#exploring)
    * [<mark>Exercise: Learn some key methods!</mark>](#exploring)
    * [<mark>Exercise: Explore a dataset</mark>](#ex-explore-data)
* [Analysis](#analysis)

<a id = 'lambdas'></a>
## <mark>Exercise: Lambda Starter</mark>

**Lambda functions** really start to come into their own when we use them with pandas. Therefore we need to be really comfortable with the syntax. 

Here is an example of a lambda function that adds 4 to a number:

In [None]:
add_4 = lambda x: x + 4

In [None]:
add_4(10)

This lambda function essentially does the same as if we were formally defining a function.

In [None]:
def add_4(x):
    return x + 4

In [None]:
add_4(10)

We can also define multiple parameters in a lambda function by a separating comma:

In [None]:
add_4y = lambda x, y: x + 4*y

In [None]:
add_4y(10, y = 2)

Now complete the following questions:

1. Create a lambda function that multiplies two numbers together (and check it)

2. Create a lambda function to check if a number is bigger than 10 (and check it)

<a id = 'overview'></a>
## Pandas overview

Pandas is a specialised package that allows you to work with tabular data using python.

First you need to import the package:

In [None]:
import pandas as pd

Then, to read in a csv file you can use:
```python
pd.read_csv('filepath/file.csv')
```

This notebook uses the `chickweight.csv` dataset, which is in the `data/` folder:

In [None]:
pd.read_csv('data/chickweight.csv')

Aside from making tables look prettier in your Jupyter notebook, there are many advantages for using the Pandas packages when working with data.

<a id = 'benefits'></a>

## The benefits of using Pandas (and Python)

**Question:** What kind of benefits do you think you have using `pandas` to work with data?

<details>
    <summary><font color=blue>Show answer</font></summary>

- **Automation**: You can automate otherwise tedious tasks such as merging multiple datasets.
- **Cleaning**: Pandas allows you to automate the cleaning of your datasets.
- **Speed**: When working with large datasets, it is much faster than tools like Excel.
- **Filtering**: Easy to filter to find specific values
- **Groupby**: Chunk your data set into pieces, apply a function, and place it back together
- **Creating new columns**: Easily create new columns from calculations with other columns

    
*And much more!*

</details>


<a id = 'wrangling'></a>

## Data wrangling with Pandas

**Data Wrangling** is the process of transforming and mapping data, with the intent of making it more appropriate and valuable for a variety of downstream purposes such as for dashboards or analytics.

To demonstrate pandas' capabilities, let's load the `chickweight.csv` dataset.

In [None]:
chickweight = pd.read_csv('data/chickweight.csv').rename(str.lower, axis='columns')

chickweight.head()

<a id = 'exploring'></a>
## <mark> Exercise: Exploratory Data Analysis</mark>

### <mark>Part 1: Learn some key attributes/methods!</mark>

Fill in the comments to explain what each cell does **in your own words**. 

You can use `help(pd.DataFrame.X)` to access the documentation for the attribute/method. For example,

```python
help(chickweight.info)
```

The first one is done for you.

In [None]:
# the shape attribute... gives the number of rows and number of columns
chickweight.shape

In [None]:
# the info method...
chickweight.info()

In [None]:
# the descibe method...
chickweight.describe()

In [None]:
# the index columns...
chickweight.columns

In [None]:
# the head method...
chickweight.head()

In [None]:
# the tail method...
chickweight.tail()

In [None]:
# the sample method...
chickweight.sample(5)

In [None]:
# you can use square brackets to...
chickweight['diet']

In [None]:
# the unique method...
chickweight['diet'].unique()

In [None]:
# the value_counts method...
chickweight['diet'].value_counts()

In [None]:
# the mean method...
chickweight.mean()

<a id = 'ex-explore-data'></a>
### <mark>Part 2: Explore a dataset</mark>
Investigate the `weight` and `time` columns of the dataframe.

1. How many different unique values for `time` are there? What do you think time represents in this dataframe?

2. What are the min & max of the time and weight column?

**Bonus:** What is the most common (i.e the mode) weight of a chicken?

### Answers


<details>
    <summary><font style=font-weight:bold>Part 1:</font>
        <font color=blue>Show answer</font></summary>
  
Using exploration includes:

* Checking the shape (`df.shape`) of the dataframe
* The length (`len(df)`) of the dataframe
* General information (`df.info()`) of the dataframe & columns
* Averages of each numeric column (`df.describe()`)
* The column names (`df.columns`)
* Fetching the first/last or a sample of a few rows (`.head()` `df.sample()` `df.tail()`)
* Selecting one (or more) columns (`df['column_name']`)
* Fetching the unique values of a column (`df['column_name'].unique()`)
* Summing the amount of unique values of a column (`df['column_name'].value_counts()`)

</details>


**Part 2:** Uncomment (remove the `# `) and run the cell to see the solution.

In [None]:
# %load answers/01_Introduction/ex-explore-data-1.py

In [None]:
# %load answers/01_Introduction/ex-explore-data-2.py

In [None]:
# %load answers/01_Introduction/ex-explore-data-3.py

<a id = 'analysis'></a>
## Analysis

### What analysis could you do? 


<img src="images/01_Introduction/chick.png" width="240" height="240" align="center"/>

Imagine that you own a farm and have this dataset available.

Now you have a feel for the dataset, what could you do with it?

In [None]:
chickweight.head()

### Potential areas for analysis:

Some questions you might want to answer are:

The main use case could be to figure out which diet is best, but it is good to think about some of the other use cases. 