# Pandas - Part 3
> "Python Data Science Handbook" - *Jake Vanderplas (2016)*

# Pivot Tables
We have seen how the GroupBy abstraction lets us explore relationships within a data‐
set. A pivot table is a similar operation that is commonly seen in spreadsheets and
other programs that operate on tabular data. The pivot table takes simple column￾wise data as input, and groups the entries into a two-dimensional table that provides
a multidimensional summarization of the data. The difference between pivot tables
and GroupBy can sometimes cause confusion; it helps me to think of pivot tables as
essentially a multidimensional version of GroupBy aggregation. That is, you split￾apply-combine, but both the split and the combine happen across not a one￾dimensional index, but across a two-dimensional grid.
## Motivating Pivot Tables
For the examples in this section, we’ll use the database of passengers on the Titanic,
available through the Seaborn library:
```python
In[1]:  import numpy as np
        import pandas as pd
        import seaborn as sns
        titanic = sns.load_dataset('titanic')
In[2]:  titanic.head()
Out[2]:
            survived pclass sex age sibsp parch fare    embarked class \\
        0   0     3     male    22.0    1   0   7.2500      S   Third
        1   1     1     female  38.0    1   0   71.2833     C   First
        2   1     3     female  26.0    0   0   7.9250      S   Third
        3   1     1     female  35.0    1   0   53.1000     S   First
        4   0     3     male    35.0    0   0   8.0500      S   Third
         who adult_male deck embark_town alive alone
        0   man     True    NaN     Southampton     no      False
        1   woman   False   C       Cherbourg       yes     False
        2   woman   False   NaN     Southampton     yes     True
        3   woman   False   C       outhampton      yes     False
        4   man     True    NaN     Southampton     no      True
```
This contains a wealth of information on each passenger of that ill-fated voyage,
including gender, age, class, fare paid, and much more.

## Pivot Table Syntax
Here is the equivalent to the preceding operation using the pivot_table method of
DataFrames:
```python
In[5]: titanic.pivot_table('survived', index='sex', columns='class')
Out[5]: 
 class  First    Second   Third
 sex
 female 0.968085 0.921053 0.500000
 male   0.368852 0.157407 0.135447
 ```
This is eminently more readable than the GroupBy approach, and produces the same
result. As you might expect of an early 20th-century transatlantic cruise, the survival
gradient favors both women and higher classes. First-class women survived with near
certainty (hi, Rose!), while only one in ten third-class men survived (sorry, Jack!).

### Multilevel pivot tables
Just as in the GroupBy, the grouping in pivot tables can be specified with multiple lev‐
els, and via a number of options. For example, we might be interested in looking at
age as a third dimension. We’ll bin the age using the pd.cut function:
```python
In[6]:  age = pd.cut(titanic['age'], [0, 18, 80])
        titanic.pivot_table('survived', ['sex', age], 'class')
Out[6]: class               First   Second  Third
        sex    age
        female (0, 18]     0.909091 1.000000 0.511628
               (18, 80]    0.972973 0.900000 0.423729
        male   (0, 18]     0.800000 0.600000 0.215686
               (18, 80]    0.375000 0.071429 0.133663
 ```
We can apply this same strategy when working with the columns as well; let’s add info
on the fare paid using pd.qcut to automatically compute quantiles:
```python
In[7]:  fare = pd.qcut(titanic['fare'], 2)
        titanic.pivot_table('survived', ['sex', age], [fare, 'class'])
Out[7]:
        fare [0, 14.454]
class               First Second Third \\
sex     age
female  (0, 18]     NaN 1.000000 0.714286
        (18, 80]    NaN 0.880000 0.444444
male    (0, 18]     NaN 0.000000 0.260870
        (18, 80]    0.0 0.098039 0.125000

        fare (14.454, 512.329]
class               First Second Third
sex     age
female  (0, 18]     0.909091 1.000000 0.318182
        (18, 80]    0.972973 0.914286 0.391304
male    (0, 18]     0.800000 0.818182 0.178571
        (18, 80]    0.391304 0.030303 0.192308
```
The result is a four-dimensional aggregation with hierarchical indices, shown in a grid demonstrating the relationship between
the values.

### Additional pivot table options
The full call signature of the pivot_table method of DataFrames is as follows:
```python
# call signature as of Pandas 0.18
DataFrame.pivot_table(data, values=None, index=None, columns=None,
 aggfunc='mean', fill_value=None, margins=False,
 dropna=True, margins_name='All')
```



### Try it yourself: Using the titanic data, create a pivot table of the `survived` value with the hierarichal index of ['sex','alone'], with columns `class`

In [16]:
import numpy as np
import pandas as pd
import seaborn as sns
titanic = sns.load_dataset('titanic')

titanic

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.2500,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.0500,S,Third,man,True,,Southampton,no,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,Second,man,True,,Southampton,no,True
887,1,1,female,19.0,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True
888,0,3,female,,1,2,23.4500,S,Third,woman,False,,Southampton,no,False
889,1,1,male,26.0,0,0,30.0000,C,First,man,True,C,Cherbourg,yes,True


In [17]:
# your code here

# Working with Time Series
Pandas was developed in the context of financial modeling, so as you might expect, it
contains a fairly extensive set of tools for working with dates, times, and time￾indexed data. Date and time data comes in a few flavors, which we will discuss here:
* `Time stamps` reference particular moments in time (e.g., July 4th, 2015, at 7:00
a.m.).
* `Time interval`s and periods reference a length of time between a particular begin‐
ning and end point—for example, the year 2015. Periods usually reference a spe‐
cial case of time intervals in which each interval is of uniform length and does
not overlap (e.g., 24 hour-long periods constituting days).
* `Time deltas` or durations reference an exact length of time (e.g., a duration of
22.56 seconds).

In this section, we will introduce how to work with each of these types of date/time
data in Pandas. This short section is by no means a complete guide to the time series
tools available in Python or Pandas, but instead is intended as a broad overview of
how you as a user should approach working with time series. We will start with a
brief discussion of tools for dealing with dates and times in Python, before moving
more specifically to a discussion of the tools provided by Pandas. After listing some
resources that go into more depth, we will review some short examples of working
with time series data in Pandas.

## Pandas Time Series: Indexing by Time
Where the Pandas time series tools really become useful is when you begin to index
data by timestamps. For example, we can construct a Series object that has time￾indexed data:

```python
In[12]: index = pd.DatetimeIndex(['2014-07-04', '2014-08-04',
                                '2015-07-04', '2015-08-04'])
        data = pd.Series([0, 1, 2, 3], index=index)
        data
Out[12]:    2014-07-04 0
            2014-08-04 1
            2015-07-04 2
            2015-08-04 3
            dtype: int64
```
Now that we have this data in a Series, we can make use of any of the Series index‐
ing patterns we discussed in previous sections, passing values that can be coerced into
dates:
```python
In[13]:     data['2014-07-04':'2015-07-04']
Out[13]:    2014-07-04 0
            2014-08-04 1
            2015-07-04 2
            dtype: int64
 ```
There are additional special date-only indexing operations, such as passing a year to
obtain a slice of all data from that year:
```python
In[14]: data['2015']
Out[14]: 2015-07-04 2
 2015-08-04 3
 dtype: int64
```
.

### Try it yourself: From the provided code, get the total precipitation for each month

In [20]:
import pandas as pd

rainfall = pd.read_csv(r"https://github.com/jakevdp/PythonDataScienceHandbook/raw/master/notebooks_v1/data/Seattle2014.csv")
rainfall['DATE'] = pd.to_datetime(rainfall['DATE'], format='%Y%m%d')
rainfall = rainfall.loc[:,['DATE','PRCP']].set_index('DATE')
rainfall

Unnamed: 0_level_0,PRCP
DATE,Unnamed: 1_level_1
2014-01-01,0
2014-01-02,41
2014-01-03,15
2014-01-04,0
2014-01-05,0
...,...
2014-12-27,33
2014-12-28,41
2014-12-29,0
2014-12-30,0


In [19]:
# your code here