# Group by 

Let's take our hvac data and look at the mean power for houses with and without solar.

In [None]:
import pandas as pd

import seaborn as sns
import matplotlib.pyplot as plt

sns.set(rc={'figure.figsize':(12,6)})
plt.style.use(['seaborn-whitegrid'])

import warnings
warnings.simplefilter('ignore')

In [None]:
import utils

df = utils.read_csv('data/measured_real_power.csv')
df.head()

In [None]:
df[['triplex_meter_0']].head()

Lets **melt** again...

In [None]:
stacked = df.reset_index().melt(id_vars='timestamp')
stacked.head()

Clean up the column names

In [None]:
stacked.rename(columns = {'variable': 'meter', 'value': 'power'}, inplace=True)
stacked.head()

How do we select just a few meters?  Easier than column selection?

In [None]:
filter_index = stacked['meter'].isin(['triplex_meter_0', 'triplex_meter_1'])
filter_index.head()

In [None]:
stacked = stacked[filter_index]
stacked.head()

We can check this with **pivot**

In [None]:
stacked.pivot(index="timestamp", columns="meter", values='power').head()

Or with **pivot_table**

In [None]:
pd.pivot_table(stacked, index="timestamp", columns="meter", values='power', aggfunc="sum").head()

## Create the base table 

We just want the mean values by **triplex_meter**.

In [None]:
mean_power = df.mean()
mean_power.head()

In [None]:
type(mean_power)

In [None]:
mean_power = mean_power.reset_index()
mean_power.head()

In [None]:
mean_power.columns = ['house', 'power']
type(mean_power)

In [None]:
mean_power.head()

## Read the relationship `DataFrame` 

In [None]:
import pandas as pd

housedf = pd.read_csv('data/triplex_meter_solar.csv')
housedf.head()

In [None]:
mean_power.head()

## Merge 

In [None]:
pd.merge(mean_power, housedf).head()

One way to fix this is to tell the `merge` function which columns you want to merge on.

In [None]:
pd.merge(mean_power, housedf, left_on='house', right_on='triplex_meter').head()

Another option, you can rename the columns.

In [None]:
housedf.columns = ['house', 'type']
pd.merge(mean_power, housedf).head()

In [None]:
merged = pd.merge(mean_power, housedf, on='house')
merged.head()

Average power by type?

In [None]:
grp = merged.groupby('type')
type(grp)

In [None]:
grp.mean()

We can also use the `agg` function

In [None]:
grp.agg(['mean', 'std', 'max', 'min'])

What about average power by `house` by `type`?

In [None]:
merged.pivot(index='house', columns='type', values='power').head()

In [None]:
merged.pivot(index='house', columns='type', values='power').boxplot(return_type='axes')

In [None]:
sns.boxplot(data=merged, y='power', x='type')