# 2. Summarizing, Aggregating & Grouping
Knowing how to use pandas aggregation & grouping functions lets us reduce the dimensionality of our data (most often over the rows - aka `axis=0`).  

For completeness, `axis=1` refers to the columns.

In [1]:
import pandas as pd
import numpy as np

In [2]:
wine = pd.read_csv('data/wine/winemag-data_first150k.csv', index_col=0)

Answer to exercise from pandas notebook 1:


`scrambled_wine[['points', 'region_1']].iloc[:5] 
`

## Initial quick analysis using pandas
Before we dive into this section on using pandas to view our data in different ways, let's review some of the basics of pandas from before.

Pandas has multiple built-in functions that make it easy to quickly see what's in your dataframe. 
You can build them with the selecting tools you used before.

Here, we will select a column, and then see how pandas lets us quickly analyse it.

To quickly see which columns our wine dataset has, we can use the .columns attribute.

In [3]:
wine.columns

Index(['country', 'description', 'designation', 'points', 'price', 'province',
       'region_1', 'region_2', 'variety', 'winery'],
      dtype='object')

Let's select price.

In [4]:
wine['price'].tail()

150925    20.0
150926    27.0
150927    20.0
150928    52.0
150929    15.0
Name: price, dtype: float64

We can quickly see some metrics of the price, using some built-in aggregating functions in pandas.

In [5]:
print('Average wine price: ', wine['price'].mean())
print('Min wine price: ', wine['price'].min())
print('Median wine price: ', wine['price'].median())
print('Max wine price: ', wine['price'].max())

Average wine price:  33.13148249353299
Min wine price:  4.0
Median wine price:  24.0
Max wine price:  2300.0


And some more advanced metrics..

Here we see how many records we have by each country

In [6]:
wine['country'].value_counts()

US                        62397
Italy                     23478
France                    21098
Spain                      8268
Chile                      5816
Argentina                  5631
Portugal                   5322
Australia                  4957
New Zealand                3320
Austria                    3057
Germany                    2452
South Africa               2258
Greece                      884
Israel                      630
Hungary                     231
Canada                      196
Romania                     139
Slovenia                     94
Uruguay                      92
Croatia                      89
Bulgaria                     77
Moldova                      71
Mexico                       63
Turkey                       52
Georgia                      43
Lebanon                      37
Cyprus                       31
Brazil                       25
Macedonia                    16
Serbia                       14
Morocco                      12
England 

What if we want to see this list in the opposite order? We can use the `ascending` keyword

In [7]:
wine['country'].value_counts().sort_values(ascending=True)

US-France                     1
Japan                         2
Montenegro                    2
Albania                       2
Tunisia                       2
China                         3
Slovakia                      3
Egypt                         3
Switzerland                   4
Bosnia and Herzegovina        4
South Korea                   4
Ukraine                       5
Czech Republic                6
Lithuania                     8
India                         8
Luxembourg                    9
England                       9
Morocco                      12
Serbia                       14
Macedonia                    16
Brazil                       25
Cyprus                       31
Lebanon                      37
Georgia                      43
Turkey                       52
Mexico                       63
Moldova                      71
Bulgaria                     77
Croatia                      89
Uruguay                      92
Slovenia                     94
Romania 

We can also quickly view all of the unique values of a given column using the `.unique()` method with a specific column

In [8]:
wine['country'].unique()

array(['US', 'Spain', 'France', 'Italy', 'New Zealand', 'Bulgaria',
       'Argentina', 'Australia', 'Portugal', 'Israel', 'South Africa',
       'Greece', 'Chile', 'Morocco', 'Romania', 'Germany', 'Canada',
       'Moldova', 'Hungary', 'Austria', 'Croatia', 'Slovenia', nan,
       'India', 'Turkey', 'Macedonia', 'Lebanon', 'Serbia', 'Uruguay',
       'Switzerland', 'Albania', 'Bosnia and Herzegovina', 'Brazil',
       'Cyprus', 'Lithuania', 'Japan', 'China', 'South Korea', 'Ukraine',
       'England', 'Mexico', 'Georgia', 'Montenegro', 'Luxembourg',
       'Slovakia', 'Czech Republic', 'Egypt', 'Tunisia', 'US-France'],
      dtype=object)

This list is too long. What if we only want to see the top 10 countries?
We can string together the other selectors we learned before!

Can you think of another way to get the top 10 rows?

In [9]:
wine['country'].value_counts().head(10)

US             62397
Italy          23478
France         21098
Spain           8268
Chile           5816
Argentina       5631
Portugal        5322
Australia       4957
New Zealand     3320
Austria         3057
Name: country, dtype: int64

What if we just want to know how many countries are on the list?

In [10]:
# Number of non-null unique values
wine['country'].nunique()

48

Another way to find unique values is using `set`

In [11]:
# Gives all unique values
set(wine['country'])

{'Albania',
 'Argentina',
 'Australia',
 'Austria',
 'Bosnia and Herzegovina',
 'Brazil',
 'Bulgaria',
 'Canada',
 'Chile',
 'China',
 'Croatia',
 'Cyprus',
 'Czech Republic',
 'Egypt',
 'England',
 'France',
 'Georgia',
 'Germany',
 'Greece',
 'Hungary',
 'India',
 'Israel',
 'Italy',
 'Japan',
 'Lebanon',
 'Lithuania',
 'Luxembourg',
 'Macedonia',
 'Mexico',
 'Moldova',
 'Montenegro',
 'Morocco',
 'New Zealand',
 'Portugal',
 'Romania',
 'Serbia',
 'Slovakia',
 'Slovenia',
 'South Africa',
 'South Korea',
 'Spain',
 'Switzerland',
 'Tunisia',
 'Turkey',
 'US',
 'US-France',
 'Ukraine',
 'Uruguay',
 nan}

You can look [here](https://www.shanelynn.ie/summarising-aggregation-and-grouping-data-in-python-pandas/) for a list of all the built-in pandas stats.

One of the most powerful built-in summary tools for pandas is `df.describe()`. This quickly calculates some of these stats for the numeric columns in the df.

In [12]:
wine.describe()

Unnamed: 0,points,price
count,150930.0,137235.0
mean,87.888418,33.131482
std,3.222392,36.322536
min,80.0,4.0
25%,86.0,16.0
50%,88.0,24.0
75%,90.0,40.0
max,100.0,2300.0


**Question**: Why are only 2 of the columns included?

### Conditional Selections 
We can use conditional selections to narrow our analysis even further.

DON'T FORGET - to make things easier, we can save selections we plan to use often as their own variables.

In [13]:
us = wine[wine['country']=='US']
france = wine[wine['country']=='France']

In [14]:
print('Mean American wine price: $', round(us['price'].mean(),2))
print('Mean French wine price: $', round(france['price'].mean(),2))
print('Mean overall wine price is: $', round(wine['price'].mean(),2))

Mean American wine price: $ 33.65
Mean French wine price: $ 45.62
Mean overall wine price is: $ 33.13


We can then use these to calculate more targeted metrics.

In [15]:
print('French wine is ${} more expensive on average'\
      .format(round(france['price'].mean() - wine['price'].mean(),2)))

French wine is $12.49 more expensive on average


#### More advanced conditionals: Using masks
When you want to filter on >1 criteria, it can be easier to use a mask.

How many wines from North America do we have on our list?

In [16]:
wine.columns

Index(['country', 'description', 'designation', 'points', 'price', 'province',
       'region_1', 'region_2', 'variety', 'winery'],
      dtype='object')

A mask is actually a 'boolean mask'. With a mask, we specify multiple conditions, and it returns a set of boolean values representing whether or not the given row meets ALL of the conditions.

Then we can use these booleans to select on the overall DataFrame. 

In [17]:
na_mask = (wine.country == 'US') | (wine.country == 'Mexico') | (wine.country == 'Canada') 
na_indexes = wine.index[na_mask]
na = wine.loc[na_indexes]

In [18]:
# another way
na = wine[na_mask]

In [19]:
na.head()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
0,US,This tremendous 100% varietal wine hails from ...,Martha's Vineyard,96,235.0,California,Napa Valley,Napa,Cabernet Sauvignon,Heitz
2,US,Mac Watson honors the memory of a wine once ma...,Special Selected Late Harvest,96,90.0,California,Knights Valley,Sonoma,Sauvignon Blanc,Macauley
3,US,"This spent 20 months in 30% new French oak, an...",Reserve,96,65.0,Oregon,Willamette Valley,Willamette Valley,Pinot Noir,Ponzi
8,US,This re-named vineyard was formerly bottled as...,Silice,95,65.0,Oregon,Chehalem Mountains,Willamette Valley,Pinot Noir,Bergström
9,US,The producer sources from two blocks of the vi...,Gap's Crown Vineyard,95,60.0,California,Sonoma Coast,Sonoma,Pinot Noir,Blue Farm


How many wines do we have total in the in North America?

In [20]:
na['country'].count()

62656

In [21]:
na.count()['country']

62656

In [22]:
na.shape[0]

62656

**Question:** How many of the wines belong to each country?

**Question:** From which US state do most of our wines come?

In [23]:
wine[wine['country']=='US']['province'].value_counts().sort_values(ascending=False).index[0]

'California'

In [24]:
wine[wine['country']=='US']['province'].value_counts().index[0]

'California'

In [25]:
na.country.value_counts()

US        62397
Canada      196
Mexico       63
Name: country, dtype: int64

## Groupby
One of the most flexible ways to aggregate in pandas is with `.groupby()`.
We will look at how this works for categorical datasets like this one, and also for datetime datasets, as dealing with datetimes in pandas can be tricky.

In [26]:
# importing matplotlib to make plots later
import matplotlib.pyplot as plt
%matplotlib inline

### How Groupby Works:
You can group your data in many different ways, and also aggregate it by any of the aggregators we saw before: like mean, mode, sum, etc.

#### NOTE:
When using groupby, the column(s) in the parentheses (the columns that you group on) in the method will be your new index!

Also, we ALWAYS have to have an 'aggregator' function. This function tells Pandas how it should aggregate the values. Remember that since we're changing the data, pandas has to know how it should change the data to fit the new output. Before, we had our data in the format of each wine review. Now we are combining all the rows for each country together under the header of the country name. Thus, pandas needs to know how it should aggregate this data. Take for example price: for the US for example, we are now combining about 60,000 rows of data into one number for the US. Should this one number be the average price? The total price? A count of how many records we have for the price column? We need to say this to Pandas specifically.

In [27]:
wine.groupby('country').mean().sort_values(by=['points','price'], ascending=False)

Unnamed: 0_level_0,points,price
country,Unnamed: 1_level_1,Unnamed: 2_level_1
England,92.888889,47.5
Austria,89.276742,31.192106
France,88.92587,45.619885
Germany,88.626427,39.011078
Italy,88.413664,37.547913
Canada,88.239796,34.628866
Slovenia,88.234043,28.061728
Morocco,88.166667,18.833333
Turkey,88.096154,25.8
Portugal,88.057685,26.332615


**Tip:** Sometimes, if you wanted to aggregate different columns in different ways, and to make your code cleaner, it's best to move the aggregations out and store them in their own variable that you can update saparately. We can save the aggregators as a dictionary with the column names for this purpose.

In [28]:
aggs = {
    'price': 'mean',
    'points': 'max'
}

wine.groupby('country').agg(aggs).sort_values(by=['points','price'], ascending =[False, True]).head()

Unnamed: 0_level_0,price,points
country,Unnamed: 1_level_1,Unnamed: 2_level_1
Australia,31.25848,100
US,33.653808,100
Italy,37.547913,100
France,45.619885,100
Portugal,26.332615,99


You can also use a list in your aggs to aggergate one column in different ways.

This will give a **multi-index**. Multi-indexes can be difficult to sort on. But, there are a few different ways we can deal with this.

In [29]:
aggs = {
    'price': ['min', 'mean', 'max', 'std']
}

price_table = wine.groupby('country').agg(aggs)
price_table

Unnamed: 0_level_0,price,price,price,price
Unnamed: 0_level_1,min,mean,max,std
country,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Albania,20.0,20.0,20.0,0.0
Argentina,4.0,20.794881,250.0,20.18654
Australia,5.0,31.25848,850.0,39.008512
Austria,8.0,31.192106,1100.0,28.540861
Bosnia and Herzegovina,12.0,12.75,13.0,0.5
Brazil,11.0,19.92,35.0,8.840814
Bulgaria,7.0,11.545455,28.0,4.959163
Canada,12.0,34.628866,145.0,24.267644
Chile,5.0,19.34478,400.0,19.618082
China,7.0,20.333333,27.0,11.547005


One way is by dropping the top level ('price'):

In [30]:
price_table.columns = price_table.columns.droplevel(level=0)

# now, we can sort by any of the columns. Here, by average price.
price_table.sort_values(by='mean', ascending=False).head()

Unnamed: 0_level_0,min,mean,max,std
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
US-France,50.0,50.0,50.0,
England,38.0,47.5,75.0,11.964232
France,5.0,45.619885,2300.0,69.69706
Hungary,7.0,44.204348,764.0,66.264502
Luxembourg,36.0,40.666667,50.0,7.0


Another way is by using `np.ravel()` . This preserves the "price" indicator somewhere in each of the column names by combining the first word, 'price', with the given aggregation for the column.

In [31]:
price_table = wine.groupby('country').agg(aggs)

price_table.head()

Unnamed: 0_level_0,price,price,price,price
Unnamed: 0_level_1,min,mean,max,std
country,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Albania,20.0,20.0,20.0,0.0
Argentina,4.0,20.794881,250.0,20.18654
Australia,5.0,31.25848,850.0,39.008512
Austria,8.0,31.192106,1100.0,28.540861
Bosnia and Herzegovina,12.0,12.75,13.0,0.5


In [32]:
price_table.columns

MultiIndex([('price',  'min'),
            ('price', 'mean'),
            ('price',  'max'),
            ('price',  'std')],
           )

In [33]:
# Using ravel, and a string join, we can create better names for the columns:
price_table.columns = ["_".join(x) for x in price_table.columns.ravel()]
price_table.sort_values(by='price_mean', ascending=False).head()

Unnamed: 0_level_0,price_min,price_mean,price_max,price_std
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
US-France,50.0,50.0,50.0,
England,38.0,47.5,75.0,11.964232
France,5.0,45.619885,2300.0,69.69706
Hungary,7.0,44.204348,764.0,66.264502
Luxembourg,36.0,40.666667,50.0,7.0


## Selecting the max and min values with Index Max and Min
In addition to `.max()` and `.min()`, which returns the maximum or minimum values, we can use `.idxmax()` and `.idxmin()` to return the *index* pertaining to the maximum and minimum values. 

For example, let's use `.idxmax()` to find the country with the highest standard deviation in its prices.

In [34]:
price_table.loc[:, 'price_std'].idxmax()

'France'

# Dealing with datetime in Pandas

Much of the data we work with in the Energy sector has a time component.


Pandas built off the datetime package in Python to offer a datetime index, and plenty of ways to work with this.
However, it is still far from intuitive. 
That doesn't mean it's not useful, and for anyone doing a timeseries project, you'll need to deal with dates and times in pandas often.

Let's load a sample dataset of datetime energy data and get started!

In [35]:
energy = pd.read_csv('data/energy/PJM_Load_hourly.csv', 
                     parse_dates=True, index_col=0)

Note that this data has a DateTimeIndex. 
setting `parse_dates=True` when we read the CSV lets pandas infer this datetimeindex.

In [36]:
energy.index

DatetimeIndex(['1998-12-31 01:00:00', '1998-12-31 02:00:00',
               '1998-12-31 03:00:00', '1998-12-31 04:00:00',
               '1998-12-31 05:00:00', '1998-12-31 06:00:00',
               '1998-12-31 07:00:00', '1998-12-31 08:00:00',
               '1998-12-31 09:00:00', '1998-12-31 10:00:00',
               ...
               '2001-01-01 15:00:00', '2001-01-01 16:00:00',
               '2001-01-01 17:00:00', '2001-01-01 18:00:00',
               '2001-01-01 19:00:00', '2001-01-01 20:00:00',
               '2001-01-01 21:00:00', '2001-01-01 22:00:00',
               '2001-01-01 23:00:00', '2001-01-02 00:00:00'],
              dtype='datetime64[ns]', name='Datetime', length=32896, freq=None)

We can select data points within a specific time range, using the DateTimeIndex and .loc.
Here, we select the first one day of data.

In [37]:
energy.loc['1998-12-31 01:00:00':'1999-01-01 00:00:00']

Unnamed: 0_level_0,PJM_Load_MW
Datetime,Unnamed: 1_level_1
1998-12-31 01:00:00,29309.0
1998-12-31 02:00:00,28236.0
1998-12-31 03:00:00,27692.0
1998-12-31 04:00:00,27596.0
1998-12-31 05:00:00,27888.0
1998-12-31 06:00:00,29382.0
1998-12-31 07:00:00,31373.0
1998-12-31 08:00:00,33272.0
1998-12-31 09:00:00,34133.0
1998-12-31 10:00:00,35232.0


In [38]:
# One record for each hour of this day.
energy.loc['1998-12-31 01:00:00':'1999-01-01 00:00:00', :].shape[0]

24

## Selecting with boolean indexing on pandas datetimeindex
We can use .dot notation with conditionals to select on specific parts of the datetime, like days or months.

Python datetime functionality example:

These functions, `.strptime()` and `.strftime()`, allow us to move back and forth between strings and datetime formats, allowing us to print out the datetime values how we want while still preserving the underlying data.

In [39]:
from datetime import datetime

# string listing a date 
s = "8 March, 2017"

# using the .strptime() method to extract the day, month and year and save it as a datetime object
d = datetime.strptime(s, '%d %B, %Y')

# using the .strftime() to format our datetime object in a different string and return it
print(d.strftime('%Y-%m-%d'))

2017-03-08


Here, we look at how the datetime has been saved.

Note the 2 0s at the end, where hour and minute would be saved if we had information for those. 

In [40]:
d

datetime.datetime(2017, 3, 8, 0, 0)

In [41]:
# making a new DF that only includes the month of septmeber from each year.
septembers = energy[energy.index.month == 9]

We can also call just a date, and get all the hours/time periods in that day:

In [42]:
energy.loc['2000-01-03']

Unnamed: 0_level_0,PJM_Load_MW
Datetime,Unnamed: 1_level_1
2000-01-03 01:00:00,21557.0
2000-01-03 02:00:00,20464.0
2000-01-03 03:00:00,20057.0
2000-01-03 04:00:00,19988.0
2000-01-03 05:00:00,20463.0
2000-01-03 06:00:00,22228.0
2000-01-03 07:00:00,25780.0
2000-01-03 08:00:00,28369.0
2000-01-03 09:00:00,29126.0
2000-01-03 10:00:00,29616.0


Same with for a year and month:

In [43]:
energy.loc['2000-01'].shape

(744, 1)

In [44]:
# We see that it includes one record for each our of each day of the month of January, which has 31 days
31*24

744

In [45]:
# or better, with an assert statement:
month_jan = energy['2000-01']
assert month_jan.shape[0] == 31*24

## Resampling
We can also combine the data in different ways, and over different time periods.
This means that just because our data is in hourly time periods, we dont have to keep it that way. 

Resampling is a powerful way to change the granularity of our timeseries data.

Think of it as a groupby function for datetime. **This means we still need an aggreator!**

However, consider that with with groupby, we are generally only going to a higher level of abstraction-- we are literally grouping multiple rows together. With resample, it's possiblt that we go the other way and actually *add* rows, moving to a higher granularity, rather than lower, aggregated granularity. An example of this might be up-sampling from 1-minute granularity to 30-second granularity. Now, we have one new row every minute. We have to use different methods to fill in these new rows. These include methods like `.bfill()`, which in this case would fill the new row made at 1:30 with the original value from 2:00, or `.ffill()`, which would fill in the new row made at 1:30 with the original value from 1:00. Which one to use will depend on your specific project. 

In [46]:
# We can get the average load over a day:
daily_avg_energy = energy.resample('D').mean()
daily_avg_energy.head()

Unnamed: 0_level_0,PJM_Load_MW
Datetime,Unnamed: 1_level_1
1998-04-01,27813.73913
1998-04-02,26605.791667
1998-04-03,25672.333333
1998-04-04,24487.083333
1998-04-05,23487.565217


Be careful when moving to different levels of granularity and with which aggregation method you use. Ask yourself: What data am I showing now? Did the data change fundamentally? How can I indicate this to future users of the data and future readers of my code? 

The example below shows an example of this. When we use 'Sum' as an aggregator over any timeframe when we're talking about energy, we more from MW of enery flow to MWh of used energy. We have to make sure we update our data to relfect this. We would also have to take more care with this shift to MWh if our data wasn't originally in hourly granularity.

In [47]:
# We can also get the total MWh used in a day:
daily_energy = energy.resample('D').sum()
daily_energy.columns = ["PJM_Load_MWh"]
daily_energy.head()

Unnamed: 0_level_0,PJM_Load_MWh
Datetime,Unnamed: 1_level_1
1998-04-01,639716.0
1998-04-02,638539.0
1998-04-03,616136.0
1998-04-04,587690.0
1998-04-05,540214.0


## Groupby with DateTimeIndex
Using groupby with a pandas DateTimeIndex can be extremely useful and powerful.
Let's look at how this can work.

In [48]:
indexes = [energy.index.year,
           energy.index.month,
           energy.index.week,
           energy.index.weekday,
           energy.index.day]

aggregated = energy.groupby(indexes).sum()

In [49]:
aggregated.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,PJM_Load_MW
Datetime,Datetime,Datetime,Datetime,Datetime,Unnamed: 5_level_1
1998,4,14,2,1,639716.0
1998,4,14,3,2,638539.0
1998,4,14,4,3,616136.0
1998,4,14,5,4,587690.0
1998,4,14,6,5,540214.0


The index level names are not particularly helpful here. We can change them.

In [50]:
# the long, ugly way
aggregated.index.set_names('year', level=0, inplace=True)
aggregated.index.set_names('month', level=1, inplace=True)
aggregated.index.set_names('week', level=2, inplace=True)
aggregated.index.set_names('weekday', level=3, inplace=True)
aggregated.index.set_names('day', level=4, inplace=True)

In [51]:
# the short, clean way. Both do the same thing.
index_level_names = ['year', 'month', 'week', 'weekday', 'day']
for i, index_level in enumerate(index_level_names):
            aggregated.index.set_names(index_level, level=i, inplace=True)

In [52]:
aggregated.head(50)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,PJM_Load_MW
year,month,week,weekday,day,Unnamed: 5_level_1
1998,4,14,2,1,639716.0
1998,4,14,3,2,638539.0
1998,4,14,4,3,616136.0
1998,4,14,5,4,587690.0
1998,4,14,6,5,540214.0
1998,4,15,0,6,640312.0
1998,4,15,1,7,643340.0
1998,4,15,2,8,636976.0
1998,4,15,3,9,657606.0
1998,4,15,4,10,601388.0


Now, we can plot by these levels!

In [53]:
aggregate

NameError: name 'aggregate' is not defined

In [None]:
energy_last_week_of_year.groupby(level=0).sum().plot(kind='bar')
plt.show()

## Multi-level aggreations on the same variable: reset index

Clearly, these complicated slices can be cumbersome to use. A much simpler way is to take these groupings out of the index and move them into the columns.

Then we can select them like any other column, and no longer need to use any complicated slices. 

In [None]:
aggregated.head()

In [None]:
aggregated.reset_index(inplace=True)

In [None]:
aggregated.head()

In [None]:
aggregated_1998 = aggregated[aggregated['year']==1998]
aggregated_1998.head()

## Exercise:
1. Find the week (and its associated year) with the highest total weekly consumption.

2. Find the day of the week that averages the highest consumption

3. Find the time of day that averages the lowest consumption.
    - Has this changed over the years?
    

4. Is average consumption rising, falling, or staying the same over the years?
5. What is the %age difference in consumption on average between April and June?

1. Find the week (and its associated year) with the highest total weekly consumption.

2. Find the day of the week that averages the highest consumption

3. Find the time of day that averages the lowest consumption.
    - Has this changed over the years?

4. Is average consumption rising, falling, or staying the same over the years?

5. What is the %age difference in consumption on average between April and June?