# Panel Data

Sometimes, data comes in such a way that many observations share certain common features. For example, several measurements can be made in the same location, under the same condition, or for the same subject. To understand the data and extract meaningful insights, we often need to aggregate these observations. This is where the groupby() function comes into play.

## Exploring Panel Data

As always, let's start by importing pandas and loading our dataset. This time our conversion to datetime will be a bit different.

We'll stop short of setting the index as our datetime value though. This is because an index must have unique values, and because this panel data contains lots of different company stocks for just one quarter of a year, we'll see the same date lots of times.

Let's explore this panel data a bit more, to answer some simple questions:

- How many companies are considered in the data
- How many stocks are considered in the data 
- Which exchanges are considered in the data
- Which exchanges appear most


## Grouping

What if we wanted to calculate daily returns in this data set. Is it as simple as using `pct_change()`? Let's try.

Can you see what's gone wrong here? Our first calculated daily return for American Airlines is using AAR's last closing price. This hopefully gets across the importance of *grouping*, particularly useful with this kind of panel data.


We can solve this with the `groupby()` method of data frames.

Perfect! Grouping is a very powerful way to manipulate panel data. Let's see what else we can do with it. Recall that we had more stocks than companies. Let's see why that is by looking at how many unique stocks are issued by each company (using the `iid`). Then let's list those companies.

So it looks like some companies have more than one issue. What if we wanted to see the ticker of each issue, for each company? Here we could use `groupby()` to group based on multiple columns.

Note that grouping by company name and issue ID should give us the same number of groups as when we group by the ticker symbol, since each issue of each company will have a unique ticker symbol. Run the code cell below to confirm it.

In [13]:
print("Number of groups with 'tic'", len(df.groupby("tic")))
print("Number of groups with 'iid' and 'conm'", len(df.groupby(["iid", "conm"])))

### Exercise 1

Identify if there are any companies that trade on more than one exchange.

In [14]:
## YOUR CODE GOES HERE

Create a new data frame with only the trading data for the company you identify, using the `get_group()` method to do so. What sort of data cleaning could you imagine carrying out on this data?

In [15]:
## YOUR CODE GOES HERE

## Aggregation

Aggregation functions like `mean()`, `median()`, `sum()`, `min()`, `max()` and `std()` can be applied to grouped data to give insights across panel data. Say we wanted the average daily return of each traded stock, or the max volume traded on any given day for each stock?

Once we've done these sorts of aggregation, we're often curious to see who sits at the top or the bottom of the distribution. We can use `nlargest()` and its antonym here.

We can group by multiple columns when doing aggregation too. This can be useful, for example, to find high performers in each month.

### Exercise 2

Identify the two exchanges that have the highest number of companies trading.

In [19]:
## YOUR CODE GOES HERE

Next identify the total trading volume of each exchange.

In [20]:
## YOUR CODE GOES HERE