# Panel Data

Sometimes, data comes in such a way that many observations share certain common features. For example, several measurements can be made in the same location, under the same condition, or for the same subject. To understand the data and extract meaningful insights, we often need to aggregate these observations. This is where the groupby() function comes into play.

## Exploring Panel Data

As always, let's start by importing pandas and loading our dataset. This time our conversion to datetime will be a bit different.

In [None]:
import pandas as pd

# Load the data
df = pd.read_csv("https://raw.githubusercontent.com/ImperialCollegeLondon/efds-ta-python/refs/heads/main/data/sec_data.csv")



We'll stop short of setting the index as our datetime value though. This is because an index must have unique values, and because this panel data contains lots of different company stocks for just one quarter of a year, we'll see the same date lots of times.

Let's explore this panel data a bit more, to answer some simple questions:

- How many companies are considered in the data
- How many stocks are considered in the data 
- Which exchanges are considered in the data
- Which exchanges appear most


## Grouping


Grouping is a powerful way to manipulate panel data. Once you've grouped, you can call functions and they will be applied groupwise. The most common application of grouping is to calculate returns on a stock-by-stock basis, but there are many other uses!

Let's see what else we can do with grouping. Recall that we had more stocks than companies. Let's see why that is by looking at how many unique stocks are issued by each company (using the `tic`). Then let's list those companies.

### Exercise: Excellent Exchanges

**Part 1** Identify the number of unique stocks traded on each exchange.

**Part 2** Then identify any companies that trade on more than one exchange.

## Aggregation

Aggregation functions like `mean()`, `median()`, `sum()`, `min()`, `max()`, `first()`, `last()` and `std()` can be applied to grouped data to give insights across panel data. Say we wanted the average daily return of each traded stock, or the max volume traded on any given day for each stock?

Once we've done these sorts of aggregation, we're often curious to see who sits at the top or the bottom of the distribution. We can use `nlargest()` and its antonym here.

We can also group by multiple columns! This can be helpful when doing aggregation, for example, to find high performers in each month.

### Exercise: Good Days

Which two days of the week see the highest average close in this data set, and what is the average close for those days?  

### Exercise: Trading Exchanges

Next identify the total trading volume of each exchange.

### Exercise: The 500 Club

For stocks that reached a closing price above 500, how many times in each month, did they acheive this?