# Introduction to Time Series Data

Data can come in many different formats, and many differentshapes and sizes. You've maybe heard of tabular data, a format you may be familiar with from working in something like Excel. 

We will explore two main kinds of tabular data in this module. The first is time series data. Time series data will be *indexed* with a date and time. We'll look a bit more closely at that soon, but for now just think of it as each row having a date or time, rather than a row number.

## Loading Data

One of the most popular packages in Python for working with tabular data is called Pandas. Today we'll get acquainted with Pandas.

The first thing we'll do is `import` the `pandas` package. Convention has us use a shortform name - `pd` - because we'll be using the package so often.

And below we'll use pandas' `read_csv()` to load the data into a `DataFrame`. DataFrames are the main data structure in pandas for tabular data, and lots of other programming languages use the concept of a DataFrame too! By convention, you'll often see `df` used as a variable name.

In [None]:
url = "https://raw.githubusercontent.com/ImperialCollegeLondon/efds-ta-python/main/data/AAPL_2020.csv"


Before we do anything else, it's a good idea to take a look at the DataFrame. Some methods will let us take a closer look at parts of our data. 

Other methods and attributes can give us an overview, and give us further insights to our data in general. `shape()` will tell us the number of rows and columns in our data frame, while `info()` will give us some info on the data type (`dtype`) of each column.

You'll notice the types are slightly different from the usual Python types - this is because they belong to the `numpy` package, which sits under the hood of `pandas`. We'll look more at `numpy` tomorrow, but for now here is a word about each of the types in our data frame.

- `float64` - 64-bit floating point (number with a decimal point)
- `int64` - 64-bit integer (whole number)
- `object` - other Python data types (strings in this case)

For a look at some actual data within the data frame, we can use square bracket notation and `iloc` to access columns and rows.

## Setting the Index

In a DataFrame, each row is assigned a unique index value. By default, this is just a number (starting at 0). When it makes sense, we can choose one of the other columns to be an index. For time series data, where each row represents a different point in time, we'll set our `Date` column as the index. This will make it easier for us to work with the data, and can speed up other operations later on.


We convert the 'Date' column to a datetime object because pandas can recognise and efficiently work with datetime objects. We set the `Date` column as the index because in time-series data like ours, operations are time-based.

With the index set, we can now use it to access different portions of our data a little bit more easily.

## Basic Operations
There are also many basic operations we can do with pandas, such as calculating the mean of a column, the maximum of a column, and so on.


### Exercise 1

Compare AAPL's *median* **high** in Q1 and Q2 of 2020. In which quarter was it higher? Use the cell below to show your work.

In [None]:
## YOUR CODE GOES HERE

### Exercise 2

Looking only at the first 100 days of trading in the AAPL dataset, print the following information:

* First opening price of the period
* Last close price of the period
* Total volume traded over the period

In [None]:
## YOUR CODE GOES HERE

### (Optional) Exercise 3

Run the cell below to create a new DataFrame with hourly trading info from a single day.

Extend the code to compare the mean close price and trading volume in
- the morning (up to and including 11:00)
- the afternoon (from 12:00 onwards)

**HINT** Instead of square brackets, use `between_time()` to slice.

In [None]:
data = {
    'Time': ['2023-06-01 00:00:00', '2023-06-01 01:00:00', '2023-06-01 02:00:00', 
             '2023-06-01 03:00:00', '2023-06-01 04:00:00', '2023-06-01 05:00:00', 
             '2023-06-01 06:00:00', '2023-06-01 07:00:00', '2023-06-01 08:00:00', 
             '2023-06-01 09:00:00', '2023-06-01 10:00:00', '2023-06-01 11:00:00',
             '2023-06-01 12:00:00', '2023-06-01 13:00:00', '2023-06-01 14:00:00',
             '2023-06-01 15:00:00', '2023-06-01 16:00:00', '2023-06-01 17:00:00',
             '2023-06-01 18:00:00', '2023-06-01 19:00:00', '2023-06-01 20:00:00',
             '2023-06-01 21:00:00', '2023-06-01 22:00:00', '2023-06-01 23:00:00'],
    'Close': [120, 121, 119, 119, 118, 119, 120, 121, 122, 123, 124, 125,
              125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136],
    'Volume': [1000, 1050, 1075, 1100, 1125, 1150, 1200, 1250, 1300, 1350, 
               1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 
               1900, 1950, 2000, 2050]
}

trading = pd.DataFrame(data) # Create a DataFrame "literal"

## YOUR CODE GOES HERE


