# Time Series Forecasting: Fundamentals and Data Exploration

## Introduction
Time series forecasting is an important area of machine learning that is often neglected. It is important because there are so many prediction problems that involve a time component. Some of the time series problems include the following:
1. Forecasting the corn yield in tons by state each year.
2. Forecasting whether an EEG trace in seconds indicates a patient is having a
   seizure ornot.
3. Forecasting the closing price of a stock each day.
4. Forecasting the birth rate at all hospitals in a city each year.
5. Forecasting product sales in units sold each day for a store.
6. Forecasting the number of passengers through a train station each day.
7. Forecasting unemployment for a state each quarter.
8. Forecasting utilization demand on a server each hour.
9. Forecasting the size of the rabbit population in a state each breeding
   season.
10. Forecasting the average price of gasoline in a city each day.

At the end of this notebook, you will discover the  fundamental concepts of time series forecasting: time series data, difference between time series analysis and time series forecasting, examples of time series forecasting problems, sliding window, univariate time series and multivariate analyses etc.

Again, you will learn basic time series data wrangling: how to load time series data into pandas data frame, query the loaded data using date-times, calculate and review summary statistics.


## Time Series
A time series is a sequence of observations taken sequentially in time. For example:

time, measure

1,    100

2,    110

3,    108

4,    115

5,    120

## Time Series Analysis vs. Time Series Forecasting
We have different goals depending on whether we are interested in understanding a dataset
or making predictions. Understanding a dataset, called time series analysis, can help to make
better predictions.




In classical statistics, we are mainly interested in the analysis of time series (descriptive modeling). The primary objective of time series analysis is to develop mathematical models that best describe the sample data.
This field seeks to answer the "why" question behind a time series dataset.

In contrast, time series forecasting (predictive modeling) uses the information in a time
series (perhaps with additional information) to forecast future values of that series.


## Features of Time Series
The main features of many time series are levels, trends, seasonal variations and noise.

• Level. The baseline value for the series if it were a straight line.

• Trend. The optional and often linear increasing or decreasing behavior of the series over
time.

• Seasonality. The optional repeating patterns or cycles of behavior over time.

• Noise. The optional variability in the observations that cannot be explained by the model.

All time series have a level, most have noise, and the trend and seasonality are optional.


## Time Series Nomenclature


Before we move on, it is important to quickly establish the standard terms used when describing
time series data. The current time is defined as t, an observation at the current time is defined
as obs(t).

We are often interested in the observations made at prior times, called lag times or lags.
Times in the past are negative relative to the current time. For example the previous time is t-1
and the time before that is t-2. The observations at these times are obs(t-1) and obs(t-2)
respectively.

Times in the future are what we are interested in forecasting and are positive
relative to the current time. For example the next time is t+1 and the time after that is t+2.
The observations at these times are obs(t+1) and obs(t+2) respectively.

For simplicity, we often drop the obs(t) notation and use t+1 instead and assume we are
talking about observations at times rather than the time indexes themselves. Additionally, we
can refer to an observation at a lag by shorthand such as a lag of 10 or lag=10 which would be
the same as t-10.

To summarize:

• t-n: A prior or lag time (e.g. t-1 for the previous time).

• t: A current time and point of reference.

• t+n: A future or forecast time (e.g. t+1 for the next time).


## Sliding Window For Time Series Data
The use of prior time steps to predict the next time step is called the sliding window method. For short, it may be called the window method in some literature. In statistics and time series analysis, this is called a lag or lag method.

The number of previous time steps is called the window width or size of the lag.

## Univeriate and Multivariate Time Series Data
The number of observations recorded for a given time in a time series dataset matters. Traditionally, we have:

1. Univariate Time Series: These are datasets where only a single variable is observed at each time, such as temperature each hour. The example in the previous section is a univariate time series dataset.

2. Multivariate Time Series: These are datasets where two or more variables are observed at each time.

In general, multivariate time series analysis is much more complicated than univariate time series analysis.

## Load and Explore Time Series Data
The Pandas library in Python provides excellent, built-in support for time series data. Once
loaded, Pandas also provides tools to explore and better understand your dataset. In this lesson,
you will discover how to load and explore your time series dataset.

### Daily Female Births Dataset
In this lesson, we will use the Daily Female Births Dataset as an example. This dataset
describes the number of daily female births in California in 1959.

### Load Time Series Data
Pandas represented time series data as a Series. A Series
is a one-dimensional array with a
time label for each row. Let's load our data:

In [1]:
# load dataset using read_csv()
from pandas import read_csv
series = read_csv('daily-total-female-births.csv', header=0, index_col=0, parse_dates=True,
squeeze=True)





  series = read_csv('daily-total-female-births.csv', header=0, index_col=0, parse_dates=True,


Let's see the data type  and few rows from the data:

In [2]:
print(type(series))
print(series.head())

<class 'pandas.core.series.Series'>
Date
1959-01-01    35
1959-01-02    32
1959-01-03    30
1959-01-04    31
1959-01-05    44
Name: Births, dtype: int64


### Exploring Time Series Data
Pandas also provides tools to explore and summarize your time series data. We’ll
take a look at a few, common operations to explore and summarize your loaded time series data.

Let's see the number of observations we have:


In [3]:
print(series.size)


365


Running the above code we can see that as we would expect, there are 365 observations, one
for each day of the year in 1959.

### Querying By Time
You can slice, dice, and query your series using the time index. For example, you can access all
observations in January as follows:

In [4]:
print(series['1959-01'])


Date
1959-01-01    35
1959-01-02    32
1959-01-03    30
1959-01-04    31
1959-01-05    44
1959-01-06    29
1959-01-07    45
1959-01-08    43
1959-01-09    38
1959-01-10    27
1959-01-11    38
1959-01-12    33
1959-01-13    55
1959-01-14    47
1959-01-15    45
1959-01-16    37
1959-01-17    50
1959-01-18    43
1959-01-19    41
1959-01-20    52
1959-01-21    34
1959-01-22    53
1959-01-23    39
1959-01-24    32
1959-01-25    37
1959-01-26    43
1959-01-27    39
1959-01-28    35
1959-01-29    44
1959-01-30    38
1959-01-31    24
Name: Births, dtype: int64


### Descriptive Statistics
Calculating descriptive statistics on your time series can help get an idea of the distribution and
spread of values. This may help with ideas of data scaling and even data cleaning that you can
perform later as part of preparing your dataset for modeling.

The describe() function creates
a 7 number summary of the loaded time series including mean, standard deviation, median,
minimum, and maximum of the observations.

In [5]:
print(series.describe())


count    365.000000
mean      41.980822
std        7.348257
min       23.000000
25%       37.000000
50%       42.000000
75%       46.000000
max       73.000000
Name: Births, dtype: float64


## Acknowledgements
The content of this notebook was adopted from:

1. Time Series Forecasting with Python (Book by Jason Brownlee)

2. Time Series Forecasting as Supervised Learning (Blog by Jason Brownlee)

Thanks!