# Final Capstone Proposal

### What is the problem you are attempting to solve?

In this project, I will use time series analysis to predict the future share price for an end user's desired stock ticker symbol.

### How is your solution valuable?

My solution is valuable because it can help small investors make buy or sell decisions for a particular stock. It will help investors to profit from the stock market by creating robust models built around historical OHLC stock price data and key technical indicators.

I will allow the end user to input a stock ticker symbol as well as a time variable from a selection of four different options: day, week, month, or year. This indicates the time range after which the end user wants to calculate the projected stock price. For example, if the end user picks 'week', it means the user wants to forecast the desired stock's closing price one week from today.

### What is your data source and how will you access it?

I will be using the Alpha Vantage stock API to download the OHLC pricing data and other technical indicators. The details of this API can be found at the following url:

https://www.alphavantage.co/documentation/

Below is an example of how to access the data for a particular stock (in this case, AAPL). There are nine variables in total returned by this API call:
* Date
* Open
* High
* Low
* Close
* Adjusted Close
* Volume
* Dividend Amount
* Split Coefficient

I will also be utilizing additional variables based on technical indicators available from Alpha Vantage. These technical indicators are listed below:
* Simple Moving Average (SMA)
* Exponential Moving Average (EMA)
* Weighted Moving Average (WMA)
* Triangular Moving Average (TRIMA)
* Kaufman Adaptive Moving Average (KAMA)
* Volume Weighted Average Price (VWAP)
* Moving Average Convergence / Divergence (MACD)
* Stochastic Oscillator (STOCH)
* Relative Strength Index (RSI)
* Average Directional Movement Index (ADX)
* Absolute Price Oscillator (APO)
* Directional Movement Index (DX)

In [3]:
from alpha_vantage.timeseries import TimeSeries
import pandas as pd
from pandas import Series, DataFrame

ALPHA_VANTAGE_API_KEY = 'WTX6IKTWWR57LOIQ&datatype'

ts = TimeSeries(key=ALPHA_VANTAGE_API_KEY, output_format='pandas')
data, meta_data = ts.get_daily_adjusted('AAPL', outputsize='compact')
data.head()

Unnamed: 0_level_0,1. open,2. high,3. low,4. close,5. adjusted close,6. volume,7. dividend amount,8. split coefficient
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2019-11-06,256.77,257.49,255.365,257.24,257.24,16916993.0,0.0,1.0
2019-11-05,257.05,258.19,256.32,257.13,257.13,19942900.0,0.0,1.0
2019-11-04,257.33,257.85,255.38,257.5,257.5,25553100.0,0.0,1.0
2019-11-01,249.54,255.93,249.16,255.82,255.82,37738700.0,0.0,1.0
2019-10-31,247.24,249.17,237.26,248.76,248.76,34766600.0,0.0,1.0


### What techniques from the course do you anticipate using?

I will be using the Time Series Analysis specialization for this project. I will utilize the following concepts and techniques from this specialization:
* Linear Trends
* Indicators
* Stochastic Processes
* ARIMA Modeling
    * Stationarity and Differencing
    * Autoregressive Models
    * Moving Average
    * ARMA Modeling

I anticipate using the following general techniques from the course:
* Data Cleaning
* Feature Engineering
* Principal Components Analysis
* Multivariable Linear Regression
* Random Forest
* Gradient Boost
* Time Series Analysis

### What do you anticipate to be the biggest challenge you’ll face?

The biggest challenge I expect to face is using cross validation techniques since the cross validation methodology needs to be adjusted when dealing with time series data. It will not be as simple as using the cross_val_score method from scikit-learn.

I also expect to face challenges with obtaining a high accuracy due to overfitting issues. I anticipate having to tweak my model across a variety of iterations in order to eliminate any overfitting.