# Part 3 -- Get Target Data (AAPL)

Get our target data (Apple stocks) and perform calculations. We want to look at the Close price to determine whether stocks went up/down/neutral compared to the previous day. This will be a **classification model** and we will set our threshold of up/down/neutral to a change of 1%.

**Load lib codes**

In [2]:
from os import chdir
chdir('/home/jovyan/work/Portfolio/Analyzing_Unstructured_Data_for_Finance/')

from lib import *
# suppress_warnings()

In [3]:
!pip install pandas-datareader

[33mYou are using pip version 8.1.2, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [4]:
import pandas_datareader.data as web

In [5]:
start = dt.datetime(2009,6,11)
end = dt.datetime(2017,6,11)
AAPL_df = web.DataReader('AAPL', 'google', start, end)

In [6]:
AAPL_df = AAPL_df.reset_index()

In [7]:
AAPL_df.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume
0,2009-06-11,19.94,20.22,19.79,19.99,131205851
1,2009-06-12,19.83,19.87,19.43,19.57,140771232
2,2009-06-15,19.43,19.56,19.27,19.44,134987111
3,2009-06-16,19.52,19.78,19.44,19.48,128701237
4,2009-06-17,19.52,19.64,19.22,19.37,142853172


In [8]:
AAPL_df['Diff'] = AAPL_df['Close'].diff()

In [9]:
AAPL_df.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Diff
0,2009-06-11,19.94,20.22,19.79,19.99,131205851,
1,2009-06-12,19.83,19.87,19.43,19.57,140771232,-0.42
2,2009-06-15,19.43,19.56,19.27,19.44,134987111,-0.13
3,2009-06-16,19.52,19.78,19.44,19.48,128701237,0.04
4,2009-06-17,19.52,19.64,19.22,19.37,142853172,-0.11


# NOTE: Can always change this from a classification model to regression model by using Percent_Change as our target instead of Percent_Change_Class

**We want the <u>PERCENT CHANGE</u> of each stock, so that data is normalized (this is especially important if we are going to make comparisons between stocks)**

In [10]:
AAPL_df['Percent_Change'] = ((AAPL_df['Close']-AAPL_df['Close'].shift(1))/AAPL_df['Close'])

In [11]:
AAPL_df.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Diff,Percent_Change
0,2009-06-11,19.94,20.22,19.79,19.99,131205851,,
1,2009-06-12,19.83,19.87,19.43,19.57,140771232,-0.42,-0.021461
2,2009-06-15,19.43,19.56,19.27,19.44,134987111,-0.13,-0.006687
3,2009-06-16,19.52,19.78,19.44,19.48,128701237,0.04,0.002053
4,2009-06-17,19.52,19.64,19.22,19.37,142853172,-0.11,-0.005679


# NOTE: Might want to consider tweaking the threshold
(e.g. what percent change can my model detect?)

In [12]:
# Make threshold 1% (for percent change in daily Close price)
def make_binary(data):
    data_list = []
    for d in data:
        if round(d,2) < 0.01:
            data_list.append('down')
        elif round(d,2) == 0.01:
            data_list.append('neutral')
        elif round(d,2) > 0.01:
            data_list.append('up')
        else:
            data_list.append(None)
    return data_list

In [13]:
AAPL_df['Percent_Change_Class'] = make_binary(AAPL_df['Percent_Change'])

In [14]:
AAPL_df.shape

(2013, 9)

In [15]:
AAPL_df['Percent_Change_Class'].value_counts()

down       1257
neutral     447
up          308
Name: Percent_Change_Class, dtype: int64

In [16]:
AAPL_df.sample(5)

Unnamed: 0,Date,Open,High,Low,Close,Volume,Diff,Percent_Change,Percent_Change_Class
447,2011-03-22,48.94,48.95,48.45,48.74,81558162,0.27,0.00554,neutral
1024,2013-07-09,59.09,60.5,58.63,60.34,88172238,1.05,0.017401,up
710,2012-04-05,89.57,90.67,89.06,90.53,160318858,1.34,0.014802,neutral
211,2010-04-15,35.11,35.58,35.07,35.56,94195920,0.46,0.012936,neutral
362,2010-11-17,43.03,43.43,42.54,42.93,119862407,-0.15,-0.003494,down


In [17]:
AAPL_df['Date'] = AAPL_df['Date'].dt.date

In [18]:
pd.to_pickle(AAPL_df, '../Analyzing_Unstructured_Data_for_Finance/data/3.1.AAPL_df.pickle')