# Design Specification and Project Plan

## Data Manipulation - Jahnavi
Dataset  
Pytrends to download reports from Google trends  
Potential keywords selection


## Use Cases - Angel
(Describe how the components interact to accomplish the use case):  
o	Input: Company selection.   
o	Output:   
♣	Upcoming earnings Report Date based on current input time + remaining days till next earnings date  
♣	 Likelihood of Stock price will go up or down on the next upcoming earning report release date  
♣	Plot of trend line of historical data (daily, percentage of change)  


##	Functional Specification of Components
Name, What it does, Inputs (with type information), Outputs (with type information) Pseudo code (If the component is an existing package, then you should point to a documentation for the package. If the component is something that you'll build, then describe (maybe at a high level) the functions and their inputs and outputs.)



### Quarterly Earning Reports

Federal securities law requires large, publicly traded companies to file 10-Q reports for each of the first three fiscal quarters, and a 10-K report for the entire year, with the Securities and Exchange Commission. These reports include the net income earned during each quarter. The net income is divided by the total number of common and special shares of stock outstanding to get the diluted earnings per share (EPS). Market analysts pay attention to the quarterly reports for each company, and generally, when the earnings report is better than what analysts expected, the company stock price increases.  
For example, this table shows the last four quarters of Amazon.com, Inc.'s earnings reports and the stock prices on the day the earnings reports were filed.  

 Fiscal Quarter End | Report Date | AMZN Net Income | AMZN Diluted EPS | AMZN Open   | AMZN Close
 ------------------ | ----------- | --------------- | ---------------- | ----------- | ----------
 Mar 2017           | 4/27/2017   | \$ 724,000,000  |   1.48           | \$ 914.39   | \$ 918.38
 Dec 2016           | 2/2/2017    | \$ 749,000,000  |   1.54           | \$ 836.59	 | \$ 839.95
 Sep 2016           | 10/27/2016  | \$ 252,000,000  |   0.52           | \$ 831.24   | \$ 818.36
 Jun 2016           | 7/28/2016   | \$ 857,000,000  |   1.78           | \$ 745.98	 | \$ 752.61
 
 Amazon's next earnings report, for the quarter ending in June 2017, will be filed during the period: Jul 26, 2017 - Jul 31, 2017. Currently, the average forecasted EPS among analysts is \$1.40 for June 2017 and \$1.13 for September 2017. 
 
 

In [64]:
import bokeh
from bokeh.io import output_file, output_notebook, push_notebook, show
from bokeh.layouts import layout, widgetbox
from bokeh.models import Div, HoverTool, Paragraph
from bokeh.models.widgets import Button, RadioButtonGroup, Select, Slider, Panel, Tabs
from bokeh.plotting import figure
# output_notebook()
from collections import OrderedDict
import datetime
import json
from lxml import html
import numpy as np
import pandas as pd
import quandl
import requests
import time
from time import sleep

In [54]:
msft_earn = quandl.get("ZSES/MSFT", authtoken="REgpZessjs9-7zsJ4pvz", collapse="quarterly")
msft_earn.columns
msft_earn["PER_CAL_YEAR"] = pd.to_numeric(msft_earn["PER_CAL_YEAR"], downcast="integer")
msft_earn["PER_FISC_YEAR"] = pd.to_numeric(msft_earn["PER_FISC_YEAR"], downcast="integer")
msft_earn["ACT_RPT_DATE"] = pd.to_numeric(msft_earn["ACT_RPT_DATE"], downcast="integer")
msft_data = msft_earn[["PER_FISC_YEAR", "PER_FISC_QTR", "ACT_RPT_DATE", "STREET_ACT", 
             "STREET_MEAN_EST"]]
msft_data

aapl = pd.read_csv("C:/Users/aerin/.bokeh/data/AAPL.csv", header=0, parse_dates=['Date'],
                   names=["Date", "Open", "High", "Low", "Close", "Volume", "Adj Close"])
aapl = aapl.set_index('Date')
goog = pd.read_csv("C:/Users/aerin/.bokeh/data/GOOG.csv", header=0, parse_dates=['Date'],
                   names=["Date", "Open", "High", "Low", "Close", "Volume", "Adj Close"])
goog = goog.set_index('Date')
msft = pd.read_csv("C:/Users/aerin/.bokeh/data/MSFT.csv", header=0, parse_dates=['Date'],
                   names=["Date", "Open", "High", "Low", "Close", "Volume", "Adj Close"])
#msft = msft.set_index('Date')

In [88]:
"""
This code is based on the code from the website:
https://www.scrapehero.com/scrape-yahoo-finance-stock-market-data/
Scraping Logic
    Construct the URL of the search results page from Yahoo Finance. 
    For example, here is the one for Apple-http://finance.yahoo.com/quote/AAPL?p=AAPL
    Download HTML of the search result page using Python Requests
    Parse the page using LXML – LXML lets you navigate the HTML Tree Structure using Xpaths. 
    We have predefined the XPaths for the details we need in the code.
    Save the data to a JSON file.
"""
from collections import OrderedDict
import json
from lxml import html
import logging
import numpy as np
import pandas as pd
import requests
import time
from time import sleep

def currentdata(ticker):
    url = "http://finance.yahoo.com/quote/%s?p=%s" % (ticker, ticker)
    response = requests.get(url)
    logging.info("Parsing %s" % (url))
    sleep(4)
    parser = html.fromstring(response.text)
    summary_table = parser.xpath('//div[contains(@data-test,"summary-table")]//tr')
    summary_data = OrderedDict()
    other_details_json_link = "https://query2.finance.yahoo.com/v10/finance/quoteSummary/{0}?formatted=true&lang=en-US&region=US&modules=summaryProfile%2CfinancialData%2CrecommendationTrend%2CupgradeDowngradeHistory%2Cearnings%2CdefaultKeyStatistics%2CcalendarEvents&corsDomain=finance.yahoo.com".format(
        ticker)
    summary_json_response = requests.get(other_details_json_link)
    try:
        json_loaded_summary = json.loads(summary_json_response.text)
        y_Target_Est = json_loaded_summary["quoteSummary"]["result"][0]["financialData"]["targetMeanPrice"]['raw']
        earnings_list = json_loaded_summary["quoteSummary"]["result"][0]["calendarEvents"]['earnings']
        eps = json_loaded_summary["quoteSummary"]["result"][0]["defaultKeyStatistics"]["trailingEps"]['raw']
        datelist = []
        # new_format = "%m-%d-%y" # wanted to reformat the dates
        for i in earnings_list['earningsDate']:
            datelist.append(i['fmt'])
        earnings_date = ' to '.join(datelist)
        for table_data in summary_table:
            raw_table_key = table_data.xpath('.//td[@class="C(black)"]//text()')
            raw_table_value = table_data.xpath('.//td[contains(@class,"Ta(end)")]//text()')
            table_key = ''.join(raw_table_key).strip()
            table_value = ''.join(raw_table_value).strip()
            summary_data.update({table_key: table_value})
        summary_data.update(
            {'1y Target Est': y_Target_Est, 'EPS (TTM)': eps, 'Earnings Date': earnings_date, 'ticker': ticker,
             'url': url})
        return earnings_date, summary_data
    except ValueError:
        print("Failed to parse json response")
        return {"error": "Failed to parse json response"}

def getdata(ticker):
    earnings_date, current_data = currentdata(ticker)
    # hist_data = histdata(ticker)
    logging.info("Writing data to output file")
    with open('%s-summary.json' % (ticker), 'w') as fp:
        json.dump(current_data, fp, indent=4)
    # with open('%s-histstockprice.json' % (ticker), 'w') as fp:
    #     json.dump(hist_data, fp, indent=4)
    return earnings_date


In [99]:
import datetime
import numpy as np
import requests
import time

from bokeh.layouts import gridplot
from bokeh.models import Div, HoverTool, Paragraph
from bokeh.models.widgets import Button, RadioButtonGroup, Select, Slider, Panel, Tabs
from bokeh.plotting import figure, show, output_file
from bokeh.sampledata.stocks import AAPL, GOOG, MSFT
from bokeh.models.widgets import Dropdown
from bokeh.plotting import curdoc

def datetime(x):
    return np.array(x, dtype=np.datetime64)

p1 = figure(x_axis_type="datetime", title="Stock Closing Prices")
p1.grid.grid_line_alpha=0.3
p1.xaxis.axis_label = 'Date'
p1.yaxis.axis_label = 'Price'

p1.line(datetime(AAPL['date']), AAPL['adj_close'], color='#A6CEE3', legend='AAPL')
p1.line(datetime(GOOG['date']), GOOG['adj_close'], color='#33A02C', legend='GOOG')
p1.line(datetime(MSFT['date']), MSFT['adj_close'], color='#FB9A99', legend='MSFT')
p1.legend.location = "top_left"
tab1 = Panel(child=p1, title="compare")

aapl = np.array(AAPL['adj_close'])
aapl_dates = np.array(AAPL['date'], dtype=np.datetime64)

window_size = 30
window = np.ones(window_size)/float(window_size)
aapl_avg = np.convolve(aapl, window, 'same')

p2 = figure(x_axis_type="datetime", title="AAPL One-Month Average")
p2.grid.grid_line_alpha = 0
p2.xaxis.axis_label = 'Date'
p2.yaxis.axis_label = 'Price'
p2.ygrid.band_fill_color = "#deebf7"
p2.ygrid.band_fill_alpha = 0.1

p2.circle(aapl_dates, aapl, size=4, legend='close',
          color='darkgrey', alpha=0.2)

p2.line(aapl_dates, aapl_avg, legend='avg', color='navy')
p2.legend.location = "top_left"
tab2 = Panel(child=p2, title="apple")

tabs = Tabs(tabs=[ tab1, tab2 ])

#slider = Slider(start=0, end=10, value=1, step=.1, title="Sample Slider")
# put the results in a row
#show(widgetbox(slider, button_group, select, width=400))

amzn_earnings_date = getdata("AMZN")
goog_earnings_date = getdata("GOOG")
amzn_div = Div(text="""The next earnings release date for Amazon is %s \n""" 
               % amzn_earnings_date, width=500, height=40)
goog_div = Div(text="""The next earnings release date for Google is %s \n""" 
               % goog_earnings_date, width=500, height=50)

#def update_div(ticker, earnings_date):
#    div = Div(text="""The next earnings release date for %s is %s \n""" 
#               % (ticker, earnings_date), width=400, height=60)

menu = [{"Amazon": "AMZN", "Apple": "AAPL", "Google": "GOOG",
           "Microsoft": "MSFT", "Netflix": "NFLX"}]
dropdown = Select(title="**Select Company**:", value="Amazon", 
                options=["Amazon", "Apple", "Google", "Microsoft", "Netflix"])

def function_to_call(attr, old, new):
    ticker = tickers[dropdown.value]
    earnings_date = getdata(ticker)
#    update_div(ticker, earnings_date)

#dropdown.on_change('value', function_to_call)

#curdoc().add_root(dropdown)

out = layout(
    [
        [widgetbox(select, amzn_div, goog_div)],
        [tabs, ],
    ],
    sizing_mode='scale_width'
)
show(out)

The prediction tool under development will show the next earnings report period and a prediction of whether the EPS will be positive or negative. The EPS is the response variable in the logistic regression model. The last 10 years of EPS reports will be used to train the statistical model. 

Control Logic (Suedo Code)  

###	Prediction of stock price - Abhishek
What it does  
Inputs   
Outputs    
Control Logic (Suedo Code)  

###	Historical Data Plot - Khyati
What it does  
Inputs   
Outputs    
Control Logic (Suedo Code)  

## Project plan - Angel
Provide details for what you'll accomplish in the next two weeks, and higher level descriptions for the remaining weeks in the quarter so that the end result is that you have implemented and tested a system that accomplishes your use cases.
