# Predicting Stock Fluctuation through financial news

In this project, we try use financial news to predict stock fluctuation. We believe that financial news articles posted in popular financial journals like WSJ, Financial Times, Bloomberg, etc. have a direct correlation to change in a given company’s stock. The correlation boils down to positive articles about a company leading to rise in stock prices whereas negative articles have an adverse effect on the stock price. 

### Links
Frontend - http://ec2-52-21-26-59.compute-1.amazonaws.com:5000/stocks<br>
Database - IP address 34.235.205.20334.235.205.203<br>
The API runs on 34.235.205.203:5000 and can be obtained by the following request -<br> 34.235.205.203:5000/api/company?company_name=oracle. Please replace the company_name with any other company's name. 




## Project Description

### Why use the app?

The biggest purpose of the app is to supply the user with data that could help them understand the overall climate a particular company of interest in and try to predict stock volatility based on the what the application suggests. 

## The Code

#### List of all imports and libraries

```python
import json
from pprint import pprint
import pymysql
import requests
from bs4 import BeautifulSoup
from flask import Flask, jsonify, request
from flask import Flask, render_template
import MySQLdb as mdb
from flask import Flask, render_template, jsonify, request
import time
import arrow
import re
import pandas as pd
import matplotlib.pyplot as plt
from slackclient import SlackClient


```

# sentiment_analysis.py

Addidtional libraries required and imports are as follows.

```python
!sudo -H pip3 install -U requests
!sudo -H pip3 install -U pymysql
import json
from pprint import pprint

import pymysql
import requests
from bs4 import BeautifulSoup

BLOOMBERG_URL = "https://www.bloomberg.com/search?query="
companies = ["Google", "Apple", "Snapchat", "Bloomberg"]

```

The class “Spider” scrapes webpages with Beautiful Soup, analyzes emotions with Microsoft Cognitive Services API (scale of 0-1) and stores data in the Database. Above, we set our url destination to https://www.bloomberg.com/search?query= and gather information about companies such as Google, Apple, Snapchat, and Bloomberg. We create a database with the name "ArticlesSentiment"

```python
class Spider:
    def __init__(self):
        self._key1 = "c31155fb4ef44e598697433926e764ae"
        self._key2 = "01572def178342179993aa4eef97d341"
        self._sentiment_analysis_endpoint = "https://eastus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment?Subscription-Key={key}" \
            .format(key=self._key2)

    def _create_database(self):

        connection = pymysql.connect(host='34.235.205.203',
                                     user='root',
                                     password='dwdstudent2015',
                                     db='ArticlesSentiment',
                                     charset='utf8',
                                     cursorclass=pymysql.cursors.DictCursor)

        create_db_query = "CREATE DATABASE IF NOT EXISTS ArticlesSentiment DEFAULT CHARACTER SET 'utf8'"
        cursor = connection.cursor()
        cursor.execute(create_db_query)
        cursor.close()

        connection.close()

    def _create_table(self):
        connection = pymysql.connect(host='34.235.205.203',
                                     user='root',
                                     password='dwdstudent2015',
                                     db='ArticlesSentiment',
                                     charset='utf8',
                                     cursorclass=pymysql.cursors.DictCursor)

        query = "CREATE TABLE IF NOT EXISTS ArticlesSentiment.Articles(company VARCHAR(255), article_url VARCHAR(400), score DOUBLE, PRIMARY KEY (company, article_url)); "
        cursor = connection.cursor()
        cursor.execute(query)
        cursor.close()

        connection.close()

    def __get_sentiment(self, story):
        payload = {'documents': [{'id': 1, 'language': 'en', 'text': story}]}
        print(json.dumps(payload))
        # headers = {'Ocp-Apim-Subscription-Key': self._key1}
        url = self._sentiment_analysis_endpoint
        result = requests.post(url, data=json.dumps(payload))
        json_result = json.loads(result.text)

        try:
            return json_result.get("documents")[0].get("score")
        except:
            return 0


    def get_links(self, company):
        return_array = []
        url = BLOOMBERG_URL + company
        body = requests.get(url)
        content = body.text

        soup = BeautifulSoup(content, "html.parser")

        story_links = soup.find_all("h1", {"class": "search-result-story__headline"})

        for link in story_links:
            return_array.append(link.contents[1]["href"])

        return return_array
    def get_content(self, company):

        links = self.get_links(company)
        return_content = dict()
        for link in links:
            content = requests.get(link).text
            soup = BeautifulSoup(content, "html.parser")
            story = ""
            paragraphs = soup.find_all("p")
            paragraphs = paragraphs[2:]
            for index, paragraph in enumerate(paragraphs):
                if index > 2:
                    try:
                        story += paragraph.contents[0]
                        story += "\n"
                    except:
                        continue

            return_content[link] = story
        return return_content

    def get_sentiments(self):

        self._create_database()
        self._create_table()

        connection = pymysql.connect(host='34.235.205.203',
                                     user='root',
                                     password='dwdstudent2015',
                                     db='ArticlesSentiment',
                                     charset='utf8',
                                     cursorclass=pymysql.cursors.DictCursor)
        cursor = connection.cursor()

        for company in companies:

            stories = self.get_content(company=company)
            stories = dict(stories)

            for key in stories.keys():
                if key is not None:
                    story = stories.get(key)
                    score = self.__get_sentiment(story)
                    query = "INSERT IGNORE INTO ArticlesSentiment.Articles VALUES('{}','{}',{})".format(company, key,float(score))
                                                                                                        
                    cursor.execute(query)

        connection.commit()
        cursor.close()
        connection.close()

    def get_sentiments_for_company(self, company):

        self._create_database()
        self._create_table()

        connection = pymysql.connect(host='34.235.205.203',
                                     user='root',
                                     password='dwdstudent2015',
                                     db='ArticlesSentiment',
                                     charset='utf8',
                                     cursorclass=pymysql.cursors.DictCursor)
        cursor = connection.cursor()

        stories = self.get_content(company=company)
        stories = dict(stories)

        for key in stories.keys():
            if key is not None:
                story = stories.get(key)
                score = self.__get_sentiment(story)
                query = "INSERT IGNORE INTO ArticlesSentiment.Articles VALUES('{}','{}',{})".format(company, key,float(score))
                                                                                                    
                cursor.execute(query)

        connection.commit()
        cursor.close()
        connection.close()


if __name__ == "__main__":
    spider = Spider()
    pprint(spider.get_sentiments())
    
```

We then create a table with the following values: company, article_url, and score. 'company' stores the name of the company. 'article_url' stores the url of the article on which we based our score on. 'score' stores the score that was given to this artcle based on the possibility of whether the stock prices of a particlular company will rise or fall. Several functions have been created to gather information for these, including get_content(), get_links(), get_sentiments(), etc.

# company_stock.py

```python
import pymysql
from flask import Flask, jsonify, request

app = Flask(__name__)


def get_result_from_db(company_name):
    if isinstance(company_name, str):
        query = "SELECT * FROM ArticlesSentiment.Articles WHERE company = '{}';".format(company_name)
        print(query)

        connection = pymysql.connect(host='34.235.205.203',
                                     user='root',
                                     password='dwdstudent2015',
                                     db='ODIMatches',
                                     charset='utf8',
                                     cursorclass=pymysql.cursors.DictCursor)
        cursor = connection.cursor()

        cursor.execute(query)

        results = cursor.fetchall()
        return results

    return None


@app.route('/api/company', methods=['GET'])
def get_scores():

    company_name = request.args.get("company_name", type=str)


    results = get_result_from_db(company_name)

    if len(results) == 0:
        print("Dynamic Fetching")
        spider = Spider()
        spider.get_sentiments_for_company(company_name)

    results = get_result_from_db(company_name)

    if results is None:
        return 400

    print(results)
    return jsonify(results), 200


if __name__ == "__main__":
    app.run(debug=True)
```

The API takes GET requests and invokes Spider. The API runs on 34.235.205.203:5000 and can be obtained by the following request - 34.235.205.203:5000/api/company?company_name=oracle

This is what a response body with the top 10 articles in the database looks like

```json
[
  {
    "article_url": "https://www.bloomberg.com/gadfly/articles/2017-12-04/three-flavors-of-infosys-to-tempt-new-ceo", 
    "company": "oracle", 
    "score": 0.5
  }, 
  {
    "article_url": "https://www.bloomberg.com/news/articles/2017-11-15/oracle-investors-say-no-to-executive-pay-for-sixth-straight-year", 
    "company": "oracle", 
    "score": 0.5
  }, 
  {
    "article_url": "https://www.bloomberg.com/news/articles/2017-11-27/video-startup-can-t-outspend-youtube-so-it-created-a-new-currency-to-pay-creators", 
    "company": "oracle", 
    "score": 0.0
  }, 
  {
    "article_url": "https://www.bloomberg.com/news/articles/2017-12-01/what-it-s-like-to-be-a-sustainable-investor-under-trump", 
    "company": "oracle", 
    "score": 0.5
  }, 
  {
    "article_url": "https://www.bloomberg.com/news/articles/2017-12-04/end-of-year-gifts-go-to-bugs-wreaths-math-logic-opioid-crisis", 
    "company": "oracle", 
    "score": 0.0
  }, 
  {
    "article_url": "https://www.bloomberg.com/news/articles/2017-12-04/what-we-know-about-corporate-winners-and-losers-in-u-s-tax-bill", 
    "company": "oracle", 
    "score": 0.0
  }, 
  {
    "article_url": "https://www.bloomberg.com/news/articles/2017-12-06/corporate-boards-are-ignoring-risks-of-sexual-harassment-q-a", 
    "company": "oracle", 
    "score": 0.5
  }, 
  {
    "article_url": "https://www.bloomberg.com/politics/articles/2017-11-22/at-t-case-vaults-antitrust-chief-from-obscurity-to-spotlight", 
    "company": "oracle", 
    "score": 0.0
  }, 
  {
    "article_url": "https://www.bloomberg.com/politics/articles/2017-12-04/senate-s-unpleasant-surprise-hurts-tax-breaks-for-tech-others", 
    "company": "oracle", 
    "score": 0.0
  }, 
  {
    "article_url": "https://www.bloomberg.com/politics/articles/2017-12-05/trump-s-stock-forecast-fails-after-senate-tax-bill-s-bombshell", 
    "company": "oracle", 
    "score": 0.0
  }
]
```

# wserver.py

```python
from flask import Flask, render_template
import MySQLdb as mdb

app = Flask(__name__)

@app.route('/stocks')
def articles():

    con = mdb.connect(host = '34.235.205.203', 
                          user = 'root',
                          password = 'dwdstudent2015',
                          db = 'ArticlesSentiment', 
                          charset='utf8', 
                          use_unicode=True);

    cur = con.cursor(mdb.cursors.DictCursor)
    cur.execute("SELECT company, article_url, score FROM Articles")
    article = cur.fetchall()
    cur.close()
    con.close()

    return render_template('index.html', articles=article)

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000, debug=True)
```

The frontend is supported on http://ec2-52-21-26-59.compute-1.amazonaws.com:5000/stocks

In this file we try to connect to the 'ArticlesSentiment' database and pass the information on to a webpage under the template 'index.html.' We fetch the attributes company, article_url, and score from the Articles table, and display our results at the address We imported Flask to effectively pass on the database values through python onto html. We added in all our image , CSS, and Javascript files into a Google Drive folder and made it public. This way, anyone can access the website and see all the images and styles of the website.

In our project, the custom API runs on one AWS instance whereas the frontend runs on a different one. To run both on the same instance, please merge the two files together as follows. 

```python
import pymysql
from flask import Flask, render_template, jsonify, request
import MySQLdb as mdb

app = Flask(__name__)


def get_result_from_db(company_name):
    if isinstance(company_name, str):
        query = "SELECT * FROM ArticlesSentiment.Articles WHERE company = '{}';".format(company_name)
        print(query)

        connection = pymysql.connect(host='34.235.205.203',
                                     user='root',
                                     password='dwdstudent2015',
                                     db='ODIMatches',
                                     charset='utf8',
                                     cursorclass=pymysql.cursors.DictCursor)
        cursor = connection.cursor()

        cursor.execute(query)

        results = cursor.fetchall()
        return results

    return None


@app.route('/stocks')
def articles():

    con = mdb.connect(host = '34.235.205.203', 
                          user = 'root',
                          password = 'dwdstudent2015',
                          db = 'ArticlesSentiment', 
                          charset='utf8', 
                          use_unicode=True);

    cur = con.cursor(mdb.cursors.DictCursor)
    cur.execute("SELECT company, article_url, score FROM Articles")
    article = cur.fetchall()
    cur.close()
    con.close()

    return render_template('index.html', articles=article)


@app.route('/api/company', methods=['GET'])
def get_scores():

    company_name = request.args.get("company_name", type=str)


    results = get_result_from_db(company_name)

    if len(results) == 0:
        print("Dynamic Fetching")
        spider = Spider()
        spider.get_sentiments_for_company(company_name)

    results = get_result_from_db(company_name)

    if results is None:
        return 400

    print(results)
    return jsonify(results), 200

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000, debug=True)

```


# StockCompanySentiment.py

The Slack Bot processes and displays data from DB. You can ask a question or request like "stock prediction for wayfair" and you will get a response such as 
"Thank you for asking about the stock prediction for wayfair

Latest article about wayfair
Article URL is https://www.bloomberg.com/gadfly/articles/2017-09-22/williams-sonoma-built-to-thrive-amid-home-goods-turmoil
Probability of stock rising in the coming days becuase of the article is 73.69354486465454%"


```python
!sudo pip3 install arrow
import time
import arrow
import re
import requests
import json
import MySQLdb as mdb
import pandas as pd
import matplotlib.pyplot as plt
from slackclient import SlackClient

host = '34.235.205.203'
username = 'root'
password = 'dwdstudent2015'
database = 'ArticlesSentiment'

def message_is_for_our_bot(user_id, message_text):

    regex_expression = '.*@' + user_id + '.*bot.*'
    regex = re.compile(regex_expression)
    # Check if the message text matches the regex above
    match = regex.match(message_text)
    # returns true if the match is not None (ie the regex had a match)
    return match != None

def extract_company_name(message_text):

    message_text = message_text.lower()
    regex_expression = 'stock prediction for (.+)'
    regex= re.compile(regex_expression)
    matches = regex.finditer(message_text)
    for match in matches:
        # return the captured phrase
        # which comes after 'in'
        return match.group(1) 
    # if there were no matches, return None
    return None
```

def message_is_for_our_bot checks if the username and the word 'bot' appears in the text.
def dynamically_get_company_details returns the company name, article URL and sentiment of the article dynamically using our custom API.
def create_message formats the message in the form it is returned on the Slack API

```python
def dynamically_get_company_details(company_name):
    '''
    Returns the details of the company from Dynamic API
    '''
    url = 'http://34.235.205.203:5000/api/company?company_name=' + company_name
    data = requests.get(url).json()
    avg_score = 0
    for i in data[:5]:
        avg_score += i['score']
    avg_score/=5
    message_list = [data[0]['company'], data[0]['article_url'], avg_score]
    return message_list

def create_message(company_name):
    '''
    This function takes as input the username of the user that asked the question,
    and the city_name that we managed to extract from the question (potentially it can be None)
    We check the Openweather API and respond with the weather condiitons in the city.
    '''
    message = ''
    if company_name != None:
        # We want to address the user with the username. Potentially, we can also check
        # if the user has added a first and last name, and use these instead of the username
        message += "Thank you for asking about the stock prediction for " + company_name + '\n'


        matching_company = dynamically_get_company_details(company_name)
        # If we cannot find any matching city...
        company_name = matching_company[0]
        news_article_URL = matching_company[1]
        sentiment_rating = float(matching_company[2])
        message += "\nLatest article about " + company_name + "\n" +\
            "Article URL is " + news_article_URL +"\n"
        if int(sentiment_rating*100) > 50:
            sentiment_rating = (sentiment_rating*100)
            message += "Probability of stock rising in the coming days becuase of the article is " +\
                str(sentiment_rating) + "%"
        else:
            sentiment_rating = -((sentiment_rating*100)%-100)
            message += "Probability of stock falling in the coming days becuase of the article is " +\
                str(sentiment_rating) + "%"
    else:
        message += "Unfortunately I did not understand the city you are asking for.\n"
        message += "Ask me `stock prediction for {name of company}` and I will try to answer."
    return message

def process_slack_event(event):
    '''
    The Slack RTM (real time messaging) generates a lot of events.
    We want to examine them all but only react to:
    1. Messages
    2. ...that come from a user
    3. ...that ask our bot to do something
    4. ...and act only for messages for which we can extract the data we need
    
    
    '''
    
    # Check that the event is a message. If not, ignore and proceed to the next event.
    if event.get("type") != 'message':
        return None

    # Check that the message comes from a user. If not, ignore and proceed to the next event.
    # We do not reply to bots, to avoid getting into infinite loops of discussions with other bots
    if event.get("user") == None:
        return None

    # Check that the message is asking the bot to do something. If not, ignore and proceed to the next event.
    message_text = event.get('text')
    if not message_is_for_our_bot(bot_user_id, message_text):
        return None

    # Extract the company name from the user's message
    company_name = extract_company_name(message_text)

    # Prepare the message that we will send back to the user
    message = create_message(company_name)

    return message
```

Here we check if the event is a message.

```python
if __name__ == "__main__":
    auth_token = "xoxp-237811062240-249700956231-264640927623-43e4a64bb0bbb2f4798b2197db644710"
    bot_user_id = 'U7BLLU46T'
    # Connect to the Real Time Messaging API of Slack and process the events
    sc = SlackClient(auth_token)
    sc.rtm_connect()

    # We are going to be polling the Slack API for recent events continuously
    while True:
        # We are going to wait 1 second between monitoring attempts
        time.sleep(1)
        # If there are any new events, we will get a list of events. 
        # If there are no events, the response will be empty
        events = sc.rtm_read()
        for event in events:
            # Check if we should generate a response for the event
            response = process_slack_event(event)
            if response:
                # Post a message to Slack with our response
                message = response
                sc.api_call("chat.postMessage", channel="#company_stock", text=message)
```

We pass on the authorization code and connect to Slack

## Findings


A few records in our database are as follows

```csv
company	article_url	score
alphabet	https://www.bloomberg.com/news/articles/2017-12-11/a-manager-of-42-billion-fears-bubble-in-world-s-biggest-stocks	0.1109285653
alphabet	https://www.bloomberg.com/news/articles/2017-12-11/deepmind-has-simple-tests-that-might-prevent-elon-musk-s-ai-apocalypse	0.5
alphabet	https://www.bloomberg.com/news/articles/2017-12-11/gm-could-go-back-to-europe-just-not-selling-mass-market-cars	0.5
alphabet	https://www.bloomberg.com/news/articles/2017-12-11/microsoft-takes-path-less-traveled-to-build-a-quantum-computer	0.5
alphabet	https://www.bloomberg.com/news/articles/2017-12-12/blame-active-not-passive-for-the-surge-in-tech-stocks-says-jpm	0.5
alphabet	https://www.bloomberg.com/news/articles/2017-12-12/u-s-fuels-the-world-as-shale-boom-powers-record-oil-exports	0.2274275422
alphabet	https://www.bloomberg.com/news/videos/2017-12-06/alphabet-stays-above-moving-averages-video	0.5
alphabet	https://www.bloomberg.com/politics/articles/2017-12-08/sex-trafficking-bill-soothes-a-tech-worry-with-focus-on-intent	0.784122467
alphabet	https://www.bloomberg.com/politics/articles/2017-12-12/comcast-s-pledge-to-behave-without-open-web-rules-draws-doubters	0
alphabet	https://www.bloomberg.com/politics/articles/2017-12-12/macron-s-pick-of-encrypted-messaging-app-isn-t-what-you-d-think	0.9126118422
amazon	https://www.bloomberg.com/news/articles/2017-12-06/deal-of-the-week-insuring-against-amazon	0.5
amazon	https://www.bloomberg.com/news/articles/2017-12-06/two-executives-depart-amazon-s-audible-after-harassment-probe	0.5
amazon	https://www.bloomberg.com/news/articles/2017-12-07/wal-mart-is-said-to-hire-ex-amazon-executive-for-u-k-turnaround	0.5
amazon	https://www.bloomberg.com/news/articles/2017-12-08/amazon-gains-help-from-state-ag-in-marketplace-fraud-fight	0.5
amazon	https://www.bloomberg.com/news/audio/2017-12-06/insuring-against-amazon	0.5
amazon	https://www.bloomberg.com/news/videos/2017-12-06/google-blocks-youtube-from-amazon-echo-video	0.5
amazon	https://www.bloomberg.com/news/videos/2017-12-07/amazon-s-prasad-seeks-to-make-alexa-smarter-video	0.5
amazon	https://www.bloomberg.com/news/videos/2017-12-08/full-show-bloomberg-technology-12-07-video	0.5
```

This data is in the database ArticlesSentiment and the table Articles which can be accessed at the IP address 34.235.205.203

We found that we could use the score for the latest article(s) to predict volatility under the assumption that the articles would have impact in the first place. 