In [2]:
'''
tasks that we need to do are:
1. Data collection
2. Data preprocessing
3. Model Selection
4. Training the model - online vs offline models 
5. Backtesting 
6. Live deployement

But since we first need to create a simple working model, we will just focus on doing 
sentiment analysis first
So we need to collect the tools for doing sentiment analysis
'''

'\ntasks that we need to do are:\n1. Data collection\n2. Data preprocessing\n3. Model Selection\n4. Training the model - online vs offline models \n5. Backtesting \n6. Live deployement\n\nBut since we first need to create a simple working model, we will just focus on doing \nsentiment analysis first\nSo we need to collect the tools for doing sentiment analysis\n'

In [3]:
'''
Task1: To create a simple sentiment analyser
All the news APIs that make our lives easier are not available for the indian markets
So need to scrap the data from the web. We will be using the library BeautifulSoup for
web scraping and textblob to do sentiment analysis
'''

'\nTask1: To create a simple sentiment analyser\nAll the news APIs that make our lives easier are not available for the indian markets\nSo need to scrap the data from the web. We will be using the library BeautifulSoup for\nweb scraping and textblob to do sentiment analysis\n'

In [3]:
import requests
from bs4 import BeautifulSoup
from textblob import TextBlob

ModuleNotFoundError: No module named 'textblob'

In [None]:
'''
Testing whether scraping would work
'''
URL = 'https://www.moneycontrol.com/news/tags/companies.html'
STORE_FILE = 'latest_news.txt'

def fetch_news(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    news_container = soup.find('div', class_='topictab_cnot')
    item = news_container.find('div', class_='topictabpane')
    hmm = item.find('li', class_= 'clearfix')
    latest_news = hmm.find('h2').text.strip()
    news_link = hmm.find('a')['href']
    article_response = requests.get(news_link)
    article_soup = BeautifulSoup(article_response.text, 'html.parser')
    article_body = article_soup.find('div', class_='content_wrapper arti-flow')
    article_text = article_body.get_text(strip=True) if article_body else "No content found"
    print(article_text)
    # print(news_link)
    # print(latest_news)
    return article_text, news_link

def store_and_check_news(news_link):
    try:
        with open(STORE_FILE, 'r') as file:
            last_news = file.read().strip()
    except FileNotFoundError:
        last_news = ''
    if last_news != news_link:
        with open(STORE_FILE, 'w') as file:
            file.write(news_link)
        return True
    return False
# latest_news, news_link = fetch_news(URL)
# if store_and_check_news(news_link):
#     print(f"News found")


In [6]:
latest_news, news_link = fetch_news(URL)
print(f"Found")
analysis = TextBlob(latest_news)
sentiment = analysis.sentiment.polarity
sentiment_status = 'Positive' if sentiment > 0 else "Negative" if sentiment < 0 else "Neutral"
print("Sentiment: ", sentiment_status)
print(sentiment)

#print(f"Not Found")

UBS loses another senior India banker with strategy yet unclearUBS Group AG’s India operation is losing another senior executive to competitors, at a time when the firm is still charting its strategy to manage wealth in the country, according to people familiar with the matter.Aditya Goenka, a Mumbai-based managing director in structured lending, is leaving to join HSBC Holdings Plc in its commercial banking business, the people said, asking not to be named as the information is private.Story continues below AdvertisementRemove AdUBS, which exited India’s onshore wealth business about a decade ago, is seeking to lean on the Credit Suisse team it acquired to make a comeback there. While margins in India are slimmer than other markets, the firm sees there is opportunity in managing both the business and personal assets of clients benefiting from the boom in the stock market and economy.Still, even as UBS draws up its plans in India, it has become a hunting ground for wealth managers and 

In [7]:
'''
Now that we have done the sentiment analysis, we want to know what are the stocks 
associated with this company. Thus we will have to do Named Entity Recognition.
Need to match this with the list of NSE-listed companies.
We will use the library of spacy for this task

'''
import spacy

In [None]:
'''
Now, only trying to do the sentiment analysis and get the name of the company.
Spacy will get all the organisation names in the article, but this wont be matched to 
the stock name on NSE. Also a major issue is there may be 2 companies one having positive
news but the sentiment analysis is independent of the org names and hence may catch a 
false sentiment corresponding the one or more of the companies in the article.
Thus we need specific NLP model for finance and not just a general NLP model. But this is
just a prototype
'''
nlp = spacy.load("en_core_web_sm")
article_text = latest_news
doc = nlp(article_text)
def analyze_sentiment(text):
    #This function is same as the one that we had defined earlier.
    analysis = TextBlob(text)
    return analysis.sentiment.polarity
company_sentiments = []
senti_dict = {}
for ent in doc.ents:
    if ent.label_ == 'ORG':
        sentence = ent.sent
        sentiment = analyze_sentiment(sentence.text)
        company_sentiments.append((ent.text, sentiment, sentence.text))
        if ent.text not in senti_dict:
            senti_dict[ent.text] = 0
        senti_dict[ent.text] += sentiment

for company, sentiment, sentence in company_sentiments:
    print((f"Company: {company}, Sentiment : {sentiment}, Context: {sentence}"))
for i in senti_dict:
    print(f"Company : {i}, Total sentiment : {senti_dict[i]}")

Company: UBS, Sentiment : -0.041666666666666664, Context: UBS loses another senior India banker with strategy yet unclearUBS Group AG’s India operation is losing another senior executive to competitors, at a time when the firm is still charting its strategy to manage wealth in the country, according to people familiar with the matter.
Company: Group AG’s, Sentiment : -0.041666666666666664, Context: UBS loses another senior India banker with strategy yet unclearUBS Group AG’s India operation is losing another senior executive to competitors, at a time when the firm is still charting its strategy to manage wealth in the country, according to people familiar with the matter.
Company: HSBC Holdings Plc, Sentiment : 0.0, Context: Aditya Goenka, a Mumbai-based managing director in structured lending, is leaving to join HSBC Holdings Plc in its commercial banking business, the people said, asking not to be named as the information is private.
Company: AdvertisementRemove AdUBS, Sentiment : 0.

In [20]:
'''
Now we need to find the closest symbol listed on NSE compared to the name that we find
using spacy. 
We will only be implementing basic functionality for now, so will only match with NSE
symbols. Task for later: expand the exchanges list
'''
import pickle
from thefuzz import process
with open('company_data.pkl', 'rb') as f:
    company_dict = pickle.load(f)
def find_closest_company_name(query_name):
    closest_name, score = process.extractOne(query_name,company_dict.keys())
    return closest_name, company_dict[closest_name], score
query_name = "Ajay Punjabi"
match_name, match_symbol, match_score = find_closest_company_name(query_name)
print(f"Matched Company Name: {match_name}, Symbol: {match_symbol}, Score: {match_score}")



Matched Company Name: Punjab National Bank, Symbol: PNB, Score: 65
