💸 Financial Sentiments Scraper

🗒️ Summary

This project focuses on scraping sentiments from the web (Yahoo! Finance in particular) based on finance related topics such as Stocks/Tickers. The scraping is done using requests and BeautifulSoup library, a simplified and orthodox method. The scraped/parsed texts are in the form of long paragraphs which are summarized using a transformer from huggingface library trained specifically for financial texts, pegasus (know more). Then, sentiment analysis is carried out with the help of pipeline API. Eventually, the decoded output summary & sentiment scores are compiled and converted into an array and the results are exported to a CSV file.

Access jupyter notebook: notebook
Access python script: script

📱 Example Output

Topic	Summary	Sentiment	Confidence
TATA	Tata Communications to gain industry-proven platform with strong capabilities and scale.	Positive	0.99976
HDFC	Shareholders in HDFC will receive 42 shares of the bank for every 25 shares.	Positive	0.55101
HDFC	HDFC to stop issuing debt, rely on bank deposits. Yields could fall about 10 basis points.	Negative	0.97710

📑 Code Snippets

Search for urls:

def search_urls(topic):
    search_url = "https://www.google.com/search?q=yahoo+finance+{}&tbm=nws".format(topic)
    r = requests.get(search_url)
    soup = BeautifulSoup(r.text, 'html.parser')
    atags = soup.find_all('a')
    hrefs = [link['href'] for link in atags]
    return hrefs

raw_urls = {topic:search_urls(topic) for topic in topics}

Remove unwanted urls:

excludewords = ['maps', 'policies', 'preferences', 'accounts', 'support']

def remove_urls(urls, excludewords):
    valid = []
    for url in urls: 
        if 'https://' in url and not any(exclude_word in url for exclude_word in excludewords):
            result = re.findall(r'(https?://\S+)', url)[0].split('&')[0]
            valid.append(result)
    return list(set(valid))

cleaned_urls = {topic:remove_urls(raw_urls[topic], excludewords) for topic in topics}

Search and scrape through cleaned urls:

def search_scrape(links):
    contents = []
    for link in links: 
        r = requests.get(link)
        soup = BeautifulSoup(r.text, 'html.parser')
        paragraphs = soup.find_all('p')
        text = [parah.text for parah in paragraphs]
        words = ' '.join(text).split(' ')[:300]
        content = ' '.join(words)
        contents.append(content)
    return contents

article = {topic:search_scrape(cleaned_urls[topic]) for topic in topics}

Summarize articles:

def summarize(article):
    summaries = []
    for item in article:
        input_item = tokenizer.encode(item, return_tensors='pt')
        output = model.generate(input_item, max_length=55, num_beams=5, early_stopping=True)
        summary = tokenizer.decode(output[0], skip_special_tokens=True)
        summaries.append(summary)
    return summaries

summaries = {topic:summarize(article[topic]) for topic in topics}

Sentiment analysis:

from transformers import pipeline
sentiment = pipeline('sentiment-analysis')

scores = {topic:sentiment(summaries[topic]) for topic in topics}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
content		content
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

💸 Financial Sentiments Scraper

🗒️ Summary

📱 Example Output

📑 Code Snippets

About

Releases

Packages

Languages

sagar-kongari/News-Scrape-Summarize-Sentiment

Folders and files

Latest commit

History

Repository files navigation

💸 Financial Sentiments Scraper

🗒️ Summary

📱 Example Output

📑 Code Snippets

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages