# Stock Analysis Notebook

This Jupyter notebook is designed to analyze stocks based on various filters and sentiment analysis of news articles. 

## Dependencies Installation

First, we need to install the necessary dependencies. Uncomment the lines below if you haven't installed these packages yet.



In [1]:
'''
%pip install finvizfinance pandas transformers yfinance goose3 requests ipywidgets torch tensorflow nltk tf-keras
from IPython.display import display, Javascript
display(Javascript('IPython.notebook.kernel.restart()'))

import nltk
nltk.download('punkt')

'''

"\n%pip install finvizfinance pandas transformers yfinance goose3 requests ipywidgets torch tensorflow nltk\nfrom IPython.display import display, Javascript\ndisplay(Javascript('IPython.notebook.kernel.restart()'))\n\nimport nltk\nnltk.download('punkt')\n\n"



## Import Libraries

Next, we import the necessary libraries for our analysis.



In [2]:
# Standard library imports
from datetime import datetime, timedelta
import warnings
from requests.exceptions import HTTPError

from finvizfinance.screener.overview import Overview # type: ignore
from finvizfinance.quote import finvizfinance        # type: ignore
from IPython.display import display                  # type: ignore
import pandas as pd                                  # type: ignore
from transformers import pipeline                    # type: ignore
import yfinance as yf                                # type: ignore
from goose3 import Goose                             # type: ignore
from requests import get                             # type: ignore
from nltk.tokenize import sent_tokenize              # type: ignore
from transformers import AutoTokenizer               # type: ignore
from bs4 import BeautifulSoup                        # type: ignore
import csv
import os
warnings.simplefilter(action='ignore', category=FutureWarning)
pd.set_option('display.max_colwidth', None) # Display full text in pandas dataframe / no line wrapping



## Stock Filtering

We define our filters for selecting stocks. These filters can be modified as per your requirements.



In [3]:
FILTERS_DICT = {
    'Performance': 'Today +10%',       # Day increase 10%
    'Relative Volume': 'Over 5',       # High Relative Volume
    #'Price': 'Under $20',             # Price under 20 USD / Ross Cameron's filter
    'Price': 'Under $50',             # Price under 20 USD / Ross Cameron's filter

    'Float': 'Under 10M',              # Float under / Ross Cameron's filter

    #'Price': '$10 to $50',             # Price 10-100 USD / Andrew Aziz's filter
    #'Float': 'Under 100M'             # Float under / Andrew Aziz's filter (float between 20M and 500M)
}


NEWS_AGE_LIMIT_IN_DAYS = 5 # Limit of days for the news to be considered relevant
POST_FILTER = False

# Alternative filtering to consider:
'''
FILTERS_DICT = {'Debt/Equity':'Under 1',                 # Positive Operating Margin
                'PEG':'Low (<1)',                        # Debt-to-Equity ratio under 1
                'Operating Margin':'Positive (>0%)',     # Low P/B (under 1)
                'P/B':'Low (<1)',                        # Low P/E ratio (under 15)
                'P/E':'Low (<15)',                       # Low PEG ratio (under 1)
                'InsiderTransactions':'Positive (>0%)<'} # Positive Insider Transactions
'''


# The filters and general manual link for the finvizfinance library: https://finvizfinance.readthedocs.io/_/downloads/en/latest/pdf/ 
# Possible filters can be found by running the following code:

#from finvizfinance.screener.overview import Overview # type: ignore
#foverview = Overview()    # Create Overview object
#foverview.get_filters()   # Get list of all possible filters

# And after to see the possible options for a filter, run the following code:
#foverview.get_filter_options('Float') # Get list of all possible options for a filter, example on 'Relative Volume'


"\nFILTERS_DICT = {'Debt/Equity':'Under 1',                 # Positive Operating Margin\n                'PEG':'Low (<1)',                        # Debt-to-Equity ratio under 1\n                'Operating Margin':'Positive (>0%)',     # Low P/B (under 1)\n                'P/B':'Low (<1)',                        # Low P/E ratio (under 15)\n                'P/E':'Low (<15)',                       # Low PEG ratio (under 1)\n                'InsiderTransactions':'Positive (>0%)<'} # Positive Insider Transactions\n"



## Fetching Filtered Stocks

We define a function `get_filtered_stocks()` to fetch the stocks that meet our filter criteria.



In [4]:
def get_filtered_stocks():
    """
    Returns a dataframe of tickers with user-defined filters applied.:
    """
    
    foverview = Overview()
    foverview.set_filter(filters_dict=FILTERS_DICT)
    df_undervalued_overview = foverview.screener_view()
    if not df_undervalued_overview.empty:  # Use .empty to check if DataFrame is empty
        df_undervalued_overview.drop('P/E', axis=1, inplace=True) # Drop P/E column as it is not relevant
        if not os.path.exists('out'): #ensures you have an 'out' folder ready
            os.makedirs('out')
        df_undervalued_overview.to_csv('out/Overview.csv', index=False)
        #display(df_undervalued_overview)
        return df_undervalued_overview
    
    else:  
        print('No stocks found with the given filters')
        return None

In [5]:



def post_filtering_stocks(df_undervalued_overview):
    if POST_FILTER:
        # MIN_FLOAT = '20M' # Minimum float to consider
        # MAX_FLOAT = '500M' # Maximum float to consider
        # MIN_MARKET_CAP = '300M' # Minimum market cap to consider
        # MAX_MARKET_CAP = '2B' # Maximum market cap to consider

        MIN_FLOAT = '1K' # Minimum float to consider
        MAX_FLOAT = '10B' # Maximum float to consider
        MIN_MARKET_CAP = '10M' # Minimum market cap to consider
        MAX_MARKET_CAP = '200B' # Maximum market cap to consider
    else:
        return df_undervalued_overview
    
    def convert_to_int(s): # Function to convert string to integer
        return int(float(s[:-1]) * {'B': 1e9, 'M': 1e6, 'K': 1e3, '': 1}[s[-1]])
    
    # Filter stocks based on Market Cap:
    df_undervalued_overview = df_undervalued_overview[ (df_undervalued_overview['Market Cap'] >= convert_to_int(MIN_MARKET_CAP)) 
                                                      & (df_undervalued_overview['Market Cap'] <= convert_to_int(MAX_MARKET_CAP))]
    
    
    if len(df_undervalued_overview) == 0:
        print("No stocks found with the given filters. Please modify the filters and try again.")
        return
    if len(df_undervalued_overview) > 20:   # If more than 20 stocks are found, sort by Change and keep only the top 20
        df_undervalued_overview = df_undervalued_overview.sort_values(by='Change', ascending=False) # Sort by Change
        df_undervalued_overview = df_undervalued_overview.iloc[20:].index # Keep only the top 20 stocks, API limitation
    
    df_undervalued_overview['Float'] = float('nan')
    for ticker in df_undervalued_overview['Ticker']:
        stock = finvizfinance(ticker)
        ticker_fundament = stock.ticker_fundament(ticker)
        df_undervalued_overview.loc[df_undervalued_overview['Ticker'] == ticker, 'Float'] = convert_to_int(ticker_fundament['Shs Float'])

    # Filter stocks based on Float:
    df_undervalued_overview = df_undervalued_overview[ (df_undervalued_overview['Float'] >= convert_to_int(MIN_FLOAT)) 
                                                      & (df_undervalued_overview['Float'] <= convert_to_int(MAX_FLOAT))]
    
    
    display(df_undervalued_overview)
    return df_undervalued_overview



## Sentiment Analysis

We define a function `get_ticker_news_sentiment(ticker)` to perform sentiment analysis on the news articles of a given ticker.



In [6]:

def get_recent_ticker_news(ticker):
    # Get news articles
    yf_ticker = yf.Ticker(ticker)
    
    try:
        news_list = yf_ticker.get_news()
    except:
        print(f"Error getting news for ticker {ticker}")
        return
    
    headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
    }
    extractor = Goose()
    
    titles, links, times, texts = [], [], [], []

    for dic in news_list:        
        try:
            response = get(dic['link'], headers=headers)
            response.raise_for_status()  # Raises a HTTPError if the status is 4xx, 5xx
        except HTTPError as http_err:
            print(f'HTTP error occurred: {http_err}')  # Python 3.6
        except Exception as err:
            print(f'Other error occurred: {err}')  # Python 3.6
        else:
            pass
        
        # Append the values to the respective lists
        titles.append(dic['title'])                                             # Article title
        links.append(dic['link'])                                               # Article link                         
        times.append(datetime.fromtimestamp(dic['providerPublishTime']))        # Article publish time
        texts.append(extractor.extract(raw_html=response.content).cleaned_text) # Article text
    
    #news_df = pd.DataFrame({'ticker': ticker, 'time': times, 'title': titles, 'text': texts, 'link': links})
    news_df = pd.DataFrame({'ticker': [ticker]*len(times), 'time': times, 'title': titles, 'text': texts, 'link': links})
    
    # Filter out news older than ""news_age_limit_in_days" days
    news_df = news_df[news_df['time'] > datetime.now() - timedelta(days=NEWS_AGE_LIMIT_IN_DAYS)]
    
    return news_df



def get_ticker_news_sentiment(stocknews):
    ALLOW_TOKENIZATION = True # True: the model will feed the full article into the model in chunks of 512 tokens, 
    #                            False: the model will consider only the first sentences of the article until the total number of tokens does not exceed 512

    tokenizer = AutoTokenizer.from_pretrained("ProsusAI/finbert")
    pipe = pipeline("text-classification", model="ProsusAI/finbert")

    # Initialize a list for the sentiment scores
    sentiments = []
    sentiment_scores = []

    for _, row in stocknews.iterrows():
        text = row['text']

        # Model max input is 512 tokens, so we need to split the text into chunks of 512 tokens
        if ALLOW_TOKENIZATION: # feed into model in chunks
            inputs = tokenizer.encode_plus(
                text,
                max_length=510,
                truncation='longest_first',
                padding='max_length',
                return_tensors='pt',
            )

            input_ids = inputs["input_ids"].tolist()[0]
            new_text = tokenizer.decode(input_ids)
        else: # feed into model only the first sentences (until 512 tokens)
            sentences = sent_tokenize(text)
            new_text = ''
            for sentence in sentences:
                if len(tokenizer.encode(new_text)) + len(tokenizer.encode(sentence)) <= 512:
                    new_text += ' ' + sentence
                else:
                    break

        results = pipe(new_text) # run the model

        # Get the sentiment score from the results and append it to the list        
        sentiment = results[0]['label']
        sentiment_score = results[0]['score'] if sentiment == 'positive' else -results[0]['score']
        
        sentiments.append(sentiment)
        sentiment_scores.append(sentiment_score) # instead assigning positive etc values, assign the score

    # Add the sentiment scores to the DataFrame
    stocknews['sentiment'] = sentiments
    stocknews['sentiment_score'] = sentiment_scores

    return stocknews






## Generate CSV

We define a function `generate_csv(sentiment,ticker)` to generate a CSV file for each ticker's sentiment analysis.



In [7]:
def generate_csv(stocknews, ticker):
    # Select all columns except 'text'
    stocknews = stocknews.loc[:, stocknews.columns != 'text']
    stocknews.to_csv(f'out/{ticker}.csv', index=False)



## Fetch Sentiments and Generate CSVs

We fetch the sentiments for each undervalued stock and generate a CSV file for each.



In [8]:
df_undervalued_overview = get_filtered_stocks()
df_undervalued_overview = post_filtering_stocks(df_undervalued_overview)
list_undervalued_tickers = df_undervalued_overview['Ticker'].to_list()

df_undervalued_overview

[Info] loading page [##############################] 1/1 

Unnamed: 0,Ticker,Company,Sector,Industry,Country,Market Cap,Price,Change,Volume
0,AILE,iLearningEngines Inc.,Technology,Software - Infrastructure,USA,75360000.0,6.48,0.1289,413116.0
1,BWAQ,Blue World Acquisition Corp,Financial,Shell Companies,USA,50080000.0,9.04,0.3255,1645831.0
2,KWE,KWESST Micro Systems Inc.,Industrials,Aerospace & Defense,Canada,4250000.0,1.05,0.2033,35180214.0
3,PIXY,ShiftPixy Inc,Industrials,Staffing & Employment Services,USA,10950000.0,1.62,0.1096,19130828.0
4,TRIB,Trinity Biotech Plc ADR,Healthcare,Medical Devices,Ireland,22020000.0,2.88,0.2639,236679.0
5,WHLR,Wheeler Real Estate Investment Trust Inc,Real Estate,REIT - Retail,USA,6340000.0,2.24,0.4268,4963749.0


In [9]:

stocknews_list = []
for ticker in list_undervalued_tickers:
    stocknews = get_recent_ticker_news(ticker)
    stocknews = get_ticker_news_sentiment(stocknews)

    if stocknews is not None:
        stocknews_list.append(stocknews)
        #generate_csv(stocknews,ticker)
    else:
        print(f'No news found for {ticker}')
    print(f'{ticker} News sentiment analysis done, {len(list_undervalued_tickers) - list_undervalued_tickers.index(ticker) - 1} stock tickers left')	


tokenizer_config.json:   0%|          | 0.00/252 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


config.json:   0%|          | 0.00/758 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

RuntimeError: Failed to import transformers.models.bert.modeling_tf_bert because of the following error (look up to see its traceback):
Your currently installed version of Keras is Keras 3, but this is not yet supported in Transformers. Please install the backwards-compatible tf-keras package with `pip install tf-keras`.



## Display Sentiments

Finally, we display the sentiments for each ticker.



In [None]:

for ticker in stocknews_list:
    ticker = ticker.loc[:, ticker.columns != 'text']
    display(ticker)

Unnamed: 0,ticker,time,title,link,sentiment,sentiment_score


Unnamed: 0,ticker,time,title,link,sentiment,sentiment_score


Unnamed: 0,ticker,time,title,link,sentiment,sentiment_score


Unnamed: 0,ticker,time,title,link,sentiment,sentiment_score


Unnamed: 0,ticker,time,title,link,sentiment,sentiment_score
0,MLGO,2024-06-04 10:10:00,MicroAlgo Inc. (NASDAQ: MLGO) Announced to Jointly Establish a Micro-Consciousness Quantum Research Center With WIMI (NASDAQ: WIMI),https://finance.yahoo.com/news/microalgo-inc-nasdaq-mlgo-announced-081000969.html,neutral,-0.870487


Unnamed: 0,ticker,time,title,link,sentiment,sentiment_score
0,MRIN,2024-06-03 23:10:00,Marin Software Announces Expanded Amazon Integration to Unlock Channel for All Advertisers,https://finance.yahoo.com/news/marin-software-announces-expanded-amazon-211000730.html,positive,0.631122


Unnamed: 0,ticker,time,title,link,sentiment,sentiment_score


Unnamed: 0,ticker,time,title,link,sentiment,sentiment_score
0,SILO,2024-06-04 14:41:00,Silo Pharma Submits Pre-Investigational New Drug Application to FDA for SPC-15 as a Treatment for PTSD and Anxiety,https://finance.yahoo.com/news/silo-pharma-submits-pre-investigational-124100723.html,positive,0.579352


Unnamed: 0,ticker,time,title,link,sentiment,sentiment_score
0,SSY,2024-06-03 23:20:00,"SunLink Health Systems, Inc. Announces Sale of Trace Extended Care & Rehab",https://finance.yahoo.com/news/sunlink-health-systems-inc-announces-212000561.html,neutral,-0.906956




## Additional Code

The following code snippets are not integrated into the main flow of the notebook but can be used for additional analysis.



In [None]:
'''
quote = finvizfinance('SGE')
df = quote.ticker_inside_trader()
from datetime import datetime
today = datetime.today().date()
df = quote.ticker_news()
df = df[df['Date'].dt.date == today]
df
df = quote.ticker_fundament()
df
'''

"\nquote = finvizfinance('SGE')\ndf = quote.ticker_inside_trader()\nfrom datetime import datetime\ntoday = datetime.today().date()\ndf = quote.ticker_news()\ndf = df[df['Date'].dt.date == today]\ndf\ndf = quote.ticker_fundament()\ndf\n"