# Stock Analysis Notebook

This Jupyter notebook is designed to analyze stocks based on various filters and sentiment analysis of news articles. 

## Dependencies Installation

First, we need to install the necessary dependencies. Uncomment the lines below if you haven't installed these packages yet.



In [1]:
'''
%pip install finvizfinance
%pip install pandas
%pip install transformers
%pip install yfinance
%pip install goose3
%pip install requests
%pip install ipywidgets
%pip install torch
%pip install tensorflow
%pip install nltk
import nltk
nltk.download('punkt')
'''

"\n%pip install finvizfinance\n%pip install pandas\n%pip install transformers\n%pip install yfinance\n%pip install goose3\n%pip install requests\n%pip install ipywidgets\n%pip install torch\n%pip install tensorflow\n%pip install nltk\nimport nltk\nnltk.download('punkt')\n"



## Import Libraries

Next, we import the necessary libraries for our analysis.



In [2]:
# Standard library imports
from datetime import datetime, timedelta
import warnings
from requests.exceptions import HTTPError

from finvizfinance.screener.overview import Overview # type: ignore
from finvizfinance.quote import finvizfinance        # type: ignore
from IPython.display import display                  # type: ignore
import pandas as pd                                  # type: ignore
from transformers import pipeline                    # type: ignore
import yfinance as yf                                # type: ignore
from goose3 import Goose                             # type: ignore
from requests import get                             # type: ignore
from nltk.tokenize import sent_tokenize              # type: ignore
from transformers import AutoTokenizer               # type: ignore
from bs4 import BeautifulSoup                        # type: ignore
import csv
import os
warnings.simplefilter(action='ignore', category=FutureWarning)
pd.set_option('display.max_colwidth', None) # Display full text in pandas dataframe / no line wrapping



## Stock Filtering

We define our filters for selecting stocks. These filters can be modified as per your requirements.



In [3]:
FILTERS_DICT = {
    'Performance': 'Today +10%',       # Day increase 10%
    'Relative Volume': 'Over 5',       # High Relative Volume
    #'Price': 'Under $20',             # Price under 20 USD / Ross Cameron's filter
    'Price': 'Under $50',             # Price under 20 USD / Ross Cameron's filter

    'Float': 'Under 10M',              # Float under / Ross Cameron's filter

    #'Price': '$10 to $50',             # Price 10-100 USD / Andrew Aziz's filter
    #'Float': 'Under 100M'             # Float under / Andrew Aziz's filter (float between 20M and 500M)
}


NEWS_AGE_LIMIT_IN_DAYS = 5 # Limit of days for the news to be considered relevant
POST_FILTER = False

# Alternative filtering to consider:
'''
FILTERS_DICT = {'Debt/Equity':'Under 1',                 # Positive Operating Margin
                'PEG':'Low (<1)',                        # Debt-to-Equity ratio under 1
                'Operating Margin':'Positive (>0%)',     # Low P/B (under 1)
                'P/B':'Low (<1)',                        # Low P/E ratio (under 15)
                'P/E':'Low (<15)',                       # Low PEG ratio (under 1)
                'InsiderTransactions':'Positive (>0%)<'} # Positive Insider Transactions
'''


# The filters and general manual link for the finvizfinance library: https://finvizfinance.readthedocs.io/_/downloads/en/latest/pdf/ 
# Possible filters can be found by running the following code:

#from finvizfinance.screener.overview import Overview # type: ignore
#foverview = Overview()    # Create Overview object
#foverview.get_filters()   # Get list of all possible filters

# And after to see the possible options for a filter, run the following code:
#foverview.get_filter_options('Float') # Get list of all possible options for a filter, example on 'Relative Volume'


"\nFILTERS_DICT = {'Debt/Equity':'Under 1',                 # Positive Operating Margin\n                'PEG':'Low (<1)',                        # Debt-to-Equity ratio under 1\n                'Operating Margin':'Positive (>0%)',     # Low P/B (under 1)\n                'P/B':'Low (<1)',                        # Low P/E ratio (under 15)\n                'P/E':'Low (<15)',                       # Low PEG ratio (under 1)\n                'InsiderTransactions':'Positive (>0%)<'} # Positive Insider Transactions\n"



## Fetching Filtered Stocks

We define a function `get_filtered_stocks()` to fetch the stocks that meet our filter criteria.



In [4]:
def get_filtered_stocks():
    """
    Returns a dataframe of tickers with user-defined filters applied.:
    """
    
    foverview = Overview()
    foverview.set_filter(filters_dict=FILTERS_DICT)
    df_undervalued_overview = foverview.screener_view()
    if not df_undervalued_overview.empty:  # Use .empty to check if DataFrame is empty
        df_undervalued_overview.drop('P/E', axis=1, inplace=True) # Drop P/E column as it is not relevant
        if not os.path.exists('out'): #ensures you have an 'out' folder ready
            os.makedirs('out')
        df_undervalued_overview.to_csv('out/Overview.csv', index=False)
        #display(df_undervalued_overview)
        return df_undervalued_overview
    
    else:  
        print('No stocks found with the given filters')
        return None

In [5]:



def post_filtering_stocks(df_undervalued_overview):
    if POST_FILTER:
        # MIN_FLOAT = '20M' # Minimum float to consider
        # MAX_FLOAT = '500M' # Maximum float to consider
        # MIN_MARKET_CAP = '300M' # Minimum market cap to consider
        # MAX_MARKET_CAP = '2B' # Maximum market cap to consider

        MIN_FLOAT = '1K' # Minimum float to consider
        MAX_FLOAT = '10B' # Maximum float to consider
        MIN_MARKET_CAP = '10M' # Minimum market cap to consider
        MAX_MARKET_CAP = '200B' # Maximum market cap to consider
    else:
        return df_undervalued_overview
    
    def convert_to_int(s): # Function to convert string to integer
        return int(float(s[:-1]) * {'B': 1e9, 'M': 1e6, 'K': 1e3, '': 1}[s[-1]])
    
    # Filter stocks based on Market Cap:
    df_undervalued_overview = df_undervalued_overview[ (df_undervalued_overview['Market Cap'] >= convert_to_int(MIN_MARKET_CAP)) 
                                                      & (df_undervalued_overview['Market Cap'] <= convert_to_int(MAX_MARKET_CAP))]
    
    
    if len(df_undervalued_overview) == 0:
        print("No stocks found with the given filters. Please modify the filters and try again.")
        return
    if len(df_undervalued_overview) > 20:   # If more than 20 stocks are found, sort by Change and keep only the top 20
        df_undervalued_overview = df_undervalued_overview.sort_values(by='Change', ascending=False) # Sort by Change
        df_undervalued_overview = df_undervalued_overview.iloc[20:].index # Keep only the top 20 stocks, API limitation
    
    df_undervalued_overview['Float'] = float('nan')
    for ticker in df_undervalued_overview['Ticker']:
        stock = finvizfinance(ticker)
        ticker_fundament = stock.ticker_fundament(ticker)
        df_undervalued_overview.loc[df_undervalued_overview['Ticker'] == ticker, 'Float'] = convert_to_int(ticker_fundament['Shs Float'])

    # Filter stocks based on Float:
    df_undervalued_overview = df_undervalued_overview[ (df_undervalued_overview['Float'] >= convert_to_int(MIN_FLOAT)) 
                                                      & (df_undervalued_overview['Float'] <= convert_to_int(MAX_FLOAT))]
    
    
    display(df_undervalued_overview)
    return df_undervalued_overview



## Sentiment Analysis

We define a function `get_ticker_news_sentiment(ticker)` to perform sentiment analysis on the news articles of a given ticker.



In [6]:

def get_recent_ticker_news(ticker):
    # Get news articles
    yf_ticker = yf.Ticker(ticker)
    
    try:
        news_list = yf_ticker.get_news()
    except:
        print(f"Error getting news for ticker {ticker}")
        return
    
    headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
    }
    extractor = Goose()
    
    titles, links, times, texts = [], [], [], []

    for dic in news_list:        
        try:
            response = get(dic['link'], headers=headers)
            response.raise_for_status()  # Raises a HTTPError if the status is 4xx, 5xx
        except HTTPError as http_err:
            print(f'HTTP error occurred: {http_err}')  # Python 3.6
        except Exception as err:
            print(f'Other error occurred: {err}')  # Python 3.6
        else:
            pass
        
        # Append the values to the respective lists
        titles.append(dic['title'])                                             # Article title
        links.append(dic['link'])                                               # Article link                         
        times.append(datetime.fromtimestamp(dic['providerPublishTime']))        # Article publish time
        texts.append(extractor.extract(raw_html=response.content).cleaned_text) # Article text
    
    #news_df = pd.DataFrame({'ticker': ticker, 'time': times, 'title': titles, 'text': texts, 'link': links})
    news_df = pd.DataFrame({'ticker': [ticker]*len(times), 'time': times, 'title': titles, 'text': texts, 'link': links})
    
    # Filter out news older than ""news_age_limit_in_days" days
    news_df = news_df[news_df['time'] > datetime.now() - timedelta(days=NEWS_AGE_LIMIT_IN_DAYS)]
    
    return news_df



def get_ticker_news_sentiment(stocknews):
    ALLOW_TOKENIZATION = True # True: the model will feed the full article into the model in chunks of 512 tokens, 
    #                            False: the model will consider only the first sentences of the article until the total number of tokens does not exceed 512

    tokenizer = AutoTokenizer.from_pretrained("ProsusAI/finbert")
    pipe = pipeline("text-classification", model="ProsusAI/finbert")

    # Initialize a list for the sentiment scores
    sentiments = []
    sentiment_scores = []

    for _, row in stocknews.iterrows():
        text = row['text']

        # Model max input is 512 tokens, so we need to split the text into chunks of 512 tokens
        if ALLOW_TOKENIZATION: # feed into model in chunks
            inputs = tokenizer.encode_plus(
                text,
                max_length=510,
                truncation='longest_first',
                padding='max_length',
                return_tensors='pt',
            )

            input_ids = inputs["input_ids"].tolist()[0]
            new_text = tokenizer.decode(input_ids)
        else: # feed into model only the first sentences (until 512 tokens)
            sentences = sent_tokenize(text)
            new_text = ''
            for sentence in sentences:
                if len(tokenizer.encode(new_text)) + len(tokenizer.encode(sentence)) <= 512:
                    new_text += ' ' + sentence
                else:
                    break

        results = pipe(new_text) # run the model

        # Get the sentiment score from the results and append it to the list        
        sentiment = results[0]['label']
        sentiment_score = results[0]['score'] if sentiment == 'positive' else -results[0]['score']
        
        sentiments.append(sentiment)
        sentiment_scores.append(sentiment_score) # instead assigning positive etc values, assign the score

    # Add the sentiment scores to the DataFrame
    stocknews['sentiment'] = sentiments
    stocknews['sentiment_score'] = sentiment_scores

    return stocknews






## Generate CSV

We define a function `generate_csv(sentiment,ticker)` to generate a CSV file for each ticker's sentiment analysis.



In [7]:
def generate_csv(stocknews, ticker):
    # Select all columns except 'text'
    stocknews = stocknews.loc[:, stocknews.columns != 'text']
    stocknews.to_csv(f'out/{ticker}.csv', index=False)



## Fetch Sentiments and Generate CSVs

We fetch the sentiments for each undervalued stock and generate a CSV file for each.



In [8]:
df_undervalued_overview = get_filtered_stocks()
df_undervalued_overview = post_filtering_stocks(df_undervalued_overview)
list_undervalued_tickers = df_undervalued_overview['Ticker'].to_list()

df_undervalued_overview

[Info] loading page [##############################] 1/1 

Unnamed: 0,Ticker,Company,Sector,Industry,Country,Market Cap,Price,Change,Volume
0,AFBI,Affinity Bancshares Inc,Financial,Banks - Regional,USA,132570000.0,20.65,0.2179,17035.0
1,BNRG,Brenmiller Energy Ltd,Utilities,Utilities - Renewable,Israel,31270000.0,1.75,0.2376,1242292.0
2,CISS,C3is Inc,Industrials,Marine Shipping,Greece,10880000.0,1.87,0.1402,9381373.0
3,DHAC,Digital Health Acquisition Corp,Financial,Shell Companies,USA,82220000.0,22.84,0.5227,1014420.0
4,ERNA,Eterna Therapeutics Inc,Healthcare,Biotechnology,USA,11700000.0,2.16,0.1817,5647.0
5,FOXO,FOXO Technologies Inc,Healthcare,Health Information Services,USA,3490000.0,0.39,0.1462,464824.0
6,GNPX,Genprex Inc,Healthcare,Biotechnology,USA,5960000.0,2.84,0.1736,308247.0
7,LUCY,Innovative Eyewear Inc,Healthcare,Medical Instruments & Supplies,USA,20710000.0,1.18,0.5,33843854.0
8,VCNX,Vaccinex Inc,Healthcare,Biotechnology,USA,11150000.0,7.85,0.5096,30818.0
9,XHG,XChange Tec.Inc. ADR,Real Estate,Real Estate Services,China,4650000.0,1.08,0.554,16150452.0


In [9]:

stocknews_list = []
for ticker in list_undervalued_tickers:
    stocknews = get_recent_ticker_news(ticker)
    stocknews = get_ticker_news_sentiment(stocknews)

    if stocknews is not None:
        stocknews_list.append(stocknews)
        #generate_csv(stocknews,ticker)
    else:
        print(f'No news found for {ticker}')
    print(f'{ticker} News sentiment analysis done, {len(list_undervalued_tickers) - list_undervalued_tickers.index(ticker) - 1} stock tickers left')	



AFBI News sentiment analysis done, 9 stock tickers left
BNRG News sentiment analysis done, 8 stock tickers left
CISS News sentiment analysis done, 7 stock tickers left
DHAC News sentiment analysis done, 6 stock tickers left
ERNA News sentiment analysis done, 5 stock tickers left
FOXO News sentiment analysis done, 4 stock tickers left
GNPX News sentiment analysis done, 3 stock tickers left
LUCY News sentiment analysis done, 2 stock tickers left
VCNX News sentiment analysis done, 1 stock tickers left
XHG News sentiment analysis done, 0 stock tickers left




## Display Sentiments

Finally, we display the sentiments for each ticker.



In [10]:

for ticker in stocknews_list:
    ticker = ticker.loc[:, ticker.columns != 'text']
    display(ticker)

Unnamed: 0,ticker,time,title,link,sentiment,sentiment_score
0,AFBI,2024-05-30 22:15:00,APCU/Center Parc Credit Union Announces Definitive Agreement to Acquire Affinity Bank,https://finance.yahoo.com/news/apcu-center-parc-credit-union-201500571.html,positive,0.86249


Unnamed: 0,ticker,time,title,link,sentiment,sentiment_score


Unnamed: 0,ticker,time,title,link,sentiment,sentiment_score
0,CISS,2024-05-28 15:20:00,"C3is Inc. reports Revenue of $12.8 million, Net Income of $3.8 million, and financial and operating results for the quarter ended March 31, 2024",https://finance.yahoo.com/news/c3is-inc-reports-revenue-12-132000325.html,positive,0.737764


Unnamed: 0,ticker,time,title,link,sentiment,sentiment_score


Unnamed: 0,ticker,time,title,link,sentiment,sentiment_score


Unnamed: 0,ticker,time,title,link,sentiment,sentiment_score


Unnamed: 0,ticker,time,title,link,sentiment,sentiment_score
0,GNPX,2024-05-30 14:31:00,Genprex to Present at the 2024 BIO International Convention,https://finance.yahoo.com/news/genprex-present-2024-bio-international-123100674.html,neutral,-0.897014


Unnamed: 0,ticker,time,title,link,sentiment,sentiment_score
0,LUCY,2024-05-30 15:19:00,Eddie Bauer Smart Eyewear collection includes voice access to ChatGPT,https://finance.yahoo.com/m/3c7e30b0-00fe-31d4-a0df-6370100d4cd6/eddie-bauer-smart-eyewear.html,neutral,-0.725734
1,LUCY,2024-05-29 22:15:00,"Innovative Eyewear, Inc. Announces Closing of $2.5 Million Registered Direct Offering Priced At-the-Market Under Nasdaq Rules",https://finance.yahoo.com/news/innovative-eyewear-inc-announces-closing-201500305.html,neutral,-0.953569
2,LUCY,2024-05-29 19:22:24,Top Midday Gainers,https://finance.yahoo.com/news/top-midday-gainers-172224856.html,neutral,-0.630008
3,LUCY,2024-05-29 17:12:40,Innovative Eyewear Launches Rimless Eddie Bauer Smart Eyewear; Shares Skyrocket,https://finance.yahoo.com/news/innovative-eyewear-launches-rimless-eddie-151240033.html,neutral,-0.630008
4,LUCY,2024-05-29 15:00:00,"Innovative Eyewear, Inc. Launches Eddie Bauer® Smart Eyewear with ChatGPT",https://finance.yahoo.com/news/innovative-eyewear-inc-launches-eddie-130000736.html,positive,0.568539
5,LUCY,2024-05-28 20:43:00,"Innovative Eyewear, Inc. Announces $2.5 Million Registered Direct Offering Priced At-the-Market Under Nasdaq Rules",https://finance.yahoo.com/news/innovative-eyewear-inc-announces-2-184300125.html,neutral,-0.95429


Unnamed: 0,ticker,time,title,link,sentiment,sentiment_score


Unnamed: 0,ticker,time,title,link,sentiment,sentiment_score




## Additional Code

The following code snippets are not integrated into the main flow of the notebook but can be used for additional analysis.



In [11]:
'''
quote = finvizfinance('SGE')
df = quote.ticker_inside_trader()
from datetime import datetime
today = datetime.today().date()
df = quote.ticker_news()
df = df[df['Date'].dt.date == today]
df
df = quote.ticker_fundament()
df
'''

"\nquote = finvizfinance('SGE')\ndf = quote.ticker_inside_trader()\nfrom datetime import datetime\ntoday = datetime.today().date()\ndf = quote.ticker_news()\ndf = df[df['Date'].dt.date == today]\ndf\ndf = quote.ticker_fundament()\ndf\n"