# Información y Programación Financiera, TI, Algoritmos y Ciencia de Datos

Neftalí Valdez

<a href="http://twitter.com/neftalivldz" target="_blank">@neftalivldz</a> | <a href="mailto:nvaldez@tec.mx">nvaldez@tec.mx</a>

Referencias

<a href="https://developers.refinitiv.com/en/article-catalog/article/using-ai-modeling-to-interpret-10-Q-filings
" target="_blank">Artículo Original de Nick Zincone</a> 

<a href="https://pypi.org/project/sec-api/" target="_blank"> SEC API</a> 

<a href="https://huggingface.co/yiyanghkust/finbert-fls" target="_blank">Hugging Face.- Forward Looking Statements</a>


In [4]:
#!pip3 install transformers
#!pip3 install torch
#!conda install -c pytorch torchtext
#!conda install pytorch torchvision -c pytorch
#!pip3 install sec-api

In [1]:
import eikon as ek  # the Eikon Python wrapper package
import numpy as np  # NumPy
import pandas as pd  # pandas
import cufflinks as cf  # Cufflinks
import configparser as cp
import datetime as dt
cf.set_config_file(offline=True)  # set the plotting mode to offline

In [2]:
# NLP package used to aid in text manipulation
import nltk
from nltk.tokenize import word_tokenize, sent_tokenize
nltk.download('punkt')

# Machine Learning modules used to prepare and measure text
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers import BertTokenizer, BertForSequenceClassification, pipeline
import torch

# HTML text processing
from bs4 import BeautifulSoup

# Helper modules
import matplotlib.pyplot as plt
from tqdm.notebook import trange # Progress bar
import pandas as pd

from sec_api import QueryApi

pd.set_option('display.max_colwidth', 60)

[nltk_data] Downloading package punkt to
[nltk_data]     /Users/neftalivaldez/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [3]:
cfg = cp.ConfigParser()
cfg.read('../refinitiv.cfg')
ek.set_app_key(cfg['eikon']['app_id'])

In [4]:
cfg = cp.ConfigParser()
cfg.read('../secapi.cfg')
sec = cfg['sec']['app_id']


In [5]:
today = dt.date.today()
start = today - dt.timedelta(days=365.2*5)
print(today, start)

2023-04-25 2018-04-25


In [6]:
fields = ['TR.PortfolioConstituentName','TR.PortfolioWeight']
data, err = ek.get_data(['Portfolio(RETO2023_B)'],fields)
ric = data['Instrument'][3]

In [7]:
lista = ['TSLA.O', 'MSFT.O']
ric = lista[1]
ric

'MSFT.O'

In [8]:
data['Instrument']

0      BIMBOA.MX
1          ATT.L
2    CEMEXCPO.MX
3         TSLA.O
Name: Instrument, dtype: string

In [9]:
ric = 'MSFT.O'

In [10]:
#help(ek.get_symbology)

In [11]:
tick = ek.get_symbology(ric, from_symbol_type="RIC", to_symbol_type="ticker")['ticker'][0]
tick

'MSFT'

In [12]:
# Query the Filings service using the Refinitiv Data Library for Python.
#
# Retrieve SEC filings for the specific company - specify the text we want to retrieve
# is defined within the "Management Discussion" section.

queryApi = QueryApi(api_key=sec)

query = {
  "query": { "query_string": {
      "query": "ticker:MSFT AND filedAt:{2022-01-01 TO 2023-12-31} AND formType:\"10-Q\""
    } },
  "from": "0",
  "size": "10",
  "sort": [{ "filedAt": { "order": "desc" } }]
}

filings = queryApi.get_filings(query)

print(filings)

{'total': {'value': 4, 'relation': 'eq'}, 'query': {'from': 0, 'size': 10}, 'filings': [{'id': '3f455f9e50fd9b413801d596a91f6554', 'accessionNo': '0001564590-23-000733', 'cik': '789019', 'ticker': 'MSFT', 'companyName': 'MICROSOFT CORP', 'companyNameLong': 'MICROSOFT CORP (Filer)', 'formType': '10-Q', 'description': 'Form 10-Q - Quarterly report [Sections 13 or 15(d)]', 'filedAt': '2023-01-24T16:34:20-05:00', 'linkToTxt': 'https://www.sec.gov/Archives/edgar/data/789019/000156459023000733/0001564590-23-000733.txt', 'linkToHtml': 'https://www.sec.gov/Archives/edgar/data/789019/000156459023000733/0001564590-23-000733-index.htm', 'linkToXbrl': '', 'linkToFilingDetails': 'https://www.sec.gov/Archives/edgar/data/789019/000156459023000733/msft-10q_20221231.htm', 'entities': [{'companyName': 'MICROSOFT CORP (Filer)', 'cik': '789019', 'irsNo': '911144442', 'stateOfIncorporation': 'WA', 'fiscalYearEnd': '0630', 'type': '10-Q', 'act': '34', 'fileNo': '001-37845', 'filmNo': '23548555', 'sic': '737

In [13]:
filings['query']

{'from': 0, 'size': 10}

In [14]:
filings['filings']

[{'id': '3f455f9e50fd9b413801d596a91f6554',
  'accessionNo': '0001564590-23-000733',
  'cik': '789019',
  'ticker': 'MSFT',
  'companyName': 'MICROSOFT CORP',
  'companyNameLong': 'MICROSOFT CORP (Filer)',
  'formType': '10-Q',
  'description': 'Form 10-Q - Quarterly report [Sections 13 or 15(d)]',
  'filedAt': '2023-01-24T16:34:20-05:00',
  'linkToTxt': 'https://www.sec.gov/Archives/edgar/data/789019/000156459023000733/0001564590-23-000733.txt',
  'linkToHtml': 'https://www.sec.gov/Archives/edgar/data/789019/000156459023000733/0001564590-23-000733-index.htm',
  'linkToXbrl': '',
  'linkToFilingDetails': 'https://www.sec.gov/Archives/edgar/data/789019/000156459023000733/msft-10q_20221231.htm',
  'entities': [{'companyName': 'MICROSOFT CORP (Filer)',
    'cik': '789019',
    'irsNo': '911144442',
    'stateOfIncorporation': 'WA',
    'fiscalYearEnd': '0630',
    'type': '10-Q',
    'act': '34',
    'fileNo': '001-37845',
    'filmNo': '23548555',
    'sic': '7372 Services-Prepackaged So

In [15]:
df = pd.json_normalize(filings['filings'])
df

Unnamed: 0,id,accessionNo,cik,ticker,companyName,companyNameLong,formType,description,filedAt,linkToTxt,linkToHtml,linkToXbrl,linkToFilingDetails,entities,documentFormatFiles,dataFiles,seriesAndClassesContractsInformation,periodOfReport
0,3f455f9e50fd9b413801d596a91f6554,0001564590-23-000733,789019,MSFT,MICROSOFT CORP,MICROSOFT CORP (Filer),10-Q,Form 10-Q - Quarterly report [Sections 13 or 15(d)],2023-01-24T16:34:20-05:00,https://www.sec.gov/Archives/edgar/data/789019/000156459...,https://www.sec.gov/Archives/edgar/data/789019/000156459...,,https://www.sec.gov/Archives/edgar/data/789019/000156459...,"[{'companyName': 'MICROSOFT CORP (Filer)', 'cik': '78901...","[{'sequence': '1', 'description': '10-Q', 'documentUrl':...","[{'sequence': '7', 'description': 'XBRL TAXONOMY EXTENSI...",[],2022-12-31
1,4a746e26f404299c8bb632a6c864b1fa,0001564590-22-035087,789019,MSFT,MICROSOFT CORP,MICROSOFT CORP (Filer),10-Q,Form 10-Q - Quarterly report [Sections 13 or 15(d)],2022-10-25T16:08:55-04:00,https://www.sec.gov/Archives/edgar/data/789019/000156459...,https://www.sec.gov/Archives/edgar/data/789019/000156459...,,https://www.sec.gov/Archives/edgar/data/789019/000156459...,"[{'companyName': 'MICROSOFT CORP (Filer)', 'cik': '78901...","[{'sequence': '1', 'description': '10-Q', 'documentUrl':...","[{'sequence': '9', 'description': 'XBRL TAXONOMY EXTENSI...",[],2022-09-30
2,51ce4292e8873deb1d7be191dbd49eff,0001564590-22-015675,789019,MSFT,MICROSOFT CORP,MICROSOFT CORP (Filer),10-Q,Form 10-Q - Quarterly report [Sections 13 or 15(d)],2022-04-26T16:08:55-04:00,https://www.sec.gov/Archives/edgar/data/789019/000156459...,https://www.sec.gov/Archives/edgar/data/789019/000156459...,,https://www.sec.gov/Archives/edgar/data/789019/000156459...,"[{'companyName': 'MICROSOFT CORP (Filer)', 'cik': '78901...","[{'sequence': '1', 'description': '10-Q', 'documentUrl':...","[{'sequence': '7', 'description': 'XBRL TAXONOMY EXTENSI...",[],2022-03-31
3,35d48ca58bcc83e681e697225c42c614,0001564590-22-002324,789019,MSFT,MICROSOFT CORP,MICROSOFT CORP (Filer),10-Q,Form 10-Q - Quarterly report [Sections 13 or 15(d)],2022-01-25T16:09:04-05:00,https://www.sec.gov/Archives/edgar/data/789019/000156459...,https://www.sec.gov/Archives/edgar/data/789019/000156459...,,https://www.sec.gov/Archives/edgar/data/789019/000156459...,"[{'companyName': 'MICROSOFT CORP (Filer)', 'cik': '78901...","[{'sequence': '1', 'description': '10-Q', 'documentUrl':...","[{'sequence': '7', 'description': 'XBRL TAXONOMY EXTENSI...",[],2021-12-31


In [16]:
from sec_api import ExtractorApi

extractorApi = ExtractorApi(sec)

#
# 10-Q example
#
# Tesla 10-Q filing
filing_url_10q = "https://www.sec.gov/Archives/edgar/data/789019/000156459023000733/0001564590-23-000733-index.htm"

# get the original HTML of section 7 "Management’s Discussion and Analysis of Financial Condition and Results of Operations"
section_html = extractorApi.get_section(filing_url_10q, "part1item2", "html")
beautifulSoupText = BeautifulSoup(section_html, "html.parser").text.replace(u'\xa0', ' ').replace('\r', ' ')
beautifulSoupText


'ITEM 2. MANAGEMENT’S DISCUSSION AND ANALYSIS OF FINANCIAL CONDITION AND RESULTS OF OPERATIONS Note About Forward-Looking Statements This report includes estimates, projections, statements relating to our business plans, objectives, and expected operating results that are “forward-looking statements” within the meaning of the Private Securities Litigation Reform Act of 1995, Section 27A of the Securities Act of 1933, and Section 21E of the Securities Exchange Act of 1934. Forward-looking statements may appear throughout this report, including the following sections: “Management’s Discussion and Analysis of Financial Condition and Results of Operations” and “Risk Factors” (Part II, Item 1A of this Form 10-Q). These forward-looking statements generally are identified by the words “believe,” “project,” “expect,” “anticipate,” “estimate,” “intend,” “strategy,” “future,” “opportunity,” “plan,” “may,” “should,” “will,” “would,” “will be,” “will continue,” “will likely result,” and similar ex

In [17]:
# Parse Section Pending


In [18]:
# Declare our final results table
results = pd.DataFrame()
text = []
dates = []
section = []

# Pull out the filings text for each report
for i in range(0,len(df)):
    extractorApi = ExtractorApi(sec)
    filing_url_10q = df.linkToHtml.values.tolist()[i]

    # get the original HTML of section 7 "Management’s Discussion and Analysis of Financial Condition and Results of Operations"
    section_html = extractorApi.get_section(filing_url_10q, "part1item2", "html")
    beautifulSoupText = BeautifulSoup(section_html, "html.parser").text.replace(u'\xa0', ' ').replace('\r', ' ')
    # Clean the data and capture it for later processing
    text.append(beautifulSoupText)
    dates.append(df.filedAt.values.tolist()[i])
    section.append('ManagementDiscussion')
    

In [19]:
results['text'] = text
results['FilingDate'] = dates
results['section'] = section

In [20]:
results

Unnamed: 0,text,FilingDate,section
0,ITEM 2. MANAGEMENT’S DISCUSSION AND ANALYSIS OF FINANCIA...,2023-01-24T16:34:20-05:00,ManagementDiscussion
1,ITEM 2. MANAGEMENT’S DISCUSSION AND ANALYSIS OF FINANCIA...,2022-10-25T16:08:55-04:00,ManagementDiscussion
2,ITEM 2. MANAGEMENT’S DISCUSSION AND ANALYSIS OF FINANCIA...,2022-04-26T16:08:55-04:00,ManagementDiscussion
3,ITEM 2. MANAGEMENT’S DISCUSSION AND ANALYSIS OF FINANCIA...,2022-01-25T16:09:04-05:00,ManagementDiscussion


In [21]:
# Load models

In [22]:
# Load the models
finbert = BertForSequenceClassification.from_pretrained('yiyanghkust/finbert-fls',num_labels=3)

In [23]:
# Download the Pre-trained transformer used to process our raw text
tokenizer = BertTokenizer.from_pretrained('yiyanghkust/finbert-fls')
nlp = pipeline("text-classification", model=finbert, tokenizer=tokenizer)

In [24]:
prediction = nlp("At the same time, our competitors are rapidly developing and deploying cloud-based services for consumers and business customer", top_k=3)
prediction

[{'label': 'Not FLS', 'score': 0.9801842570304871},
 {'label': 'Specific FLS', 'score': 0.010232682339847088},
 {'label': 'Non-specific FLS', 'score': 0.009583091363310814}]

In [25]:
# Sentiment - Download the Pre-trained transformer used to process our raw text
sent_tokenizer = AutoTokenizer.from_pretrained("ProsusAI/finbert")

In [26]:
# Sentiment - Download the FinBert model used to process our transformed data
model = AutoModelForSequenceClassification.from_pretrained("ProsusAI/finbert")


In [27]:
# Capture closing prices

In [28]:
# Container to hold the Closing Prices based on the filing date
prices = []

# Walk through the collection of filings and pull out the reported filing date
num_rows = len(results)

for i in range(num_rows):
    date = results.iloc[i]['FilingDate']
    end = dt.datetime.strptime(date, "%Y-%m-%dT%H:%M:%S%z")
    start = dt.datetime.strptime(date, "%Y-%m-%dT%H:%M:%S%z") - dt.timedelta(minutes=1)
    try:
        response = ek.get_timeseries([ric], fields = ['CLOSE'], interval='minute', start_date=start, end_date=end)
        print(response)
        prices.append(response.iloc[0]['CLOSE'])
    except:
        prices.append(None)

if prices:
    results['close'] = prices

MSFT.O                  CLOSE
Date                         
2023-01-24 21:34:00  251.5114
MSFT.O                CLOSE
Date                       
2022-10-25 20:08:00  247.31
MSFT.O                CLOSE
Date                       
2022-04-26 20:08:00  261.55


2023-04-25 14:28:17,595 P[33087] [MainThread 8076811072] Error with MSFT.O: No data available for the requested date range
2023-04-25 14:28:17,595 P[33087] [MainThread 8076811072] MSFT.O: No data available for the requested date range | 


In [29]:
results

Unnamed: 0,text,FilingDate,section,close
0,ITEM 2. MANAGEMENT’S DISCUSSION AND ANALYSIS OF FINANCIA...,2023-01-24T16:34:20-05:00,ManagementDiscussion,251.5114
1,ITEM 2. MANAGEMENT’S DISCUSSION AND ANALYSIS OF FINANCIA...,2022-10-25T16:08:55-04:00,ManagementDiscussion,247.31
2,ITEM 2. MANAGEMENT’S DISCUSSION AND ANALYSIS OF FINANCIA...,2022-04-26T16:08:55-04:00,ManagementDiscussion,261.55
3,ITEM 2. MANAGEMENT’S DISCUSSION AND ANALYSIS OF FINANCIA...,2022-01-25T16:09:04-05:00,ManagementDiscussion,


In [38]:
def n_4(x, y):
    z = x + y + 4
    return print(z)

In [39]:
n_4(1,2)

7


In [70]:
def evaluate(filings):
    # Container to hold the percentages of FLS sentences within each filing
    fls_pct = []

    # Container to hold the sentiment scores
    scores = []
    
    # Walk through the collection of filings and feed into the FinBert models
    num_rows = len(filings)
    for i in range(num_rows):
        # Pull out the "Management Section" text from our filings
        management_section = filings.iloc[i]['text']
        
        # For this section, break it into individual sentences
        sentences = sent_tokenize(management_section)
        
        # Initialize our FLS container
        fls = []
        
        # Define the container to collect stats related to the sentiment scores
        # for all forward-looking statement
        sentiments = torch.Tensor([0,0,0])
        
        # Process each sentence, converting into tokens required by the FinBert model.
        for sentence in sentences:
            # FLS prediction
            prediction = nlp(sentence[:512], top_k=3)[0]['label']

            # Capture FLS statements
            if prediction.startswith("Specific") or prediction.startswith("Non"):
                fls.append(sentence)
                # Tokenize - The FinBert model requires tensor-based tokens as input. For any given
                # sentence, I must ensure the length must does not exceed the models self-imposed limit.
                encoded_input = sent_tokenizer(sentence, return_tensors="pt", truncation=True)
                
                with torch.no_grad():
                    # Run the sentence through the model...
                    output = model(**encoded_input)

                    # The prediction will be in the form of a probability
                    fls_sentiment = torch.nn.functional.softmax(output.logits, dim=-1)
                    
                    # Tally the predictions for each sentence
                    sentiments = sentiments+fls_sentiment

        # Record the percentage of FLS sentences
        fls_pct.append(len(fls)/len(sentences)*100)
        
        # Record the resulting sentiment for 'FLS' sentences within this section
        sentiments = sentiments.divide(len(sentences))
        
        score = model.config.id2label[sentiments.argmax().item()]
        print(f'Filing: {i+1} contains {len(sentences)} sentences of which {len(fls)} are "FLS" with a sentiment of: {sentiments} => {score}')
        scores.append(score)
    
    # Add the measures to our results table
    filings['fls_pct'] = fls_pct
    filings[f'fls_sentiment'] = scores

In [71]:
# Plot the data
def plot(x_axis, title, y1_label, x2_axis, **kwargs):
    plt.style.use('dark_background')
    plt.rcParams['figure.figsize'] = (17,8)
    fig, ax = plt.subplots()
    for label, data in kwargs.items():
        # Special label '_' to plot vertical bar
        if label == '_':
            x = 0
            for s in data:
                plt.axvline(x, ymax=0.25, color=s['color'], label=s['label'], linestyle="--")
                x += 1
        else:
            ax.plot(x_axis, data, label=label)
    ax2 = ax.twinx()
    ax2.plot(x_axis, x2_axis, label="Closing Price", color='red')
    ax.tick_params(labelrotation=90)
    # Remove the spines from the graph - leave the bottom
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.spines['left'].set_visible(False)
    # Add a faint grid
    ax.yaxis.grid(True, alpha=0.2)
    ax.xaxis.grid(True, alpha=0.2)
    # Add labels and a title. Note the use of `labelpad` and `pad` to add some
    # extra space between the text and the tick labels.
    ax.set_ylabel(y1_label, labelpad=12, fontsize=14, color='cyan')
    ax.set_title(title, pad=15, fontsize=16, color='cyan')
    if len(kwargs) > 1:
        ax.legend(loc='upper left')
    ax2.legend(loc='upper right')
    fig.tight_layout()

In [72]:
def sentiment_bars(scores):
    bars = []
    frequency = {'negative':0, 'neutral':0, 'positive':0}
    
    for s in scores:
        bar = {'color': '', 'label': ''}
        if s == 'negative':
            color = 'red'
        elif s == 'neutral':
            color = 'yellow'
        else:
            color = 'green'
            
        bar['color'] = color
        bar['label'] = s if frequency[s] == 0 else '_'
        frequency[s] += 1
        bars.append(bar)
        
    return bars

In [73]:
# Evaluate our predictions and compare against the sentement scores based on the 
# FLS sentences.
evaluate(results)

Filing: 1 contains 375 sentences of which 48 are "FLS" with a sentiment of: tensor([[0.0197, 0.0334, 0.0749]]) => neutral
Filing: 2 contains 289 sentences of which 46 are "FLS" with a sentiment of: tensor([[0.0238, 0.0410, 0.0943]]) => neutral
Filing: 3 contains 356 sentences of which 43 are "FLS" with a sentiment of: tensor([[0.0161, 0.0341, 0.0706]]) => neutral
Filing: 4 contains 334 sentences of which 42 are "FLS" with a sentiment of: tensor([[0.0160, 0.0361, 0.0737]]) => neutral


In [74]:
results

Unnamed: 0,text,FilingDate,section,close,fls_pct,fls_sentiment
0,ITEM 2. MANAGEMENT’S DISCUSSION AND ANALYSIS OF FINANCIA...,2023-01-24T16:34:20-05:00,ManagementDiscussion,251.5114,12.8,neutral
1,ITEM 2. MANAGEMENT’S DISCUSSION AND ANALYSIS OF FINANCIA...,2022-10-25T16:08:55-04:00,ManagementDiscussion,247.31,15.916955,neutral
2,ITEM 2. MANAGEMENT’S DISCUSSION AND ANALYSIS OF FINANCIA...,2022-04-26T16:08:55-04:00,ManagementDiscussion,261.55,12.078652,neutral
3,ITEM 2. MANAGEMENT’S DISCUSSION AND ANALYSIS OF FINANCIA...,2022-01-25T16:09:04-05:00,ManagementDiscussion,,12.57485,neutral


In [35]:
# Top 10...
results.head(10)


Unnamed: 0,text,FilingDate,section,close,fls_pct,fls_sentiment
0,ITEM 2. MANAGEMENT’S DISCUSSION AND ANALYSIS OF FINANCIA...,2023-01-24T16:34:20-05:00,ManagementDiscussion,251.5114,0.0,positive
1,ITEM 2. MANAGEMENT’S DISCUSSION AND ANALYSIS OF FINANCIA...,2022-10-25T16:08:55-04:00,ManagementDiscussion,247.31,0.0,positive
2,ITEM 2. MANAGEMENT’S DISCUSSION AND ANALYSIS OF FINANCIA...,2022-04-26T16:08:55-04:00,ManagementDiscussion,261.55,0.0,positive
3,ITEM 2. MANAGEMENT’S DISCUSSION AND ANALYSIS OF FINANCIA...,2022-01-25T16:09:04-05:00,ManagementDiscussion,,0.0,positive


In [None]:
# Container to hold the percentages of FLS sentences within each filing
    fls_pct = []

    # Container to hold the sentiment scores
    scores = []
    
    # Walk through the collection of filings and feed into the FinBert models
    num_rows = len(filings)
    for i in range(num_rows):
        # Pull out the "Management Section" text from our filings
        management_section = filings.iloc[i]['section']
        
        # For this section, break it into individual sentences
        sentences = sent_tokenize(management_section)
        
        # Initialize our FLS container
        fls = []
        
        # Define the container to collect stats related to the sentiment scores
        # for all forward-looking statement
        sentiments = torch.Tensor([0,0,0])
        
        # Process each sentence, converting into tokens required by the FinBert model.
        for sentence in sentences:
            # FLS prediction
            prediction = nlp(sentence[:512], top_k=3)[0]['label']

            # Capture FLS statements
            if prediction.startswith("Specific") or prediction.startswith("Non"):
                fls.append(sentence)
                # Tokenize - The FinBert model requires tensor-based tokens as input. For any given
                # sentence, I must ensure the length must does not exceed the models self-imposed limit.
                encoded_input = sent_tokenizer(sentence, return_tensors="pt", truncation=True)
                
                with torch.no_grad():
                    # Run the sentence through the model...
                    output = model(**encoded_input)

                    # The prediction will be in the form of a probability
                    fls_sentiment = torch.nn.functional.softmax(output.logits, dim=-1)
                    
                    # Tally the predictions for each sentence
                    sentiments = sentiments+fls_sentiment

        # Record the percentage of FLS sentences
        fls_pct.append(len(fls)/len(sentences)*100)
        
        # Record the resulting sentiment for 'FLS' sentences within this section
        sentiments = sentiments.divide(len(sentences))
        
        score = model.config.id2label[sentiments.argmax().item()]
        print(f'Filing: {i+1} contains {len(sentences)} sentences of which {len(fls)} are "FLS" with a sentiment of: {sentiments} => {score}')
        scores.append(score)
    
    # Add the measures to our results table
    filings['fls_pct'] = fls_pct
    filings[f'fls_sentiment'] = scores

In [69]:
filings = results
# Container to hold the percentages of FLS sentences within each filing
fls_pct = []

# Container to hold the sentiment scores
scores = []
    
# Walk through the collection of filings and feed into the FinBert models
num_rows = len(filings)
#for i in range(num_rows):
i = 0
# Pull out the "Management Section" text from our filings
management_section = filings.iloc[i]['text']

# For this section, break it into individual sentences
sentences = sent_tokenize(management_section)

# Initialize our FLS container
fls = []

# Define the container to collect stats related to the sentiment scores
# for all forward-looking statement
sentiments = torch.Tensor([0,0,0])

# Process each sentence, converting into tokens required by the FinBert model.
for sentence in sentences:
    #print(sentence)
    # FLS prediction
    prediction = nlp(sentence[:512], top_k=3)[0]['label']
    #print(prediction)

    # Capture FLS statements
    if prediction.startswith("Specific") or prediction.startswith("Non"):
        print(sentence)
        print(prediction)
        fls.append(sentence)
        # Tokenize - The FinBert model requires tensor-based tokens as input. For any given
        # sentence, I must ensure the length must does not exceed the models self-imposed limit.
        encoded_input = sent_tokenizer(sentence, return_tensors="pt", truncation=True)
        #print(encoded_input)
        
        with torch.no_grad():
            # Run the sentence through the model...
            output = model(**encoded_input)

            # The prediction will be in the form of a probability
            fls_sentiment = torch.nn.functional.softmax(output.logits, dim=-1)
                    
            # Tally the predictions for each sentence
            sentiments = sentiments+fls_sentiment
        print(sentiments)
# Record the percentage of FLS sentences
fls_pct.append(len(fls)/len(sentences)*100)
        
# Record the resulting sentiment for 'FLS' sentences within this section
sentiments = sentiments.divide(len(sentences))
        
score = model.config.id2label[sentiments.argmax().item()]
print(f'Filing: {i+1} contains {len(sentences)} sentences of which {len(fls)} are "FLS" with a sentiment of: {sentiments} => {score}')
        

We must continue to evolve and adapt over an extended time in pace with this changing environment.
Non-specific FLS
tensor([[0.6679, 0.0093, 0.3227]])
The investments we are making in infrastructure and devices will continue to increase our operating costs and may decrease our operating margins.
Non-specific FLS
tensor([[0.6814, 0.9763, 0.3423]])
Extended disruptions at these suppliers and/or manufacturers could lead to a similar disruption in our ability to manufacture devices on time to meet consumer demand.
Non-specific FLS
tensor([[0.6906, 1.9406, 0.3689]])
As a result, changes in foreign exchange rates may significantly affect revenue and expenses.
Non-specific FLS
tensor([[0.7066, 2.8586, 0.4348]])
While we are eliminating roles in some areas, we will continue to hire in key strategic areas.
Specific FLS
tensor([[0.7786, 3.3708, 0.8506]])
Third, we are consolidating our leases to create higher density across our workspaces, which will also impact our financial results through the

tensor([[ 6.4748,  7.2022, 20.3230]])
Equity investments without readily determinable fair values are written down to fair value if a qualitative assessment indicates that the investment is impaired and the fair value of the investment is less than carrying value.
Non-specific FLS
tensor([[ 6.4860,  8.0907, 20.4233]])
We are required to estimate the fair value of the investment to determine the amount of the impairment loss.
Non-specific FLS
tensor([[ 6.5126,  8.1225, 21.3650]])
Once an investment is determined to be impaired, an impairment charge is recorded in other income (expense), net.
Non-specific FLS
tensor([[ 6.5303,  8.2167, 22.2530]])
Goodwill is tested for impairment at the reporting unit level (operating segment or one level below an operating segment) on an annual basis (May 1 for us) and between annual tests if an event occurs or circumstances change that would more likely than not reduce the fair value of a reporting unit below its carrying value.
Non-specific FLS
tensor

In [75]:
results

Unnamed: 0,text,FilingDate,section,close,fls_pct,fls_sentiment
0,ITEM 2. MANAGEMENT’S DISCUSSION AND ANALYSIS OF FINANCIA...,2023-01-24T16:34:20-05:00,ManagementDiscussion,251.5114,12.8,neutral
1,ITEM 2. MANAGEMENT’S DISCUSSION AND ANALYSIS OF FINANCIA...,2022-10-25T16:08:55-04:00,ManagementDiscussion,247.31,15.916955,neutral
2,ITEM 2. MANAGEMENT’S DISCUSSION AND ANALYSIS OF FINANCIA...,2022-04-26T16:08:55-04:00,ManagementDiscussion,261.55,12.078652,neutral
3,ITEM 2. MANAGEMENT’S DISCUSSION AND ANALYSIS OF FINANCIA...,2022-01-25T16:09:04-05:00,ManagementDiscussion,,12.57485,neutral


In [79]:
# Visualize the results
plot(results['FilingDate'],"Distribution of 10-Q filings for",'% of FLS sentences', results['close'], percent_fls=results['fls_pct'], _=sentiment_bars(results['fls_sentiment']))

SyntaxError: invalid character in identifier (4174397727.py, line 2)