# Overview
In this notebook we'll do an overview of textual analysis techniques in Python.

# **Install Various Packages We'll Be Using**
**<font color='red'>Run This Next Code Block If You Are Early to the Webinar to Save Time</font>**

To run the code block, put your mouse's cursor in the code block, then hit the run icon (white triangle in black circle). Alternatively, you can use the keyboard shortcut CTRL/COMMAND+ENTER

You can find the list of all keyboard shortcuts by following these steps:
- Go to the "Tools" menu in the top menu bar.
- Select "Keyboard shortcuts" from the dropdown menu.

In [None]:
%%time
#!pip install transformers --quiet
#!pip install spacy --quiet
#!python -m spacy download en --quiet
!pip install -U sec-edgar-downloader --quiet
!python -m spacy download en_core_web_sm --quiet
!pip install textstat --quiet
!pip install wordfreq --quiet
!pip install -U deep-translator --quiet
!pip install gtts --quiet
!pip install vaderSentiment --quiet
!pip install textblob --quiet
!pip install sumy --quiet
#!pip install wordcloud --quiet

# Download a Few Helper Files

In [None]:
%%time
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

In [None]:
%%time
!wget 'https://github.com/gdsaxton/Textual_Analysis/blob/main/Flesch-Kincaid.png?raw=true' --quiet
!wget 'https://github.com/gdsaxton/Textual_Analysis/blob/main/Flesch%20Reading%20Score.png?raw=true' --quiet
!wget 'https://github.com/gdsaxton/Textual_Analysis/blob/main/Gunning-Fog.png?raw=true' --quiet
!wget 'https://github.com/gdsaxton/Textual_Analysis/blob/main/CabinSketch-Bold.ttf?raw=true' --quiet
!wget 'https://github.com/gdsaxton/Textual_Analysis/blob/main/Bob.png?raw=true' --quiet

In [None]:
import os

In [None]:
pwd

In [None]:
os.listdir()

In [None]:
os.rename('Flesch-Kincaid.png?raw=true', 'Flesch-Kincaid.png')
os.rename('Flesch Reading Score.png?raw=true', 'Flesch Reading Score.png')
os.rename('Gunning-Fog.png?raw=true', 'Gunning-Fog.png')
os.rename('CabinSketch-Bold.ttf?raw=true', 'CabinSketch-Bold.ttf')
os.rename('Bob.png?raw=true', 'Bob.png')

In [None]:
os.listdir()

# Load Packages

First, we will import several necessary Python packages. We will be using the <a href="http://pandas.pydata.org/">Python Data Analysis Library,</a> or <i>PANDAS</i>, extensively for our data manipulations. It is invaluable for analyzing datasets.

In [None]:
import numpy as np
import pandas as pd
from pandas import DataFrame
from pandas import Series

#### Set PANDAS to show all columns in DataFrame

PANDAS allows you to set various options for, among other things, inspecting the data. I like to be able to see all of the columns. Therefore, I typically include this line at the top of all my notebooks.

In [None]:
#http://pandas.pydata.org/pandas-docs/stable/options.html
pd.set_option('display.max_columns', None)
pd.set_option('max_colwidth', 500)

<br>

We can check which version of various packages we're using.

In [None]:
from platform import python_version
print(python_version())

In [None]:
print(pd.__version__)

In [None]:
pd.options.display.float_format = '{:,.2f}'.format

##### Import plotting packages
We are using iPython's built-in graphics package <i>matplotlib</i>, and making the plots prettier by applying the <i>Seaborn</i> package's tweaks to matplotlib. PANDAS makes it easy to produce fine plots of your data, thought typically the default graphs have a few things we'd like to tweak. Learning the ins and outs of all the possible modifications takes time, so don't worry about learning them all now. Instead, I'd recommend using the following examples as a template for your own data and then learning new options as you need them.

In [None]:
import matplotlib
print(matplotlib.__version__)
import matplotlib.pyplot as plt

#NECESSARY FOR XTICKS OPTION, ETC.
from pylab import*

In [None]:
import seaborn as sns
print(sns.__version__)

In [None]:
%matplotlib inline

In [None]:
from wordcloud import WordCloud

In [None]:
#FOR FUTUREWARNING REGARDING *append*
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
warnings.filterwarnings('ignore')

# Download Recent Annual Reports (10-K) for Tesla

Extract from sample Tesla 10-K filing

In [4]:
from IPython.display import display, HTML
display(HTML('<p style="margin-bottom:0pt;margin-top:0pt;text-indent:0%;font-weight:bold;font-family:Times New Roman;font-size:10pt;font-style:normal;text-transform:none;font-variant: normal;" id="ITEM_1A_RISK_FACTORS">ITEM 1A. RISK FACTORS</p> <p style="margin-top:4pt;margin-bottom:0pt;text-indent:4.54%;font-style:italic;font-family:Times New Roman;font-size:10pt;font-weight:normal;text-transform:none;font-variant: normal;">You should carefully consider the risks described below together with the other information set forth in this report, which could materially affect our business, financial condition and future results. The risks described below are not the only risks facing our company. Risks and uncertainties not currently known to us or that we currently deem to be immaterial also may materially adversely affect our business, financial condition and operating results. </p> <p style="margin-top:14pt;margin-bottom:0pt;text-indent:0%;font-weight:bold;font-family:Times New Roman;font-size:10pt;font-style:normal;text-transform:none;font-variant: normal;">Risks Related to Our Ability to Grow Our Business</p> <p style="margin-top:4pt;margin-bottom:0pt;margin-left:4.54%;text-indent:0%;font-weight:bold;font-style:italic;font-family:Times New Roman;font-size:10pt;text-transform:none;font-variant: normal;">We may be impacted by macroeconomic conditions resulting from the global COVID-19 pandemic.</p> '))

January 26, 2024 Filing: https://www.sec.gov/Archives/edgar/data/1318605/000162828024002390/tsla-20231231.htm

#### Install sec-edgar-downloader
Uncomment the next code block (remove the '#' symbols) if you didn't download above

https://sec-edgar-downloader.readthedocs.io/en/latest/#api-guide

In [6]:
%%time
#!pip install -U sec-edgar-downloader --quiet

In [None]:
from sec_edgar_downloader import Downloader

##### **<font color='red'>Add in your email in the code block below</font>**

In [None]:
%%time
dl = Downloader("York", "YOUR EMAIL ADDRESS") #ADD YOUR EMAIL ADDRESS; e.g., dl = Downloader("York", "montypython@gmail.com")

In [None]:
%%time
# Get the latest supported filings for a specified list of tickers and CIKs
equity_ids = ["TSLA"]
filing_type = "10-K"
for equity_id in equity_ids:
    dl.get(filing_type, equity_id, limit=3)

#### View current directory and switch to sec-edgar-filings folder

In [7]:
pwd

'/content'

In [None]:
#cd sec-edgar-filings

In [None]:
ls sec-edgar-filings

# Parse Downloaded Filings and Insert into Dataframe

### Rename each file to have the accession number
"Accession number: In the example above, 0001193125-15-118890 is the accession number, a unique identifier assigned automatically to an accepted submission by EDGAR. The first set of numbers (0001193125) is the CIK of the entity submitting the filing. This could be the company or a third-party filer agent. Some filer agents without a regulatory requirement to make disclosure filings with the SEC have a CIK but no searchable presence in the public EDGAR database. The next two numbers (15) represent the year. The last series of numbers represent a sequential count of submitted filings from that CIK. The count is usually, but not always, reset to zero at the start of each calendar year."

https://www.sec.gov/os/accessing-edgar-data

In [None]:
import os
# Set the path to the parent directory
parent_directory = '/content/sec-edgar-filings'

# Walk through all sub-folders and files in the directory
for root, dirs, files in os.walk(parent_directory):
    for file in files:
        # Check if the file is 'filing.txt'
        if file == 'full-submission.txt':
            # Get the name of the current folder
            folder_name = os.path.basename(root)
            # Construct the old and new file paths
            old_file_path = os.path.join(root, file)
            new_file_path = os.path.join(root, folder_name + '.txt')

            # Rename the file
            os.rename(old_file_path, new_file_path)
            print(f"Renamed '{old_file_path}' to '{new_file_path}'")

#### Move all TSLA filings out of their sub-folders into the main TSLA folder

---



In [None]:
import shutil

# Set the path to the parent directory and the destination directory
parent_directory = '/content/sec-edgar-filings/TSLA'
destination_directory = '/content/sec-edgar-filings/TSLA'

# Create the destination directory if it doesn't exist
if not os.path.exists(destination_directory):
    os.makedirs(destination_directory)

# Walk through all sub-folders and files in the directory
for root, dirs, files in os.walk(parent_directory):
    for file in files:
        # Check if the file is 'filing.txt'
        if file.endswith('.txt'):
            # Get the name of the current folder
            folder_name = os.path.basename(root)
            # Construct the old and new file paths
            old_file_path = os.path.join(root, file)
            new_file_name = folder_name + '.txt'
            new_file_path = os.path.join(destination_directory, new_file_name)

            # Rename and move the file
            os.rename(old_file_path, new_file_path)
            print(f"Moved and renamed '{old_file_path}' to '{new_file_path}'")


In [None]:
!rm -rf '/content/sec-edgar-filings/TSLA/10-K'

In [None]:
ls '/content/sec-edgar-filings/TSLA'

#### Create DataFrame holding data on all downloaded filings

In [None]:
%%time
from bs4 import BeautifulSoup
import re

# Set the path to the folder containing the text files
folder_path = '/content/sec-edgar-filings/TSLA'

# Initialize an empty DataFrame to store the CIK data
df = pd.DataFrame(columns=['Filename', 'CIK', 'Company Name', 'Reporting Period'])

# Loop over all .txt files in the folder
for filename in os.listdir(folder_path):
    if filename.endswith('.txt'):
        file_path = os.path.join(folder_path, filename)

        # Open and read the file
        with open(file_path, 'r', encoding='utf-8') as file:
            soup = BeautifulSoup(file, 'html.parser')

            # Use regular expression to find the CIK number
            cik_search = re.search('CENTRAL INDEX KEY:?\s+(\d+)', soup.get_text())
            if cik_search:
                cik_number = cik_search.group(1)
            else:
                cik_number = 'Not Found'

            name_search = re.search('COMPANY CONFORMED NAME:?\s+([^\n]+)', soup.get_text())
            if name_search:
                name = name_search.group(1).strip()
            else:
                name = 'Not Found'

            period_search = re.search('CONFORMED PERIOD OF REPORT:?\s+(\d+)', soup.get_text())
            if period_search:
                period = period_search.group(1)
            else:
                period = 'Not Found'

            # Append the data to the DataFrame
            #df = df.append({'Filename': filename, 'CIK': cik_number}, ignore_index=True)
            df = pd.concat([df, pd.DataFrame.from_records([{'Filename': filename, 'CIK': cik_number,
                                                           'Company Name': name, 'Reporting Period': period}])])

            #outputxlsx = pd.concat([outputxlsx, df], ignore_index=True)

# Display the DataFrame
df

#### Create 'ticker' column in dataframe

In [None]:
df['ticker'] = np.where(df['Company Name']=='Tesla, Inc.', 'TSLA', np.NaN)
df

In [None]:
df.columns

<br>Re-arrange Columns

In [None]:
df = df[['ticker', 'Filename', 'CIK', 'Company Name', 'Reporting Period']]
df = df.reset_index()
df = df.drop('index', axis=1)
df

### Loop Over all Filings in Dataframe

In [None]:
%%time
start_list = ['ITEM_1A_RISK_FACTORS', 'item_1a_risk_factors', 'risk_factors']
end_list = ['ITEM_1B_UNRESOLVED_STAFF_COMMENTS', 'item_1b_unresolved_staff_comments', 'unresolved_staff_comments']

for index, row in df[:].iterrows():
    file_path = '/content/sec-edgar-filings/'+row['ticker']+'/'+row['Filename']
    print(file_path)
    extracted_text = ''

    with open(file_path, 'r', encoding='utf-8') as file:
        soup = BeautifulSoup(file, 'html.parser')

        # Initialize a flag to indicate when to start collecting text
        start_collecting = False
        extracted_text = ''

        # Iterate through all <p> tags
        for p_tag in soup.find_all('p'):
            # Check if the current tag is the start tag
            # Get the id attribute of the tag in lowercase
            tag_id = (p_tag.get('id') or '').lower()

            # Check if the id is in the list
            if tag_id in start_list:
                # Do something if the id is in the list
                #print(f"Found a matching id: {tag_id}")
                start_collecting = True
                continue  # Skip the start tag itself

            # Check if the current tag is the end tag
            if tag_id in end_list:
                 break  # Stop collecting text

           # Collect text if between the start and end tags
            if start_collecting:
                extracted_text += p_tag.get_text(' ', strip=True) + '\n'

        ##### CODE BLOCK ADDED IN FOR LATEST TESLA FILING
        if extracted_text=='':
          for span in soup.find_all('span'):
            if span.text == "ITEM 1A. RISK FACTORS":
              print("Found the span with text: 'ITEM 1A. RISK FACTORS'")
              start_collecting = True
              continue  # Skip the start tag itself

            # Check if the current tag is the end tag
            if span.text == "ITEM 1B. UNRESOLVED STAFF COMMENTS":
              print("Found the span with text: 'ITEM 1B. UNRESOLVED STAFF COMMENTS'")
              break  # Stop collecting text

            # Collect text if between the start and end tags
            if start_collecting:
                extracted_text += span.get_text(' ', strip=True) + '\n'

        print(extracted_text.strip()[:100], '\n')

        df.loc[index, '1A_Text'] = extracted_text

df

#### Sort DF

In [None]:
df = df.sort_values('Reporting Period', ascending=False)
df

#### Save DF

In [None]:
pwd

In [None]:
os.listdir()

In [None]:
df.to_pickle('Three TSLA 10-K Risk Analysis Sections.pkl')

# Read in Dataset (If Needed)

PANDAS can read in data from a variety of different data types. We have a data set with 3 Risk Assessment sections from Tesla's 10-K filings. This is a PANDAS dataframe, which I also saved as a 'pickled' file and uploaded to a repository on my GitHub account. So, if needed, we can access this GitHub file using the following code blocks, where we will read in the remote file and assign it to the name ``df`` -- short for 'dataframe', the PANDAS name for a dataset. Second, we will use the <i>slice</i> function ``df[:1]`` to show the first row of the dataset.

- Reading in files in Google Colab: https://neptune.ai/blog/google-colab-dealing-with-files

- I'm showing some alternative methods here. I've "commented out" these lines by using the "#" sign at the start of each line. We could run the code by removing the pound sign.

### Read in Pickled File Directly from GitHub Repository
- Make sure to add "?raw=true" to end of file

In [None]:
#%%time
#import datetime
#print ("Current date and time : ", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"), '\n')
#df = pd.read_pickle('https://github.com/gdsaxton/Textual_Analysis/blob/main/Three%20TSLA%2010-K%20Risk%20Analysis%20Sections.pkl?raw=true')
#print('# of columns:', len(df.columns))
#print('# of observations:', len(df))
#df

In [None]:
#SORT DF
#df = df.sort_values('Reporting Period', ascending=False)
#df

### Read in Saved Version of DF (if necessary) - Download Pickled File Directly from GitHub Repository then Read in
- Make sure to add "?raw=true" to end of file

In [None]:
#!wget 'https://github.com/gdsaxton/Textual_Analysis/blob/main/Three%20TSLA%2010-K%20Risk%20Analysis%20Sections.pkl?raw=true'

In [None]:
#os.listdir()

In [None]:
#%%time
#import datetime
#print ("Current date and time : ", datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"), '\n')
#df = pd.read_pickle('Three TSLA 10-K Risk Analysis Sections.pkl?raw=true')
#print('# of columns:', len(df.columns))
#print('# of observations:', len(df))
#df

In [None]:
#SORT DF
#df = df.sort_values('Reporting Period', ascending=False)
#df

# String Operations in Python

### String Slicing and Indexing
Python strings are like arrays of characters, and you can easily extract parts of strings using slicing.

In [8]:
text = "You should carefully consider the risks described below together with the other information set forth in this report, which could materially affect our business, financial condition and future results. The risks described below are not the only risks facing our company. Risks and uncertainties not currently known to us or that we currently deem to be immaterial also may materially adversely affect our business, financial condition and operating results. Risks Related to Our Ability to Grow Our Business We may be impacted by macroeconomic conditions resulting from the global COVID-19 pandemic. Since the first quarter of 2020, there has been a worldwide impact from the COVID-19 pandemic."
text

'You should carefully consider the risks described below together with the other information set forth in this report, which could materially affect our business, financial condition and future results. The risks described below are not the only risks facing our company. Risks and uncertainties not currently known to us or that we currently deem to be immaterial also may materially adversely affect our business, financial condition and operating results. Risks Related to Our Ability to Grow Our Business We may be impacted by macroeconomic conditions resulting from the global COVID-19 pandemic. Since the first quarter of 2020, there has been a worldwide impact from the COVID-19 pandemic.'

In [9]:
print(text[:25])

You should carefully cons


In [10]:
print(text[1:25])

ou should carefully cons


In [11]:
print(text[-25:-3])

om the COVID-19 pandem


### Regular Expressions
With the re module, you can perform complex string searches, replacements, and manipulations.

In [None]:
import re
x = re.findall("COVID-19", text)
print(x)

### String Concatenation and Repetition
You can easily concatenate (join) strings together, or repeat them a certain number of times.

In [None]:
str1 = "Hello"
str2 = "World"
print(str1 + " " + str2)
print(str1 * 3)

### Changing Case
Python strings come with a variety of built-in methods for common tasks. To start, there are three methods for changing case:

``.upper()``: Changing to upper case.

``.lower()``: Changing to lower case.

``.capitalize()``: Changing start of sentences to upper case.

``swapcase()``: Changing each letter to its opposite case.

##### Upper case

In [None]:
print(text[:600].upper())

##### Lower case

In [None]:
print(text[:600].lower())

##### Capitalize

In [None]:
print(text[:600].capitalize())

##### Case Conversion

In [None]:
print(text[:600].swapcase())

### Replacement

``.replace()``: Replacing parts of a string.

In [None]:
text.replace('COVID-19', '*****THAT WHICH MUST NEVER BE NAMED*****')

### Select Latest Filing and Clean by Using ``replace`` to Remove Page Numbers, Bullet Points, and "Table of Contents"

In [None]:
pd.set_option('max_colwidth', 500)
df[:1]['1A_Text'].item()

In [None]:
for index, row in df[:1].iterrows():
    print('# of characters in risk factors section:', len(row['1A_Text']), '\n')
    print(row['1A_Text'][:15000])

In [None]:
for index, row in df[:1].iterrows():
  print('# of characters in risk factors section:', len(row['1A_Text']))

df['1A_Text'] = df['1A_Text'].replace('\n(\d{2})\n', '', regex=True)
for index, row in df[:1].iterrows():
    print('# of characters in risk factors section after removing page numbers:', len(row['1A_Text']))
    #print(row['1A_Text'], '\n\n\n\n\n\n\n\n\n\n')

df['1A_Text'] = df['1A_Text'].replace('•', '')
#for index, row in df[:1].iterrows():
    #print('# of characters in risk factors section after removing bullet points:', len(row['1A_Text']))
    #print(row['1A_Text'], '\n\n\n\n\n\n\n\n\n\n')

df['1A_Text'] = df['1A_Text'].replace('Table of Contents', '')
for index, row in df[:1].iterrows():
    print('# of characters in risk factors section after removing page numbers, bullet points, and Table of Contents:', len(row['1A_Text']), '\n')
    text = row['1A_Text']
    print(text[:15000])

### Splitting - Create a List
``.split()``: Splitting a string into a list.

In [None]:
word_list = text.split()
print('# of words in risk factors section:', len(word_list))
print("First 5 words: ", word_list[:5])

### Joining a List

In [None]:
my_word_list = text[:101].split()
print(my_word_list)

In [None]:
print(' '.join(my_word_list))

### Palindrome Check
You can easily check if a string is a palindrome. Here is a plug for the world's best-ever song about palindromes. :)

In [None]:
from IPython.display import Image
Image(width=400, filename='Bob.png')

In [None]:
s = "radar"
print(s == s[::-1])  # True

<br>Take every second character

In [None]:
s[::2]

In [None]:
s[::5]

The slice notation generally follows the format [start:stop:step].

In ``[::-1]``:

start is omitted, which means it defaults to the beginning of the string.

stop is omitted, which means it defaults to the end of the string.

step is -1, which tells Python to step through the string backwards.

So, ``[::-1]`` starts from the beginning of the string, goes all the way to the end, but takes steps backwards, effectively reversing the string.

In [None]:
s = "Lisa Bonet ate no basil"
print(s, '\n')
if s == s[::-1]:
  print("Yes, it's a palindrome!")
else:
  print("No, it's not a palindrome")

<br>Try making it lower case

In [None]:
s = s.lower()
print(s, '\n')
if s == s[::-1]:
  print("Yes, it's a palindrome!")
else:
  print("No, it's not a palindrome")

<br>Try removing spaces

In [None]:
s = s.replace(' ', '')
print(s, '\n')
if s == s[::-1]:
  print("Yes, it's a palindrome!")
else:
  print("No, it's not a palindrome")

### Zfill
Pad a numeric string on the left with zeros using ``zfill``.

In [None]:
number = "50"
print(number.zfill(5))  # "00050"

<br>Show *CIK* column of dataframe

In [None]:
df[['CIK']]

<br>Grab value from first cell

In [None]:
df[:1]['CIK'].item()

In [None]:
print('# of characters in first CIK:', len(df[:1]['CIK'].item()))

<br>Create new column called *CIK15* and add as many zeroes as needed to the start of the cell to make the string 15 characters.

In [None]:
df['CIK15'] = df['CIK'].apply(lambda x: x.zfill(15))

In [None]:
df[['CIK', 'CIK15']]

In [None]:
df[:1]['CIK15'].item()

In [None]:
print('# of characters in first CIK after using zfill:', len(df[:1]['CIK15'].item()))

<br>Add characters to end of string using ``ljust()``

In [None]:
cik = '0001318605'
print(cik.ljust(15, '0'))

### Strip Characters
Besides whitespace, you can use ``strip``, ``lstrip``, and ``rstrip`` to remove blank space or specified characters from the string.

In [None]:
string = "xxxyyyHello Worldyyxx"
print(string.strip('xy'))

The ``lstrip()`` and ``rstrip()`` methods in Python are used to remove leading (at the start) and trailing (at the end) characters from a string, respectively. By default, these methods remove whitespace, but you can also specify other characters to remove. Here are some examples to illustrate their usage:

<br>Removing leading whitespaces

In [None]:
string = "   Hello, World!   "
print(string)
print(string.lstrip())

<br>Remove specific trailing characters

In [None]:
string = "xxxHello, World!xxx"
print(string.rstrip('x'))

### Count Substring Occurrences
Use count to find how many times a substring occurs in a string.

In [None]:
string = "banana"
print(string.count('na'))  # 2

In [None]:
print(text.count('Tesla'))  # 2

### String to List Conversion (List Comprehension)
Convert strings to lists using list comprehensions for more control over the elements.

In [None]:
chars = [char for char in string]
print(chars)

<br>We can also use list comprehensions for selecting or replacing content, etc.

In [None]:
word_list[:10]

In [None]:
[word for word in word_list[:10] if word!='risks']

### Use List Comprehension to Replace 'U.S.' with "US"

In [None]:
word_list = [word.replace("U.S.", "US") for word in word_list]

# Miscellaneous String Operations (Optional)
Run through these on your own time if you are interested in playing more.

#### Strip
``.strip()``: Remove whitespace at beginning and end of a string

In [None]:
string = "    Our business and our future plans for expansion are capital-intensive, and the specific.    "
string

In [None]:
string = string.strip()
string

*Remove All Spaces*

In [None]:
no_spaces = string.replace(' ', '')
no_spaces

*Remove Extra White Space*

In [None]:
no_extra_spaces = string.replace('  ', ' ')
no_extra_spaces

*No Line Breaks*

In [None]:
string = 'Our business and\t our future plans for expansion are\n capital-intensive, and the specific.'
print(string)

In [None]:
no_line_breaks = string.replace('\n', '')
print(no_line_breaks)

*No Tabs*

In [None]:
no_tabs = string.replace('\t', '')
print(no_tabs)

*No Tabs or Line Breaks*

In [None]:
no_tabs_or_breaks = string.replace('\t', '').replace('\n', '')
print(no_tabs_or_breaks)

#### String Translation
The ``str.maketrans()`` and ``translate()`` methods can be used to replace or remove characters in a string.

In [None]:
intab = "aeiou"
outtab = "12345"
trantab = str.maketrans(intab, outtab)
str = "this is string example....wow!!!"
print(str.translate(trantab))

#### Expand Tabs
Convert tab characters in a string into spaces using ``expandtabs``.

In [None]:
string = "Hello\tWorld"
print(string)
print(string.expandtabs(24))  # "Hello   World"

### String Formatting
Python's ``format()`` method and f-strings (formatted string literals) make it easy to create neatly formatted strings.

In [None]:
name = "Alice"
age = 25
print(f"My name is {name} and I am {age} years old.")

### String Encoding and Decoding
In Python, you can encode strings into different formats like UTF-8 and decode them back.

In [None]:
str = "pythön!"
str_utf = str.encode("utf-8")
print(str_utf.decode("utf-8"))

### String Templates
The string module provides a Template class which offers another way to substitute values into strings.

In [None]:
from string import Template
t = Template('Hello, $name!')
print(t.substitute(name='Alice'))

### String Methods for Checking Content
Python offers string methods like ``isdigit()``, ``isalpha()``, ``isalnum()``, ``islower()``, ``isupper()``, and ``isspace()`` to check the content of the string.

- The ``isalnum()`` method returns True if all characters in the string are alphanumeric (either alphabets or numbers)

In [None]:
print(len(word_list))
print("First 5 words: ", word_list[:5])

In [None]:
[word for word in word_list if word.isdigit()]

### Find Substrings
``find`` and ``rfind`` locate the first occurrence of a substring (from the left and right, respectively).

In [None]:
string = "Hello world"
print(string.find('o'))  # 4

<br>Show character position of first and last instances of the word "Tesla" in our risk filing

In [None]:
text.find('Tesla')

In [None]:
text.rfind('Tesla')

# Tokenization and Parts of Speech Tagging

### Splitting text into words or sentences.

In [None]:
#%%time
#!pip install textblob --quiet

In [None]:
from textblob import TextBlob

In [None]:
blob = TextBlob(text)

In [None]:
print(blob.words)  # Word tokenization

In [None]:
print(blob[:10000].sentences)  # Sentence tokenization

### Part-of-Speech Tagging
TextBlob can also tag words in a sentence into parts of speech like nouns, verbs, adjectives, etc.

In [None]:
for word in blob[:150].tags:
    print(word)

# Translation

In [None]:
#%%time
#!pip install -U deep-translator --quiet

In [None]:
from deep_translator import GoogleTranslator
GoogleTranslator(source='auto', target='de').translate("keep it up, you are awesome")

In [None]:
GoogleTranslator(source='auto', target='en').translate('Mach weiter so, du bist großartig')

In [None]:
from deep_translator import GoogleTranslator
GoogleTranslator(source='auto', target='fr').translate(text[:456])

In [None]:
GoogleTranslator(source='auto', target='en').translate("Vous devez examiner attentivement les risques décrits ci-dessous ainsi que les autres informations présentées dans ce rapport, qui pourraient affecter sensiblement nos activités, notre situation financière et nos résultats futurs. Les risques décrits ci-dessous ne sont pas les seuls risques auxquels notre entreprise est confrontée. Les risques et incertitudes dont nous ne sommes pas actuellement conscients ou que nous considérons actuellement comme négligeables peuvent également avoir un effet défavorable important sur notre activité, notre situation financière et nos résultats d'exploitation.")

# Text-to-Speech

In [None]:
#%%time
#!pip install gtts --quiet

In [None]:
from gtts import gTTS

# Language in which you want to convert
language = 'en'

In [None]:
myobj = gTTS(text=text[:100], lang=language, slow=False)

In [None]:
# Passing the text and language to the engine,
# here we have marked slow=False, which tells the module that the converted audio should have a high speed
myobj = gTTS(text=text[:100], lang=language, slow=False)

## Saving the converted audio in a mp3 file named 'converted_speech.mp3'
#myobj.save("converted_speech.mp3")

## Playing the converted file
# `Note that you need an external tool like mpg321 to play the mp3 file, which you can install on Linux using sudo apt-get install mpg321. On Windows, you can use any standard media player to play the mp3 file.
#os.system("mpg321 converted_speech.mp3")

# Save the audio file
audio_file = "converted_speech.mp3"
myobj.save(audio_file)

from IPython.display import Audio
Audio(audio_file)

# Named Entity Recognition (NER)

https://spacy.io/usage/spacy-101#annotations-ner

In [None]:
#%%time
#!python -m spacy download en --quiet

In [None]:
#%%time
#SPACY SHOULD BE PRE-INSTALLED
#!pip install spacy --quiet

In [None]:
from spacy import displacy

In [None]:
import spacy

In [None]:
# Load the English model
nlp = spacy.load('en_core_web_sm')

In [None]:
# Test it on some text
doc = nlp("Hello Bob, I just bought an Apple computer in Berlin!")
for token in doc:
    print(token.text)

In [None]:
# Display entities in the Jupyter Notebook
displacy.render(doc, style='ent', jupyter=True)

In [None]:
# Example text
string = "Google LLC is an American multinational technology company that specializes in Internet-related services and products."

# Process the text
doc = nlp(string)

# Extract entities
for ent in doc.ents:
    print(ent.text, ent.label_)

In [None]:
displacy.render(doc, style='ent', jupyter=True)

In [None]:
#%%time
#!pip install transformers

In [None]:
doc = nlp(text)

for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

In [None]:
displacy.render(doc[:450], style='ent', jupyter=True)

# Word Frequency

We still need to do one thing -- we need to break our combined string up into words. We do this with the *split( )* command in Python, which breaks the string up at each space. The output is a list with as manuy elements as there are words in the list. We see below that there are 12,028 elements (words) in our combined list -- in other words, the risk filing section of Tesla's most-recent 10-K filing collectively has 12,028 words.

In [None]:
word_list = text.split()
print(len(word_list))
print("First 5 words: ", word_list[:5])

#### 2,551 Unique Words

In [None]:
print('# of Total Words:', len(word_list))
print('# of Unique Words:', len(set(word_list)))

In [None]:
Series(word_list).value_counts()[:25]

#### Remove 'stopwords'

In [None]:
import nltk
from nltk.corpus import stopwords
#from nltk.tokenize import word_tokenize

In [None]:
nltk.download('stopwords')

In [None]:
# NLTK's default stop words
stop_words = set(stopwords.words('english'))

In [None]:
# Remove stop words
filtered_text = [word for word in word_list if not word.lower() in stop_words]

# Join words back into a string
filtered_text_string = ' '.join(filtered_text)

print(len(filtered_text_string))
print(filtered_text_string[:1000])

In [None]:
len(filtered_text_string.split())

In [None]:
wordcount_df = DataFrame(Series(filtered_text).value_counts())
wordcount_df.columns = ['frequency']
print(len(wordcount_df))
wordcount_df[:10]

# Word Clouds

https://python-charts.com/ranking/wordcloud-matplotlib/

##### Download specific font (if not done at beginning)

In [None]:
#!wget 'https://github.com/gdsaxton/Textual_Analysis/blob/main/CabinSketch-Bold.ttf?raw=true' --quiet

In [None]:
#os.listdir()

In [None]:
#pwd

In [None]:
#%%time
#!pip install wordcloud --quiet

In [None]:
%%time
wordcloud = WordCloud(font_path='/content/CabinSketch-Bold.ttf',
                      background_color='white',
                      width=1400,
                      height=700
                     ).generate(filtered_text_string)
plt.imshow(wordcloud)
plt.axis('off')
plt.show()

In [None]:
plt.rcParams['figure.figsize'] = (15, 5)

In [None]:
# Circle mask
x, y = np.ogrid[:300, :300]
mask = (x - 150) ** 2 + (y - 150) ** 2 > 130 ** 2
mask = 255 * mask.astype(int)

wc = WordCloud(background_color = "white", repeat = True, mask = mask,
               font_path='/content/CabinSketch-Bold.ttf',
               height = 300, width = 300,
               #max_words = 25,
               contour_width = 3, contour_color = "black")
wc.generate(filtered_text_string)

plt.axis("off")
plt.imshow(wc, interpolation = "bilinear")

### Comparing TSLA words with most common English-language words
``wordfreq`` package
https://github.com/rspeer/wordfreq

In [None]:
#%%time
#!pip install wordfreq --quiet

In [None]:
from wordfreq import top_n_list
most_common_words = top_n_list('en', 2000)

In [None]:
print(len(most_common_words))
most_common_words[:10]

In [None]:
most_common_words[-10:]

##### Compare Top 1,000 Words

In [None]:
print(len(set(most_common_words[:1000]) - set(wordcount_df[:1000].index.tolist())))

In [None]:
print(len(wordcount_df))
print(len(set(wordcount_df[:1000].index.tolist()) - set(most_common_words[:1000])))

In [None]:
print('Top 1,000 English words not in top 1,000 Tesla Risk Factors section:',
      set(most_common_words[:1000]) - set(wordcount_df[:1000].index.tolist()))

### Show 1,000 Most-Common English Words Not in Tesla Risk Factors Section

In [None]:
not_in_TSLA = list(set(most_common_words[:1000]) - set(wordcount_df[:].index.tolist()))
print(type(not_in_TSLA))
print(len(not_in_TSLA))
not_in_TSLA[:5]

In [None]:
plt.rcParams['figure.figsize'] = (15, 5)

# Circle mask
x, y = np.ogrid[:300, :300]
mask = (x - 150) ** 2 + (y - 150) ** 2 > 130 ** 2
mask = 255 * mask.astype(int)

wc = WordCloud(background_color = "white", repeat = True, mask = mask,
               font_path='/content/CabinSketch-Bold.ttf',
               contour_width = 1, contour_color = "black")
wc.generate(' '.join(not_in_TSLA))

plt.axis("off")
plt.imshow(wc, interpolation = "bilinear")

### Show Words in Tesla Risk Factors Section That are Not in the Top 1,000 Most-Common English Words

In [None]:
print(len(word_list))
print("First 5 words: ", word_list[:5])

In [None]:
word_list_v2 = [word for word in word_list if word not in most_common_words[:2000]]
print(len(word_list_v2))
print(len(set(word_list_v2)))
word_list_v2[:5]

In [None]:
plt.rcParams['figure.figsize'] = (15, 5)

# Circle mask
x, y = np.ogrid[:300, :300]
mask = (x - 150) ** 2 + (y - 150) ** 2 > 130 ** 2
mask = 255 * mask.astype(int)

wc = WordCloud(background_color = "white", repeat = True, mask = mask,
               font_path='/content/CabinSketch-Bold.ttf',
               contour_width = 1, contour_color = "black")
wc.generate(' '.join(word_list_v2))

plt.axis("off")
plt.imshow(wc, interpolation = "bilinear")

# Readability

This can be done before sending out a document to a target audience

https://pypi.org/project/textstat/

In [None]:
#%%time
#!pip install textstat --quiet

In [None]:
import textstat

### Flesch Reading Ease
Flesch Reading Ease: Scores typically between 0-100. Higher scores indicate material that is easier to read; lower numbers mark passages that are more difficult to read.

In [None]:
# Flesch Reading Ease
flesch_reading_score = textstat.flesch_reading_ease(text)
print("Flesch Reading Ease:", flesch_reading_score)

In [None]:
from IPython.display import Image
Image(width=800, filename='Flesch Reading Score.png')

### Flesch-Kincaid Grade Level

Flesch-Kincaid Grade Level: Shows the US grade level needed to understand the text. For example, a score of 8 means eighth grade.

In [None]:
flesch_kincaid_grade = textstat.flesch_kincaid_grade(text)
print("Flesch-Kincaid Grade Level:", flesch_kincaid_grade)

In [None]:
from IPython.display import Image
Image(width=800, filename='Flesch-Kincaid.png')

### Gunning Fog Index
Gunning Fog Index: Estimates the years of formal education needed to understand the text on the first reading. Higher scores indicate harder texts. For instance, a fog index of 12 requires the reading level of a United States high school senior (around 18 years old).

In [None]:
textstat.gunning_fog(text)

In [None]:
from IPython.display import Image
Image(width=300, filename='Gunning-Fog.png')

### SMOG Index - "Simple Measure of Gobbledygook"

The SMOG index is a measure of readability that estimates the years of education needed to understand a piece of writing.

Used commonly in health care sector

In [None]:
smog_index = textstat.smog_index(text)
print("SMOG Index:", smog_index)
# Other metrics can be calculated in a similar manner

# Sentiment

##### Using VADER
This will give you a sentiment score in terms of positive, negative, neutral, and compound values, where the compound score typically represents the overall sentiment.

VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool.

In [None]:
#%%time
#!pip install vaderSentiment --quiet

In [None]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()

The compound score is computed by summing the valence scores of each word in the lexicon, adjusted according to the rules, and then normalized to be between -1 (most extreme negative) and +1 (most extreme positive). This is the most useful metric if you want a single unidimensional measure of sentiment for a given sentence. Calling it a 'normalized, weighted composite score' is accurate.

It is also useful for researchers who would like to set standardized thresholds for classifying sentences as either positive, neutral, or negative. Typical threshold values (used in the literature cited on this page) are:

positive sentiment: compound score >= 0.05

neutral sentiment: (compound score > -0.05) and (compound score < 0.05)

negative sentiment: compound score <= -0.05

In [None]:
%%time
sentiment_dict = analyzer.polarity_scores(text)
print(sentiment_dict)

In [None]:
print("Overall sentiment dictionary is : ", sentiment_dict)
print("sentence was rated as ", sentiment_dict['neg']*100, "% Negative")
print("sentence was rated as ", sentiment_dict['neu']*100, "% Neutral")
print("sentence was rated as ", sentiment_dict['pos']*100, "% Positive")

# Text Similarity

https://spotintelligence.com/2022/12/19/text-similarity-python/

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

##### Calculate TF-IDF Vectors
TF-IDF is a numerical statistic that reflects how important a word is to a document in a collection of documents. It's a common algorithm used in text mining and information retrieval.

You'll need the TfidfVectorizer from sklearn.feature_extraction.text to convert the text data into TF-IDF vectors.

In [None]:
pd.set_option('display.max_colwidth', 150)

In [None]:
df

In [None]:
# Extract the Risk Factors sections
text_2023 = df[(df['Reporting Period'] == '20231231')&(df['ticker']=='TSLA')]['1A_Text'].iloc[0]
text_2022 = df[(df['Reporting Period'] == '20221231')&(df['ticker']=='TSLA')]['1A_Text'].iloc[0]

# Create a TfidfVectorizer object
vectorizer = TfidfVectorizer()

# Convert the text to TF-IDF vectors
tfidf_matrix = vectorizer.fit_transform([text_2023, text_2022])

##### Compute Cosine Similarity
After converting the text sections into TF-IDF vectors, you can use cosine similarity to determine how similar these vectors (and thus the texts) are to each other. Cosine similarity measures the cosine of the angle between two non-zero vectors in a multi-dimensional space, which in this context represents the similarity between two documents.

In [None]:
%%time
# Compute the cosine similarity
similarity = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:2])

print(f"Cosine similarity between the Risk Factors sections of 2023 and 2024: {similarity[0][0]}", '\n')

<br>The resulting cosine similarity value ranges from -1 to 1, where -1 indicates completely dissimilar documents, and 1 indicates identical documents. A value of 0 indicates that the two documents are orthogonal and have no similarity.

In [None]:
cosine_similarity(tfidf_matrix)[0]

# Gauge Plots - Summarize Similarity, Sentiment, and Readability

https://plotly.com/python/gauge-charts/

In [None]:
import plotly.graph_objects as go
import plotly.io as pio
#pio.renderers.default = "plotly_mimetype+notebook"
#pio.renderers.default = "plotly_mimetype+notebook_connected" #FOR LOCAL JUPYTER NOTEBOOK
pio.renderers.default = 'colab'

In [None]:
import plotly.graph_objs as go



from plotly.subplots import make_subplots

# traces with separate domains to form a subplot
trace1 = go.Indicator(mode="gauge+number",    value=similarity[0][0]*100,
                      domain={'row' : 1, 'column' : 1},  title={'text': "Similarity"},
                      gauge = {'axis': {'range': [None, 100]}}
                     )

#dfceo[dfceo['screen_name']==ceo]['positive_replies'].values[0]

trace2 = go.Indicator(mode="gauge+number", value=sentiment_dict['neg']*100,
                      domain={'row' : 1, 'column' : 2},    title={'text': "Sentiment (% Negative)"},
                      gauge = {'axis': {'range': [None, 100]}}
                     )

trace3 = go.Indicator(mode="gauge+number", value=flesch_reading_score,
                      domain={'row' : 1, 'column' : 3},   title={'text': "Readability"},
                      gauge = {'axis': {'range': [None, 100]}}
                     )

fig1 = make_subplots(
    rows=1,
    cols=3,
    specs=[[{'type' : 'indicator'}, {'type' : 'indicator'}, {'type' : 'indicator'}]],
    )

fig1.append_trace(trace1, row=1, col=1)
fig1.append_trace(trace2, row=1, col=2)
fig1.append_trace(trace3, row=1, col=3)

fig1.show()

# Summarize

In [None]:
#%%time
#!pip install sumy --quiet

In [None]:
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer

# Parse the text
parser = PlaintextParser.from_string(text, Tokenizer("english"))

#### Use the different summarizers
https://miso-belica.github.io/sumy/summarizators.html

##### Luhn
Heuristic method - the simplest real-world algorithm. It’s the first one known and it’s based on the assumption that the most important sentences are those with the most significant words. The significant words are those which are more often in the text but at the same time, they don’t belong among stop-words.

In [None]:
%%time
from sumy.summarizers.luhn import LuhnSummarizer

# Using Luhn Summarizer
summarizer = LuhnSummarizer()

# Summarize the text
summary = summarizer(parser.document, 5)  # Number of sentences in the summary

#for sentence in summary:
#   print(sentence, '\n')

summary

In [None]:
%%time
from sumy.summarizers.lex_rank import LexRankSummarizer
summarizer = LexRankSummarizer()
summary = summarizer(parser.document, 5)
#for sentence in summary:
#    print(sentence, '\n')
summary

# ChatGPT

### [PROMPT] **I am going to paste in one of the risk factors sections from an SEC 10-K filing. I'm looking for a quick rundown of the major risk factors from this SEC filing. Can you help with that?**

Certainly! Here's a quick rundown of the major risk factors from the SEC filing:

1. COVID-19 Pandemic Impact: The pandemic has significantly disrupted global supply chains, causing labor shortages, increased operational costs, and supplier delays. This includes challenges like port congestion and a semiconductor shortage, affecting production and supply chain efficiency.

2. Operational Challenges in New Factories: Scaling and managing new production facilities, such as Gigafactory Berlin-Brandenburg and Gigafactory Texas, pose risks due to regulatory, compliance, and production uncertainties.

3. Dependency on Lithium-Ion Battery Cells: The company relies heavily on lithium-ion battery cells for its products, with fluctuating costs and availability of raw materials like lithium and cobalt being a major concern.

4. Competitive Automotive Market: The company operates in a highly competitive electric vehicle market, where consumer demand and the pace of market adoption are critical to success.

5. Operational Risks: The company faces risks from potential union activities, product recalls, intellectual property disputes, and cybersecurity threats, which could impact business operations and financial stability.

These summarized points highlight the critical risks and challenges that could affect the company's business and financial outcomes as reported in the SEC filing.

### [PROMPT] ****If I paste in a second filing from a year earlier (2021), can you tell me what was added to the 2022 filing/dropped from the 2021 filing? Also, can you please produce that answer in a format that I can copy and paste into a Jupyter notebook cell and keep the formatting?**

1. **Less Emphasis on COVID-19 Operational Disruptions**: The 2021 filing had detailed descriptions of operational disruptions due to COVID-19, such as temporary suspensions and furloughs, which are less emphasized or omitted in the 2022 filing, indicating a possible stabilization post the initial pandemic shock.

2. **Specific References to Geographic COVID-19 Impact**: The 2021 filing contained more specific references to the impact of COVID-19 in certain regions, like the lockdowns in Shanghai and their direct impact on operations. This seems to be less emphasized or absent in the 2022 filing.

3. **Details on Temporary Measures During Pandemic**: The 2021 filing discussed specific temporary measures like employee furloughs and salary reductions due to pandemic-related shutdowns, which are not mentioned in the 2022 filing, possibly indicating a return to more normal operations.

4. **Certain Product-specific Challenges**: The 2021 filing had more detailed mentions of challenges faced with specific products, like Model X and Model 3, and their production ramp difficulties, which are less detailed in the 2022 filing.

5. **In-depth Discussion on Vehicle Safety and Recalls**: The 2021 filing had a more detailed discussion on vehicle safety, recalls, and related liabilities, which seems to have been condensed or reduced in detail in the 2022 filing.

6. **Less Emphasis on Specific Challenges in International Expansion**: While the 2022 filing continues to mention international expansion, there's less emphasis on specific challenges faced in different international jurisdictions compared to the 2021 filing.

7. **Certain Legal and Regulatory Risks**: Some specific legal and regulatory challenges or cases that were ongoing and mentioned in the 2021 filing might have been resolved or no longer deemed significant enough to include in the 2022 filing.

8. **Detailed Discussion on Certain Financial Risks**: The 2021 filing might have contained more detailed discussions on specific financial risks, such as lease management and vehicle financing programs, which are less prominent in the 2022 filing.

9. **Specific Manufacturing Challenges and Bottlenecks**: Detailed discussions on past manufacturing bottlenecks and challenges, particularly with older models, might have been omitted in the 2022 filing, possibly due to their resolution or reduced relevance.

10. **Detailed References to Past Product Adjustments and Pricing**: The 2021 filing included specific references to adjustments in vehicle pricing and product updates, which might be less detailed or omitted in the 2022 filing, perhaps due to a shift in focus or strategy.


# **<font color='blue'>If You're Interested in Learning More about How to Use ChatGPT to Analyze Financial Information, Join Us for this Webinar on March 15th!</font>** https://centre-in-digital-financial-information.webinargeek.com/chatgpt-python-api-a-deep-dive-into-large-language-model

And for a list of all our upcoming webinars, please visit https://schulich.yorku.ca/faculty-research/cpa-ontario-centre/industry-outreach/

- February 23: Hiring Trends and Technology https://centre-in-digital-financial-information.webinargeek.com/hiring-trends-and-technology
- March 1 [RESEARCH PRESENTATION]: Cross-National Comparison of Cryptoasset Income Taxation Regimes https://centre-in-digital-financial-information.webinargeek.com/cross-national-comparison-of-cryptoasset-income-taxation-regimes
- March 15: ChatGPT Python API: A Deep Dive into Large Language Model https://centre-in-digital-financial-information.webinargeek.com/chatgpt-python-api-a-deep-dive-into-large-language-model
- March 29 [RESEARCH PRESENTATION]: From Numbers to Stories: Accounting Numbers and Executive Compensation Narratives https://centre-in-digital-financial-information.webinargeek.com/from-numbers-to-stories-accounting-numbers-and-executive-compensation-narratives
- April 19: Innovations in Fraud Detection https://centre-in-digital-financial-information.webinargeek.com/accounting-analytics
- May 17: Introduction to Social Network Analysis https://centre-in-digital-financial-information.webinargeek.com/intro-to-social-network-analysis
- May 31 [RESEARCH PRESENTATION]: The Effects of Ego Network Structure on Market Reactions: A Social Network Analysis Perspective of Twitter Cashtag Networks https://centre-in-digital-financial-information.webinargeek.com/the-effects-of-ego-network-structure-on-market-reactions-a-social-network-analysis-perspective

All webinars are Fridays at noon Toronto time.

# Some Additional Readings and Coding Resources


https://www.sfmagazine.com/articles/2021/june/textual-analysis-for-accountants/

https://www.vscpa.com/article/text-analytics-and-accounting-profession

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4029950

https://www.cpajournal.com/2020/07/15/textual-analysis-for-risk-profiles-from-10-k-filings/

https://publications.aaahq.org/iae/article/37/1/141/8264/Put-Your-Best-Text-Forward-Introducing-Textual

https://github.com/jtkiley/2020_carma_python


<br>

For more Notebooks as well as additional Python and Big Data tutorials, please visit http://social-metrics.org or follow me on Twitter <a href='https://twitter.com/gregorysaxton'>@gregorysaxton</a>