### Parsing SEC filings with SEC Parsers

Declare our imports, and create a nice helper function.

In [1]:
from sec_parsers import Filing, download_sec_filing, set_headers

def print_first_n_lines(text, n):
    lines = text.split('\n')
    for line in lines[:n]:
        print(line)

Download Tesla's 10K 2023 Filing after setting headers.

In [2]:
set_headers("John Test",'johntest@example.com')
html = download_sec_filing('https://www.sec.gov/Archives/edgar/data/1318605/000162828024002390/tsla-20231231.htm')

Create a new Parser object and examine its html

In [3]:
filing = Filing(html)
filing.html

<Element html at 0x1af5f3f2780>

Parse the filing and examine the new filing.xml attribute.

In [4]:

filing.parse() # parses filing
filing.xml


<Element root at 0x1af5f0d4080>

Print the tree structure.

In [5]:
print_first_n_lines(filing.get_tree(),10)

root
document
|-introduction
|-part
|-|-item
|-|-|-company_designated_header
|-|-|-company_designated_header
|-|-|-|-company_designated_header
|-|-|-|-|-company_designated_header
|-|-|-|-|-|-company_designated_header


Titles are more descriptive than tags, so lets look at those by printing the title tree.

In [6]:
print_first_n_lines(filing.get_title_tree(),20)


Document
|-introduction
|-PART I
|-|-ITEM 1. BUSINESS
|-|-|-Overview
|-|-|-Segment Information
|-|-|-|-Our Products and Services
|-|-|-|-|-Automotive
|-|-|-|-|-|-Energy Generation and Storage
|-|-|-|-|-|-|-Energy Storage Products
|-|-|-Solar Energy Offerings
|-|-|-|-Technology
|-|-|-|-|-Automotive
|-|-|-|-|-|-Battery and Powertrain
|-|-|-|-|-|-Vehicle Control and Infotainment Software
|-|-|-|-|-Self-Driving Development and Artificial Intelligence
|-|-|-|-|-|-Energy Generation and Storage
|-|-|-|-|-|-|-Energy Storage Products
|-|-|-Solar Energy Systems


Find section from title. To return a list, use find_all_sections_from_title


In [7]:
item1a = filing.find_section_from_title('item 1a')

SEC Parsers parses titles of sections into the title attribute and removes it from the original text. You can access the text of a node using traditional xml methods, but it will be missing the titles of the node and its children.

To avoid that, use get_text_from_section. include_title = True tells the function to include the sections title. By default this is set to false. Subsection titles are always returned by get_text_from_section.

In [8]:
item1a_text = filing.get_text_from_section(item1a,include_title=True)
print_first_n_lines(item1a_text,10)

ITEM 1A. RISK FACTORS

You should carefully consider the risks described below together with the other information set forth in this report, which could materially affect our business, financial condition and future results. The risks described below are not the only risks facing our company. Risks and uncertainties not currently known to us or that we currently deem to be immaterial also may materially adversely affect our business, financial condition and operating results.

Risks Related to Our Ability to Grow Our Business

We may experience delays in launching and ramping the production of our products and features, or we may be unable to control our manufacturing costs.
We have previously experienced and may in the future experience launch and production ramp delays for new products and features. For example, we encountered unanticipated supplier issues that led to delays during the initial ramp of our first Model X and experienced challenges with a supplier and with ramping full 

Find sections by text

In [9]:
section = filing.find_all_sections_from_text("Moreover, significant increases in our production or product design changes by us have required and may in the future require us to procure additional components in a short amoun")[0]
section_text = filing.get_text_from_section(section,include_title=False)
print_first_n_lines(section_text,10)

Our products contain thousands of parts purchased globally from hundreds of suppliers, including single-source direct suppliers, which exposes us to multiple potential sources of component shortages. Unexpected changes in business conditions, materials pricing, including inflation of raw material costs, labor issues, wars, trade policies, natural disasters, health epidemics such as the global COVID-19 pandemic, trade and shipping disruptions, port congestions, cyberattacks and other factors beyond our or our suppliers’ control could also affect these suppliers’ ability to deliver components to us or to remain solvent and operational. For example, a global shortage of semiconductors beginning in early 2021 has caused challenges in the manufacturing industry and impacted our supply chain and production. Additionally, if our suppliers do not accurately forecast and effectively allocate production or if they are not willing to allocate sufficient production to us, or face other challenges 

Save data to xml or csv. Note that saving to csv will be larger due to duplication. Supported encodings are utf-8 and ascii.

In [10]:
filing.save_xml('tesla_10k.xml', encoding='utf-8')
filing.save_csv('tesla_10k.csv',encoding='ascii')

Visualize the parsing in webbrowser

In [11]:
filing.visualize()