# Parsing SEC Filing XBRL Document


## Objective

Parse the filing XBRL file to create a DOM like structure that represent the filing data

## References

* [XBRL Specification - Extensible Business Reporting Language (XBRL) 2.1](https://www.xbrl.org/Specification/XBRL-2.1/REC-2003-12-31/XBRL-2.1-REC-2003-12-31+corrected-errata-2013-02-20.html)

* [XBRL US - List of Elements](https://xbrl.us/data-rule/dqc_0015-le/)

**Element Version**|**Element ID**|**Namespace**|**Element Label**|**Element Name**|**Balance Type**|**Definition**
:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:
1|1367|us-gaap|Interest Expense|InterestExpense|debit|Amount of the cost of borrowed funds accounted for as interest expense.
2|2692|us-gaap|Cash and Cash Equivalents, at Carrying Value|CashAndCashEquivalentsAtCarryingValue|debit|Amount of currency on hand as well as demand deposits with banks or financial institutions. Includes other kinds of accounts that have the general characteristics of demand deposits. Also includes short-term, highly liquid investments that are both readily convertible to known amounts of cash and so near their maturity that they present insignificant risk of changes in value because of changes in interest rates. Excludes cash and cash equivalents within disposal group and discontinued operation.

## XBRL Element

* [Understanding the Financial Report Logical System](https://www.youtube.com/playlist?list=PLqMZRUzQ64B7EWamzDP-WaYbS_W0RL9nt)

### Example
For instance, [Qorvo 2020 10K](https://www.sec.gov/Archives/edgar/data/1604778/000160477821000032/rfmd-20210403_htm.xml):

```
<us-gaap:cashandcashequivalentsatcarryingvalue contextref="*" decimals="-3" id="..." unitref="usd">
  1397880000
</us-gaap:cashandcashequivalentsatcarryingvalue>,
<us-gaap:cashandcashequivalentsatcarryingvalue contextref="***" decimals="-3" id="..." unitref="usd">
  714939000
</us-gaap:cashandcashequivalentsatcarryingvalue>,
<us-gaap:cashandcashequivalentsatcarryingvalue contextref="***" decimals="-3" id="..." unitref="usd">
 711035000
</us-gaap:cashandcashequivalentsatcarryingvalue>
```

Corresponds to the Cash and Cash equivalents in the Cash Flow statement.

<img src="../image/edgar_qorvo_2020_10K_CF.png" align="left" width=800 />

---
# Setup

In [69]:
import re
import requests
import unicodedata
from bs4 import BeautifulSoup
from IPython.core.display import (
    display, 
    HTML
)

In [70]:
%%html
<style>
table {float:left}
</style>

In [71]:
def restore_windows_1252_characters(restore_string):
    """
        Replace C1 control characters in the Unicode string s by the
        characters at the corresponding code points in Windows-1252,
        where possible.
    """

    def to_windows_1252(match):
        try:
            return bytes([ord(match.group(0))]).decode('windows-1252')
        except UnicodeDecodeError:
            # No character at the corresponding code point: remove it.
            return ''
        
    return re.sub(r'[\u0080-\u0099]', to_windows_1252, restore_string)

---
# Load EDGAR Filing XBRL

Download the ```_htm.xml``` file from EDGAR. SEC now requires user-agent header.

In [72]:
# define the url to specific html_text file
CIK = '1604778'
ACCESSION = '000160477821000032'

FILING_DIR_URL = f"https://www.sec.gov/Archives/edgar/data/{CIK}/{ACCESSION}"
XBRL_NAME = "rfmd-20210403_htm.xml"
XBRL_URL = "/".join([FILING_DIR_URL, XBRL_NAME])

XBRL_URL

'https://www.sec.gov/Archives/edgar/data/1604778/000160477821000032/rfmd-20210403_htm.xml'

In [73]:
headers = {"User-Agent": "Company Name myname@company.com"}
response = requests.get(XBRL_URL, headers=headers)

if response.status_code == 200:
    content = response.content.decode("utf-8") 
else:
    print(f"{XBRL_URL} failed with status {response.status_code}")

In [43]:
soup = BeautifulSoup(content, 'html.parser')

## Cash & Cash Equivalents

Look for the cash and cash equivalents in 10-K in the Balance Sheet and Cash Flow statements.

In [68]:
cash_equivalents = soup.findAll(name=re.compile('us-gaap:cashandcashequivalent.*'), decimals=True, unitref=True)
cash_equivalents # decimals="-3" means the displayed value is divied by 1000.

[<us-gaap:cashandcashequivalentsatcarryingvalue contextref="i531402faf1d04969ac2b2ba0e1680766_I20210403" decimals="-3" id="id3VybDovL2RvY3MudjEvZG9jOjEyODM3MDU1ODhkNzQyNjY4MWYxNTY3ZWE2MTZhZjBhL3NlYzoxMjgzNzA1NTg4ZDc0MjY2ODFmMTU2N2VhNjE2YWYwYV83MC9mcmFnOmI2YjIwMjNiYjk2NzRmMjZiZTkzNzVhYjYxNzk3YmMwL3RhYmxlOjYxNmIyYzU1MGNkNTQwNmRiYjk5Zjk5YzIyYjNiMmUwL3RhYmxlcmFuZ2U6NjE2YjJjNTUwY2Q1NDA2ZGJiOTlmOTljMjJiM2IyZTBfMy0xLTEtMS0w_b08df05f-df5e-45b4-ba6f-72638eca470f" unitref="usd">1397880000</us-gaap:cashandcashequivalentsatcarryingvalue>,
 <us-gaap:cashandcashequivalentsatcarryingvalue contextref="i61ca1a85fc064eb1a4858bc0478ba964_I20200328" decimals="-3" id="id3VybDovL2RvY3MudjEvZG9jOjEyODM3MDU1ODhkNzQyNjY4MWYxNTY3ZWE2MTZhZjBhL3NlYzoxMjgzNzA1NTg4ZDc0MjY2ODFmMTU2N2VhNjE2YWYwYV83MC9mcmFnOmI2YjIwMjNiYjk2NzRmMjZiZTkzNzVhYjYxNzk3YmMwL3RhYmxlOjYxNmIyYzU1MGNkNTQwNmRiYjk5Zjk5YzIyYjNiMmUwL3RhYmxlcmFuZ2U6NjE2YjJjNTUwY2Q1NDA2ZGJiOTlmOTljMjJiM2IyZTBfMy0zLTEtMS0w_087d6bff-c391-45cf-b2dc-d365b3aca5d7" unitref=

In [77]:
cash_equivalents[0].attrs

{'contextref': 'i531402faf1d04969ac2b2ba0e1680766_I20210403',
 'decimals': '-3',
 'id': 'id3VybDovL2RvY3MudjEvZG9jOjEyODM3MDU1ODhkNzQyNjY4MWYxNTY3ZWE2MTZhZjBhL3NlYzoxMjgzNzA1NTg4ZDc0MjY2ODFmMTU2N2VhNjE2YWYwYV83MC9mcmFnOmI2YjIwMjNiYjk2NzRmMjZiZTkzNzVhYjYxNzk3YmMwL3RhYmxlOjYxNmIyYzU1MGNkNTQwNmRiYjk5Zjk5YzIyYjNiMmUwL3RhYmxlcmFuZ2U6NjE2YjJjNTUwY2Q1NDA2ZGJiOTlmOTljMjJiM2IyZTBfMy0xLTEtMS0w_b08df05f-df5e-45b4-ba6f-72638eca470f',
 'unitref': 'usd'}

In [85]:
for element in cash_equivalents:
    print(f"{element.name} {element.text:15} {element['unitref']:5} {element['decimals']:5}")

us-gaap:cashandcashequivalentsatcarryingvalue 1397880000      usd   -3   
us-gaap:cashandcashequivalentsatcarryingvalue 714939000       usd   -3   
us-gaap:cashandcashequivalentsatcarryingvalue 1397880000      usd   -3   
us-gaap:cashandcashequivalentsatcarryingvalue 714939000       usd   -3   
us-gaap:cashandcashequivalentsatcarryingvalue 711035000       usd   -3   


---

# Loading Edgar XBRL in HTML

Download the ```.htm``` file from EDGAR.

In [87]:
XBRL_HTML_NAME = "rfmd-20210403.htm"
XBRL_HTML_URL = "/".join([FILING_DIR_URL, XBRL_HTML_NAME])

XBRL_HTML_URL

'https://www.sec.gov/Archives/edgar/data/1604778/000160477821000032/rfmd-20210403.htm'

In [88]:
headers = {"User-Agent": "Company Name myname@company.com"}
response = requests.get(XBRL_HTML_URL, headers=headers)

if response.status_code == 200:
    content = response.content.decode("utf-8") 
else:
    print(f"{XBRL_HTML_URL} failed with status {response.status_code}")

In [89]:
soup = BeautifulSoup(content, 'html.parser')

In [104]:
cash_equivalents = soup.findAll(attrs={
    "name" : "us-gaap:CashAndCashEquivalentsAtCarryingValue", 
    "decimals": True, 
    "unitref": True
})
cash_equivalents  # Scale="3" means the value is per 1000.

[<ix:nonfraction contextref="i531402faf1d04969ac2b2ba0e1680766_I20210403" decimals="-3" format="ixt:numdotdecimal" id="id3VybDovL2RvY3MudjEvZG9jOjEyODM3MDU1ODhkNzQyNjY4MWYxNTY3ZWE2MTZhZjBhL3NlYzoxMjgzNzA1NTg4ZDc0MjY2ODFmMTU2N2VhNjE2YWYwYV83MC9mcmFnOmI2YjIwMjNiYjk2NzRmMjZiZTkzNzVhYjYxNzk3YmMwL3RhYmxlOjYxNmIyYzU1MGNkNTQwNmRiYjk5Zjk5YzIyYjNiMmUwL3RhYmxlcmFuZ2U6NjE2YjJjNTUwY2Q1NDA2ZGJiOTlmOTljMjJiM2IyZTBfMy0xLTEtMS0w_b08df05f-df5e-45b4-ba6f-72638eca470f" name="us-gaap:CashAndCashEquivalentsAtCarryingValue" scale="3" unitref="usd">1,397,880</ix:nonfraction>,
 <ix:nonfraction contextref="i61ca1a85fc064eb1a4858bc0478ba964_I20200328" decimals="-3" format="ixt:numdotdecimal" id="id3VybDovL2RvY3MudjEvZG9jOjEyODM3MDU1ODhkNzQyNjY4MWYxNTY3ZWE2MTZhZjBhL3NlYzoxMjgzNzA1NTg4ZDc0MjY2ODFmMTU2N2VhNjE2YWYwYV83MC9mcmFnOmI2YjIwMjNiYjk2NzRmMjZiZTkzNzVhYjYxNzk3YmMwL3RhYmxlOjYxNmIyYzU1MGNkNTQwNmRiYjk5Zjk5YzIyYjNiMmUwL3RhYmxlcmFuZ2U6NjE2YjJjNTUwY2Q1NDA2ZGJiOTlmOTljMjJiM2IyZTBfMy0zLTEtMS0w_087d6bff-c391-45cf-b2dc