## How to get access to XBRL data using Python
References
- US xbrl: https://xbrl.us/
- See ch.11.4 in https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3576098

All xbrl data for each 10-K filing are included in "XBRL INSTANCE DOCUMENT". For example, see the last file in https://www.sec.gov/Archives/edgar/data/1690820/000169082022000080/0001690820-22-000080-index.htm.  
This .ipynb demonstrates how to get several values from a few XBRL documents.

--------------

### import and set-ups

In [1]:
import requests
import re
from html import unescape
headers= {'user-agent': 'sample text'}

### example url set-ups

In [3]:
url1= "https://www.sec.gov/Archives/edgar/data/1690820/000169082022000080/cvna-20211231_htm.xml"
url2= "https://www.sec.gov/Archives/edgar/data/100517/000010051722000009/ual-20211231_htm.xml"
url3= "https://www.sec.gov/Archives/edgar/data/1334036/000133403622000011/crox-20211231_htm.xml"

url_dicts= {'Carvana': url1,
           'United Airlines': url2,
           'Crox': url3
           }

In [4]:
for key, value in url_dicts.items():
    print(key, value)

Carvana https://www.sec.gov/Archives/edgar/data/1690820/000169082022000080/cvna-20211231_htm.xml
United Airlines https://www.sec.gov/Archives/edgar/data/100517/000010051722000009/ual-20211231_htm.xml
Crox https://www.sec.gov/Archives/edgar/data/1334036/000133403622000011/crox-20211231_htm.xml


--------------

## Extract data from XBRL

#### Emerging growth companies

In [32]:
print("----------------------------------------")
print("Emerging growth companies obtained from XBRL")
print("----------------------------------------")

for key, value in url_dicts.items():
    response = requests.get(value, headers= headers)
    xbrl_10k=response.text
    pattern=re.compile(r"<dei:EntityEmergingGrowthCompany.*?>(.*?)<.*?EntityEmergingGrowthCompany>", re.DOTALL | re.IGNORECASE)
    egc= pattern.search(xbrl_10k).group(1)
    print(f"\t{key:<20}: {str(egc)}")   

----------------------------------------
Emerging growth companies obtained from XBRL
----------------------------------------
	Carvana             : false
	United Airlines     : false
	Crox                : false


#### Well-known seasoned issuers

In [14]:
print("----------------------------------------")
print("Well-known seasoned issuers obtained from XBRL")
print("----------------------------------------")

for key, value in url_dicts.items():
    response = requests.get(value, headers= headers)
    xbrl_10k=response.text
    pattern=re.compile(r"<dei:EntityWellKnownSeasonedIssuer.*?>(.*?)<.*?EntityWellKnownSeasonedIssuer>", re.DOTALL | re.IGNORECASE)
    wksi= pattern.search(xbrl_10k).group(1)
    print(f"\t{key:<20}: {str(wksi)}")

----------------------------------------
Well-known seasoned issuers obtained from XBRL
----------------------------------------
	Carvana             : Yes
	United Airlines     : Yes
	Crox                : Yes


#### Public floats

In [38]:
print("----------------------------------------")
print("Public floats (Billions) obtained from XBRL")
print("----------------------------------------")

for key, value in url_dicts.items():
    response = requests.get(value, headers= headers)
    xbrl_10k=response.text
    pattern=re.compile(r"<dei:EntityPublicFloat.*>(\d+)</dei:EntityPublicFloat>", re.DOTALL | re.IGNORECASE)
    pf= pattern.search(xbrl_10k)
    pf= int(pf.group(1))/10**9
    print(f"\t{key:<20}: {str(pf)}")   

----------------------------------------
Public floats (Billions) obtained from XBRL
----------------------------------------
	Carvana             : 24.6
	United Airlines     : 16.9
	Crox                : 4.6


#### Audit firm name and location

In [24]:
print("----------------------------------------")
print("Public floats (Billions) obtained from XBRL")
print("----------------------------------------")

for key, value in url_dicts.items():
    response = requests.get(value, headers= headers)
    xbrl_10k=response.text
    afn_pattern=re.compile(r"<dei:AuditorName.*?>(.*?)<.*?AuditorName>", re.DOTALL | re.IGNORECASE)
    afn= afn_pattern.search(xbrl_10k).group(1)
    
    afl_pattern=re.compile(r"<dei:AuditorLocation.*?>(.*?)<.*?AuditorLocation>", re.DOTALL | re.IGNORECASE)
    afl= afl_pattern.search(xbrl_10k).group(1)
    
    print(f"\t{key:<20}: {str(afn):<30} {str(afl)}")

----------------------------------------
Public floats (Billions) obtained from XBRL
----------------------------------------
	Carvana             : GRANT THORNTON LLP             Southfield, Michigan
	United Airlines     : Ernst &amp; Young LLP          Chicago, Illinois
	Crox                : Deloitte &amp; Touche LLP      Denver, Colorado


--------------