Analyst-grade SEC EDGAR financials. A Python library, CLI, and MCP server that pulls XBRL filings directly from the SEC and layers an analyst-normalized metric engine on top.
Unlike raw filing tools, edgar-link aims to return usable financial
outputs such as normalized revenue, EBIT, EBITDA, FCF, leverage, margins,
returns, growth, LTM rollups, and structural balance-sheet / cash-flow
buildups. It also includes 5-year peer beta / R^2 utilities. No API key, no
subscription; the live data source is the SEC.
SEC identity required. SEC fair-access policy requires every requester to identify themselves. Before any live call, set:
export EDGAR_IDENTITY="Your Name your@email.com" # macOS/Linux $env:EDGAR_IDENTITY = "Your Name your@email.com" # Windows PowerShellWithout it the SEC will throttle requests. Do not hardcode, borrow, or commit someone else's identity.
SEC_EDGAR_USER_AGENTis also accepted as an alias for compatibility with other SEC tooling.
- Looks up public companies by name or ticker and resolves them to SEC CIKs
- Pulls Company Facts and Company Concept data from the SEC XBRL APIs
- Normalizes issuer-specific XBRL tags into reusable statement data
- Computes analyst-normalized derived metrics on top of that normalized data
- Builds structural balance-sheet and cash-flow statements from a frozen slot taxonomy
- Exposes the engine through a local CLI and an MCP server
This project primarily uses the SEC XBRL APIs:
- Company Facts API
https://data.sec.gov/api/xbrl/companyfacts/CIK##########.json- Full XBRL fact history for a filer in one response
- Company Concept API
https://data.sec.gov/api/xbrl/companyconcept/CIK##########/taxonomy/tag.json- Historical values for one concept such as
AssetsorRevenues
The engine uses:
- request identification via
EDGAR_IDENTITY - local caching to reduce repeated pulls
- rate-limited access intended to stay within SEC fair-access expectations
More information: https://www.sec.gov/developer
- Python 3.11+
pip
The pinned dependency set (
pandas,numpy) requires Python 3.11+, so 3.11 is the hard floor. The hash-verified clean-room install and CI path are validated on Windows / CPython 3.11 (requirements.lockis Windows/cp311-specific); a Linux lock is deferred until a fresh dependency resolve is safe under the active supply-chain incident policy.
No PyPI release yet.
# library + metric engine
pip install "git+https://github.com/jibarix/edgar-link.git#egg=edgar-link"
# with MCP support
pip install "edgar-link[mcp] @ git+https://github.com/jibarix/edgar-link.git"git clone https://github.com/jibarix/edgar-link.git
cd edgar-link
python -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # macOS/Linux
pip install -e .Dependencies are declared in pyproject.toml and pinned to
reviewed versions. requirements.lock is the corresponding hash-pinned lockfile.
Three ways to drive the engine. They share the same Python core; pick the one that fits the workflow.
| Talk to Claude (MCP) | CLI | Library | |
|---|---|---|---|
| Best for | Ad-hoc questions, exploration, no coding | One-shot pulls, scripted or scheduled exports | Building analyses, pipelines, custom code |
| Invocation | Natural-language prompt to an MCP client | python main.py ... |
from edgar.metrics import ... |
| Setup | Install [mcp] extra, register the server once |
Install the runtime | Install the runtime |
| Output | Conversational response in the client | Console table or Excel / CSV / JSON / HTML file | Python objects (dicts, DataFrames) |
| Lookup by | Name, ticker, or CIK | Name, ticker, or --cik |
CIK (resolve names via edgar.company_lookup) |
EDGAR_IDENTITY |
In the MCP client's env config | In your shell env | In your shell env |
The conversational path. The same engine is exposed as an MCP stdio server, so an MCP-aware client (Claude Code, Claude Desktop) can call it directly in natural language:
- "Show Apple's last 3 fiscal years of revenue, EBIT, and FCF."
- "Compute ROIC for ticker MSFT, last 5 quarters."
- "Find software filers (SIC 7372) with U.S. revenue dominance."
Install and register the server once — see MCP server reference below — then just ask. No code from the caller.
Interactive mode:
python main.pyCommand-line mode:
python main.py --company "Apple Inc" --statement-type BS --period-type annual --num-periods 3 --output-format excelSupported statement types:
BS- Balance SheetIS- Income StatementCF- Cash Flow StatementEQ- Equity StatementCI- Comprehensive IncomeALL- All supported statements
For building analyses on top of the engine in your own code:
from edgar.filing_retrieval import FilingRetrieval
from edgar.xbrl_parser import XBRLParser
from edgar.metrics import REGISTRY, NormalizedStatement, compute, list_slugs
filings = FilingRetrieval()
parser = XBRLParser()
facts = filings.get_company_facts("0000320193") # Apple Inc.
normalized = parser.parse_company_facts(
facts,
statement_type="IS",
period_type="annual",
num_periods=3,
)
ns = NormalizedStatement(normalized)
ebit_series = compute("ebit", ns) # {period: value}
margin_slugs = list_slugs(category="margins") # discover what's registered
spec = REGISTRY["roic"] # MetricSpec (fn, unit, ...)EDGAR_IDENTITY must be set in the environment for any live retrieval.
See Architecture for module boundaries and
MCP server reference for the equivalent tool
surface.
Install, registration, configuration, and the tool surface for the
stdio server invoked via python -m edgar_mcp.
pip install -e ".[mcp]"claude mcp add edgar -e EDGAR_IDENTITY="Your Name your@email.com" -- python -m edgar_mcp{
"mcpServers": {
"edgar": {
"command": "python",
"args": ["-m", "edgar_mcp"],
"env": {
"EDGAR_IDENTITY": "Your Name your@email.com"
}
}
}
}| Tool | Purpose |
|---|---|
lookup_company(query) |
Resolve a name or ticker to SEC CIK candidates |
get_financial_statement(cik_or_ticker, statement_type, period_type, num_periods) |
Return normalized BS / IS / CF / EQ / CI / ALL data by period |
get_concept(cik_or_ticker, concept, taxonomy) |
Return a full historical time series for one XBRL concept |
search_companies(sic, industry, country_inc, revenue_country, name_substring, limit) |
Filter the local company classification index |
list_metrics(category) |
Enumerate registered derived metrics |
compute_metric(slug, cik_or_ticker, period_type, num_periods) |
Compute one derived metric series with the required internal lookback |
search_companies reads data/company_index.json; build it first with:
python -m edgar.company_classifier --buildThe MCP server gives Claude access to the engine; the bundled
edgar-link-financials skill gives Claude
the workflow knowledge to use it well. Together they mean a question like
"show MSFT's ROIC over 5 quarters" resolves the ticker, computes the metric
with the right internal lookback, and reports the result with units — without
the user having to know tool names, that EBIT is analyst-normalized, or that
search_companies needs a prebuilt index.
The skill is a portable folder (SKILL.md + references/) that works in
Claude.ai, Claude Code, and the API. To use it in Claude Code, copy the folder
into your skills directory, or upload it in Claude.ai via Settings →
Capabilities → Skills. It triggers on financial / fundamentals / filings
questions and defers detailed metric and error guidance to its references/
files (progressive disclosure), so it adds little to context until needed.
EDGAR_IDENTITY must still be set in the MCP server's environment — the skill
documents that precondition but cannot supply the identity itself.
edgar/filing_retrieval.py- SEC submissions, Company Facts, Company Concept, filing instance retrieval
edgar/company_lookup.py- ticker / company-name lookup and CIK resolution
edgar/xbrl_parser.py- converts Company Facts into categorized, periodized statement data
edgar/tag_classifier.py- maps raw XBRL concepts into statement categories
The metric engine is registry-based. Metric functions register themselves in
edgar/metrics/registry.py, and the public
surface is imported through edgar/metrics/__init__.py.
Main metric modules:
| Module | Examples |
|---|---|
derived_lines.py |
revenue, gross_profit, ebit, ebitda, fcf, total_debt |
margins.py |
ebit_margin, ebitda_margin, ni_margin, fcf_margin |
ratios.py |
debt_to_capital, debt_to_equity, current_ratio, quick_ratio |
returns.py |
roa, roe, roic, asset_turnover |
working_capital.py |
dso, dio, dpo, cash_conversion_cycle |
growth.py |
<base>_growth, <base>_cagr_{3,5,7}y |
ltm.py |
trailing-twelve-months rollups |
The repo also includes a second normalization path for balance-sheet and cash-flow structure:
edgar/metrics/_statement_taxonomy.py- frozen closed-set slot taxonomy
edgar/metrics/_bs_prefilter.py- deterministic balance-sheet tag prefilter with polarity guardrails
edgar/metrics/_cf_prefilter.py- deterministic cash-flow tag prefilter with polarity guardrails
edgar/metrics/statement_buildup.py- derives structural BS / CF buildups from raw Company Facts
Important design rule:
- reported subtotal tags are kept for provenance and drift checking
- they are not treated as raw input lines to be summed into the buildup
edgar/metrics/beta.py computes 5-year monthly
beta and R^2 versus the S&P 500 from Yahoo monthly bars.
What it does today:
- peer beta / R^2 regression
- one row per ticker
- aligned monthly return window
What it does not currently implement:
- bottom-up beta chain
- unlever -> cash-correct -> total-beta -> relever workflow
derived_lines.ebit() is intentionally not just raw
us-gaap:OperatingIncomeLoss.
Current normalization:
EBIT = OperatingIncomeLoss + goodwill_impairment + asset_impairment
This is meant to move the output closer to institutional analyst convention for
names where unusual impairments sit inside reported operating income. A
pretax-plus-interest fallback is also used for some hybrid-finance issuers that
do not tag OperatingIncomeLoss cleanly.
The current scripts/ directory is focused on validation, maintenance,
and workbook/reporting workflows:
| Script | Purpose |
|---|---|
scripts/smoke_test_metrics.py |
Live AAPL smoke test for the parser + metric registry. Prints a compact multi-period table of hand-checked metrics. Requires EDGAR_IDENTITY or SEC_EDGAR_USER_AGENT. |
scripts/gen_lockfile.py |
Regenerates requirements.lock from pip --dry-run --report ... output using exact versions and sha256 hashes. |
scripts/update_sec_tag_mapping.py |
Maintenance tool for data/sec_tag_mapping.json. Forward-only integrity check via a sha256 manifest, plus an additive merge of new us-gaap tags from a fresh SEC Financial Statement Data Set quarter. |
scripts/update_company_index.py |
Maintenance tool for data/company_index.json. Forward-only integrity check via a sha256 manifest, plus a snapshot rebuild of the company classification index from one or more fresh SEC Financial Statement Data Set quarters. |
scripts/build_comps.py |
Build a styled multi-peer comparables workbook from data/company_index.json (no anchor company required). Filters by SIC plus optional name / subindustry / country, pulls Company Facts per peer, and writes one Excel with a Universe sheet (classification fields), a Metrics matrix (peers × metric × relative fiscal period), CapIQ-mirror Screening_24col / Screening_36col snapshot sheets (LTM, point-in-time at --as-of), per-peer drilldown sheets (BS / IS / CF stacked), and an About methodology sheet. Styled header band, freeze panes, auto-filter, accounting number formats. Optional --extensions merges captive-finance extension XBRL; 5Y monthly β + R² vs ^GSPC is on by default (fail-soft on Yahoo errors / <24 months of history) and disabled with --no-beta. Requires EDGAR_IDENTITY or SEC_EDGAR_USER_AGENT for live runs; --dry-run previews the peer set offline; --no-capiq-layout skips the Screening / submissions / quarterly path for a faster build. |
Lockfile regeneration flow:
pip install --dry-run --report report.json --ignore-installed -e .
python scripts/gen_lockfile.py report.json requirements.lockTag-mapping update flow (after a new FSDS quarter is published):
python scripts/update_sec_tag_mapping.py init # one-time, baselines the current file
python scripts/update_sec_tag_mapping.py check # verify nobody hand-edited the mapping
python scripts/update_sec_tag_mapping.py update 2026q1 # dry-run: diff + review report
python scripts/update_sec_tag_mapping.py update 2026q1 --apply # merge new tags; needs_review held backCompany-index rebuild flow (after one or more new FSDS quarters are published):
python scripts/update_company_index.py init # one-time, baselines the current file
python scripts/update_company_index.py check # verify nobody hand-edited the index
python scripts/update_company_index.py rebuild 2025q4 2026q1 # dry-run: snapshot diff
python scripts/update_company_index.py rebuild 2025q4 2026q1 --apply # replace the index, rotate the manifestComparables workbook flow (no anchor company; universe is selected by
filtering data/company_index.json):
# Preview the peer set without any SEC calls (offline)
python scripts/build_comps.py --sic 5500 \
--exclude-name casey murphy copart openlane camping lazydays \
--dry-run
# Live build (writes output/comps_<label>_<period>_<YYYYMMDD>.xlsx).
# Requires EDGAR_IDENTITY or SEC_EDGAR_USER_AGENT in the environment.
python scripts/build_comps.py --sic 5500 \
--exclude-name casey murphy copart openlane camping lazydays \
--num-periods 5Each row in the resulting Universe sheet carries the classification
fields already present in company_index.json (SIC, industry,
subindustry, country of incorporation, dominant revenue country, and
the per-country revenue mix where the issuer tags it). The Metrics
sheet is a peers × (metric, relative fiscal period) matrix; peers with
different fiscal-year ends line up because columns are FY 0 / FY -1 /
FY -2 ... rather than absolute dates. The CapIQ-mirror Screening_24col
and Screening_36col sheets give a single point-in-time snapshot at
--as-of (LTM = trailing 4 quarters whose period-end is ≤ as-of for
non-Dec filers, else the FY-aligned annual); 36-col adds 6 trailing
LTM revenue columns. Forward analyst-estimate columns (CapIQ cols
26–29) are blank by design — EDGAR has no equivalent and they are
never fabricated. Per-peer drilldown sheets stack the normalized BS /
IS / CF line items with the peer's own period dates as columns. The
About sheet records the filter set, the as-of date, the period basis,
and which columns are blank by design.
edgar-link/
|-- main.py
|-- pyproject.toml
|-- requirements.lock
|-- README.md
|-- LICENSE
|-- scripts/
| |-- smoke_test_metrics.py
| |-- gen_lockfile.py
| |-- update_sec_tag_mapping.py
| |-- update_company_index.py
| `-- build_comps.py
|-- edgar_mcp/
| |-- __main__.py
| `-- server.py
|-- edgar/
| |-- company_lookup.py
| |-- filing_retrieval.py
| |-- xbrl_parser.py
| |-- tag_classifier.py
| |-- statement_extractor.py
| |-- data_formatter.py
| |-- company_classifier.py
| |-- _extension_mappings.py
| |-- market_data/
| `-- metrics/
| |-- registry.py
| |-- _concepts.py
| |-- _statement_taxonomy.py
| |-- _bs_prefilter.py
| |-- _cf_prefilter.py
| |-- _bs_slot_map.py
| |-- _cf_slot_map.py
| |-- statement_buildup.py
| |-- derived_lines.py
| |-- margins.py
| |-- ratios.py
| |-- returns.py
| |-- working_capital.py
| |-- growth.py
| |-- ltm.py
| `-- beta.py
|-- config/
|-- utils/
`-- data/
- Data quality depends on the issuer's XBRL filings with the SEC
- Some companies use different taxonomies or inconsistent tags for similar concepts
- Some analyst-normalized adjustments cannot be reproduced from raw XBRL alone
- Live SEC usage is subject to throttling and availability limits
- The company classifier index must be built locally before
search_companiesis useful - Beta utilities currently cover peer beta / R^2 only, not the full bottom-up beta chain
-
Company not found
- Try the ticker instead of the full company name
- Verify the company is an SEC filer
-
No data available
- Older periods may not be available in XBRL
- Try a different statement type or period type
-
MCP
search_companiesreturns an error- Build the local index first:
python -m edgar.company_classifier --build
- Build the local index first:
-
Smoke test fails or throttles
- Confirm
EDGAR_IDENTITYis set scripts/smoke_test_metrics.pyperforms live SEC calls
- Confirm
-
Missing derived metrics
- Not all filers expose every concept needed for every metric
- Use
list_metrics()to inspect the public metric catalog, then test one metric at a time withcompute_metric()
- Create a feature branch
- Keep metric semantics explicit and avoid casual dependency changes
- Validate live SEC-dependent changes when possible
- Regenerate
requirements.lockif dependencies change - Open a pull request
Distributed under the MIT License. See LICENSE.