-
Notifications
You must be signed in to change notification settings - Fork 0
Data Layer
mosaic/dataflows/ provides market + macro data to the agents, plus the qlib historical data base and ingest toolchain.
-
Tushare (
tushare.py) — primary A-share equity + ETF data (pro.daily,pro.fund_daily,pro.index_daily, financials) plus research reports (pro.research_report):get_broker_reports(行业研报, industry-level) andget_stock_reports(个股研报, stock-level). The LangChain@toolwrappersget_broker_research/get_stock_researchlive inmosaic/agents/utils/research_report_tools.pyand are attached to the sector + superinvestor agents (see Agents). -
akshare, yfinance, FRED (
macro_data.py,fred.py), Xueqiu heat, etc. — macro/global/sentiment tools. Includesget_property_data(aksharemacro_china_real_estate— the monthly 国房景气指数 / national real-estate climate index, point-in-time clamped bycurr_date),get_stock_moneyflow/get_industry_moneyflow(A-share capital flow by 同花顺), used by thechinaand sector agents. The macro layer is 18 tools total (30 across all 5 tool modules — see Bridge RPC for the full module split). -
Tool modules — all LangChain
@tool-decorated functions undermosaic/agents/utils/are registered astools.list/tools.callRPCs. Five modules:macro_tools(18),etf_tools(4: info/NAV/holdings/universe),financial_tools(4: fundamentals/balance-sheet/income/cashflow),research_report_tools(2: broker/stock),technical_tools(2: price/indicators). Each agent uses a scoped subset — see Agents for per-layer assignments. - Tool selection is config-driven (
data_vendors/tool_vendorsinMosaicConfig).
Reads OHLCV directly from qlib's binary feature files without importing qlib. Restores split-adjusted values to market scale (original = adjusted / factor). Provides get_stock, get_indicator, etc. matching the Tushare vendor signatures.
-
Stocks →
cn_datadataset (~/.qlib/qlib_data/cn_data),QLIB_CN_DATA_PATHoverride. -
ETFs →
cn_etfdataset (~/.qlib/qlib_data/cn_etf),QLIB_CN_ETF_PATHoverride. - An instrument is an ETF iff
sh5xxxxx/sz1xxxxx(disjoint from stock prefixes sh6/sz0/sz3). Same routing is mirrored in the scorecard scorer (_is_a_share_etf) so ETF recommendations get forward-return scoring viapro.fund_daily.
A thin orchestrator over the vendored collectors. Public API:
-
ingest_full(start, end, kind=...)— pipeline: download → normalize → dump_to_bin. -
ingest_incremental(end, kind=...)— append latest days (update_data_to_bin). -
sync_calendar(end, ...)— refreshcalendars/day.txtonly. -
validate_after_ingest(...)— per-ticker gap report + skip manifest (data/qlib_skipped.txt).
kind="stock" drives cn_data, kind="etf" drives cn_etf. Exposed to the front-end via the data.* RPCs and the pnpm dev data incremental|validate CLI.
The collectors' working dirs default to ~/.cache/mosaic_tushare_{raw,norm} — never the project tree. Because the collectors are now vendored inside the repo, ingest_incremental / sync_calendar pass explicit --source_dir/--normalize_dir (and .gitignore ignores any stray source//normalize//tmp/ under collectors/) so raw/normalized CSVs and __inc_tmp__ never pollute the repo.
So that ingest is self-contained (no external qlib checkout required at run time):
-
data_collector/tushare/collector.py+data_collector/tushare_etf/collector.py— the stock + ETF collectors. -
dump_bin.py,data_collector/base.py,data_collector/utils.py— copied verbatim from microsoft/qlib (MIT), which the collectors build on. - Run time still imports
qlib.utilsfrompyqlib(thebacktestextra). Subprocess deps are theingestextra (fire/loguru/joblib/yahooquery/beautifulsoup4).
find_qlib_collector(kind) prefers the vendored copy; a valid MOSAIC_QLIB_REPO (stock) / MOSAIC_QLIB_ETF_COLLECTOR (etf) env override wins, with graceful fallback to the vendored copy if an env override is set-but-invalid.
MOSAIC is Apache-2.0; the three vendored qlib files remain MIT under Microsoft's copyright. See mosaic/dataflows/collectors/NOTICE.md + LICENSE.qlib. MIT is Apache-2.0-compatible.