# 01 — Load the Data

Ingest raw CSV data from two sources into parquet (idempotent — skips if already built) and load as DataFrames.

| Source | Library | Raw dir | DB dir |
|--------|---------|---------|--------|
| Check Expert (체크전문가) | `utils.chkxp_ingest` | `data/raw/chkxp/` | `data/db/chkxp/` |
| FnGuide DataGuide 6 | `fn_dg6_ingest` | `data/raw/fnguide/` | `data/db/fnguide/` |

In [20]:
from pathlib import Path

import fn_dg6_ingest
from utils.chkxp_ingest import open as chkxp_open

## Raw CSV paths

All Check Expert files under `data/raw/chkxp/` (excluding `stock_lob/`).

In [21]:
RAW_DIR = Path("../data/raw/chkxp")
DB_DIR  = Path("../data/db/chkxp")

DATASETS = {
    "etf_1m":  RAW_DIR / "chkxp_etf(kodex200)_(1m)_ohlcvNAV.csv",
    "etf_10s": RAW_DIR / "chkxp_etf(kodex200)_(10s)_ohlcvNAVlob.csv",
    "kp200":   RAW_DIR / "kp200_(fut)(mini)(v)_(1m)_from(20250101)_to(20260207).csv",
    "ktb":     RAW_DIR / "ktb_(3)(10)_(fut)(spread)(2nd)_(1m)_from(20200101)_to(20260207).csv",
}

## Ingest (idempotent)

`chkxp_open()` checks if the parquet output already exists. If so, it skips the build and returns a `Dataset` handle instantly.

In [22]:
ds = {}
for name, csv_path in DATASETS.items():
    output_dir = DB_DIR / csv_path.stem
    ds[name] = chkxp_open(str(csv_path), output_dir=str(output_dir))
    info = ds[name].describe()
    print(f"{name:10s}  {info.format_name:15s}  {info.frequency:4s}  {info.shape[0]:>10,} rows  entities={info.entities}")

etf_1m      single_entity    1M        23,800 rows  entities=['KODEX 200']
etf_10s     single_entity    10S       49,200 rows  entities=['KODEX 200']
kp200       multi_entity     1M       786,800 rows  entities=['KOSPI200 선물 2603', 'K200 스프레드 6366', 'KOSPI200 선물 2606', 'MINI KOSPI200 선물 2602', 'MINI K200 스프레드 6263', 'V-KOSPI200 선물 2602', 'V-KOSPI200 스프레드 6263']
ktb         multi_entity     1M     1,513,800 rows  entities=['(N)KTB3 선물 2603', '(N)KTB3 스프레드 6366', '(N)KTB3 선물 2606', '(N)KTB10 선물 2603', '(N)KTB10 스프레드 6366', '(N)KTB10 선물 2606']


## Load full datasets

In [23]:
df_etf_1m  = ds["etf_1m"].load()
df_etf_10s = ds["etf_10s"].load()
df_kp200   = ds["kp200"].load()
df_ktb     = ds["ktb"].load()

## Quick preview

In [24]:
df_etf_1m.head()

Unnamed: 0,datetime,entity,entity_code,Intra시가,Intra고가,Intra저가,Intra종가,Intra매도거래량,Intra매수거래량,IntraETP기초지수,Intra장중지표가치(iNAV/iIV)시가,Intra장중지표가치(iNAV/iIV)고가,Intra장중지표가치(iNAV/iIV)저가,Intra장중지표가치(iNAV/iIV)종가,IntraETP괴리율,Intra추적오차율
0,2025-11-11 09:01:00,KODEX 200,069500*001,58450.0,58720.0,58450.0,58705.0,40040.0,145573.0,586.33,58501.73,58776.2,58487.99,58773.38,-0.11,0.01
1,2025-11-11 09:02:00,KODEX 200,069500*001,58720.0,58810.0,58625.0,58660.0,96993.0,106907.0,586.35,58773.38,58838.48,58719.31,58779.26,-0.2,0.02
2,2025-11-11 09:03:00,KODEX 200,069500*001,58660.0,58770.0,58605.0,58745.0,70163.0,62650.0,586.63,58779.26,58874.7,58709.6,58823.43,-0.14,0.04
3,2025-11-11 09:04:00,KODEX 200,069500*001,58740.0,58815.0,58675.0,58700.0,82122.0,48424.0,586.4,58823.43,58916.0,58767.96,58780.09,-0.14,0.01
4,2025-11-11 09:05:00,KODEX 200,069500*001,58695.0,58810.0,58685.0,58800.0,38236.0,26858.0,587.43,58780.09,58890.11,58776.59,58881.69,-0.13,0.0


In [25]:
df_kp200.head()

Unnamed: 0,datetime,entity,entity_code,Intra시가,Intra고가,Intra저가,Intra종가,Intra거래대금
0,2025-01-02 09:46:00,K200 스프레드 6366,K2FS020*005,-0.8,-0.8,-0.85,-0.85,1271112000.0
1,2025-01-02 09:46:00,KOSPI200 선물 2603,K2FA020*005,317.7,318.65,317.55,318.45,270716200000.0
2,2025-01-02 09:46:00,KOSPI200 선물 2606,K2FB020*005,316.75,317.8,316.75,317.8,4360162000.0
3,2025-01-02 09:46:00,MINI K200 스프레드 6263,M2FS020*103,0.94,0.94,0.94,0.94,0.0
4,2025-01-02 09:46:00,MINI KOSPI200 선물 2602,M2FA020*103,316.88,317.94,316.8,317.76,15342060000.0


In [26]:
df_ktb.head()

Unnamed: 0,datetime,entity,entity_code,Intra시가,Intra고가,Intra저가,Intra종가,Intra거래대금
0,2023-07-12 09:01:00,(N)KTB10 선물 2603,KXFA020*016,110.15,110.22,110.05,110.07,167078700000.0
1,2023-07-12 09:01:00,(N)KTB10 선물 2606,KXFB020*016,110.14,110.14,110.14,110.14,0.0
2,2023-07-12 09:01:00,(N)KTB10 스프레드 6366,XBFS020*016,-0.27,-0.27,-0.27,-0.27,0.0
3,2023-07-12 09:01:00,(N)KTB3 선물 2603,KBFA020*017,103.56,103.56,103.54,103.55,396713800000.0
4,2023-07-12 09:01:00,(N)KTB3 선물 2606,KBFB020*017,103.58,103.58,103.58,103.58,0.0


## Filtered loading examples

`load()` supports entity, item (column), and date-range filters — all pushed down to PyArrow so only the needed data is read from disk.

In [27]:
# KP200 front-month futures only, OHLC columns, Jan 2025
ds["kp200"].load(
    entities=["KOSPI200 선물 2603"],
    items=["Intra시가", "Intra고가", "Intra저가", "Intra종가"],
    date_from="2025-01-01",
    date_to="2025-01-31",
)

Unnamed: 0,datetime,entity,entity_code,Intra시가,Intra고가,Intra저가,Intra종가
0,2025-01-02 09:46:00,KOSPI200 선물 2603,K2FA020*005,317.70,318.65,317.55,318.45
1,2025-01-02 09:47:00,KOSPI200 선물 2603,K2FA020*005,318.40,318.40,317.75,318.00
2,2025-01-02 09:48:00,KOSPI200 선물 2603,K2FA020*005,318.05,318.15,317.40,317.40
3,2025-01-02 09:49:00,KOSPI200 선물 2603,K2FA020*005,317.40,317.55,317.20,317.50
4,2025-01-02 09:50:00,KOSPI200 선물 2603,K2FA020*005,317.45,317.50,317.30,317.30
...,...,...,...,...,...,...,...
7075,2025-01-24 15:41:00,KOSPI200 선물 2603,K2FA020*005,336.10,336.10,336.10,336.10
7076,2025-01-24 15:42:00,KOSPI200 선물 2603,K2FA020*005,336.10,336.10,336.10,336.10
7077,2025-01-24 15:43:00,KOSPI200 선물 2603,K2FA020*005,336.10,336.10,336.10,336.10
7078,2025-01-24 15:44:00,KOSPI200 선물 2603,K2FA020*005,336.10,336.10,336.10,336.10


In [28]:
# ETF 1-min iNAV close vs ETF close
ds["etf_1m"].load(
    items=["Intra종가", "Intra장중지표가치(iNAV/iIV)종가", "IntraETP괴리율"],
)

Unnamed: 0,datetime,entity,entity_code,Intra종가,Intra장중지표가치(iNAV/iIV)종가,IntraETP괴리율
0,2025-11-11 09:01:00,KODEX 200,069500*001,58705.0,58773.38,-0.11
1,2025-11-11 09:02:00,KODEX 200,069500*001,58660.0,58779.26,-0.20
2,2025-11-11 09:03:00,KODEX 200,069500*001,58745.0,58823.43,-0.14
3,2025-11-11 09:04:00,KODEX 200,069500*001,58700.0,58780.09,-0.14
4,2025-11-11 09:05:00,KODEX 200,069500*001,58800.0,58881.69,-0.13
...,...,...,...,...,...,...
23795,NaT,KODEX 200,069500*001,,,
23796,NaT,KODEX 200,069500*001,,,
23797,NaT,KODEX 200,069500*001,,,
23798,NaT,KODEX 200,069500*001,,,


## Metadata / lineage

In [29]:
ds["kp200"].load_meta()

Unnamed: 0,source_file,source_hash,detected_format,entity_name,entity_code,item_code,field_code,item_name,frequency,period_start,period_end,processed_at
0,kp200_(fut)(mini)(v)_(1m)_from(20250101)_to(20...,10117c64a23608f104e056c65a04d79cae98982a446c46...,multi_entity,KOSPI200 선물 2603,K2FA020*005,20005,F20005,Intra시가,1M,20250101,20260207,2026-02-12T07:16:08.660412+00:00
1,kp200_(fut)(mini)(v)_(1m)_from(20250101)_to(20...,10117c64a23608f104e056c65a04d79cae98982a446c46...,multi_entity,KOSPI200 선물 2603,K2FA020*005,20006,F20006,Intra고가,1M,20250101,20260207,2026-02-12T07:16:08.660412+00:00
2,kp200_(fut)(mini)(v)_(1m)_from(20250101)_to(20...,10117c64a23608f104e056c65a04d79cae98982a446c46...,multi_entity,KOSPI200 선물 2603,K2FA020*005,20007,F20007,Intra저가,1M,20250101,20260207,2026-02-12T07:16:08.660412+00:00
3,kp200_(fut)(mini)(v)_(1m)_from(20250101)_to(20...,10117c64a23608f104e056c65a04d79cae98982a446c46...,multi_entity,KOSPI200 선물 2603,K2FA020*005,20008,F20008,Intra종가,1M,20250101,20260207,2026-02-12T07:16:08.660412+00:00
4,kp200_(fut)(mini)(v)_(1m)_from(20250101)_to(20...,10117c64a23608f104e056c65a04d79cae98982a446c46...,multi_entity,KOSPI200 선물 2603,K2FA020*005,20011,F20011,Intra거래대금,1M,20250101,20260207,2026-02-12T07:16:08.660412+00:00
5,kp200_(fut)(mini)(v)_(1m)_from(20250101)_to(20...,10117c64a23608f104e056c65a04d79cae98982a446c46...,multi_entity,K200 스프레드 6366,K2FS020*005,20005,F20005,Intra시가,1M,20250101,20260207,2026-02-12T07:16:08.660412+00:00
6,kp200_(fut)(mini)(v)_(1m)_from(20250101)_to(20...,10117c64a23608f104e056c65a04d79cae98982a446c46...,multi_entity,K200 스프레드 6366,K2FS020*005,20006,F20006,Intra고가,1M,20250101,20260207,2026-02-12T07:16:08.660412+00:00
7,kp200_(fut)(mini)(v)_(1m)_from(20250101)_to(20...,10117c64a23608f104e056c65a04d79cae98982a446c46...,multi_entity,K200 스프레드 6366,K2FS020*005,20007,F20007,Intra저가,1M,20250101,20260207,2026-02-12T07:16:08.660412+00:00
8,kp200_(fut)(mini)(v)_(1m)_from(20250101)_to(20...,10117c64a23608f104e056c65a04d79cae98982a446c46...,multi_entity,K200 스프레드 6366,K2FS020*005,20008,F20008,Intra종가,1M,20250101,20260207,2026-02-12T07:16:08.660412+00:00
9,kp200_(fut)(mini)(v)_(1m)_from(20250101)_to(20...,10117c64a23608f104e056c65a04d79cae98982a446c46...,multi_entity,K200 스프레드 6366,K2FS020*005,20011,F20011,Intra거래대금,1M,20250101,20260207,2026-02-12T07:16:08.660412+00:00


---

# FnGuide DataGuide 6

Daily equity data (OHLCV, consensus, ETF constituents) from `data/raw/fnguide/`.

In [30]:
FN_RAW_DIR = Path("../data/raw/fnguide")
FN_DB_DIR  = Path("../data/db/fnguide")

FN_DATASETS = {
    "ohlcv":      FN_RAW_DIR / "dataguide_kse+kosdaq_ohlcv_from(20160101)_to(20260207).csv",
    "consensus":  FN_RAW_DIR / "dataguide_kse+kosdaq_sales-consensus_from(20180101)_to(20260207).csv",
    "etf_const":  FN_RAW_DIR / "dataguide_etfconst(kodex200)_from(20250101)_to(20260207).csv",
}

## Ingest FnGuide (idempotent)

`fn_dg6_ingest.open()` auto-detects the CSV format, builds parquet + `_meta`, and skips if output already exists. The large OHLCV file (~350 MB) takes a few minutes on first run.

In [31]:
fn_ds = {}
for name, csv_path in FN_DATASETS.items():
    output_dir = FN_DB_DIR / csv_path.stem
    fn_ds[name] = fn_dg6_ingest.open(str(csv_path), output_dir=str(output_dir))
    info = fn_ds[name].describe()
    print(f"{name:12s}  {info.format_name:20s}  {info.shape}  entities={info.entities}")

ohlcv         timeseries_wide       {'default': (7613009, 9)}  entities=4071
consensus     timeseries_wide       {'default': (4913310, 13)}  entities=4071
etf_const     misc_etf              {'default': (53836, 8)}  entities=0


## Load FnGuide datasets

In [32]:
df_ohlcv     = fn_ds["ohlcv"].load()
df_consensus = fn_ds["consensus"].load()
df_etf_const = fn_ds["etf_const"].load()

## Quick preview

In [33]:
df_ohlcv.head()

Unnamed: 0,코드,코드명,date,수정시가(원),수정고가(원),수정저가(원),수정주가(원),거래량(주),거래대금(원)
0,A000020,동화약품,2015-12-30,8180.0,8180.0,8020.0,8140.0,166761.0,1348911000.0
1,A000020,동화약품,2016-01-04,8130.0,8150.0,7920.0,8140.0,281440.0,2265829000.0
2,A000020,동화약품,2016-01-05,8040.0,8250.0,8000.0,8190.0,243179.0,1981977000.0
3,A000020,동화약품,2016-01-06,8200.0,8590.0,8110.0,8550.0,609906.0,5129946000.0
4,A000020,동화약품,2016-01-07,8470.0,8690.0,8190.0,8380.0,704752.0,5919556000.0


In [34]:
df_consensus.head()

Unnamed: 0,코드,코드명,date,추정기관수,최근기업리포트발간일,매출액(원),매출액(최고)(원),매출액(최저)(원),매출액(중앙값)(원),매출액 최근 추정일자,매출액(Fwd.12M)(원),매출액(Fwd.24M)(원),매출액(LTM.12M)(원)
0,A000020,동화약품,2017-12-28,,20170221.0,,,,,,,,
1,A000020,동화약품,2018-01-02,,20170221.0,,,,,,,,
2,A000020,동화약품,2018-01-03,,20170221.0,,,,,,,,
3,A000020,동화약품,2018-01-04,,20170221.0,,,,,,,,
4,A000020,동화약품,2018-01-05,,20170221.0,,,,,,,,


In [35]:
df_etf_const.head()

Unnamed: 0,date,ETF코드,ETF명,구성종목코드,구성종목,주식수(계약수),금액,금액기준 구성비중(%)
0,2025-01-02,A069500,KODEX 200,,원화현금,,9945243,0.62
1,2025-01-02,A069500,KODEX 200,A000080,하이트진로,47.0,914620,0.06
2,2025-01-02,A069500,KODEX 200,A000100,유한양행,91.0,10765300,0.67
3,2025-01-02,A069500,KODEX 200,A000120,CJ대한통운,17.0,1429700,0.09
4,2025-01-02,A069500,KODEX 200,A000150,두산,11.0,2915000,0.18


## Filtered loading examples

`fn_dg6_ingest` supports filtering by `codes` (ticker), `items` (columns), and `date_from`/`date_to`.

In [36]:
# Samsung Electronics adjusted close, 2024 only
fn_ds["ohlcv"].load(
    codes=["A005930"],
    items=["수정주가(원)"],
    date_from="2024-01-01",
    date_to="2024-12-31",
)

Unnamed: 0,코드,코드명,date,수정주가(원)
0,A005930,삼성전자,2024-01-02,79600.0
1,A005930,삼성전자,2024-01-03,77000.0
2,A005930,삼성전자,2024-01-04,76600.0
3,A005930,삼성전자,2024-01-05,76600.0
4,A005930,삼성전자,2024-01-08,76500.0
...,...,...,...,...
239,A005930,삼성전자,2024-12-23,53500.0
240,A005930,삼성전자,2024-12-24,54400.0
241,A005930,삼성전자,2024-12-26,53600.0
242,A005930,삼성전자,2024-12-27,53700.0


In [37]:
# KODEX 200 ETF constituents, latest date
fn_ds["etf_const"].load(date_from="2026-02-06")

Unnamed: 0,date,ETF코드,ETF명,구성종목코드,구성종목,주식수(계약수),금액,금액기준 구성비중(%)
0,2026-02-06,A069500,KODEX 200,A377300,카카오페이,42.0,2725800,0.07
1,2026-02-06,A069500,KODEX 200,A383220,F&F,21.0,1533000,0.04
2,2026-02-06,A069500,KODEX 200,A373220,LG에너지솔루션,74.0,28490000,0.76
3,2026-02-06,A069500,KODEX 200,A375500,DL이앤씨,44.0,1982200,0.05
4,2026-02-06,A069500,KODEX 200,A352820,하이브,32.0,11376000,0.30
...,...,...,...,...,...,...,...,...
196,2026-02-06,A069500,KODEX 200,A000100,유한양행,90.0,9432000,0.25
197,2026-02-06,A069500,KODEX 200,A454910,두산로보틱스,29.0,2949300,0.08
198,2026-02-06,A069500,KODEX 200,A457190,이수스페셜티케미컬,30.0,3000000,0.08
199,2026-02-06,A069500,KODEX 200,A450080,에코프로머티,38.0,2394000,0.06


## FnGuide metadata / lineage

In [38]:
fn_ds["ohlcv"].load_meta()

Unnamed: 0,table_name,source_file,source_hash,source_last_updated,detected_format,아이템코드,아이템명,아이템명_normalized,유형,집계주기,frequency,period_start,period_end,unit_original,unit_multiplier,non_business_days,include_weekends,entities_total,entities_dropped,processed_at
0,default,dataguide_kse+kosdaq_ohlcv_from(20160101)_to(2...,523e45d81c505847b44929c1b2b4c806e08d3a68327db3...,2026-02-07 15:46:56,timeseries_wide,S410000650,수정시가(원),수정시가(원),SSC,일간,일간,20160101,최근일자(20260206),원,1,제외,제외,4071,1000,2026-02-12T05:10:21+00:00
1,default,dataguide_kse+kosdaq_ohlcv_from(20160101)_to(2...,523e45d81c505847b44929c1b2b4c806e08d3a68327db3...,2026-02-07 15:46:56,timeseries_wide,S410000660,수정고가(원),수정고가(원),SSC,일간,일간,20160101,최근일자(20260206),원,1,제외,제외,4071,1000,2026-02-12T05:10:21+00:00
2,default,dataguide_kse+kosdaq_ohlcv_from(20160101)_to(2...,523e45d81c505847b44929c1b2b4c806e08d3a68327db3...,2026-02-07 15:46:56,timeseries_wide,S410000670,수정저가(원),수정저가(원),SSC,일간,일간,20160101,최근일자(20260206),원,1,제외,제외,4071,1000,2026-02-12T05:10:21+00:00
3,default,dataguide_kse+kosdaq_ohlcv_from(20160101)_to(2...,523e45d81c505847b44929c1b2b4c806e08d3a68327db3...,2026-02-07 15:46:56,timeseries_wide,S410000700,수정주가(원),수정주가(원),SSC,일간,일간,20160101,최근일자(20260206),원,1,제외,제외,4071,1000,2026-02-12T05:10:21+00:00
4,default,dataguide_kse+kosdaq_ohlcv_from(20160101)_to(2...,523e45d81c505847b44929c1b2b4c806e08d3a68327db3...,2026-02-07 15:46:56,timeseries_wide,S41000080F,거래량(주),거래량(주),SSC,일간,일간,20160101,최근일자(20260206),,1,제외,제외,4071,1000,2026-02-12T05:10:21+00:00
5,default,dataguide_kse+kosdaq_ohlcv_from(20160101)_to(2...,523e45d81c505847b44929c1b2b4c806e08d3a68327db3...,2026-02-07 15:46:56,timeseries_wide,S410000900,거래대금(원),거래대금(원),SSC,일간,일간,20160101,최근일자(20260206),원,1,제외,제외,4071,1000,2026-02-12T05:10:21+00:00
