·
6 commits
to main
since this release
Full Digital Corpus of Sanskrit as a queryable SQLite master, built by the M1-M7 pipeline in src/DCS-data-2026/ from gasyoun/dcs-conllu @ 04e0778 (2026-03-05). 270 texts / 754,726 sentences / 5,688,416 tokens / 98,606 attested lemmas / 74 treebank texts. Validated: cross-walk 0 mismatches, coverage matches upstream. Tables: text / chapter / sentence / token (flatten-all UD FEATS+MISC) / mwt / lemma / provenance. Use: gunzip dcs_full.sqlite.gz ; sqlite3 dcs_full.sqlite. Regenerate: python src/DCS-data-2026/import_dcs_conllu.py --all --db dcs_full.sqlite. 287 MB gz (~920 MB uncompressed). CC BY 4.0 (Oliver Hellwig / DCS).