Multi-domain Open Research and Inferential Estimation
Pronounced /ˈmɔɪraɪ/ — "MOY-rye", like the Greek Moirai (the Fates).
A multi-domain scientific computing toolkit (Python and R) for observational inference, with sociolegal, signal-processing, cryptographic, spatial-statistics, statistical-physics, and psychometrics modules. Hosts the MRM framework as a primary application for Canadian carceral, police, and oversight data analysis.
⚠️ Pre-alpha (v0.x). MORIE is in pre-alpha. The first alpha milestone is v1.0.0; everything before that is point-releases of pre-alpha code. APIs may shift, datasets may move, and findings may be refined between minor versions. Paper sources are atpapers/(LaTeX); compiled PDFs are on Zenodo via the DOI badges above.
Full step-by-step install guide with platform-specific notes (PEP 668 on Debian, python 3.13 segfault on Raspberry Pi OS, etc.) is at INSTALLATION.md.
morie is a Python (and R) package — once Python is present it is pip install morie. If you are starting with nothing installed, INSTALLATION.md opens with Step 1 — install the prerequisites: every tool you might need (Python, curl, bash/WSL, Git Bash, winget, Homebrew, Docker, R) with its official download. The short version:
- Windows — install Python from python.org (on the first screen tick Add python.exe to PATH), then
pip install morie. Full walkthrough: Windows below. Windows has nocurl/bash, so the one-liner does not apply there. - macOS / Linux — the one-liner below sets up everything. It needs
curlandbash, which macOS has built in and most Linux ships. - Already have Python ≥3.10 — just
pip install morie.
The simplest path if you have a terminal with curl and bash — both are built into macOS and preinstalled on most Linux (Windows has no bash, so use the installer above instead). It then bootstraps everything else for you: Python via uv, a managed venv, and the morie wheel. No pre-existing Python or pip needed.
curl -fsSL https://hadesllm.github.io/morie/install.sh | bashOr, with R alongside Python:
curl -fsSL https://hadesllm.github.io/morie/install.sh | bash -s -- --autoAfter install, ~/.local/bin/morie is a thin shim into the managed venv at ~/.venvs/morie. Full install instructions, channel comparison, and platform-specific notes are at hadesllm.github.io/morie/#quick-start.
On minimal Linux containers (Alpine, slim Debian) that ship without
curl, install it first:apt-get install -y curlorapk add curl. macOS already hascurlbuilt in.
Windows doesn't ship curl, bash, python, or R, so the Linux/macOS one-liner above won't run there. The path that works on any Windows with no prerequisites:
- Install Python from python.org/downloads — on the first installer screen, tick "Add python.exe to PATH" (skipping this is the No. 1 cause of
pythonbeing "not recognized" in the terminal). - (Optional — for the R package) install R from cran.r-project.org/bin/windows/base.
- Open PowerShell and install morie:
python -m pip install --upgrade pip
python -m pip install morie
python -c "import morie; print(morie.__version__)"For the R package: Rscript -e "install.packages('morie', repos=c('https://hadesllm.r-universe.dev','https://cloud.r-project.org'))"
Prefer a package manager? If winget --version works on your machine, winget install -e --id Python.Python.3.12 (and RProject.R) installs the prerequisites in one line each — but winget is absent from many Windows installs, so the installer steps above are the reliable default. The full Windows walkthrough, including fixes for common errors (python opening the Microsoft Store, PowerShell execution policy, long-path), is in INSTALLATION.md.
If you don't have Homebrew yet, install it first (macOS ships curl and bash, so this works out of the box):
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"Then:
brew tap hadesllm/morie
brew install morieThe tap repo is hadesllm/homebrew-morie. It pulls morie's source distribution from PyPI and bundles a self-contained python@3.12 venv — no system Python required.
pip install morieHeads-up: modern Debian / Ubuntu / Raspberry Pi OS forbid
pipoutside virtual environments (PEP 668), and the systempython3on Raspberry Pi OS 13 segfaults on importing the SciPy stack. Ifpip install morieerrors orimport moriesegfaults, use the one-liner above instead — it handles both cases automatically.
# Latest stable
docker run --rm ghcr.io/hadesllm/morie:latest morie --help
# Pin to a specific version (recommended for reproducibility)
docker run --rm ghcr.io/hadesllm/morie:0.9.5.4 morie --helpMulti-arch image published on every release with both versioned and :latest tags. Requires only Docker — no Python, no pip.
# Stable from CRAN (when listing is live)
install.packages("morie")
# Nightly binary builds (recommended while CRAN listing is rolling out)
install.packages(
"morie",
repos = c(
hadesllm = "https://hadesllm.r-universe.dev",
CRAN = "https://cloud.r-project.org"
)
)import morie
# Load a built-in dataset
df = morie.load_dataset("otis-2025")
# Run an MRM module on OTIS data
from morie.otis_all_analyze import analyze_a01_mrm
result = analyze_a01_mrm(df)
print(result)- SIU subsystem — first-class. A full pipeline for the Ontario Special Investigations Unit director's-report corpus (English + French, 2005-present):
morie_fetch_siu()with a polite token-bucket fetcher (4 req/s default, exponential backoff on 429/5xx, optional on-disk page cache), a hand-rolled C++ parser (src/siu_parser.cpp) that handles both 2015-2019 and 2020+ template families plus 2014 Overview and 2005 Director's report variants, 38 police-service acronyms (English + French) mapped to canonical English names, compound officer count handling, and a linearhtml_to_textstate machine replacing the segfault-pronestd::regex_replace. - Language-aware DRID manifest.
inst/extdata/siu_drid_manifest.csv.gzships with 4,743 probed drids (en=2,531, fr=2,212, unknown=0) and acanonical_dridcolumn for English-preferred dedupe.morie_fetch_siu(lang = "en")skips French drids — half the network round-trips.morie_siu_index()exposes the manifest. - Canonical override system — the parser learns.
inst/extdata/siu_canonical_overrides.csv.gzships with 47 hand-verified corrections;morie_siu_record_correction(case_number, field, value)lets users add their own. Overrides are applied automatically at the end of every fetch. - Audit + AI tooling.
morie_siu_audit_case(),morie_siu_compare(),morie_siu_sanity_check(),morie_siu_anomaly_check(),morie_siu_audit_columns(),morie_siu_translate(), andmorie_siu_llm_extract()with four providers —ollama(default, local, free),gemini,claude,vertex— and ac("ollama", "gemini")failover chain so paid APIs only fire when the local model fails. Defaults:OLLAMA_HOST=http://localhost:11434,OLLAMA_MODEL=gemma3:4b,OLLAMA_KEEP_ALIVE=30m. French → English translation viatranslategemma:latest. - 559 exported
morie_*R functions — every public callable now prefixed. Cleared rOpenScipkgcheck's duplicated-function-names finding by renaming 352 unprefixed exports tomorie_*acrossR/,tests/,vignettes/,inst/, anddata-raw/. No aliases — the unprefixed names are gone fromNAMESPACE. - TPS open-data ingestion fixes (carried over from the original v0.9.5 plan). Corrected the Homicides and Shootings date ranges in the dataset catalog (
2004-present, not2014); rewrotemorie_fetch_tps()ArcGIS paging to follow the server'sexceededTransferLimitflag so large layers are no longer silently truncated to the first page; daily-resolution Hawkes fits now build the occurrence date from the local-timeOCC_YEAR/OCC_MONTH/OCC_DAYfields rather than the UTC-convertedOCC_DATE. T_horizonrename in the Hawkes C++ likelihood. The time-horizon parameter was bareTin the auto-generatedR/RcppExports.R, whichlintrflags as a potentialTRUEshadow. The C++ signature is nowT_horizon; the math convention is preserved in C++ docstrings only.- rOpenSci 770 blockers cleared.
.github/CONTRIBUTING.mdshipped, 16@returndocs added, 15@examplesadded, full roxygen2 conversion (RoxygenNote 7.3.3), coverage validated ≥75% undercovr::package_coverage,\dontrun{}count 72 → 0,setwd()replaced withwithr::local_dir()inR/workflow.R. - Five-cell R CMD check matrix all green on
release/v0.9.5-audit: macOS-latest release, Windows-2025 release, Ubuntu-latest release, Ubuntu-latest release + postgres-15, Ubuntu-latest oldrel-1, Ubuntu-latest devel. Pluspkgcheck,covr+ Codecov upload,lintr,goodpractice, and CodeQL. - Final SIU corpus stats: 2,218 unique cases × 64 columns, 100.000% format-clean per
morie_siu_sanity_check().
- CRAN source-package compliance. The R package's vendored copy of the shared C++ core header was renamed
morie_core.hpp→morie_core.h.R CMD check --as-crandoes not recognize.hppas asrc/file extension and warned about it; the rename clears the WARNING. No behaviour change — the canonicallibmorie/morie_core.hpp(Python/CMake side) is unchanged.
- Docker image build fixed (completely). v0.9.2's Dockerfile fix was incomplete — the builder stage didn't copy
LICENSE, andpyproject.toml'slicense-filesdeclaration made scikit-build-core fail metadata generation without it. The builder now copiesLICENSEtoo; the image build is verified end-to-end. - Homebrew tap bump fixed. The tap-bump job raced the PyPI publish — it polled for the sdist for only ~4 minutes, but the sdist uploads last, after the full wheel matrix (~20 minutes). It now waits for the sdist itself, up to ~35 minutes.
- Atomic releases. The release pipeline now verifies the full build — the sdist and the Docker image — before the version tag is created. If any build fails, no tag is created and nothing publishes, so a half-broken release (a working PyPI package but a failed Docker image, as in v0.9.1/v0.9.2) can no longer ship.
- Docker image build fixed for the C/C++ core. The container build's Python stage was written for the old pure-Python package — it staged the install from a stub before copying the source. v0.9.1's compiled
libmoriecore (scikit-build-core + CMake) cannot build that way, so the published image failed to build. The builder stage now installscmake/ninjaand builds from the realCMakeLists.txt+libmorie/sources.
- C/C++ computational backend — the hot numerical kernels (formerly
_jit.py) are ported to a shared C++ core (libmorie), exposed to Python via nanobind and to R via Rcpp. One compiled core now serves both language sides. - Hawkes-process engine — a self-exciting point-process suite in the C++ core: sum-of-exponentials and complex-pole SoE engines, a matrix-pencil exponential fitter, sub-quadratic truncated Weibull / Lomax / gamma kernels, sinusoidal-baseline variants, and a hybrid gamma-tail kernel. An R-side Hawkes fitter with Poisson-degeneracy detection and multi-start restarts is included.
- Wheels via cibuildwheel — the PyPI wheel matrix is now built with
cibuildwheelfor the compiled extension. - IP / licensing cleanup — the bundled demo dataset was replaced with public-domain Solar System data; copyrighted pop-culture quotes throughout
fn/were replaced with public-domain ones; 85 franchise-derived function codes were renamed to neutral names and four themed categories merged intoAtomicPrimitives. - OTIS data resolution fix —
load_otis()and the OTIS analysis modules resolve their data directory robustly (apyproject.tomlmarker walk) instead of a hard-coded path depth.
check_datasets()dataset auditor — probes every entry in the dataset catalogue and reports which datasets are reachable and which need attention, classified by tier.- More open-data sources — new
morie.ingest.statcanandmorie.ingest.cihimodules add the StatCan Canadian Community Health Survey 2022 PUMF and five CIHI indicator data tables, fetched on demand. - 16 datasets wired to verified sources — Cannabis / Substance Use / Alcohol-and-Drugs / Student survey PUMFs got verified open.canada.ca CKAN ids; the Toronto Police crime datasets and the Ontario SIU case data now fetch through their scrapers. The catalogue went from 33 to 49 reachable datasets.
- New-version notification +
morie update—import moriedoes a fail-silent, daily-cached PyPI check and warns when a newer release exists (opt out withMORIE_NO_UPDATE_CHECK);morie updateupgrades in place. - CRAN fix — the
morie_load_cpadsexample is wrapped in\dontrun{}, clearing anR CMD check --as-cranerror.
- New: the fairness & disparity-audit subsystem (
morie.fairness) — a subsystem for auditing risk-assessment, recidivism, and predictive-policing systems for racial and other group disparities. morie measures whether an existing system encodes disparate treatment; it does not deploy one. - Six group-fairness metrics — disparate impact (the four-fifths rule), demographic parity gap, equalized odds, average odds difference, Gini, and the composite Bias Amplification Score (Python + R parity).
- Predictive-policing calibration audit — rank areas by predicted risk vs. realised outcomes and test whether the disagreement tracks demographics; a city-agnostic
CityProfilelayer runs the audit for Chicago, New York, Toronto, or any registered city. - Multi-city temporal audit — the disparity metrics per
(city, period), surfacing temporal instability and cross-city divergence. - Simulation framework — a Noisy-OR detection model, a synthetic biased-data generator, a JAX spatial GAN, and a CTGAN-style debiaser (the optional
morie[sim]extra — JAX, not PyTorch, to stay lean). - Explainability (XAI) suite — permutation importance, partial dependence, ALE, ceteris paribus, and SHAP — model-agnostic, and wired to flag when a model leans on a protected attribute.
- Clean-room reimplementations from published methods (IBM AIF360; the SciencesPo Predictive-policing-Chicago project; Barman & Barman, arXiv:2603.18987; the COMPAS XAI Stories audit) — no third-party code copied.
- Security fix — resolved a regular-expression denial-of-service (ReDoS) vulnerability in the Ontario SIU scraper (
siu_fetch), flagged by static analysis (CodeQLpy/redos, high severity). A repeated sub-pattern could backtrack catastrophically on a maliciously crafted page; it is now linear-time, with no change to parsing of valid SIU index pages. - Stale
User-Agentstrings across the data-ingestion modules aligned to the release version.
- Licensing — morie is licensed
AGPL-3.0-or-lateron both language sides. The two optional Linux-kernel adjuncts (kernel-module/anddaemon/) stayGPL-2.0-onlybecause the kernel ABI requires it; they are not part of the wheel or CRAN tarball. - Empirical applications paper published — Solitary Confinement, Self-Excitation, and Institutional Churn: Empirical Applications of MRM to Canadian Carceral and Police Data on Zenodo at 10.5281/zenodo.20175689. Five-paper publication set now complete.
ac/vmterminology locked across all 5 papers —ac(alert complexity) andvm(volatility measure of placements, "regional-transition count" alongside) are now the canonical operational terms.- DOI + version propagation sweep — empirical-paper DOI now reaches Sphinx index,
pyproject.toml [project.urls],papers/README.md, and CITATION.cff. Sphinx install snippets, Docker tag examples, and the in-treepapers/README.mdwere also un-pinned from stale versions. - R-package roxygen docs for fast Rcpp kernels —
morie_mean,morie_var,morie_cor_pearson,morie_normal_pdf,morie_fast_availableship with Rd man pages. - R 4.6.0 compatibility —
DESCRIPTIONcarries an explicitAuthor:field alongsideAuthors@R:soR CMD checkpasses on the strict 4.6.0 build.
- Three replication modules from Laniyonu et al. —
morie.laniyonu.gentrification_policing()(Spatial Durbin replication of Laniyonu 2018 UAR — gentrification spillover on NYPD SQF),morie.laniyonu.smi_force_disparity()(Bayesian-style hierarchical neg-binomial replication of Laniyonu & Goff 2021 BMC Psych — police force on persons with serious mental illness),morie.laniyonu.actuarial_risk_disparity()(cumulative-logit replication of O'Connell & Laniyonu 2025 Race & Justice — Canadian federal-prison risk-assessment bias). - Five reusable MRM identification primitives —
mrm.primitive.gentrification_panel,spatial_spillover_decomposition,synthetic_area_exposure,threshold_specific_ordinal,score_net_residual. The building blocks every future module composes. - US + Canadian crime-data adapters —
morie.datasets.chicago_crime(),nyc_stop_and_frisk(),bigquery()(lazy Google-Cloud BigQuery), plusnibrs()(FBI Crime Data Explorer),namus_missing_persons(),nist_rds()(NIST Reference Datasets catalog). - Toy bundles for every new dataset — Chicago crime (50 rows), NYC SQF (40 rows), NIBRS (30 rows), NamUs (20 rows), NIST RDS (10 rows).
offline=Trueworks on every loader. morie.fastopt-in JIT acceleration surface — drop-in JIT-compiled kernels (normal_pdf,cor_pearson_jit,bootstrap_mean_jit,trimmed_ipw_weights_jit, …) + ajit_if_availabledecorator.pip install morie[fast]activates Numba; without it, kernels run as pure-numpy. Numerically identical to scipy/numpy (max diff ≤5.55e-17).ci-numba-bench.ymlnightly benchmark workflow comparing JIT vs non-JIT paths on every release.- Three new BibTeX entries added to all 4 paper bibliographies: Laniyonu (2018), Laniyonu & Goff (2021), O'Connell & Laniyonu (2025).
- Lazy-import fix in
morie.ingest.__init__— PEP 562__getattr__for BigQuery usesimportlib.import_moduleto avoid the infinite-recursion trap thatfrom . import bigquerywould create.
- Any-dataset support — bring your own column names.
morie.schema.infer_mapping(your_df, canonical=...)fuzzy-matches your columns onto morie's canonical schema; pass the dict toapply_mappingand your data flows through every module without renaming. CLI users getmorie run-module ... --columns my_wt:weight,drinks_yn:alcohol_past12m. - 9-locale CLI —
MORIE_LOCALE=es|de|zh|pt|ja|ar|hi morie ...plus the existing EN + FR. Methodology docs stay English; CLI surface is multilingual. - No-code dataset shortcuts —
morie pull tps-major --year 2024 --out file.csvwrites the entire Toronto Police "Major Crime" feed to disk in one line. No Python, no API URLs, no SQL. Also:morie pull tps-shootings,morie pull tps-homicide,morie pull cpads,morie pull otis-a01-toy,morie pull siu-toy,morie pull tps-layers. TUTORIAL.md— your first analysis, no Python knowledge required. Copy-paste five commands and you have 13 CSVs explained.- Python facade —
import morie.datasets as md; df = md.tps_major_crime(year=2024)for users who want to script. - Open-data adapters —
morie ingest ckan/tps/siupulls feeds from CKAN portals (open.canada.ca, data.gov.uk, etc.), Toronto Police Service ArcGIS layers, and Special Investigations Unit director's-reports directly into pandas. Seemorie.ingest.{ckan,tps,siu}. - Synthetic CPADS bundled —
morie run-module power-designworks on a fresh install with no manual download; emits a clear "synthetic data" warning so toy outputs aren't mistaken for real findings. INSTALLATION.mdwalkthrough covering all 5 install channels with platform-specific notes (PEP 668 on Debian, python 3.13 segfault on Raspberry Pi OS, Windows).papers/allowlisted JSS paper sources in-tree (5 papers; no emails or drafts).- Sphinx "Edit on GitHub" link in the sidebar so readers can suggest doc changes in one click.
anova_onewaybackwards-compat alias +gibbons_chakrabortirename (from v0.4.14, carried forward).
Full documentation is at hadesllm.github.io/morie.
If you use morie in your research, please cite both software papers (R and Python) and, where applicable, the MRM framework paper, the Hawkes methodology paper, and the empirical applications paper.
# Software paper — R (also the R package source on Zenodo)
Ruhela, V. S. (2026). morie: Multi-domain Open Research and Inferential
Estimation in R (v0.9.5.4). Zenodo.
https://doi.org/10.5281/zenodo.20111233
# Software paper — Python (also the Python package source on Zenodo)
Ruhela, V. S. (2026). morie: Multi-domain Open Research and Inferential
Estimation in Python (v0.9.5.4). Zenodo.
https://doi.org/10.5281/zenodo.20096350
# MRM framework paper (theoretical foundations)
Ruhela, V. S. (2026). MRM Framework: Multi-Source Statistical Foundation
for Canadian Carceral, Police, and Oversight Data (v1). Zenodo.
https://doi.org/10.5281/zenodo.20096075
# Hawkes-process methodology paper
Ruhela, V. S. (2026). Criminological Hawkes Process via MORIE: Markovian
and Non-Markovian Self-Exciting Point Processes for Toronto Crime (v1).
Zenodo. https://doi.org/10.5281/zenodo.20102198
# Empirical applications paper
Ruhela, V. S. (2026). Solitary Confinement, Self-Excitation, and
Institutional Churn: Empirical Applications of MRM to Canadian Carceral
and Police Data (v1). Zenodo. https://doi.org/10.5281/zenodo.20175689
See CITATION.cff for machine-readable citation metadata.
MORIE was developed with substantial assistance from frontier AI assistants. The author retains full responsibility for the code, the methods, and the scientific claims; AI assistance accelerated implementation but does not change the attribution of the work.
-
Claude — Anthropic. Anthropic's Claude family (Opus, Sonnet, and Haiku across the 4.x generation) was used extensively throughout development for code generation, refactoring, documentation, code review, and design discussions. Use was supported by Anthropic research-credit programs.
-
Gemini and Vertex AI — Google. Google's Gemini 2.5 models (Pro and Flash) on the Vertex AI platform were used extensively for additional code generation, cross-checking Claude-generated code, multi-modal data analysis, and prototype evaluation. Use was supported by Google research-credit programs.
-
Anthropic — Claude API research credits.
-
Google — Gemini / Vertex AI research credits.
-
The author thanks Glenn McNamara — a 35-year career with the Ontario Government — for his methodological mentorship. He brings distribution theory, applied-statistics intuition for administrative data, and the judgment that grounds much of this framework. Glenn is the M in MRM (Multilevel Reconciliation Methodology; people-credit reading: McNamara-Ruhela-Medina) (catalyst).
-
The author thanks Prof. Angela Zorro Medina, Centre for Criminology and Sociolegal Studies, University of Toronto, who is the author's supervisor, methodological instructor, the domain-expert reviewer of the preliminary methodological approach, and a knowledge user of the framework. The methodological lineage MRM follows is established in her work on anti-gang legislation (Zorro Medina, 2023, The Effect of Anti-Gang Laws on Crime and Social Control) — staggered two-way-fixed-effects identification, formal leads-and-lags Granger-causality diagnostics for parallel trends, multi-source data-integration over five jurisdictional sources, deterrence / routine-activities / certainty mechanism categorisation, and the inequality-effects-of-criminal-law framing — all of which directly shape MRM's empirical-statistical spine. Prof. Medina is the M in MRM (supervisor & reviewer).
Several MRM analyses use Statistics Canada and Health Canada Public
Use Microdata Files (PUMFs) — including the Canadian Cannabis
Survey (CCS), the Canadian Student Alcohol and Drugs Survey
(CSADS), the Canadian Substance Use Survey (CSUS), the
Canadian Alcohol and Drugs Survey (CADS, 2019;
doi.org/10.25318/132500052021001-eng),
and the Canadian Postsecondary Education Alcohol and Drug Use
Survey (CPADS) — along with Public Health Agency of Canada (PHAC)
and Canadian Institute for Health Information (CIHI) aggregates.
Although the analyses use Statistics Canada and Health Canada data,
the analyses, interpretations, and conclusions are those of the
author and do not represent the views of Statistics Canada or
Health Canada. Ontario open data (OTIS, A01-RCDD release; via
data.ontario.ca) and Toronto Police Service open data are used
under the same standard disclaimer.
morie is licensed under the GNU Affero General Public License, version 3.0 or later (AGPL-3.0-or-later), on both the Python and R sides. The AGPL is a strong copyleft license: anyone who distributes a modified morie — or offers a modified morie to users over a network — must publish their source. Modifications and improvements cannot be kept secret or taken closed-source.
- Python and R packages (
src/morie/,r-package/morie/) —AGPL-3.0-or-later. SeeLICENSE. - Optional Linux kernel adjuncts (
kernel-module/morie.c,daemon/morie_lsm.py) —GPL-2.0-only(the Linux kernel ABI requires GPL for loaded modules). These are NOT part of the R / Python distribution; they are separately-licensed, independently-distributed adjuncts. Seekernel-module/LICENSE-GPL2. - Papers, data and documentation —
CC BY-NC-SA 4.0(Creative Commons Attribution-NonCommercial-ShareAlike) unless explicitly marked otherwise.
Full detail in LICENSING.md.
- General issues: GitHub Issues
- Security vulnerabilities: see
SECURITY.md