# **High Level Guide to Data Sets and Variables in [MIMIC-IV](https://physionet.org/content/mimiciv/3.1/)**

## MIMIC-IV has three main components: `hosp`, `ed`, and `icu` . We are currently exploring the `ed` dataset, but will plan to see how we can expand our analysis to `hosp` later on. Below are table/column summaries of `ed` and `hosp` since those are the two modules we plan to focus on

# **MIMIC-ED**

### Comprised of 6 tables: 

## 1. **edstays** -- One row per ED stay (stay_id = unique visit).

Columns:
    
`subject_id`: patient identifier (links to MIMIC-IV, MIMIC-CXR).

`hadm_id`: hospital admission ID (if admitted after ED stay).

`intime` / `outtime`: ED admission and discharge times.

`gender`, `race`: patient demographics.

`arrival_transport`: how patient arrived (AMBULANCE, HELICOPTER, WALK IN, etc.).

`disposition`: discharge outcome (ADMITTED, HOME, TRANSFER, etc.).

## 2. **diagnosis** -- Coded diagnoses (ICD-9/10) for each ED stay.

Columns:
    
`subject_id`, `stay_id`: link to ED stay.

`seq_num`: priority/order of diagnosis (1 = most relevant).

`icd_code`, `icd_version`, `icd_title`: diagnosis details.

Only includes ED-related diagnoses (hospital diagnoses are separate in MIMIC-IV).


## 3. **medrecon** -- Medication reconciliation: drugs the patient was taking before the ED visit.

Columns:

`subject_id`, `stay_id`, `charttime`.

`name`: medication name.

`gsn`, `ndc`: drug identifiers (0 = missing).

`etc_rn`, `etccode`, `etcdescription`: ontology grouping (e.g., CNS stimulant).

A single drug can appear in multiple rows if it belongs to multiple drug classes.

## 4. **pyxis** -- Medications dispensed in the ED by the Pyxis MedStation.

Columns:

`subject_id`, `stay_id`, `charttime`.

`med_rn`: distinguishes multiple drugs dispensed at once.

`name`: medication name/formulation.

`gsn`, `gsn_rn`: drug identifiers.

Only includes drugs dispensed via Pyxis, not large fluids or non-Pyxis meds.

## 5. **triage** -- Initial triage data collected upon ED arrival.

Columns:

`subject_id`, `stay_id`

`temperature`, `heartrate`, `resprate`, `o2sat`, `sbp`, `dbp`: vital signs

`pain`: patient-reported pain level.

`acuity`: severity (1 = highest, 5 = lowest).

`chiefcomplaint`: free-text reason for visit (with PHI redacted).

## 6. **vitalsign** -- Ongoing vital sign measurements taken during the ED stay.

Columns:

`subject_id`, `stay_id`, `charttime`.

Same vital sign variables as triage + rhythm (heart rhythm).
    


## In summary:

`edstays` = core table defining each ED visit.

`diagnosis` = ED diagnoses.

`medrecon` = medications before arrival.

`pyxis` = medications dispensed during stay.

`triage` = initial assessment at arrival.

`vitalsign` = vital signs throughout stay.

We have all tables already merged in the dataset `mimicel.csv`, which is accessible through the google drive link in the ReadMe

# **MIMIC-hosp**

## Overview

Contains data on 546,028 hospitalizations for 223,452 unique patients.

Focuses on in-hospital events, though some data (e.g. lab tests) may come from outpatient or ED settings.

Each table centers on different aspects of a hospital stay.

## Core Tables

`patients` – Basic demographics for each individual.

`admissions` – Details of each hospital admission (one row per hospital stay).

`transfers` – Records of intra-hospital movements (e.g., between wards or ICUs).

## Clinical Data

`labevents`, `d_labitems` – Laboratory test results and their definitions.

`microbiologyevents`, `d_micro` – Microbiology cultures and test descriptions.

`poe`, `poe_detail` – Provider orders and order details.

`emar`, `emar_detail` – Medication administrations and their details.

`prescriptions`, `pharmacy` – Prescribed medications and related pharmacy info.

## Billing and Administrative Data

`diagnoses_icd`, `d_icd_diagnoses` – ICD-coded diagnoses.

`procedures_icd`, `d_icd_procedures` – ICD-coded procedures.

`hcpcsevents`, `d_hcpcs` – HCPCS-coded billing events.

`drgcodes` – DRG billing classifications.

`services` – Information about hospital services involved in the patient’s care.

## Additional Tables

`omr` – Online medical record data (vitals, measurements, notes, etc.).

`provider` – List of deidentified care providers.

Columns ending in _provider_id in other tables can link here.

Prefixes on provider_id indicate context (e.g., admitting vs. treating provider).

## In relation to MIMIC-ED:

subject_id → links patients across all MIMIC modules.

hadm_id → links ED visits to hospital admissions (when admitted).