# Laboratory Feature Extraction for MIMIC-IV MDP Environment

This notebook extracts valid laboratory features for an RL agent by filtering the dictionary of all possible lab tests to include only those present in actual patient data.

## Setup: Import Libraries

In [None]:
import pandas as pd
import os

## Step 1: Define File Paths

Define placeholders for the data files. Update these paths to match your data location.

In [None]:
# Define file paths - update these to match your data location
labevents_file_path = 'labevents.csv.gz'  # Large file containing actual lab events
d_labevents_file_path = 'd_labevents.csv.gz'  # Dictionary file with all possible lab tests
output_file_path = 'features_labevents.csv'  # Output file for filtered features

## Step 2: Extract Unique Item IDs from Lab Events

Read only the `itemid` column from the large labevents file for optimization, then extract unique values.

In [None]:
# Read only the itemid column from labevents for efficiency
print("Reading labevents file (only itemid column)...")
labevents_df = pd.read_csv(labevents_file_path, usecols=['itemid'])

# Extract unique itemids
unique_itemids = labevents_df['itemid'].unique()

# Convert to set for faster lookup
unique_itemids_set = set(unique_itemids)

# Print the count of unique items
print(f"Number of unique lab test items found in labevents: {len(unique_itemids_set)}")

## Step 3: Filter Dictionary to Keep Only Valid Items

Read the complete d_labevents dictionary and filter it to keep only the rows where itemid exists in our actual data.

In [None]:
# Read the complete d_labevents dictionary
print("\nReading d_labevents dictionary file...")
d_labevents_df = pd.read_csv(d_labevents_file_path)

print(f"Total items in d_labevents dictionary: {len(d_labevents_df)}")

# Filter to keep only items that exist in actual data
filtered_features_df = d_labevents_df[d_labevents_df['itemid'].isin(unique_itemids_set)]

print(f"Items after filtering (present in actual data): {len(filtered_features_df)}")

## Step 4: Save Filtered Feature List

Save the filtered DataFrame to a CSV file with all original columns retained.

In [None]:
# Save the filtered features to CSV
print(f"\nSaving filtered features to {output_file_path}...")
filtered_features_df.to_csv(output_file_path, index=False)
print("Features saved successfully!")

## Step 5: Verification

Display the first 5 rows and total number of features available for the RL agent.

In [None]:
# Display first 5 rows
print("\n" + "="*80)
print("VERIFICATION: First 5 rows of the feature list:")
print("="*80)
print(filtered_features_df.head())

# Display total count
print("\n" + "="*80)
print(f"TOTAL NUMBER OF FEATURES AVAILABLE FOR THE AGENT: {len(filtered_features_df)}")
print("="*80)

## Summary

This notebook has successfully:
1. ✅ Extracted unique lab test item IDs from the actual patient data (labevents)
2. ✅ Filtered the dictionary of all possible lab tests (d_labevents) to include only those present in actual data
3. ✅ Saved the filtered feature list to `features_labevents.csv`
4. ✅ Verified the results by displaying the first 5 rows and total count

The resulting feature list defines the Action Space for the RL agent, ensuring that the agent can only select lab tests that actually exist in the patient data.