# Study-level Evaluator
- **df_metadata** (real-world DICOM metadata)
    - `study_id`: Study identifier
    - `file_id`: DICOM file identifier
    - `IOD`: DICOM Information Object Definition (IOD)
    - `Tag`: DICOM Tag
    - `Value`: Tag value
- **df_standard** (DICOM standard definition)
    - `IOD`: DICOM IOD
    - `Tag`: DICOM Tag
    - `Attribute Name`: DICOM Tag's Attribute Name

In [1]:
import numpy as np
import pandas as pd
import os

In [2]:
import sys
dir_function = '/DicomStandardEvaluator/Evaluator'
sys.path.append(dir_function)

In [None]:
data_dir = 
output_dir = 

## Data Set (DICOM Metadata)
**Columns**:
- **IOD**: IOD Specification according to `(0008,0016) SOP Class UID` ([Table B.5-1.Standard SOP Classes](https://dicom.nema.org/medical/Dicom/2025c/output/chtml/part04/sect_B.5.html))
- **study_id**: unique id per dicom studies 
- **series_id**: unique id per dicom series
- **file_id**: unique id per dicom instances
- **Manufacturer (optional)**: `(0008,0070) Manufacturer` - e.g., 'Siemens Healthineers', 'GE Healthcare',
       'Canon Medical Systems Corporation', 'Philips Healthcare',
       'Hitachi Healthcare Corporation', 'Shimadzu Corporation',
       'AI Lab Co., Ltd.', 'Scimedix Corporation', 'Agfa HealthCare N.V.',
       'Carestream Health, Inc.',
       'Konica Minolta Healthcare Americas, Inc.', None, 'Hologic, Inc.',
       'MEDI-FUTURE, Inc.', 'IMS Giotto S.p.A.', 'FUJIFILM Corporation',
       'GENORAY Co., Ltd.', 'DRTech Corporation'
- **ScannerModel (optional)**: - `(0008,1090) Manufacturer's Model Name` e.g., 'SOMATOM Definition AS', 'LightSpeed16', 'Asteion', 'HiSpeed',
       'SOMATOM Spirit', 'SOMATOM Perspective', 'CT/e', 'LightSpeed',
       'BrightSpeed', 'Brilliance 6', 'Pronto', 'LightSpeed VCT',
       'SOMATOM Definition Flash', 'SOMATOM Definition AS+', 'Aquilion',
       'Brilliance 16', 'SOMATOM Emotion Duo', 'SOMATOM Emotion 6',
       'Optima CT660', 'SOMATOM Emotion 16', 'Alexion', 'Biograph20',
       None, 'Mx8000', 'SOMATOM Volume Zoom', 'Supria',
       'Brilliance iCT 256', 'ProSpeed FII', 'Sytec SRi', 'Brilliance 64',
       'SCT-4800TC', 'AIRIS Vento', 'MAGNETOM Essenza', 'MAGNETOM Avanto',
       'Signa Excite 1.5T', 'Achieva', 'GoldSeal Signa HDxt', 'Ingenia',
       'Intera', 'Genesis Signa', 'AIRIS II', 'MAGNETOM Espree',
       'MagFinder II', 'MAGNETOM Trio', 'Discovery MR750w',
       'Signa profile excite', 'Optima MR430s 1.5T', 'SM160',
       'MAGNETOM Skyra', 'ADC', 'CLASSIC CR', '0862', 'Senographe DS',
       'KODAK DirectView CR 975', 'Lorad Selenia', 'CR 85-X', 'BRESTIGE',
       'CR 75', 'GIOTTO IMAGE MD', 'MAMMOMAT Inspiration',
       'Senographe 2000D', 'KODAK DirectView CR 850',
       'Selenia Dimensions', 'FCR 5000 CR', 'DMX-600', 'RSM 1824C'
- **Tag**
- **AttributeName**
- **Value**

In [None]:
df_dataset= pd.read_parquet(os.path.join(data_dir, '.parquet'))

In [6]:
df_dataset.head()

Unnamed: 0,IOD,study_id,series_id,file_id,Manufacturer,ScannerModel,Tag,AttributeName,Value
0,CT Image IOD,1,279,16597,Siemens Healthineers,SOMATOM Definition AS,"(0008,0005)",SpecificCharacterSet,ISO_IR 100
1,CT Image IOD,1,279,16597,Siemens Healthineers,SOMATOM Definition AS,"(0008,0008)",ImageType,"['ORIGINAL', 'PRIMARY', 'LOCALIZER', 'CT_SOM5 ..."
2,CT Image IOD,1,279,16597,Siemens Healthineers,SOMATOM Definition AS,"(0008,0016)",SOPClassUID,1.2.840.10008.5.1.4.1.1.2
3,CT Image IOD,1,279,16597,Siemens Healthineers,SOMATOM Definition AS,"(0008,0018)",SOPInstanceUID,1.3.12.2.1107.5.1.4.11035.30000015100423574059...
4,CT Image IOD,1,279,16597,Siemens Healthineers,SOMATOM Definition AS,"(0008,0020)",StudyDate,20151005


In [7]:
def table_1(df):
    summary = pd.DataFrame({
        'n_manufacturer': df.groupby('IOD', dropna=False)['Manufacturer'].nunique(),
        'n_scannermodel': df.groupby('IOD', dropna=False)['ScannerModel'].nunique(),
        'n_study_global': df.groupby('IOD', dropna=False)['study_id'].nunique(),
        'n_series_global': df.groupby('IOD', dropna=False)['series_id'].nunique(),
        'n_file_global': df.groupby('IOD', dropna=False)['file_id'].nunique(),
        'n_tag': df.groupby('IOD', dropna=False)['Tag'].size(),
        'n_tag/n_file': df.groupby('IOD', dropna=False)['Tag'].size()/df.groupby('IOD', dropna=False)['file_id'].nunique()
    })
    return summary

table_1(df_dataset)

Unnamed: 0_level_0,n_manufacturer,n_scannermodel,n_study_global,n_series_global,n_file_global,n_tag,n_tag/n_file
IOD,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
CT Image IOD,6,30,155,700,45693,4651308,101.794761
Computed Radiography Image IOD,5,9,163,307,743,60725,81.729475
Digital Mammography X-Ray Image IOD,8,11,68,170,280,45135,161.196429
MR Image IOD,7,18,145,1021,24675,2891907,117.199878


## Reference Set (DICOM Standard 2025c)
- Mandatory Modality-specific Modules

In [8]:
# 2025c standard
df_standard = pd.read_excel(r"..\files\DicomStandardReference_2025c\C2025MandatoryModalityspecificModules_ReferenceSet.xlsx")
print(f"2025c standard: {len(df_standard)} rows with {df_standard['Tag'].nunique()} unique tags")

2025c standard: 216 rows with 164 unique tags


In [9]:
display(df_standard.groupby('IOD')['Tag'].nunique())
display(df_standard.head(1))

IOD
CT Image IOD                           57
Computed Radiography Image IOD         31
Digital Mammography X-Ray Image IOD    72
MR Image IOD                           56
Name: Tag, dtype: int64

Unnamed: 0,IOD,IE,Module,Tag,Attribute Name,Keyword,Type,VR,VM,Attribute Description,Source,Standard Terms,Type_Group
0,Computed Radiography Image IOD,Image,CR_Image,"(0028,0004)",Photometric Interpretation,Photometric​Interpretation,1,CS,1,Specifies the intended interpretation of the p...,Table C.8-2. CR Image Module Attributes,"{'Enumerated Values': ['MONOCHROME1', 'MONOCHR...",Required


## Evaluation

In [10]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import ast
import tabulate
import sys

sys.path.append('Evaluator')
from DicomCodeStandardEvaluator import DicomCodeStandardEvaluator

In [11]:
#df_standard = df_standard.copy()
#df_dataset = df_dataset.copy() 

print(f'Standard: {df_standard.shape[0]} rows with {df_standard['Tag'].nunique()} unique tags')
print(f'Dataset: {df_dataset.shape[0]} rows with {df_dataset['Tag'].nunique()} unique tags')

Standard: 216 rows with 164 unique tags
Dataset: 7649075 rows with 529 unique tags


### Completeness (Study-level)

In [None]:
from DicomCodeStandardEvaluator_withoutVR import DicomCodeStandardEvaluator_withoutVR # Completeness Only (No need for 'VR')
evaluator_2025c = DicomCodeStandardEvaluator_withoutVR(df_dataset, df_standard)

In [None]:
study_rates_2025c, study_stats_2025c = evaluator_2025c.analyze_rates_with_stats(group_cols=['IOD', 'study_id'])

#### study_rates_2025c

In [None]:
# Information Completeness Index (ICI) = tag existence x value existence
study_rates_2025c['ICI'] = study_rates_2025c['tag_existence_rate'] * study_rates_2025c['value_existence_rate']

# Manufacturer, ScannerModel
iod_study_rates_2025c = pd.merge(study_rates_2025c, df_dataset[['IOD', 'study_id', 'Manufacturer', 'ScannerModel']].drop_duplicates(keep='first'), on = ['IOD', 'study_id'], how='left')
iod_study_rates_2025c.head(2)

Unnamed: 0,Tag,Attribute Name,total_files,files_with_tag,files_with_value,tag_existence_rate,value_existence_rate,IOD,study_id,ICI,Manufacturer,ScannerModel
0,"(0008,0008)",Image Type,146,146,146,1.0,1.0,CT Image IOD,1,1.0,Siemens Healthineers,SOMATOM Definition AS
1,"(0018,9361)",Multi-energy CT Acquisition,146,0,0,0.0,0.0,CT Image IOD,1,0.0,Siemens Healthineers,SOMATOM Definition AS


In [None]:
# SAVE
study_rates_2025c.to_excel(os.path.join(output_dir, 'study_rates_2025c.xlsx'), index=False) 
iod_study_rates_2025c.to_excel(os.path.join(output_dir, 'iod_study_rates_2025c.xlsx'), index=False) 

#### study_stats_2025c

In [None]:
print(study_stats_2025c.shape)
study_stats_2025c.head(2)

(432, 11)


Unnamed: 0,IOD,Tag,Attribute Name,Metric,Mean,Std,CV(%),Min,Max,Range,n_groups
0,CT Image IOD,"(0008,0008)",Image Type,tag_existence_rate,1.0,0.0,0.0,1.0,1.0,0.0,155
1,CT Image IOD,"(0008,0008)",Image Type,value_existence_rate,1.0,0.0,0.0,1.0,1.0,0.0,155


In [None]:
# 1. Convert to pivot table (create wide-format columns for each metric and statistic)
study_stats_2025c_wide = study_stats_2025c.pivot_table(
    index=['IOD', 'Tag', 'n_groups'],
    columns='Metric',
    values=['Mean', 'Std']
).reset_index()

# 2. Clean up column names (MultiIndex → single column)
study_stats_2025c_wide.columns = ['_'.join(col).strip('_') for col in study_stats_2025c_wide.columns.values]

# 3. Reorder columns
cols = ['IOD', 'Tag'] + \
       [col for col in study_stats_2025c_wide.columns if 'tag_existence_rate' in col] + \
       [col for col in study_stats_2025c_wide.columns if 'value_existence_rate' in col] + \
       [col for col in study_stats_2025c_wide.columns if 'value_standardization_rate' in col] + \
       [col for col in study_stats_2025c_wide.columns if 'value_diversity' in col] + \
       ['n_groups']
study_stats_2025c_wide = study_stats_2025c_wide[cols]
study_stats_2025c_wide.head(2)

Unnamed: 0,IOD,Tag,Mean_tag_existence_rate,Std_tag_existence_rate,Mean_value_existence_rate,Std_value_existence_rate,n_groups
0,CT Image IOD,"(0008,0008)",1.0,0.0,1.0,0.0,155
1,CT Image IOD,"(0008,2218)",0.0,0.0,0.0,0.0,155


In [None]:
# Merge with df_standard
iod_study_stats_2025c = pd.merge(study_stats_2025c_wide, df_standard, on=['IOD', 'Tag'], how='left')
iod_study_stats_2025c = iod_study_stats_2025c[['IOD', 'Tag', 'Attribute Name', 'Type', 'Type_Group', 
       'Attribute Description', 'Source', 'n_groups', 
       'Mean_tag_existence_rate', 'Std_tag_existence_rate',
       'Mean_value_existence_rate', 'Std_value_existence_rate'
       ]]
iod_study_stats_2025c.head(2)

Unnamed: 0,IOD,Tag,Attribute Name,Type,Type_Group,Attribute Description,Source,n_groups,Mean_tag_existence_rate,Std_tag_existence_rate,Mean_value_existence_rate,Std_value_existence_rate
0,CT Image IOD,"(0008,0008)",Image Type,1,Required,Image identification characteristics. See Sect...,Table C.8-3. CT Image Module Attributes,155,1.0,0.0,1.0,0.0
1,CT Image IOD,"(0008,2218)",Anatomic Region Sequence,3,Optional,Sequence that identifies the anatomic region o...,Table 10-7. General Anatomy Optional Macro Att...,155,0.0,0.0,0.0,0.0


In [None]:
# SAVE
study_stats_2025c.to_excel(os.path.join(output_dir, 'study_stats_2025c.xlsx'), index=False) 
iod_study_stats_2025c.to_excel(os.path.join(output_dir, 'iod_study_stats_2025c.xlsx'), index=False) 