# Understanding Radiological Features

The feature names are combinations of different abbreviations, describing different imaging and measurement types.

There are 4752 different radiological features. To get an understanding of the combinations of abbreviations and types of features, we analyzed the feature names more thoroughly.

Import packages: 

In [1]:
import pandas as pd

Read .csv files:

'UPENN-GBM_CaPTk_radiomic_features_list.csv' is a file from the Cancer Imaging Archive listing all measurement types.

In [2]:
radiological = pd.read_csv("./Data/radFeatures_UPENN.csv", header=0, index_col=0)
feature_measurements = pd.read_csv("./Additional Data/UPENN-GBM_CaPTk_radiomic_features_list.csv", header=0)

Extract all feature names from radiological:

In [3]:
names = pd.DataFrame(data=radiological.head(0).transpose())
names = names.reset_index()
names.columns = ["Original Feature Names"]
names["Original Feature Names"]

0       FLAIR_ED_Intensity_CoefficientOfVariation
1                       FLAIR_ED_Intensity_Energy
2           FLAIR_ED_Intensity_InterQuartileRange
3                     FLAIR_ED_Intensity_Kurtosis
4                      FLAIR_ED_Intensity_Maximum
                          ...                    
4747                    DSC_PH_ED_NGTDM_Coarsness
4748                   DSC_PH_ED_NGTDM_Complexity
4749                     DSC_PH_ED_NGTDM_Contrast
4750                     DSC_PH_ED_NGTDM_Strength
4751           DSC_PH_ED_LBP_Radius-1_Bins-16_LBP
Name: Original Feature Names, Length: 4752, dtype: object

Split the feature names into their's components:

This creates a pd dataframe with the name components per feature.

In [4]:
names_split = names["Original Feature Names"].str.split('_', expand=True)

To look at the unique feature name components we save them in a new dataframe:

In [5]:
unique_name_components = pd.DataFrame()

for col in names_split.columns:
    unique_col_values = pd.DataFrame(data=names_split[col].unique(), columns=[col])
    unique_name_components = pd.concat([unique_name_components, unique_col_values], axis=1)

unique_name_components

Unnamed: 0,0,1,2,3,4,5,6,7
0,FLAIR,ED,Intensity,CoefficientOfVariation,,,,
1,T1,PSR,Histogram,Energy,Bins-16,Bin-0,Frequency,Frequency
2,DSC,TR,Volumetric,InterQuartileRange,Axis-0,Bin-10,Probability,Probability
3,DTI,FA,Morphologic,Kurtosis,Axis-1,Bin-11,Bin-0,
4,T1GD,AD,GLCM,Maximum,Axis-2,Bin-12,Bin-10,
...,...,...,...,...,...,...,...,...
81,,,,,,Bins-16,ZoneSizeVariance,
82,,,,,,Axis-0,LBP,
83,,,,,,Axis-1,,
84,,,,,,Axis-2,,


Based on the information in the paper (https://doi.org/10.1038/s41597-022-01560-7) we see that the first 2 columns of unique_name_components describes the image type and the rest of the name components are part of the measurement types listed in feature_measurements.

We need to analyze what versions of image type per unique measurement type exists.

In [6]:
n_count = []
versions = []

# List with the unique image type name components
str_remove_image = [name + '_' for name in names_split.iloc[:, :2].stack().unique()]

# For every measurement type (144 different)
# Count how many occurences there are in (all 4752) radiolocical features (saved in names)
for idx, feature in enumerate(feature_measurements["FeatureName"]):
    n_count.append(0)
    versions.append([])
    # Go through all radiological features
    for f in names["Original Feature Names"]:
        feature_strip = f
        # Remove the image type components
        for re in str_remove_image:
            feature_strip = feature_strip.replace(re, '')
        # Check if the remaining str is the same as the measurement feature (from feature_measurements)
        if feature_strip == feature:
            n_count[idx] += 1
            versions[idx].append(f)

# Add new columns to the existing feature_measurements DataFrame
feature_measurements['Count'] = n_count

feature_measurements

Unnamed: 0,FeatureName,Count
0,Intensity_CoefficientOfVariation,33
1,Intensity_Energy,33
2,Intensity_InterQuartileRange,33
3,Intensity_Kurtosis,33
4,Intensity_Maximum,33
...,...,...
139,NGTDM_Coarsness,33
140,NGTDM_Complexity,33
141,NGTDM_Contrast,33
142,NGTDM_Strength,33


We see: for each measurement type there are 33 different image types registered.

In versions we previously saved all the radiological feature names based on measurement types in seperate sublists. In each sublist there are 33 image type versions of the same measurement type.

To analyze whether those image type versions are all the same for each measurement and how the image type name components are combined, we further analyze these version names.

In [7]:
# List with the unique measurement type name components
str_remove_measurement = feature_measurements["FeatureName"].to_list()

# idx2 always in same range since n_count was 33 per feature before (all feature names accounted for with 144*33)
for idx in range(len(versions)):
    for idx2 in range(len(versions[idx])):
        # Remove measurement type name components by replacing it with empty str
        versions[idx][idx2] = versions[idx][idx2].replace(str_remove_measurement[idx], '')

# Dataframe with filled with image type compositions
vers_df = pd.DataFrame(versions)

# Create dataframe to save all unique prefix compositions
unique_vers = pd.DataFrame()
for col in vers_df.columns:
    unique_col_values = pd.DataFrame(data=vers_df[col].unique(), columns=[col])
    unique_vers = pd.concat([unique_vers, unique_col_values], axis=1)

unique_vers = unique_vers.transpose()
unique_vers.columns = ["ImageType"]

unique_vers

Unnamed: 0,ImageType
0,FLAIR_ED_
1,T1_ED_
2,DSC_PSR_ED_
3,DTI_TR_ED_
4,DTI_FA_NC_
5,DTI_AD_ET_
6,DSC_ap-rCBV_ET_
7,DTI_TR_ET_
8,DSC_PSR_ET_
9,T1_ET_


Since there are only 33 unique image type versions all compositions per measurement type were the same.

In [8]:
# Save as .csv
feature_measurements.to_csv("./Output/Data/01 Understanding/feature_exploration_measurement.csv", sep=',', index=False, encoding='utf-8')
unique_vers.to_csv("./Output/Data/01 Understanding/feature_exploration_imaging.csv", sep=',', index=False, encoding='utf-8')

# Conclusion

This means we have 33 unique image feature types and 144 measurement feature types.

The combination of each image type and each measurement type makes up the whole range of 4752 unique radiological features.