### Workflow
- Input data 
    - NifTI images 
- Preprocessing if needed
- Feature extraction 
    - PyRadiomics
- Model development
    - Neural network?
    - Random forest?

*Note*
- I had to create a virtual environment using conda in order to install pyradiomics which I believe only works on specific versions of python*
- I also had to downgrade numpy to use pyradiomics

# Radiomic Feature Extractor (PyRadiomics)
### Important Notes: 
- This portion of the project is essentially creating our own dataset of features from which to perform ML classification tasks
- Features extracted
    - The current baseline for this is using the default settings on Pyradiomics feature extractor https://pyradiomic.readthedocs.io/en/latest/features.html 
        - First order statistics (19 features)
        - Shap-based 3D (16 features)
        - Shape-based 2D (10 features)
        - Gray level Co-occurence Matrix (24 features)
        - Gray Level Size Zone Matrix (16 features)
        - Gray Level Run Length Matrix (16 features)
        - Neighbouring Gray Tone Difference Matrix (5 features)
        - Gray level Dependence Matrix (14 features)
    - *Moving forward there is a lot of room to better understand this and the customization of it*
    - Customization: Choose the best features to craft a dataset optimized for our images/task
### Runtime Note:
*Using the default feature extractor this is a very long runtime. In theory it should only have to be done once and the features can be imported to a pandas dataframe*
- Run time ~ 2 hours

In [None]:
#Need to pair the images with the masks to be used in the feature extractor
def pair(image_dir, mask_dir):
    image_files = [f for f in os.listdir(image_dir) if f.endswith('.nii.gz')]
    mask_files = [f for f in os.listdir(mask_dir) if f.endswith('.nii.gz')]
    
    image_mask_pairs = {}
    
    for image_file in image_files:
        if image_file in mask_files:
            image_mask_pairs[image_file] = image_file  # Both the image and mask have the same name
    
    return image_mask_pairs

train_pairs_T1 = pair(train_images_T1, train_masks_T1)
valid_pairs_T1 = pair(valid_images_T1, valid_masks_T1)
test_pairs_T1 = pair(test_images_T1, test_masks_T1)

train_pairs_T2 = pair(train_images_T2, train_masks_T2)
valid_pairs_T2 = pair(valid_images_T2, valid_masks_T2)
test_pairs_T2 = pair(test_images_T2, test_masks_T2)

# Verify pairs
print("T1 Train Pairs:", train_pairs_T1)
print("T1 Valid Pairs:", valid_pairs_T1)
print("T1 Test Pairs:", test_pairs_T1)
len(train_pairs_T1)
len(valid_pairs_T1)
len(test_pairs_T1)


In [None]:
import pandas as pd

# Initialize the feature extractor
extractor = featureextractor.RadiomicsFeatureExtractor()

def extract_features_for_pairs(image_mask_pairs, image_dir, mask_dir):
    all_features = []
    
    for image_file, mask_file in image_mask_pairs.items():
        # Load the image and mask using SimpleITK
        image_path = os.path.join(image_dir, image_file)
        mask_path = os.path.join(mask_dir, mask_file)

        image = sitk.ReadImage(image_path)
        mask = sitk.ReadImage(mask_path)
        
        # Extract features using Pyradiomics
        features = extractor.execute(image, mask)
        
        # Convert features to a dictionary and add to the list
        feature_dict = {key: value for key, value in features.items()}
        all_features.append(feature_dict)
    
    return all_features

# Extract features for training, validation, and test sets for T1
train_features_T1 = extract_features_for_pairs(train_pairs_T1, train_images_T1, train_masks_T1)
valid_features_T1 = extract_features_for_pairs(valid_pairs_T1, valid_images_T1, valid_masks_T1)
test_features_T1 = extract_features_for_pairs(test_pairs_T1, test_images_T1, test_masks_T1)

# Extract features for T2 images
train_features_T2 = extract_features_for_pairs(train_pairs_T2, train_images_T2, train_masks_T2)
valid_features_T2 = extract_features_for_pairs(valid_pairs_T2, valid_images_T2, valid_masks_T2)
test_features_T2 = extract_features_for_pairs(test_pairs_T2, test_images_T2, test_masks_T2)

In [None]:
#Pass it to pandas 
df_train_T1 = pd.DataFrame(train_features_T1)
df_valid_T1 = pd.DataFrame(valid_features_T1)
df_test_T1 = pd.DataFrame(test_features_T1)

df_train_T2 = pd.DataFrame(train_features_T2)
df_valid_T2 = pd.DataFrame(valid_features_T2)
df_test_T2 = pd.DataFrame(test_features_T2)

# Save the DataFrames to CSV files
df_train_T1.to_csv('train_features_T1.csv', index=False)
df_valid_T1.to_csv('valid_features_T1.csv', index=False)
df_test_T1.to_csv('test_features_T1.csv', index=False)

df_train_T2.to_csv('train_features_T2.csv', index=False)
df_valid_T2.to_csv('valid_features_T2.csv', index=False)
df_test_T2.to_csv('test_features_T2.csv', index=False)

#These are now the datasets for which we can do ML with