<h1>Multi-Modal COVID-19 Diagnosis Using Chest X-rays and Patient Metadata</h1>

## Importating Necessary datasets

In [29]:
import pandas as pd
import numpy as np
import os
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from tensorflow.keras.preprocessing.image import ImageDataGenerator, load_img, img_to_array


## Load Metadata

In [30]:
metadata_path = 'covid_cx_dataset/metadata.csv'
metadata = pd.read_csv(metadata_path)

In [31]:
print(metadata.head())

  filename  patient_id     sex   age view     label  pcr_test  survival  \
0  260.jpg          10  female  68.0  NaN  COVID-19  positive       NaN   
1  261.jpg          11    male  47.0  NaN  COVID-19  positive       NaN   
2  262.jpg          11    male  47.0  NaN  COVID-19  positive       NaN   
3  263.jpg          11    male  47.0  NaN  COVID-19  positive       NaN   
4  264.jpg          11    male  47.0  NaN  COVID-19  positive       NaN   

  location  admission_offset  ...  has_cough  has_dyspnea  has_diarrhea  spo2  \
0    Spain               NaN  ...        1.0          NaN           NaN   NaN   
1    Spain               1.0  ...        NaN          NaN           NaN   NaN   
2    Spain               4.0  ...        NaN          NaN           NaN   NaN   
3    Spain               8.0  ...        NaN          NaN           NaN   NaN   
4    Spain              12.0  ...        NaN          NaN           NaN   NaN   

   other_symptoms  medical_background  opacification other is_

## Prepare Image Data
Define a function to preprocess images:

In [32]:
def preprocess_image(filepath):
    img = load_img(filepath, target_size=(224, 224))
    img = img_to_array(img)
    img = img / 255.0
    return img

## Process Image Data
Extract image paths from metadata and preprocess them:

In [33]:
image_folder = 'covid_cx_dataset/covid19/'
metadata['filepath'] = image_folder + metadata['filename'].astype(str)

train_metadata, val_metadata = train_test_split(metadata, test_size=0.2, random_state=42)

train_images = np.array([preprocess_image(filepath) for filepath in train_metadata['filepath']])
val_images = np.array([preprocess_image(filepath) for filepath in val_metadata['filepath']])

## Process Tabular Data
Select relevant features and preprocess them:

In [34]:
# Select numerical and categorical columns
numerical_cols = ['age', 'has_fever', 'has_cough', 'has_dyspnea']
categorical_cols = ['sex']

# Fill missing numerical values with 0 (or other suitable strategies)
X_train_num = train_metadata[numerical_cols].fillna(0)
X_val_num = val_metadata[numerical_cols].fillna(0)

# One-hot encode categorical columns
encoder = OneHotEncoder(sparse=False, handle_unknown='ignore')
X_train_cat = encoder.fit_transform(train_metadata[categorical_cols])
X_val_cat = encoder.transform(val_metadata[categorical_cols])

# Concatenate numerical and encoded categorical data
X_train_tabular = np.concatenate([X_train_num, X_train_cat], axis=1)
X_val_tabular = np.concatenate([X_val_num, X_val_cat], axis=1)

# Standardize numerical features
scaler = StandardScaler()
X_train_tabular_scaled = scaler.fit_transform(X_train_tabular)
X_val_tabular_scaled = scaler.transform(X_val_tabular)




## Combine Image and Tabular Data
Prepare combined input for training (as per your specific model requirements):

In [36]:
y_train = train_metadata['label']
y_val = val_metadata['label']

# Example of combining image and tabular data using tf.data.Dataset or other methods


## Build and Train Model
Build a deep learning model combining image and tabular data:

## Evaluate Model Performance
Evaluate the model on validation data and iterate as needed: