# Genetic Trait Predictor and Family Tree Visualizer

This notebook demonstrates a proof-of-concept implementation of the core machine learning components for the Genesis project. We'll walk through the entire pipeline from data generation/collection to trait prediction and visualization.

## Project Overview

The Genesis project aims to develop an AI-driven system capable of predicting human phenotypic traits based on genomic data, specifically Single Nucleotide Polymorphisms (SNPs). Key components demonstrated in this notebook:

1. Genomic data preprocessing
2. Feature engineering for SNPs
3. Machine learning model development
4. Explainable AI techniques
5. Basic family tree visualization

Let's begin by importing the necessary libraries and setting up our environment.

In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# Machine learning libraries
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, roc_curve, auc
from sklearn.feature_selection import SelectKBest, chi2, f_classif

# Deep learning libraries
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping

# For explainable AI
import shap

# Visualization
import networkx as nx
import plotly.express as px
import plotly.graph_objects as go

# Set random seed for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

# Configure plot style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("colorblind")

print("Libraries imported successfully!")

## 1. Data Collection and Preprocessing

For demonstration purposes, we'll simulate a dataset of SNPs and phenotypic traits. In a real-world application, you would:

1. Download SNP data from sources like NCBI dbSNP, 1000 Genomes Project, or UK Biobank
2. Associate SNPs with known phenotypic traits from literature or existing studies
3. Create a feature matrix of genetic markers and target traits

Let's generate synthetic data to represent this:

In [None]:
# Function to simulate SNP data
def generate_snp_data(n_samples=1000, n_snps=200, trait_snps=20):
    """
    Generate synthetic SNP data and trait information
    
    Parameters:
    - n_samples: Number of individuals
    - n_snps: Total number of SNPs
    - trait_snps: Number of SNPs that actually influence the trait
    
    Returns:
    - X: SNP data (0, 1, 2 representing genotypes)
    - y_eye_color: Eye color (0=blue, 1=green, 2=brown)
    - y_height: Height in cm (continuous)
    - y_disease_risk: Disease risk (0=low, 1=high)
    """
    # Generate SNP data: 0, 1, 2 (homozygous reference, heterozygous, homozygous alternate)
    X = np.random.choice([0, 1, 2], size=(n_samples, n_snps), p=[0.25, 0.5, 0.25])
    
    # Select subset of SNPs that influence traits
    causal_snps_indices = np.random.choice(n_snps, trait_snps, replace=False)
    
    # Eye color: influenced by multiple SNPs (categorical)
    # Create effect sizes for SNPs affecting eye color
    eye_color_effects = np.zeros(n_snps)
    eye_color_effects[causal_snps_indices[:5]] = [0.5, 0.3, 0.2, 0.4, 0.6]  # First 5 SNPs affect eye color
    
    # Calculate genetic component for eye color
    genetic_component = X @ eye_color_effects
    
    # Convert to categorical eye color
    thresholds = np.percentile(genetic_component, [33, 66])
    y_eye_color = np.digitize(genetic_component, thresholds)
    
    # Height: influenced by many SNPs (continuous)
    height_effects = np.zeros(n_snps)
    height_effects[causal_snps_indices[5:15]] = np.random.normal(0, 1, 10)  # Next 10 SNPs affect height
    
    # Base height + genetic component + random environmental noise
    base_height = 165  # cm
    genetic_height = X @ height_effects
    environmental_effect = np.random.normal(0, 5, n_samples)  # environmental noise
    y_height = base_height + 10 * genetic_height + environmental_effect
    
    # Disease risk: binary trait
    disease_effects = np.zeros(n_snps)
    disease_effects[causal_snps_indices[15:]] = np.random.uniform(0.1, 0.5, 5)  # Last 5 SNPs affect disease risk
    
    # Calculate risk score
    risk_score = X @ disease_effects
    y_disease_risk = (risk_score > np.median(risk_score)).astype(int)
    
    return X, y_eye_color, y_height, y_disease_risk

# Generate data
X, y_eye_color, y_height, y_disease_risk = generate_snp_data(n_samples=1000, n_snps=200, trait_snps=20)

# Create SNP column names (rs numbers are common SNP identifiers)
snp_columns = [f"rs{i+100}" for i in range(X.shape[1])]

# Create DataFrame for SNP data
snp_df = pd.DataFrame(X, columns=snp_columns)

# Add phenotype data
phenotype_df = pd.DataFrame({
    'eye_color': y_eye_color,
    'height': y_height,
    'disease_risk': y_disease_risk
})

# Map eye color codes to actual colors for better interpretability
eye_color_map = {0: 'Blue', 1: 'Green', 2: 'Brown'}
phenotype_df['eye_color_label'] = phenotype_df['eye_color'].map(eye_color_map)

# Display the first few rows of our data
print("SNP data (first 5 rows, first 10 SNPs):")
print(snp_df.iloc[:5, :10])
print("\nPhenotype data (first 5 rows):")
print(phenotype_df.head())

# Check for missing values
print(f"\nMissing values in SNP data: {snp_df.isna().sum().sum()}")
print(f"Missing values in phenotype data: {phenotype_df.isna().sum().sum()}")

# Basic statistics of the phenotype data
print("\nPhenotype statistics:")
print(phenotype_df.describe())

In [None]:
# Let's simulate missing data and preprocessing steps as would be typical in real genomic datasets

# 1. Introduce some missing values (NaN) to SNP data
missing_mask = np.random.random(X.shape) < 0.05  # 5% missing values
X_missing = X.copy().astype(float)
X_missing[missing_mask] = np.nan

# Create DataFrame with missing values
snp_df_missing = pd.DataFrame(X_missing, columns=snp_columns)

print(f"SNP data with missing values (5%):")
print(snp_df_missing.iloc[:5, :10])
print(f"\nMissing values per SNP (first 10): {snp_df_missing.iloc[:, :10].isna().sum()}")

# 2. Preprocessing: Handle missing values

# Common strategies for SNP missing values:
# a. Impute with most common genotype
snp_df_imputed_mode = snp_df_missing.fillna(snp_df_missing.mode().iloc[0])

# b. Impute with mean (can be useful for PCA or statistical methods)
snp_df_imputed_mean = snp_df_missing.fillna(snp_df_missing.mean())

# Let's use the mode imputation for this analysis (most common in genetics)
snp_df_processed = snp_df_imputed_mode.copy()

# Check that missing values have been handled
print(f"\nMissing values after imputation: {snp_df_processed.isna().sum().sum()}")

# 3. Quality control: Remove SNPs with low variation (would be non-informative)
# Calculate minor allele frequency (MAF)
maf = snp_df_processed.mean() / 2  # divide by 2 because values are 0, 1, or 2
maf = np.minimum(maf, 1 - maf)  # MAF is the minimum of allele frequency and its complement

# Filter SNPs with very low MAF (< 1%)
low_maf_snps = maf[maf < 0.01].index.tolist()
print(f"\nRemoved {len(low_maf_snps)} SNPs with MAF < 1%")

if low_maf_snps:
    snp_df_processed = snp_df_processed.drop(columns=low_maf_snps)

# Display final processed data
print(f"\nFinal processed SNP data shape: {snp_df_processed.shape}")
print(snp_df_processed.iloc[:5, :10])

## 2. Exploratory Data Analysis

Now that we have our dataset, let's explore it to understand the distribution of SNP values and their relationships with phenotypic traits. This step is crucial to identify patterns and correlations that could inform our feature selection and model development.

In [None]:
# 1. Examine the distribution of eye colors in our dataset
plt.figure(figsize=(10, 6))
ax = sns.countplot(x='eye_color_label', data=phenotype_df, palette=['blue', 'green', 'saddlebrown'])
plt.title('Distribution of Eye Colors', fontsize=16)
plt.xlabel('Eye Color', fontsize=14)
plt.ylabel('Count', fontsize=14)

# Add count labels
for p in ax.patches:
    ax.annotate(f'{int(p.get_height())}', 
                (p.get_x() + p.get_width()/2., p.get_height()), 
                ha='center', va='bottom', fontsize=12)
plt.tight_layout()
plt.show()

# 2. Visualize the height distribution
plt.figure(figsize=(12, 6))
sns.histplot(data=phenotype_df, x='height', hue='eye_color_label', kde=True, 
             palette=['blue', 'green', 'saddlebrown'])
plt.title('Height Distribution by Eye Color', fontsize=16)
plt.xlabel('Height (cm)', fontsize=14)
plt.ylabel('Count', fontsize=14)
plt.axvline(x=phenotype_df['height'].mean(), color='red', linestyle='--', 
            label=f'Mean: {phenotype_df["height"].mean():.2f} cm')
plt.legend(title='Eye Color')
plt.tight_layout()
plt.show()

# 3. Examine disease risk distribution by eye color
plt.figure(figsize=(10, 6))
ax = sns.countplot(x='eye_color_label', hue='disease_risk', data=phenotype_df, 
                  palette=['lightgreen', 'salmon'])
plt.title('Disease Risk Distribution by Eye Color', fontsize=16)
plt.xlabel('Eye Color', fontsize=14)
plt.ylabel('Count', fontsize=14)
plt.legend(title='Disease Risk', labels=['Low', 'High'])

# Add percentage labels
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x()+p.get_width()/2.,
            height + 5,
            '{:.1f}%'.format(100 * height/len(phenotype_df)),
            ha="center", fontsize=10) 
plt.tight_layout()
plt.show()

# 4. Correlation between numeric features (SNPs) and height
# Select a subset of SNPs for visualization clarity
selected_snps = np.random.choice(snp_df_processed.columns, 10, replace=False)

# Create correlation matrix
corr_data = pd.concat([snp_df_processed[selected_snps], phenotype_df[['height']]], axis=1)
corr_matrix = corr_data.corr()

# Plot heatmap
plt.figure(figsize=(12, 10))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)
plt.title('Correlation between Selected SNPs and Height', fontsize=16)
plt.tight_layout()
plt.show()

# 5. PCA to visualize overall genetic structure
from sklearn.decomposition import PCA

# Apply PCA to SNP data
pca = PCA(n_components=2)
pca_result = pca.fit_transform(snp_df_processed)

# Create PCA plot with eye color as the color
plt.figure(figsize=(12, 8))
scatter = plt.scatter(pca_result[:, 0], pca_result[:, 1], 
                      c=phenotype_df['eye_color'], cmap='viridis', alpha=0.6)
plt.colorbar(scatter, label='Eye Color (0:Blue, 1:Green, 2:Brown)')
plt.title('PCA of Genetic Data Colored by Eye Color', fontsize=16)
plt.xlabel(f'Principal Component 1 ({pca.explained_variance_ratio_[0]:.2%} variance)', fontsize=14)
plt.ylabel(f'Principal Component 2 ({pca.explained_variance_ratio_[1]:.2%} variance)', fontsize=14)
plt.tight_layout()
plt.show()

# Summary of findings
print("\nExploratory Data Analysis Findings:")
print("-" * 40)
print(f"1. Eye color distribution: {phenotype_df['eye_color_label'].value_counts().to_dict()}")
print(f"2. Average height: {phenotype_df['height'].mean():.2f} cm (std: {phenotype_df['height'].std():.2f} cm)")
print(f"3. Disease risk: {phenotype_df['disease_risk'].mean()*100:.1f}% of individuals have high risk")
print(f"4. PCA explained variance: PC1 = {pca.explained_variance_ratio_[0]:.2%}, PC2 = {pca.explained_variance_ratio_[1]:.2%}")

# Check if we can see any patterns in the PCA
eye_color_counts = phenotype_df.groupby(['eye_color']).size()
print(f"5. Potential genetic clustering: {eye_color_counts.index.size} distinct eye color groups identified")

## 3. Feature Engineering for SNP Data

In genomic studies, not all SNPs are equally informative for predicting traits. Feature engineering and selection help identify the most relevant genetic markers, reducing dimensionality and improving model performance.

Let's implement some common techniques for SNP feature selection:

In [None]:
# 1. Feature selection for eye color prediction using chi-squared test
# Chi-squared test is appropriate for categorical outcomes like eye color

# Convert SNP data to numpy array for scikit-learn
X_array = snp_df_processed.values
y_eye_color_array = phenotype_df['eye_color'].values

# Apply chi-squared feature selection
chi2_selector = SelectKBest(chi2, k=20)  # Select top 20 features
X_chi2 = chi2_selector.fit_transform(X_array, y_eye_color_array)

# Get selected feature indices and p-values
selected_indices = chi2_selector.get_support(indices=True)
selected_features_chi2 = [snp_df_processed.columns[i] for i in selected_indices]
scores = chi2_selector.scores_
pvalues = chi2_selector.pvalues_

# Create a DataFrame to display results
chi2_results = pd.DataFrame({
    'SNP': snp_df_processed.columns,
    'Chi2 Score': scores,
    'p-value': pvalues
}).sort_values(by='Chi2 Score', ascending=False).head(20)

print("Top 20 SNPs for eye color based on Chi-squared test:")
print(chi2_results)

# 2. Feature selection for height prediction using F-test (ANOVA)
# F-test is appropriate for continuous outcomes like height

y_height_array = phenotype_df['height'].values

# Apply F-test feature selection
f_selector = SelectKBest(f_classif, k=20)
X_f = f_selector.fit_transform(X_array, y_height_array)

# Get selected feature indices and p-values
selected_indices_f = f_selector.get_support(indices=True)
selected_features_f = [snp_df_processed.columns[i] for i in selected_indices_f]
scores_f = f_selector.scores_
pvalues_f = f_selector.pvalues_

# Create a DataFrame to display results
f_results = pd.DataFrame({
    'SNP': snp_df_processed.columns,
    'F Score': scores_f,
    'p-value': pvalues_f
}).sort_values(by='F Score', ascending=False).head(20)

print("\nTop 20 SNPs for height prediction based on F-test:")
print(f_results)

# 3. Visualize the top 5 SNPs for eye color prediction
plt.figure(figsize=(15, 8))

for i, snp in enumerate(selected_features_chi2[:5]):
    plt.subplot(1, 5, i+1)
    
    # Create crosstab of SNP values vs eye color
    cross_tab = pd.crosstab(snp_df_processed[snp], phenotype_df['eye_color_label'])
    cross_tab_pct = cross_tab.div(cross_tab.sum(axis=1), axis=0) * 100
    
    # Plot stacked bar chart
    cross_tab_pct.plot(kind='bar', stacked=True, ax=plt.gca(), 
                       color=['skyblue', 'lightgreen', 'sandybrown'])
    
    plt.title(f'SNP {snp}\nChi2: {scores[selected_indices[i]]:.1f}', fontsize=12)
    plt.xlabel('Genotype (0/1/2)')
    plt.ylabel('Percentage %')
    plt.xticks(rotation=0)
    if i == 0:
        plt.legend(title='Eye Color')
    else:
        plt.legend([]).set_visible(False)
        
plt.suptitle('Association between Top 5 SNPs and Eye Color', fontsize=16)
plt.tight_layout()
plt.subplots_adjust(top=0.85)
plt.show()

# 4. Create a feature importance plot for height prediction
plt.figure(figsize=(12, 6))
top_10_indices = f_results['F Score'].sort_values(ascending=False).index[:10]
top_10_snps = f_results.loc[top_10_indices, 'SNP'].values
top_10_scores = f_results.loc[top_10_indices, 'F Score'].values

sns.barplot(x=top_10_scores, y=top_10_snps, palette='viridis')
plt.title('Top 10 SNPs for Height Prediction (F-test)', fontsize=16)
plt.xlabel('F Score', fontsize=14)
plt.ylabel('SNP ID', fontsize=14)
plt.tight_layout()
plt.show()

# 5. Identify overlap between SNPs predictive of different traits
# This is interesting from a biological perspective (pleiotropy)
common_snps = set(selected_features_chi2).intersection(set(selected_features_f))

print(f"\nNumber of SNPs shared between eye color and height prediction: {len(common_snps)}")
if common_snps:
    print(f"Common SNPs: {', '.join(common_snps)}")
    
    # Check if these common SNPs are also predictive of disease risk
    # For this, we would do feature selection for disease risk too
    y_disease_array = phenotype_df['disease_risk'].values
    chi2_disease = SelectKBest(chi2, k=20)
    chi2_disease.fit(X_array, y_disease_array)
    
    selected_indices_disease = chi2_disease.get_support(indices=True)
    selected_features_disease = [snp_df_processed.columns[i] for i in selected_indices_disease]
    
    common_all = set(common_snps).intersection(set(selected_features_disease))
    print(f"\nSNPs predictive of all three traits (eye color, height, disease risk): {len(common_all)}")
    if common_all:
        print(f"Common SNPs across all traits: {', '.join(common_all)}")

# 6. Create feature datasets for our models
X_eye_color = snp_df_processed[selected_features_chi2].values
X_height = snp_df_processed[selected_features_f].values
X_disease = snp_df_processed[selected_features_disease].values

print("\nSelected feature sets ready for modeling:")
print(f"Eye color prediction: {X_eye_color.shape[1]} features")
print(f"Height prediction: {X_height.shape[1]} features")
print(f"Disease risk prediction: {X_disease.shape[1]} features")

## 4. Machine Learning Model Development

Now that we've preprocessed our data and selected relevant features, let's develop and train machine learning models for trait prediction. We'll implement:

1. Random Forest Classifier for eye color prediction
2. Neural Network for disease risk prediction

We'll use cross-validation to evaluate the models and tune hyperparameters to optimize performance.

In [None]:
# First, let's split our data into training and testing sets for each task

# 1. Eye color prediction (classification task)
X_train_eye, X_test_eye, y_train_eye, y_test_eye = train_test_split(
    X_eye_color, y_eye_color_array, test_size=0.2, random_state=42, stratify=y_eye_color_array
)

# 2. Disease risk prediction (binary classification)
X_train_disease, X_test_disease, y_train_disease, y_test_disease = train_test_split(
    X_disease, y_disease_risk, test_size=0.2, random_state=42, stratify=y_disease_risk
)

print("Data split complete:")
print(f"Eye color training set: {X_train_eye.shape}")
print(f"Eye color testing set: {X_test_eye.shape}")
print(f"Disease risk training set: {X_train_disease.shape}")
print(f"Disease risk testing set: {X_test_disease.shape}")

# Let's implement Random Forest for eye color prediction
rf_model = RandomForestClassifier(
    n_estimators=100,
    max_depth=10,
    min_samples_split=5,
    random_state=42,
    n_jobs=-1  # Use all cores
)

# Train the model
rf_model.fit(X_train_eye, y_train_eye)

# Evaluate on test set
y_pred_eye = rf_model.predict(X_test_eye)
accuracy = accuracy_score(y_test_eye, y_pred_eye)
print(f"\nRandom Forest accuracy for eye color prediction: {accuracy:.4f}")

# Detailed classification report
print("\nClassification Report (Eye Color):")
print(classification_report(y_test_eye, y_pred_eye, target_names=['Blue', 'Green', 'Brown']))

# Confusion matrix
plt.figure(figsize=(10, 8))
cm = confusion_matrix(y_test_eye, y_pred_eye)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=['Blue', 'Green', 'Brown'],
            yticklabels=['Blue', 'Green', 'Brown'])
plt.title('Confusion Matrix - Eye Color Prediction', fontsize=16)
plt.ylabel('True Label', fontsize=14)
plt.xlabel('Predicted Label', fontsize=14)
plt.tight_layout()
plt.show()

# Feature importance from Random Forest
plt.figure(figsize=(12, 6))
feature_importance = pd.DataFrame({
    'SNP': selected_features_chi2,
    'Importance': rf_model.feature_importances_
}).sort_values('Importance', ascending=False)

sns.barplot(x='Importance', y='SNP', data=feature_importance.head(10), palette='viridis')
plt.title('Top 10 SNPs for Eye Color Prediction (Random Forest)', fontsize=16)
plt.xlabel('Feature Importance', fontsize=14)
plt.ylabel('SNP ID', fontsize=14)
plt.tight_layout()
plt.show()

# Now, let's implement a Neural Network for disease risk prediction
# Scale features for neural network
scaler = StandardScaler()
X_train_disease_scaled = scaler.fit_transform(X_train_disease)
X_test_disease_scaled = scaler.transform(X_test_disease)

# Build the model
model = Sequential([
    Dense(64, activation='relu', input_shape=(X_train_disease.shape[1],)),
    Dropout(0.3),
    Dense(32, activation='relu'),
    Dropout(0.2),
    Dense(16, activation='relu'),
    Dense(1, activation='sigmoid')
])

# Compile the model
model.compile(
    optimizer=Adam(learning_rate=0.001),
    loss='binary_crossentropy',
    metrics=['accuracy']
)

# Set up early stopping
early_stop = EarlyStopping(
    monitor='val_loss',
    patience=10,
    restore_best_weights=True,
    verbose=1
)

# Train the model
history = model.fit(
    X_train_disease_scaled,
    y_train_disease,
    epochs=100,
    batch_size=32,
    validation_split=0.2,
    callbacks=[early_stop],
    verbose=1
)

# Evaluate on test set
y_pred_prob = model.predict(X_test_disease_scaled)
y_pred_disease = (y_pred_prob > 0.5).astype(int).flatten()
accuracy = accuracy_score(y_test_disease, y_pred_disease)
print(f"\nNeural Network accuracy for disease risk prediction: {accuracy:.4f}")

# Detailed classification report
print("\nClassification Report (Disease Risk):")
print(classification_report(y_test_disease, y_pred_disease, target_names=['Low Risk', 'High Risk']))

# Plot training history
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy', fontsize=14)
plt.ylabel('Accuracy', fontsize=12)
plt.xlabel('Epoch', fontsize=12)
plt.legend(['Train', 'Validation'], loc='lower right')

plt.subplot(1, 2, 2)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss', fontsize=14)
plt.ylabel('Loss', fontsize=12)
plt.xlabel('Epoch', fontsize=12)
plt.legend(['Train', 'Validation'], loc='upper right')
plt.tight_layout()
plt.show()

# ROC curve for disease risk prediction
y_pred_prob = model.predict(X_test_disease_scaled).flatten()
fpr, tpr, thresholds = roc_curve(y_test_disease, y_pred_prob)
roc_auc = auc(fpr, tpr)

plt.figure(figsize=(10, 8))
plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC curve (AUC = {roc_auc:.3f})')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate', fontsize=14)
plt.ylabel('True Positive Rate', fontsize=14)
plt.title('ROC Curve - Disease Risk Prediction', fontsize=16)
plt.legend(loc="lower right")
plt.grid(True, linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()

# Model performance summary
print("\nModel Performance Summary:")
print("-" * 50)
print(f"Eye Color Prediction (Random Forest): Accuracy = {accuracy_score(y_test_eye, y_pred_eye):.4f}")
print(f"Disease Risk Prediction (Neural Network): Accuracy = {accuracy_score(y_test_disease, y_pred_disease):.4f}, AUC = {roc_auc:.4f}")

# Store the trained models for later use
models = {
    'eye_color': {
        'model': rf_model,
        'features': selected_features_chi2,
        'accuracy': accuracy_score(y_test_eye, y_pred_eye)
    },
    'disease_risk': {
        'model': model,
        'features': selected_features_disease,
        'scaler': scaler,
        'accuracy': accuracy_score(y_test_disease, y_pred_disease),
        'auc': roc_auc
    }
}

## 5. Explainable AI Integration

Explainable AI (XAI) techniques are crucial in genetic trait prediction to provide transparency and interpretability. This is especially important in healthcare applications where understanding the "why" behind predictions is as important as the predictions themselves.

We'll implement SHAP (SHapley Additive exPlanations) values to explain our model predictions:

In [None]:
# Using SHAP to explain Random Forest predictions for eye color
# Create a SHAP explainer for the Random Forest model
explainer = shap.TreeExplainer(rf_model)

# Calculate SHAP values for a subset of test samples (for speed)
sample_size = min(100, X_test_eye.shape[0])
sample_indices = np.random.choice(X_test_eye.shape[0], sample_size, replace=False)
X_sample = X_test_eye[sample_indices]
shap_values = explainer.shap_values(X_sample)

print(f"SHAP values calculated for {sample_size} test samples")
print(f"Number of classes: {len(shap_values)}")

# Plot SHAP summary plots for each class
plt.figure(figsize=(12, 8))
plt.suptitle('SHAP Summary Plot for Eye Color Prediction', fontsize=16)

eye_color_classes = ['Blue', 'Green', 'Brown']
for i, color_name in enumerate(eye_color_classes):
    plt.subplot(1, 3, i+1)
    shap.summary_plot(
        shap_values[i], 
        X_sample, 
        feature_names=selected_features_chi2, 
        plot_size=(12, 4),
        show=False,
        plot_type='bar'
    )
    plt.title(f'Class: {color_name}', fontsize=14)
    
plt.tight_layout()
plt.subplots_adjust(top=0.85)
plt.show()

# Individual explanation plot for a sample individual
plt.figure(figsize=(14, 8))
individual_idx = 0  # First individual in the sample
plt.suptitle(f'Individual Explanation for Sample Person with Predicted Eye Color: {eye_color_classes[rf_model.predict([X_sample[individual_idx]])[0]]}', fontsize=16)

for i, color_name in enumerate(eye_color_classes):
    plt.subplot(1, 3, i+1)
    shap.force_plot(
        explainer.expected_value[i],
        shap_values[i][individual_idx],
        X_sample[individual_idx],
        feature_names=selected_features_chi2,
        matplotlib=True,
        show=False
    )
    plt.title(f'SHAP values for {color_name} prediction', fontsize=14)

plt.tight_layout()
plt.subplots_adjust(top=0.85)
plt.show()

# Create a function to explain predictions for new individuals
def explain_eye_color_prediction(snp_values, model=rf_model, explainer=explainer, feature_names=selected_features_chi2):
    """
    Explain eye color prediction for a new individual
    
    Parameters:
    - snp_values: SNP values for the individual (numpy array)
    - model: Trained Random Forest model
    - explainer: SHAP explainer
    - feature_names: Names of the SNP features
    
    Returns:
    - prediction: Predicted eye color class
    - explanation: SHAP values for the prediction
    """
    # Make prediction
    prediction = model.predict([snp_values])[0]
    prediction_proba = model.predict_proba([snp_values])[0]
    
    # Get SHAP values
    shap_values = explainer.shap_values([snp_values])[prediction]
    
    # Create a dictionary with feature names and their SHAP values
    explanation = dict(zip(feature_names, shap_values[0]))
    
    # Sort by absolute SHAP value
    explanation = {k: v for k, v in sorted(explanation.items(), key=lambda item: abs(item[1]), reverse=True)}
    
    return {
        'prediction': eye_color_classes[prediction],
        'confidence': prediction_proba[prediction],
        'probabilities': dict(zip(eye_color_classes, prediction_proba)),
        'top_factors': list(explanation.items())[:5]  # Top 5 contributing factors
    }

# Test the explanation function with a sample individual
sample_person = X_test_eye[0]
explanation = explain_eye_color_prediction(sample_person)

print("\nSample Individual Prediction Explanation:")
print("-" * 50)
print(f"Predicted eye color: {explanation['prediction']} (confidence: {explanation['confidence']:.2f})")
print("\nClass probabilities:")
for color, prob in explanation['probabilities'].items():
    print(f"  - {color}: {prob:.4f}")
    
print("\nTop 5 contributing SNPs:")
for snp, value in explanation['top_factors']:
    impact = "increases" if value > 0 else "decreases"
    print(f"  - SNP {snp}: {impact} probability by {abs(value):.4f}")

# Function to generate a visual explanation
def plot_individual_explanation(snp_values, model=rf_model, explainer=explainer, feature_names=selected_features_chi2):
    """Generate a visual explanation for an individual's prediction"""
    prediction = model.predict([snp_values])[0]
    shap_values = explainer.shap_values([snp_values])
    
    plt.figure(figsize=(10, 6))
    shap.force_plot(
        explainer.expected_value[prediction],
        shap_values[prediction][0],
        snp_values,
        feature_names=feature_names,
        matplotlib=True,
        show=False
    )
    plt.title(f'Explanation for {eye_color_classes[prediction]} Eye Color Prediction', fontsize=16)
    plt.tight_layout()
    plt.show()
    
    return eye_color_classes[prediction]

# Demonstrate the visual explanation for our sample person
predicted_class = plot_individual_explanation(sample_person)
print(f"\nVisual explanation generated for sample individual with predicted eye color: {predicted_class}")

# Let's also explain Neural Network predictions for disease risk
# (Note: SHAP KernelExplainer is slower but works with any model)
# For brevity, we'll use a small sample size

# Select a small subset for demonstration
nn_sample_size = 20
nn_sample_indices = np.random.choice(X_test_disease_scaled.shape[0], nn_sample_size, replace=False)
X_nn_sample = X_test_disease_scaled[nn_sample_indices]

# Create a SHAP explainer for the Neural Network
nn_explainer = shap.KernelExplainer(
    model=lambda x: model.predict(x), 
    data=X_train_disease_scaled[:50]  # Background data for approximation
)

# Calculate SHAP values (this can be slow)
nn_shap_values = nn_explainer.shap_values(X_nn_sample)

print("\nCalculated SHAP values for Neural Network disease prediction")

# Plot SHAP summary plot
plt.figure(figsize=(12, 6))
shap.summary_plot(
    nn_shap_values, 
    X_nn_sample,
    feature_names=selected_features_disease,
    plot_type="bar",
    show=False
)
plt.title('SHAP Feature Importance for Disease Risk Prediction', fontsize=16)
plt.tight_layout()
plt.show()

# Define a function for explaining disease risk predictions
def explain_disease_prediction(snp_values, nn_model=model, scaler=scaler, explainer=nn_explainer, feature_names=selected_features_disease):
    """
    Explain disease risk prediction for a new individual
    
    Parameters:
    - snp_values: SNP values for the individual (numpy array)
    - nn_model: Trained Neural Network model
    - scaler: Feature scaler used during training
    - explainer: SHAP explainer
    - feature_names: Names of the SNP features
    
    Returns:
    - Dictionary with prediction and explanation
    """
    # Scale the input
    scaled_values = scaler.transform([snp_values])[0]
    
    # Make prediction
    prediction_prob = nn_model.predict([scaled_values.reshape(1, -1)])[0][0]
    prediction = 'High Risk' if prediction_prob > 0.5 else 'Low Risk'
    
    # Get SHAP values (this can be slow)
    shap_values = explainer.shap_values(scaled_values)
    
    # Create a dictionary with feature names and their SHAP values
    explanation = dict(zip(feature_names, shap_values))
    
    # Sort by absolute SHAP value
    explanation = {k: v for k, v in sorted(explanation.items(), key=lambda item: abs(item[1]), reverse=True)}
    
    return {
        'prediction': prediction,
        'probability': float(prediction_prob),
        'top_factors': list(explanation.items())[:5]  # Top 5 contributing factors
    }

# For demonstration, we'll just explain the first sample
print("\nDisease risk explanation functionality is ready to use")
print("Note: For real applications, this would be integrated into the web interface")

## 6. Family Tree Visualization

An important component of our application is the family tree visualizer, which helps users understand trait inheritance patterns across generations. Let's implement a basic family tree visualization using NetworkX and Matplotlib that could later be enhanced with D3.js in the web application.

In [None]:
# Create a sample family tree with genetic traits
# First, let's define a simple family structure over 3 generations

class FamilyMember:
    def __init__(self, id, name, gender, generation, eye_color=None, height=None, disease_risk=None, genotype=None):
        self.id = id
        self.name = name
        self.gender = gender  # 'M' or 'F'
        self.generation = generation  # 0, 1, 2 (grandparents, parents, children)
        self.eye_color = eye_color
        self.height = height
        self.disease_risk = disease_risk
        self.genotype = genotype if genotype is not None else {}  # SNP dictionary

# Create a sample family with three generations
family_members = [
    # Generation 0 (Grandparents)
    FamilyMember(1, "Grandfather 1", "M", 0, "Brown", 178, "Low", {"rs100": 2, "rs101": 1}),
    FamilyMember(2, "Grandmother 1", "F", 0, "Blue", 165, "Low", {"rs100": 0, "rs101": 1}),
    FamilyMember(3, "Grandfather 2", "M", 0, "Brown", 182, "High", {"rs100": 2, "rs101": 2}),
    FamilyMember(4, "Grandmother 2", "F", 0, "Green", 170, "High", {"rs100": 1, "rs101": 0}),
    
    # Generation 1 (Parents)
    FamilyMember(5, "Father", "M", 1, "Brown", 180, "Low", {"rs100": 2, "rs101": 1}),
    FamilyMember(6, "Mother", "F", 1, "Green", 168, "High", {"rs100": 1, "rs101": 2}),
    
    # Generation 2 (Children)
    FamilyMember(7, "Child 1", "M", 2, "Brown", 175, "Low", {"rs100": 2, "rs101": 1}),
    FamilyMember(8, "Child 2", "F", 2, "Green", 170, "Low", {"rs100": 1, "rs101": 1}),
    FamilyMember(9, "Child 3", "F", 2, "Blue", 165, "High", {"rs100": 1, "rs101": 2})
]

# Define relationships (parent-child)
relationships = [
    (1, 5),  # Grandfather 1 -> Father
    (2, 5),  # Grandmother 1 -> Father
    (3, 6),  # Grandfather 2 -> Mother
    (4, 6),  # Grandmother 2 -> Mother
    (5, 7),  # Father -> Child 1
    (5, 8),  # Father -> Child 2
    (5, 9),  # Father -> Child 3
    (6, 7),  # Mother -> Child 1
    (6, 8),  # Mother -> Child 2
    (6, 9)   # Mother -> Child 3
]

# Create a directed graph for the family tree
family_tree = nx.DiGraph()

# Add nodes (family members)
for member in family_members:
    family_tree.add_node(member.id, 
                        name=member.name,
                        gender=member.gender,
                        generation=member.generation,
                        eye_color=member.eye_color,
                        height=member.height,
                        disease_risk=member.disease_risk)

# Add edges (relationships)
for parent_id, child_id in relationships:
    family_tree.add_edge(parent_id, child_id)

# Visualize the family tree with traits
plt.figure(figsize=(15, 10))

# Define positions for the nodes by generation and relative position
pos = {}
gen_counts = [0, 0, 0]  # Keep track of members per generation

for member in family_members:
    gen = member.generation
    pos[member.id] = (gen_counts[gen] - (len([m for m in family_members if m.generation == gen]) / 2), -gen)
    gen_counts[gen] += 1

# Define node colors based on eye color
eye_color_map = {'Blue': 'skyblue', 'Green': 'lightgreen', 'Brown': 'sandybrown'}
node_colors = [eye_color_map[family_tree.nodes[node]['eye_color']] for node in family_tree.nodes]

# Define node shapes based on gender
gender_shapes = {'M': 's', 'F': 'o'}  # square for male, circle for female
node_shapes = [gender_shapes[family_tree.nodes[node]['gender']] for node in family_tree.nodes]

# Plot nodes with different shapes based on gender
for shape in set(node_shapes):
    nodelist = [node for i, node in enumerate(family_tree.nodes) if node_shapes[i] == shape]
    node_color_subset = [c for i, c in enumerate(node_colors) if node_shapes[i] == shape]
    
    nx.draw_networkx_nodes(family_tree, pos, 
                          nodelist=nodelist,
                          node_color=node_color_subset,
                          node_shape=shape,
                          node_size=800)

# Draw edges (family relationships)
nx.draw_networkx_edges(family_tree, pos, arrows=True, arrowsize=20, width=1.5, alpha=0.7)

# Add labels for names and traits
labels = {}
for node in family_tree.nodes:
    member_data = family_tree.nodes[node]
    labels[node] = f"{member_data['name']}\n{member_data['eye_color']} eyes\nHeight: {member_data['height']}cm\nRisk: {member_data['disease_risk']}"

nx.draw_networkx_labels(family_tree, pos, labels, font_size=10)

plt.title("Family Tree with Genetic Traits", fontsize=20)
plt.legend(handles=[
    plt.Line2D([0], [0], marker='o', color='w', markerfacecolor='skyblue', markersize=10, label='Blue Eyes'),
    plt.Line2D([0], [0], marker='o', color='w', markerfacecolor='lightgreen', markersize=10, label='Green Eyes'),
    plt.Line2D([0], [0], marker='o', color='w', markerfacecolor='sandybrown', markersize=10, label='Brown Eyes'),
    plt.Line2D([0], [0], marker='s', color='w', markerfacecolor='grey', markersize=10, label='Male'),
    plt.Line2D([0], [0], marker='o', color='w', markerfacecolor='grey', markersize=10, label='Female')
], loc='upper left')

plt.axis('off')
plt.tight_layout()
plt.show()

# For integration with a web application, we would convert this visualization to D3.js
# Here's a function to generate the JSON structure needed for D3.js
def generate_d3_family_tree_json(family_members, relationships):
    """Generate JSON structure for D3.js visualization"""
    nodes = [{"id": member.id, 
              "name": member.name,
              "gender": member.gender,
              "generation": member.generation,
              "eye_color": member.eye_color,
              "height": member.height,
              "disease_risk": member.disease_risk,
              "genotype": member.genotype} 
             for member in family_members]
    
    links = [{"source": source, "target": target} for source, target in relationships]
    
    return {"nodes": nodes, "links": links}

# Generate the JSON for D3.js
d3_json = generate_d3_family_tree_json(family_members, relationships)
print("D3.js JSON structure for web application integration:")
print(d3_json)

# Trait inheritance analysis
def analyze_trait_inheritance(family_members, trait_name):
    """Analyze the inheritance pattern of a specific trait"""
    trait_values = {}
    
    # Group by generation
    by_generation = {}
    for member in family_members:
        gen = member.generation
        if gen not in by_generation:
            by_generation[gen] = []
        by_generation[gen].append(member)
    
    # Get trait distribution by generation
    trait_distribution = {}
    for gen, members in by_generation.items():
        trait_counts = {}
        for member in members:
            trait_value = getattr(member, trait_name)
            if trait_value not in trait_counts:
                trait_counts[trait_value] = 0
            trait_counts[trait_value] += 1
        trait_distribution[gen] = trait_counts
    
    return trait_distribution

# Analyze eye color inheritance
eye_color_inheritance = analyze_trait_inheritance(family_members, 'eye_color')
print("\nEye Color Distribution by Generation:")
for gen, counts in eye_color_inheritance.items():
    gen_name = "Grandparents" if gen == 0 else "Parents" if gen == 1 else "Children"
    print(f"{gen_name}: {counts}")

# Visualize the trait distribution across generations
plt.figure(figsize=(12, 6))
traits = ['Blue', 'Green', 'Brown']
generations = ['Grandparents', 'Parents', 'Children']
trait_data = np.zeros((len(traits), len(generations)))

for i, trait in enumerate(traits):
    for j, gen in enumerate(range(3)):  # 0, 1, 2
        if trait in eye_color_inheritance[gen]:
            trait_data[i, j] = eye_color_inheritance[gen][trait]

# Create a stacked bar chart
bottom = np.zeros(3)
for i, trait in enumerate(traits):
    plt.bar(generations, trait_data[i], bottom=bottom, label=trait, 
            color=eye_color_map[trait], alpha=0.8)
    bottom += trait_data[i]

plt.title("Eye Color Distribution Across Generations", fontsize=16)
plt.xlabel("Generation", fontsize=14)
plt.ylabel("Number of Individuals", fontsize=14)
plt.legend()
plt.tight_layout()
plt.show()

print("\nThis family tree visualization demonstrates how genetic traits are inherited across generations.")
print("In a real application, we would integrate this with D3.js for an interactive web visualization,")
print("allowing users to explore trait inheritance patterns and correlate them with genotype information.")

## 7. Web Application Integration

In this section, we'll outline how the machine learning models and visualization components would be integrated into a Flask web application. We'll create basic API endpoints and model serialization functions.

In [None]:
# Import Flask-related libraries
import pickle
import json
import base64
import io
from PIL import Image, ImageDraw

# This code would be in a separate Flask application file, but we demonstrate it here
def save_models(models, path='models/'):
    """Save trained models to disk"""
    import os
    os.makedirs(path, exist_ok=True)
    
    for model_name, model_data in models.items():
        # Save the model
        with open(f"{path}{model_name}_model.pkl", 'wb') as f:
            pickle.dump(model_data['model'], f)
        
        # Save additional data (features, scaler, etc.)
        metadata = {k: v for k, v in model_data.items() if k != 'model' and not isinstance(v, (tf.keras.Model, scaler.__class__))}
        with open(f"{path}{model_name}_metadata.json", 'w') as f:
            json.dump(metadata, f)
        
        # Special handling for neural network model
        if isinstance(model_data['model'], tf.keras.Model):
            model_data['model'].save(f"{path}{model_name}_nn_model")
        
        # Special handling for scaler
        if 'scaler' in model_data:
            with open(f"{path}{model_name}_scaler.pkl", 'wb') as f:
                pickle.dump(model_data['scaler'], f)
    
    print(f"Models saved to {path}")

# Uncomment to save models
# save_models(models)

# Mock Flask API implementation
def create_flask_app(models):
    """
    Create Flask application with API endpoints
    
    This is a mock function - in a real application, this would be in app.py
    """
    from flask import Flask, request, jsonify
    
    app = Flask(__name__)
    
    @app.route('/api/predict/eye-color', methods=['POST'])
    def predict_eye_color():
        """API endpoint for eye color prediction"""
        data = request.json
        
        # Extract SNP values
        snp_values = []
        for feature in models['eye_color']['features']:
            # Default to most common value if SNP not provided
            snp_values.append(data.get('snps', {}).get(feature, 1))
        
        # Make prediction
        prediction = models['eye_color']['model'].predict([snp_values])[0]
        probabilities = models['eye_color']['model'].predict_proba([snp_values])[0].tolist()
        
        # Map prediction to eye color
        eye_color_map = ['Blue', 'Green', 'Brown']
        
        return jsonify({
            'prediction': eye_color_map[prediction],
            'probabilities': dict(zip(eye_color_map, probabilities))
        })
    
    @app.route('/api/predict/disease-risk', methods=['POST'])
    def predict_disease_risk():
        """API endpoint for disease risk prediction"""
        data = request.json
        
        # Extract SNP values
        snp_values = []
        for feature in models['disease_risk']['features']:
            # Default to most common value if SNP not provided
            snp_values.append(data.get('snps', {}).get(feature, 1))
        
        # Scale the input
        scaled_values = models['disease_risk']['scaler'].transform([snp_values])
        
        # Make prediction
        prediction_prob = float(models['disease_risk']['model'].predict(scaled_values)[0][0])
        prediction = 'High Risk' if prediction_prob > 0.5 else 'Low Risk'
        
        return jsonify({
            'prediction': prediction,
            'probability': prediction_prob
        })
    
    @app.route('/api/visualize/family-tree', methods=['POST'])
    def visualize_family_tree():
        """API endpoint for family tree visualization"""
        data = request.json
        
        # In a real application, this would generate a D3.js visualization
        # For this demo, we'll return the JSON structure needed for D3.js
        
        return jsonify(data)
    
    return app

# Create a mock Flask app (this would be in a separate file)
# app = create_flask_app(models)

# Mock client to demonstrate API usage
def test_api_endpoints():
    """Simulate API calls to the mock Flask application"""
    # Mock eye color prediction request
    eye_color_request = {
        'snps': {
            'rs100': 2,
            'rs101': 1,
            'rs102': 0,
            # Add more SNPs as needed
        }
    }
    
    print("Mock API request for eye color prediction:")
    print(json.dumps(eye_color_request, indent=2))
    
    # In a real application, this would be:
    # response = requests.post('http://localhost:5000/api/predict/eye-color', json=eye_color_request)
    # result = response.json()
    
    # Mock response
    result = {
        'prediction': 'Brown',
        'probabilities': {'Blue': 0.1, 'Green': 0.3, 'Brown': 0.6}
    }
    
    print("\nMock API response:")
    print(json.dumps(result, indent=2))
    
    # Mock disease risk prediction
    disease_risk_request = {
        'snps': {
            'rs200': 1,
            'rs201': 0,
            'rs202': 2,
            # Add more SNPs as needed
        }
    }
    
    print("\nMock API request for disease risk prediction:")
    print(json.dumps(disease_risk_request, indent=2))
    
    # Mock response
    result = {
        'prediction': 'Low Risk',
        'probability': 0.25
    }
    
    print("\nMock API response:")
    print(json.dumps(result, indent=2))
    
    print("\nThis demonstrates how the machine learning models would be exposed via a Flask API.")
    print("In a real application, these endpoints would be called from the React frontend.")

# Demonstrate the API endpoints
test_api_endpoints()

# Create a mockup of what the web application interface might look like
def create_ui_mockup():
    """Create a simple mockup of the web application interface"""
    # Create a blank image
    width, height = 800, 600
    img = Image.new('RGB', (width, height), color='white')
    draw = ImageDraw.Draw(img)
    
    # Draw header
    draw.rectangle([0, 0, width, 60], fill='#4a69bd')
    draw.text((20, 20), "Genetic Trait Predictor & Family Tree Visualizer", fill='white')
    
    # Draw sidebar
    draw.rectangle([0, 60, 200, height], fill='#f5f6fa')
    draw.text((20, 80), "Navigation", fill='black')
    draw.text((30, 120), "Home", fill='black')
    draw.text((30, 150), "Trait Prediction", fill='#4a69bd')
    draw.text((30, 180), "Family Tree", fill='black')
    draw.text((30, 210), "My Profile", fill='black')
    draw.text((30, 240), "About", fill='black')
    
    # Draw main content area - Trait prediction form
    draw.rectangle([220, 80, 780, 120], fill='#dcdde1')
    draw.text((240, 90), "Genetic Trait Prediction", fill='black')
    
    # Form fields
    draw.rectangle([240, 140, 760, 180], fill='#f5f6fa')
    draw.text((250, 150), "Upload genetic data file (23andMe, AncestryDNA, etc.)", fill='gray')
    
    draw.rectangle([240, 200, 760, 300], fill='#f5f6fa')
    draw.text((250, 210), "Or enter SNP values manually:", fill='black')
    draw.text((250, 240), "rs429358:  [ T/T ▼ ]", fill='black')
    draw.text((250, 270), "rs7412:    [ C/T ▼ ]", fill='black')
    
    # Predict button
    draw.rectangle([240, 320, 340, 360], fill='#4a69bd')
    draw.text((270, 330), "Predict", fill='white')
    
    # Results area
    draw.rectangle([240, 380, 760, 550], fill='#f5f6fa')
    draw.text((250, 390), "Results", fill='black')
    draw.text((250, 420), "Eye Color: Brown (76% confidence)", fill='black')
    draw.text((250, 450), "Height Prediction: 175-180 cm", fill='black')
    draw.text((250, 480), "Disease Risk: Low risk for selected conditions", fill='black')
    
    # Display the mockup
    plt.figure(figsize=(15, 10))
    plt.imshow(img)
    plt.axis('off')
    plt.title('Web Application UI Mockup', fontsize=16)
    plt.show()
    
    print("This mockup illustrates how the prediction interface might look in the final web application.")
    print("The actual implementation would use React components with API calls to the Flask backend.")

# Generate and display UI mockup
create_ui_mockup()

## 8. Conclusion and Next Steps

This notebook has demonstrated the core machine learning components for the Genetic Trait Predictor and Family Tree Visualizer project. We've covered data preprocessing, feature selection, model development, explainable AI integration, and visualization.

### Summary of achievements:
- Created synthetic SNP data and phenotypic traits
- Implemented data preprocessing techniques for genomic data
- Developed machine learning models for trait prediction
- Applied explainable AI techniques to provide interpretable predictions
- Created a basic family tree visualization
- Outlined how these components would integrate into a web application

### Next steps for project implementation:
1. **Data Collection**: Partner with genomic data providers to access real SNP datasets
2. **Model Refinement**: Train models on real genomic data and optimize hyperparameters
3. **Feature Engineering**: Implement more sophisticated feature selection techniques
4. **Web Application Development**: Build React frontend and Flask backend
5. **Interactive Visualization**: Implement D3.js-based family tree visualization
6. **Security & Privacy**: Ensure HIPAA compliance and data protection measures
7. **User Testing**: Conduct usability testing with target user groups
8. **Deployment**: Deploy the application on a secure cloud platform

This project represents a convergence of bioinformatics, artificial intelligence, and data visualization to promote awareness of genetic inheritance and proactive healthcare planning.