The logistic regression model used in the previous example is not necessarily the best model for liver disease prediction. Let me explain how to improve accuracy and suggest better approaches:

Why Logistic Regression Might Not Be Best
Class imbalance (common in medical datasets)

Non-linear relationships between features

Feature interactions not captured effectively

Dataset-specific characteristics needing specialized handling

### Better Approaches for Higher Accuracy

1. Try Different Algorithms

In [None]:
# Hyperparameter Tuning

from sklearn.model_selection import GridSearchCV

param_grid = {
    'n_estimators': [100, 200, 500],
    'max_depth': [3, 5, 7],
    'learning_rate': [0.01, 0.1, 0.2]
}

grid_search = GridSearchCV(XGBClassifier(), param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)
best_model = grid_search.best_estimator_

In [None]:
# Handle Class Imbalance
# Check class distribution
print(y.value_counts())

# Solutions
from imblearn.over_sampling import SMOTE

smote = SMOTE()
X_res, y_res = smote.fit_resample(X, y)

In [None]:
#  Feature Engineering
# Create new features
df['Total_Bilirubin_ratio'] = df['Direct_Bilirubin'] / df['Total_Bilirubin']
df['Protein_ratio'] = df['Albumin'] / df['Total_Proteins']

### Step-by-Step Improvement Plan

In [None]:
# Data Preprocessing
# Handle outliers
from scipy import stats
df = df[(np.abs(stats.zscore(df[numerical_features])) < 3).all(axis=1)]

# Better encoding
df = pd.get_dummies(df, columns=['Gender'], drop_first=True)

In [None]:
from sklearn.model_selection import StratifiedKFold
cv = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)

In [None]:
from sklearn.ensemble import VotingClassifier

estimators = [
    ('xgb', XGBClassifier()),
    ('rf', RandomForestClassifier()),
    ('svm', SVC(probability=True))
]

ensemble = VotingClassifier(estimators, voting='soft')

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

model = Sequential([
    Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    Dropout(0.3),
    Dense(32, activation='relu'),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

### Critical Enhancements for Production

In [None]:
# Model Monitoring
# Log metrics with MLflow
import mlflow
mlflow.log_metric("accuracy", accuracy_score)

In [None]:
# Feature Importance Analysis
import shap
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)

In [None]:
# Probability Calibration
from sklearn.calibration import CalibratedClassifierCV
calibrated_model = CalibratedClassifierCV(base_model, method='isotonic', cv=5)

In [None]:
# Deployment Optimization
# Convert model to ONNX format
from skl2onnx import convert_sklearn
onnx_model = convert_sklearn(model, 'liver_model.onnx')

### How to Choose the Best Model
Evaluate using precision-recall curves (better than accuracy for imbalanced data)

Use Bayesian Optimization for hyperparameter tuning

Implement automated feature selection

Test multiple evaluation metrics:

In [None]:
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))