<a href="https://colab.research.google.com/github/mewilke/GI-diagnosis/blob/main/GI_neural_net.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Neural Network
A neural network is a type of AI model, inspired by the brain. It has interconnected nodes or "neurons" arranged in layers. These networks learn by adjusting weights and biases of the connections between neurons to recognize patterns and make predictions or decisions.

In this application a neural net is used for categorization. TabPFN was chosen because it is excels with small tabular datasets. Its "...transformer architecture learns a generic algorithm from a massive number of synthetic datasets, allowing it to make predictions on new data with a single forward pass (in-context learning) without requiring model retraining or extensive hyperparameter tuning."

In [7]:
# TabPFN neaural net classifier

# Install TabPFN with compatible Scikit Learn
!pip install -q scikit-learn==1.6.1
!pip install -q tabpfn

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from tabpfn import TabPFNClassifier
# suppress warnings
import warnings
warnings.filterwarnings('ignore')

# upload the dataset and check for duplicate rows
df = pd.read_csv('./data/gastrointestinal_disease_dataset.csv')
num_duplicate_rows = df.duplicated().sum()
print(f"Number of duplicate rows: {num_duplicate_rows}")

# train_test_split using just the top features
df_top_feats = df[['Body_Weight', 'Gender', 'Age', 'Family_History', 'Disease_Class']]
X = df_top_feats.drop('Disease_Class', axis=1)
y = df_top_feats['Disease_Class']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

clf = TabPFNClassifier(ignore_pretraining_limits=True)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

statistics_df = pd.DataFrame(columns=['Algorithm', 'Accuracy', 'Precision', 'Recall', 'F1'])
statistics_df.loc[0, 'Algorithm'] = 'Neural Network'
statistics_df.loc[0, 'Accuracy'] = accuracy_score(y_test, y_pred)
statistics_df.loc[0, 'Precision'] = precision_score(y_test, y_pred, average='weighted')
statistics_df.loc[0, 'Recall'] = recall_score(y_test, y_pred, average='weighted')
statistics_df.loc[0, 'F1'] = f1_score(y_test, y_pred, average='weighted')

statistics_df

Number of duplicate rows: 0


Unnamed: 0,Algorithm,Accuracy,Precision,Recall,F1
0,Neural Network,0.169339,0.028676,0.169339,0.049046
