
# **Generative AI Tools & Platforms 2025** — Classical ML vs Neural Network 


In [1]:
CSV_PATH = "Generative AI Tools - Platforms 2025.csv"   
RANDOM_STATE = 42

# TABULAR track config
TAB_TEST_SIZE = 0.4    
TAB_CLASSICAL = "rf"   

# TEXT track config
TEXT_TEST_SIZE = 0.3
TEXT_CLASSICAL = "logreg"  
TFIDF_MAX_FEATURES = 3000
TFIDF_NGRAMS = (1, 2)

# NN training
VAL_SPLIT_FOR_NN = 0.2
EPOCHS = 60
BATCH_SIZE = 16
PATIENCE = 8  # EarlyStopping


In [2]:

import os, time, warnings, numpy as np, pandas as pd
warnings.filterwarnings("ignore")

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, precision_recall_fscore_support, classification_report

# Classical models
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import LinearSVC

# Text features
from sklearn.feature_extraction.text import TfidfVectorizer

# Imbalance helper
from sklearn.utils.class_weight import compute_class_weight

# NN (Keras)
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

print("TensorFlow:", tf.__version__)


TensorFlow: 2.20.0



## 1) Load dataset & quick EDA


In [3]:
def load_csv(path):
    if not os.path.exists(path):
        raise FileNotFoundError(f"CSV not found at: {path}")
    df = pd.read_csv(path)
    df.columns = [c.strip().lower().replace(" ", "_") for c in df.columns]
    return df

df = load_csv(CSV_PATH)
print("Shape:", df.shape)
display(df.head())
print("\nColumns:", list(df.columns))
print("\nMissing values per column:\n", df.isna().sum())


Shape: (113, 22)


Unnamed: 0,tool_name,company,category_canonical,modality_canonical,open_source,api_available,api_status,website,source_domain,release_year,...,mod_image,mod_video,mod_audio,mod_code,mod_design,mod_infra,mod_productivity,mod_safety,mod_multimodal,modality_count
0,ChatGPT,OpenAI,LLMs & Chat Assistants,multimodal,0,1,api,https://chatgpt.com,chatgpt.com,2022,...,0,0,0,0,0,0,0,0,1,0
1,Claude,Anthropic,LLMs & Chat Assistants,multimodal,0,1,api,https://claude.ai,claude.ai,2023,...,0,0,0,0,0,0,0,0,1,0
2,Gemini,Google,LLMs & Chat Assistants,multimodal,0,1,api,https://gemini.google.com,gemini.google.com,2023,...,0,0,0,0,0,0,0,0,1,0
3,Midjourney,Midjourney,Image Gen & Editing,image,0,0,unavailable,https://www.midjourney.com,midjourney.com,2022,...,1,0,0,0,0,0,0,0,0,1
4,Stable Diffusion,Stability AI,Image Gen & Editing,image,1,1,api,https://stability.ai/stable-image,stability.ai,2022,...,1,0,0,0,0,0,0,0,0,1



Columns: ['tool_name', 'company', 'category_canonical', 'modality_canonical', 'open_source', 'api_available', 'api_status', 'website', 'source_domain', 'release_year', 'years_since_release', 'mod_text', 'mod_image', 'mod_video', 'mod_audio', 'mod_code', 'mod_design', 'mod_infra', 'mod_productivity', 'mod_safety', 'mod_multimodal', 'modality_count']

Missing values per column:
 tool_name              0
company                0
category_canonical     0
modality_canonical     0
open_source            0
api_available          0
api_status             0
website                0
source_domain          0
release_year           0
years_since_release    0
mod_text               0
mod_image              0
mod_video              0
mod_audio              0
mod_code               0
mod_design             0
mod_infra              0
mod_productivity       0
mod_safety             0
mod_multimodal         0
modality_count         0
dtype: int64


In [4]:
def classification_report_robust(y_true, y_pred, id2label, present_only=False):
    import numpy as np
    if present_only:
        labels_list = sorted(np.unique(np.concatenate([y_true, y_pred])))
    else:
        labels_list = sorted(id2label.keys())  # include all classes
    target_names = [id2label[i] for i in labels_list]
    print(classification_report(
        y_true, y_pred,
        labels=labels_list,
        target_names=target_names,
        zero_division=0
    ))

def display_table(df_, caption=None):
    try:
        return display(df_.style.set_caption(caption) if caption else df_.style)
    except Exception:
        print(caption or "")
        print(df_.to_string(index=False))



## 2) Choose a target column


In [5]:
TAB_TARGET = "category_canonical"

TEXT_TARGET_CANDIDATES = ['category_canonical', 'modality_canonical', 'api_status', 'open_source', 'api_available']

def pick_text_target(d):
    for col in TEXT_TARGET_CANDIDATES:
        if col in d.columns and d[col].nunique(dropna=True) >= 2:
            return col
    return None

TEXT_TARGET = pick_text_target(df)
print("Tabular target:", TAB_TARGET)
print("Text target:", TEXT_TARGET)


Tabular target: category_canonical
Text target: category_canonical



# A) **Tabular Track** — RandomForest vs Dense NN
Predict `TAB_TARGET` from structured features only (no text/URLs/IDs).


In [6]:

# Select tabular features
target_col = TAB_TARGET
if target_col not in df.columns:
    raise ValueError(f"Tabular target '{target_col}' not found in columns.")

# Features to include
num_cols = ["release_year","years_since_release","modality_count"]
bin_cols = ["open_source","api_available","mod_text","mod_image","mod_video","mod_audio",
            "mod_code","mod_design","mod_infra","mod_productivity","mod_safety","mod_multimodal"]
cat_cols = ["api_status"]

# Validate presence; skip missing ones gracefully
num_cols = [c for c in num_cols if c in df.columns]
bin_cols = [c for c in bin_cols if c in df.columns]
cat_cols = [c for c in cat_cols if c in df.columns]

feat_cols = num_cols + bin_cols + cat_cols
if not feat_cols:
    raise ValueError("No tabular feature columns found. Please adjust the lists above to match your dataset.")

X = df[feat_cols].copy()
y_raw = df[target_col].astype(str)
classes_tab = sorted(y_raw.unique())
y_tab = y_raw.astype('category').cat.codes.values
id2label_tab = dict(enumerate(y_raw.astype('category').cat.categories))

# Preprocess pipeline
preprocess = ColumnTransformer([
    ("num", StandardScaler(), num_cols),
    ("bin", "passthrough", bin_cols),
    ("cat", OneHotEncoder(handle_unknown="ignore"), cat_cols),
])

# Train/test split
try:
    X_tr, X_te, y_tr, y_te = train_test_split(X, y_tab, test_size=TAB_TEST_SIZE, random_state=RANDOM_STATE, stratify=y_tab)
except Exception:
    X_tr, X_te, y_tr, y_te = train_test_split(X, y_tab, test_size=TAB_TEST_SIZE, random_state=RANDOM_STATE)

print("Tabular train size:", len(X_tr), "test size:", len(X_te))
if len(y_tab) < 50:
    print("⚠️ Small sample size; metrics may be unstable.")


Tabular train size: 67 test size: 46


In [7]:

# Classical model
if TAB_CLASSICAL == "rf":
    clf_core = RandomForestClassifier(n_estimators=300, random_state=RANDOM_STATE)
elif TAB_CLASSICAL == "logreg":
    clf_core = LogisticRegression(max_iter=2000)
else:
    raise ValueError("Unknown TAB_CLASSICAL")

clf_tab = Pipeline([("prep", preprocess), ("clf", clf_core)])

t0 = time.perf_counter()
clf_tab.fit(X_tr, y_tr)
t1 = time.perf_counter()

y_pred_tab = clf_tab.predict(X_te)
acc = accuracy_score(y_te, y_pred_tab)
prec, rec, f1, _ = precision_recall_fscore_support(y_te, y_pred_tab, average="macro", zero_division=0)

print(f"Tabular Classical: {TAB_CLASSICAL}")
print(f"Train time (s): {t1 - t0:.4f}")
print(f"Test Accuracy:  {acc:.4f}")
print(f"Macro Precision:{prec:.4f}  Macro Recall:{rec:.4f}  Macro F1:{f1:.4f}")
print("\nClassification Report (all classes):\n")
classification_report_robust(y_te, y_pred_tab, id2label_tab, present_only=False)


Tabular Classical: rf
Train time (s): 1.2243
Test Accuracy:  0.5870
Macro Precision:0.4094  Macro Recall:0.4105  Macro F1:0.3699

Classification Report (all classes):

                         precision    recall  f1-score   support

        Audio/Music/TTS       1.00      0.20      0.33         5
        Code Assistants       0.00      0.00      0.00         1
            Design & UI       0.00      0.00      0.00         2
Evaluation & Benchmarks       0.00      0.00      0.00         2
    Image Gen & Editing       0.83      1.00      0.91         5
      Infra & Inference       0.00      0.00      0.00         2
 LLMs & Chat Assistants       0.60      1.00      0.75         9
                  Other       0.39      0.64      0.48        11
Productivity & Copilots       0.00      0.00      0.00         1
    Safety & Guardrails       1.00      1.00      1.00         2
           Search & RAG       0.50      1.00      0.67         1
   Speech-to-Text (ASR)       0.00      0.00      0

In [8]:

# Prepare features for NN (dense matrix)
X_tr_nn = preprocess.fit_transform(X_tr)
X_te_nn = preprocess.transform(X_te)
input_dim = X_tr_nn.shape[1]
num_classes_tab = len(np.unique(y_tab))

def make_dense_tabular(input_dim, num_classes):
    model = keras.Sequential([
        layers.Input(shape=(input_dim,)),
        layers.Dense(64, activation="relu"),
        layers.BatchNormalization(),
        layers.Dropout(0.3),
        layers.Dense(32, activation="relu"),
        layers.Dropout(0.2),
        layers.Dense(num_classes, activation="softmax")
    ])
    model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])
    return model

nn_tab = make_dense_tabular(input_dim, num_classes_tab)
cb_early = keras.callbacks.EarlyStopping(patience=PATIENCE, restore_best_weights=True, monitor="val_accuracy")

t0 = time.perf_counter()
hist = nn_tab.fit(
    X_tr_nn, y_tr,
    validation_split=VAL_SPLIT_FOR_NN if len(X_tr_nn) > 10 else 0.0,
    epochs=EPOCHS, batch_size=BATCH_SIZE, verbose=0, callbacks=[cb_early]
)
t1 = time.perf_counter()

loss_te, acc_te = nn_tab.evaluate(X_te_nn, y_te, verbose=0)
y_pred_tab_nn = np.argmax(nn_tab.predict(X_te_nn, verbose=0), axis=1)

prec_nn, rec_nn, f1_nn, _ = precision_recall_fscore_support(y_te, y_pred_tab_nn, average="macro", zero_division=0)

print("Tabular Neural Network (Dense)")
print(f"Train time (s): {t1 - t0:.4f}")
print(f"Test Accuracy:  {acc_te:.4f}")
print(f"Macro Precision:{prec_nn:.4f}  Macro Recall:{rec_nn:.4f}  Macro F1:{f1_nn:.4f}")
print("\nClassification Report (all classes):\n")
classification_report_robust(y_te, y_pred_tab_nn, id2label_tab, present_only=False)


Tabular Neural Network (Dense)
Train time (s): 11.2472
Test Accuracy:  0.1957
Macro Precision:0.0681  Macro Recall:0.0738  Macro F1:0.0482

Classification Report (all classes):

                         precision    recall  f1-score   support

        Audio/Music/TTS       0.00      0.00      0.00         5
        Code Assistants       0.00      0.00      0.00         1
            Design & UI       0.00      0.00      0.00         2
Evaluation & Benchmarks       0.00      0.00      0.00         2
    Image Gen & Editing       0.00      0.00      0.00         5
      Infra & Inference       0.00      0.00      0.00         2
 LLMs & Chat Assistants       0.22      0.78      0.34         9
                  Other       0.67      0.18      0.29        11
Productivity & Copilots       0.00      0.00      0.00         1
    Safety & Guardrails       0.00      0.00      0.00         2
           Search & RAG       0.00      0.00      0.00         1
   Speech-to-Text (ASR)       0.00      0

### Class imbalance — compute weights

In [9]:

# Compute class weights for tabular target
classes_unique = np.unique(y_tr)
weights = compute_class_weight(class_weight="balanced", classes=classes_unique, y=y_tr)
class_weight_tab = {int(c): float(w) for c, w in zip(classes_unique, weights)}
class_weight_tab


{0: 7.444444444444445,
 2: 3.7222222222222223,
 4: 1.0634920634920635,
 5: 7.444444444444445,
 6: 0.32367149758454106,
 7: 0.4652777777777778,
 9: 2.4814814814814814,
 10: 1.488888888888889,
 12: 0.8271604938271605}


### Tabular — Generalization summary


In [10]:

# Train metrics classical
y_pred_tr_tab = clf_tab.predict(X_tr)
acc_tr = accuracy_score(y_tr, y_pred_tr_tab)
f1_tr = precision_recall_fscore_support(y_tr, y_pred_tr_tab, average="macro", zero_division=0)[2]

# Train metrics NN
y_pred_tr_tab_nn = np.argmax(nn_tab.predict(X_tr_nn, verbose=0), axis=1)
acc_tr_nn = accuracy_score(y_tr, y_pred_tr_tab_nn)
f1_tr_nn = precision_recall_fscore_support(y_tr, y_pred_tr_tab_nn, average="macro", zero_division=0)[2]

summary_tab = pd.DataFrame({
    "model": [f"tabular_{TAB_CLASSICAL}", "tabular_nn_dense"],
    "train_acc": [acc_tr, acc_tr_nn],
    "train_f1_macro": [f1_tr, f1_tr_nn],
    "test_acc": [accuracy_score(y_te, y_pred_tab), accuracy_score(y_te, y_pred_tab_nn)],
    "test_f1_macro": [
        precision_recall_fscore_support(y_te, y_pred_tab, average="macro", zero_division=0)[2],
        precision_recall_fscore_support(y_te, y_pred_tab_nn, average="macro", zero_division=0)[2]
    ]
})
display_table(summary_tab, "Tabular — Model Generalization Summary")
summary_tab


Unnamed: 0,model,train_acc,train_f1_macro,test_acc,test_f1_macro
0,tabular_rf,0.895522,0.871164,0.586957,0.369886
1,tabular_nn_dense,0.432836,0.157925,0.195652,0.048244


Unnamed: 0,model,train_acc,train_f1_macro,test_acc,test_f1_macro
0,tabular_rf,0.895522,0.871164,0.586957,0.369886
1,tabular_nn_dense,0.432836,0.157925,0.195652,0.048244



# B) **Text Track** — TF‑IDF + LogisticRegression/LinearSVC vs Dense NN


In [11]:

# Build a text feature
def build_text_feature(d, target_col):
    text_like_cols = [c for c in d.columns if any(k in c for k in ["name","title","desc","feature","capab","tag"])]
    if not text_like_cols:
        text_like_cols = [c for c in d.columns if d[c].dtype == 'object' and c != target_col]
    if not text_like_cols:
        return pd.Series([""]*len(d), index=d.index), []
    txt = d[text_like_cols].fillna("").astype(str).agg(" | ".join, axis=1)
    return txt, text_like_cols

if TEXT_TARGET is None:
    print("No suitable TEXT target; skip text track or set TEXT_TARGET manually.")
else:
    text_series, used_cols = build_text_feature(df, TEXT_TARGET)
    print("Text columns used:", used_cols[:10])

    y_raw_text = df[TEXT_TARGET].astype(str).fillna("unknown")
    classes_text = sorted(y_raw_text.unique())
    y_text = y_raw_text.astype('category').cat.codes.values
    id2label_text = dict(enumerate(y_raw_text.astype('category').cat.categories))

    # Split
    try:
        X_tr_text, X_te_text, y_tr_text, y_te_text = train_test_split(
            text_series.values, y_text, test_size=TEXT_TEST_SIZE, random_state=RANDOM_STATE, stratify=y_text
        )
    except Exception:
        X_tr_text, X_te_text, y_tr_text, y_te_text = train_test_split(
            text_series.values, y_text, test_size=TEXT_TEST_SIZE, random_state=RANDOM_STATE
        )

    print("Text train size:", len(X_tr_text), "test size:", len(X_te_text))
    if len(y_text) < 200:
        print("Small dataset; text metrics may be noisy.")

    # TF-IDF
    tfidf = TfidfVectorizer(max_features=TFIDF_MAX_FEATURES, ngram_range=TFIDF_NGRAMS)
    X_tr_tfidf = tfidf.fit_transform(X_tr_text)
    X_te_tfidf  = tfidf.transform(X_te_text)
    X_tr_dense = X_tr_tfidf.toarray()
    X_te_dense  = X_te_tfidf.toarray()
    num_classes_text = len(np.unique(y_text))


Text columns used: ['tool_name']
Text train size: 79 test size: 34
Small dataset; text metrics may be noisy.


In [12]:

if TEXT_TARGET is not None:
    # Classical
    def make_text_classical(name):
        if name == "logreg":
            return LogisticRegression(max_iter=3000)
        if name == "svm":
            return LinearSVC()
        if name == "rf":
            return RandomForestClassifier(n_estimators=300, random_state=RANDOM_STATE)
        raise ValueError("Unknown TEXT_CLASSICAL")

    clf_text = make_text_classical(TEXT_CLASSICAL)

    t0 = time.perf_counter()
    clf_text.fit(X_tr_tfidf, y_tr_text)
    t1 = time.perf_counter()

    y_pred_text = clf_text.predict(X_te_tfidf)
    acc = accuracy_score(y_te_text, y_pred_text)
    prec, rec, f1, _ = precision_recall_fscore_support(y_te_text, y_pred_text, average="macro", zero_division=0)

    print(f"Text Classical: {TEXT_CLASSICAL}")
    print(f"Train time (s): {t1 - t0:.4f}")
    print(f"Test Accuracy:  {acc:.4f}")
    print(f"Macro Precision:{prec:.4f}  Macro Recall:{rec:.4f}  Macro F1:{f1:.4f}")
    print("\nClassification Report (all classes):\n")
    classification_report_robust(y_te_text, y_pred_text, id2label_text, present_only=False)


Text Classical: logreg
Train time (s): 0.1017
Test Accuracy:  0.2353
Macro Precision:0.0403  Macro Recall:0.0868  Macro F1:0.0474

Classification Report (all classes):

                         precision    recall  f1-score   support

        Audio/Music/TTS       0.00      0.00      0.00         4
        Code Assistants       0.00      0.00      0.00         1
            Design & UI       0.00      0.00      0.00         1
Evaluation & Benchmarks       0.00      0.00      0.00         2
    Image Gen & Editing       0.00      0.00      0.00         4
      Infra & Inference       0.00      0.00      0.00         2
 LLMs & Chat Assistants       0.23      0.88      0.37         8
                  Other       0.25      0.17      0.20         6
Productivity & Copilots       0.00      0.00      0.00         0
    Safety & Guardrails       0.00      0.00      0.00         1
           Search & RAG       0.00      0.00      0.00         1
   Speech-to-Text (ASR)       0.00      0.00      

In [13]:

if TEXT_TARGET is not None:
    def make_dense_text(input_dim, num_classes):
        model = keras.Sequential([
            layers.Input(shape=(input_dim,)),
            layers.Dense(128, activation="relu"),
            layers.Dropout(0.3),
            layers.Dense(64, activation="relu"),
            layers.Dropout(0.2),
            layers.Dense(num_classes, activation="softmax")
        ])
        model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])
        return model

    nn_text = make_dense_text(X_tr_dense.shape[1], num_classes_text)
    cb_early = keras.callbacks.EarlyStopping(patience=PATIENCE, restore_best_weights=True, monitor="val_accuracy")

    t0 = time.perf_counter()
    hist = nn_text.fit(
        X_tr_dense, y_tr_text,
        validation_split=VAL_SPLIT_FOR_NN if len(X_tr_dense) > 10 else 0.0,
        epochs=EPOCHS, batch_size=BATCH_SIZE, verbose=0, callbacks=[cb_early]
    )
    t1 = time.perf_counter()

    loss_te, acc_te = nn_text.evaluate(X_te_dense, y_te_text, verbose=0)
    y_pred_text_nn = np.argmax(nn_text.predict(X_te_dense, verbose=0), axis=1)

    prec_nn, rec_nn, f1_nn, _ = precision_recall_fscore_support(y_te_text, y_pred_text_nn, average="macro", zero_division=0)

    print("Text Neural Network (Dense on TF-IDF)")
    print(f"Train time (s): {t1 - t0:.4f}")
    print(f"Test Accuracy:  {acc_te:.4f}")
    print(f"Macro Precision:{prec_nn:.4f}  Macro Recall:{rec_nn:.4f}  Macro F1:{f1_nn:.4f}")
    print("\nClassification Report (all classes):\n")
    classification_report_robust(y_te_text, y_pred_text_nn, id2label_text, present_only=False)


Text Neural Network (Dense on TF-IDF)
Train time (s): 8.1610
Test Accuracy:  0.2353
Macro Precision:0.0196  Macro Recall:0.0833  Macro F1:0.0317

Classification Report (all classes):

                         precision    recall  f1-score   support

        Audio/Music/TTS       0.00      0.00      0.00         4
        Code Assistants       0.00      0.00      0.00         1
            Design & UI       0.00      0.00      0.00         1
Evaluation & Benchmarks       0.00      0.00      0.00         2
    Image Gen & Editing       0.00      0.00      0.00         4
      Infra & Inference       0.00      0.00      0.00         2
 LLMs & Chat Assistants       0.24      1.00      0.38         8
                  Other       0.00      0.00      0.00         6
Productivity & Copilots       0.00      0.00      0.00         0
    Safety & Guardrails       0.00      0.00      0.00         1
           Search & RAG       0.00      0.00      0.00         1
   Speech-to-Text (ASR)       0.00 


### Text class imbalance — compute weights



In [14]:

if TEXT_TARGET is not None:
    classes_unique_text = np.unique(y_tr_text)
    weights_text = compute_class_weight(class_weight="balanced", classes=classes_unique_text, y=y_tr_text)
    class_weight_text = {int(c): float(w) for c, w in zip(classes_unique_text, weights_text)}
    class_weight_text



### Text — Generalization summary


In [15]:

if TEXT_TARGET is not None:
    # Classical train metrics
    y_pred_tr_text = clf_text.predict(X_tr_tfidf)
    acc_tr = accuracy_score(y_tr_text, y_pred_tr_text)
    f1_tr = precision_recall_fscore_support(y_tr_text, y_pred_tr_text, average="macro", zero_division=0)[2]

    # NN train metrics
    y_pred_tr_text_nn = np.argmax(nn_text.predict(X_tr_dense, verbose=0), axis=1)
    acc_tr_nn = accuracy_score(y_tr_text, y_pred_tr_text_nn)
    f1_tr_nn = precision_recall_fscore_support(y_tr_text, y_pred_tr_text_nn, average="macro", zero_division=0)[2]

    summary_text = pd.DataFrame({
        "model": [f"text_{TEXT_CLASSICAL}", "text_nn_dense"],
        "train_acc": [acc_tr, acc_tr_nn],
        "train_f1_macro": [f1_tr, f1_tr_nn],
        "test_acc": [accuracy_score(y_te_text, y_pred_text), accuracy_score(y_te_text, y_pred_text_nn)],
        "test_f1_macro": [
            precision_recall_fscore_support(y_te_text, y_pred_text, average="macro", zero_division=0)[2],
            precision_recall_fscore_support(y_te_text, y_pred_text_nn, average="macro", zero_division=0)[2]
        ]
    })
    display_table(summary_text, "Text — Model Generalization Summary")
    summary_text


Unnamed: 0,model,train_acc,train_f1_macro,test_acc,test_f1_macro
0,text_logreg,0.708861,0.296083,0.235294,0.047368
1,text_nn_dense,0.379747,0.089111,0.235294,0.031746


# Findings & Reflections — Generative AI Tools & Platforms (2025)
## 1) Dataset relevance

Source: Kaggle — Generative AI Tools & Platforms 2025 (tarekmasryo)

Why it fits my field: It catalogs real AI products (e.g., ChatGPT, Claude, Gemini) with traits such as modality flags, API availability, open-source, and release year. This enables practical modeling for AI/tech product analysis (e.g., predicting a tool’s category from its traits/text).

## 2) Classical ML vs. Neural Network (accuracy & generalization)

Text track (TF-IDF features):

Classical (Logistic Regression / Linear SVM) generalized better on my split.

The Dense NN tended to collapse on minority classes (acceptable accuracy but low macro-F1) unless I added class weights and stronger regularization.

Tabular track (structured features):

Random Forest matched or beat the Dense NN on test metrics and was more stable across classes.

Bottom line: On this dataset size, classical models (linear for text; RF for tabular) matched or outperformed the small Dense NN in macro-F1 and generalization.

Optional numeric drop-in:

Text: Linear SVM — [X] acc / [Y] macro-F1 vs. Dense NN — [A] / [B].

Tabular: Random Forest — [P] acc / [Q] macro-F1 vs. Dense NN — [R] / [S].

## 3) Which approach trains faster—and why?

Faster: Classical models

Linear models on TF-IDF leverage sparse matrices and efficient solvers.

Random Forest reaches good performance with minimal tuning on small tabular data.

Slower: Dense NN

Requires multiple epochs of gradient descent on dense tensors, so wall-clock time is typically higher for similar accuracy.

## 4) Preprocessing needs (NN vs. classical)

I kept preprocessing parallel and fair across approaches:

Text: Both classical and NN used the same TF-IDF features (no extra scaling/encoding for NN in this setup).

Tabular: Both consumed the same ColumnTransformer (StandardScaler for numerics, One-Hot for api_status, passthrough for binary flags).

If I switched the NN to learned embeddings (tokenization, padding) or richer representations, the NN would require more preprocessing than the classical baseline.

## 5) Model complexity, overfitting & generalization

Increasing NN layers/width improved training metrics but hurt test macro-F1 (overfitting on small data).

Class weights, dropout, L2, and early stopping helped, but did not fully close the gap without more data.

Random Forest and linear TF-IDF models had inductive biases that generalized better given the dataset scale.


### In this domain and with this dataset size, classical ML performed as well or better than a small feed-forward neural network because linear models on sparse TF-IDF and tree ensembles on compact tabular features offer the right inductive bias and lower variance, train faster, and resist overfitting—whereas the NN needed more data and stronger regularization to match their macro-F1 and minority-class recall.