# Feature Selection Significance on Student Performance (Colab Notebook)

This notebook reproduces the full pipeline for **feature-selection significance**
(Binary / Five-Level / Regression) as in your public repository.

- Leak-free, fold-internal FS (Varimax, LASSO, RFE, TreeImp, Min2, Union, Intersection)
- Significance tests: **Friedman** and **Wilcoxon**
- Outputs: Tables 3–5 with mean±CI and significance marks

> Reference: *Cortez & Silva (2008) — Student Performance Dataset (UCI)*

## 1) Setup — Install Dependencies (run this cell)

In [ ]:
!pip -q install factor-analyzer tabulate xgboost

## 2) Download Dataset from Your GitHub (run this cell)
This will place the CSV exactly where the script expects it (`/content/student-mat.csv`).

In [ ]:
import os
os.makedirs('/content', exist_ok=True)
!wget -O /content/student-mat.csv https://raw.githubusercontent.com/karman09/student-performance-FS-significance/main/fs-significance-studentdata/student-mat.csv

## 3) Fetch the Main Script from Your GitHub (run this cell)
This downloads the **exact** Python file you host in your repo into the Colab runtime.

In [ ]:
!wget -O full_master_v3_8_FSsignif.py https://raw.githubusercontent.com/karman09/student-performance-FS-significance/main/fs-significance-studentdata/full_master_v3_8_FSsignif.py
!ls -lh full_master_v3_8_FSsignif.py

## 4) Run the Full Pipeline (Tables 3–5)
Click ▶ to execute. This will:

- Read `/content/student-mat.csv`
- Produce Tables 3–5 (Binary, Five-Level, Regression)
- Print significance summaries (Friedman + Wilcoxon)
- Export CSVs to `/content/results/`


In [ ]:
%run full_master_v3_8_FSsignif.py

## 5) Optional: Inspect Generated CSVs
If you want to preview the CSV outputs generated under `/content/results/`, run:

In [ ]:
import os, pandas as pd
base = '/content/results'
print('Files under /content/results:')
print(os.listdir(base) if os.path.exists(base) else 'No results found yet.')

# Example: preview one CSV if exists
for fname in (os.listdir(base) if os.path.exists(base) else []):
    if fname.endswith('.csv'):
        print('\nPreview of', fname)
        display(pd.read_csv(os.path.join(base, fname)).head())
        break

## 6) Notes on Reproducibility
- The script uses **`FAST_MODE=True`** by default for quick runs in Colab.
- To run the full research-grade CV (10×20), set `FAST_MODE=False` **in the Python file** and re-run the pipeline.


## 7) Citation & License
- *Cortez, P., & Silva, A. M. G. (2008).* Using Data Mining to Predict Secondary School Student Performance. UMinho, Portugal. (UCI Repository)
- This notebook and script are released under the **MIT License** (see repository `LICENSE`).