Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
dcf93a4
cast site to numpy
zaRizk7 Jun 4, 2025
f0d92aa
add num_solver_iter and rename extension
zaRizk7 Jun 4, 2025
7047c23
update notebook objectives and trainer imports
zaRizk7 Jun 4, 2025
092bf9d
update base exp yaml
zaRizk7 Jun 5, 2025
448c359
use skf by default
zaRizk7 Jun 5, 2025
1e32ba2
add handle for google colab runtime
zaRizk7 Jun 5, 2025
cf9e370
update output
zaRizk7 Jun 5, 2025
bff672b
reduce preprocess_phenotypic_data functionality and use polars to rep…
zaRizk7 Jun 16, 2025
72161f0
use polars to replace pandas
zaRizk7 Jun 16, 2025
11da26e
add manifest and load_data function to fetch data from gdrive
zaRizk7 Jun 16, 2025
d22141d
update default cfg and base exp yml
zaRizk7 Jun 16, 2025
b5b5923
update notebook contents
zaRizk7 Jun 16, 2025
5bbcc39
add polars and gdown to req
zaRizk7 Jun 16, 2025
237691d
change nilearn req
zaRizk7 Jun 16, 2025
afb3b0a
remove param_validation
zaRizk7 Jun 16, 2025
fb65616
add handle to prioritize site-packages for colab
zaRizk7 Jun 16, 2025
8567ddd
use single core only
zaRizk7 Jun 16, 2025
ac4a5e5
update pre_dispatch config
zaRizk7 Jun 16, 2025
7a175ba
add --user to handle site-packages
zaRizk7 Jun 16, 2025
fe52be1
use default n_jobs
zaRizk7 Jun 16, 2025
bebe42f
fallback to pandas
zaRizk7 Jun 16, 2025
79f0336
update config and base yml
zaRizk7 Jun 16, 2025
d93784d
remove polars
zaRizk7 Jun 16, 2025
1ad406b
use tangent-pearson by default
zaRizk7 Jun 16, 2025
544fb50
remove fc cfg
zaRizk7 Jun 16, 2025
63d78c4
reduce search iter
zaRizk7 Jun 16, 2025
d41007a
update notebook with new cfg
zaRizk7 Jun 16, 2025
dba1953
Merge branch 'main' into brain-decoding
zaRizk7 Jun 16, 2025
df96558
revert to use param_validation for load_data
zaRizk7 Jun 16, 2025
8688472
fix pydoc typo
zaRizk7 Jun 16, 2025
c0fb3f4
explicitly name loaded fc as fc_data
zaRizk7 Jun 16, 2025
7aa3355
add dirname(__file__) to prevent relative dir errors
zaRizk7 Jun 16, 2025
ed386a1
use dirname(__file__) for atlas_folder
zaRizk7 Jun 16, 2025
627008b
remove note for colab
zaRizk7 Jun 16, 2025
a584d53
remove check_random_state
zaRizk7 Jun 16, 2025
667e097
annotate config for classifier and split
zaRizk7 Jun 16, 2025
7efc7c2
fix seed with trainer
zaRizk7 Jun 16, 2025
763c1b9
include cc400 in the validation for load_data
zaRizk7 Jun 16, 2025
ac29f3e
update comments
zaRizk7 Jun 16, 2025
98a8d16
remove nilearn imports
zaRizk7 Jun 16, 2025
7ba387c
remove aal imports
zaRizk7 Jun 16, 2025
2175be3
remove unused seaborn import
zaRizk7 Jun 16, 2025
fabed46
reformat load_data validation
zaRizk7 Jun 16, 2025
f11b4ba
resolve missing return in pydoc
zaRizk7 Jun 16, 2025
42f8329
update markdown per-section
zaRizk7 Jun 16, 2025
fa62738
update additional requirements to include pyg
zaRizk7 Jun 25, 2025
ca3552d
add args to select top k sites with most subjects
zaRizk7 Jun 25, 2025
b8229a5
add filter_param_grid to automatically handle baseline param_grid
zaRizk7 Jun 25, 2025
97652c8
include param_grid to yacs
zaRizk7 Jun 25, 2025
0a6b2a9
add extra config yml to show how to define custom param_grid
zaRizk7 Jun 25, 2025
87b267b
add a try except to re-raise more descriptive error msg
zaRizk7 Jun 25, 2025
c642791
use relative dir for default data_dir
zaRizk7 Jun 25, 2025
4c4c14f
remove redudant os imports
zaRizk7 Jun 25, 2025
c1e77f5
revise logic for parse_param_grid
zaRizk7 Jun 25, 2025
5f441ce
remove n_jobs for tmi config
zaRizk7 Jun 26, 2025
473f8bb
update copyright
zaRizk7 Jun 26, 2025
76eda1d
add top_k_sites to dataset config
zaRizk7 Jun 26, 2025
61695e9
reduce runtime by taking top-5 site
zaRizk7 Jun 26, 2025
cef1494
Merge branch 'main' of https://github.com/pykale/embc-mmai25 into bra…
zaRizk7 Jun 26, 2025
e6b4b7c
include top_k_site args, parse_param_grid function, select coef from …
zaRizk7 Jun 26, 2025
234a74d
include flowchart
zaRizk7 Jun 26, 2025
17d7b7e
update interpretation with new base config
zaRizk7 Jun 26, 2025
9e65f6c
include sphinx-exercise as requirements to format exercise question
zaRizk7 Jun 27, 2025
2cf6674
correct mapping for missing handedness
zaRizk7 Jun 27, 2025
cfd95f9
add one_hot_encode as optional arguments for preprocess_phenotype_data
zaRizk7 Jun 27, 2025
84f3948
improve error message clarity
zaRizk7 Jun 27, 2025
5267704
add visualization code for phenotype distribution and fc and figure f…
zaRizk7 Jun 27, 2025
030db01
include rst and sphinx style captioning and exercise format
zaRizk7 Jun 27, 2025
1ed4fdb
reorganize config yml
zaRizk7 Jun 27, 2025
aa4cc8e
add additional param_grid for tmi2022
zaRizk7 Jun 27, 2025
393cca2
reorganize resources like logo and add pykale icon
zaRizk7 Jun 27, 2025
090551f
include abide logo for notebook intro
zaRizk7 Jun 27, 2025
ee0a647
add hue grouping for phenotype distribution and upper triangular matr…
zaRizk7 Jun 27, 2025
ebd3b96
replace path to data_dir to standardize with load_data args
zaRizk7 Jun 27, 2025
3f9d9d2
update logo directory
zaRizk7 Jun 27, 2025
d9dc5b3
complete reorganization of notebook structure and add extra contents …
zaRizk7 Jun 27, 2025
08ed488
use latest nilearn version for pip
zaRizk7 Jun 27, 2025
760b240
updates indexing for selecting subjects for visualizing fc
zaRizk7 Jun 28, 2025
15bccdf
add description about compile_results
zaRizk7 Jun 28, 2025
9ec41b6
use top-10 sites for base config
zaRizk7 Jun 28, 2025
3dd3b92
updates notebook content
zaRizk7 Jun 28, 2025
db83128
adds hue order
zaRizk7 Jun 28, 2025
54f522f
swap the abide logo order
zaRizk7 Jun 28, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@

title: PyKale
author: PyKale Contributors
logo: EMBC_logo.png
logo: resources/embc_logo.png
copyright: 2025

# Force re-execution of notebooks on each build.
# See https://jupyterbook.org/content/execute.html
Expand Down Expand Up @@ -33,6 +34,12 @@ repository:
html:
use_issues_button: true
use_repository_button: true
favicon: resources/icon.ico

sphinx:
extra_extensions:
- sphinx_exercise
- sphinx_togglebutton

# Only works for .ipynb files
launch_buttons:
Expand Down
19 changes: 16 additions & 3 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,8 +1,21 @@
# Requirements for the book itself
jupyter-book==1.0.4.post1
sphinx-exercise==1.0.1

# Visualization tools
matplotlib==3.10.3
seaborn==0.13.2

# Data loading, processing, and manipulation
numpy==1.26.4
git+https://github.com/pykale/pykale@main
nilearn==0.10.4
yacs==0.1.8
gdown==5.2.0

# PyKale latest version
git+https://github.com/pykale/pykale@main

# Additional dependencies for the tutorial notebooks
nilearn==0.12.0
torch==2.3.0
torch-geometric==2.3.0
torch-sparse
torch-scatter
File renamed without changes
Binary file added resources/icon.ico
Binary file not shown.
9 changes: 8 additions & 1 deletion tutorials/brain-disorder-diagnosis/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
# Dataset configuration
_C.DATASET = CfgNode()
# Path to the dataset directory
_C.DATASET.PATH = "data"
_C.DATASET.DATA_DIR = "data"
# Name of the brain atlas to use
# Available options:
# - "aal" (AAL)
Expand All @@ -27,6 +27,8 @@
# - "covariance"
# - "tangent-pearson"
_C.DATASET.FC = "tangent-pearson"
# Number of top sites to load for the runtime.
_C.DATASET.TOP_K_SITES = None

# Phenotype configuration
_C.PHENOTYPE = CfgNode()
Expand Down Expand Up @@ -57,6 +59,11 @@
# - "ridge"
# - "auto"
_C.TRAINER.CLASSIFIER = "lr"
# Parameter grid for hyperparameter tuning
# We use list of pairs directly instead of CfgNode for flexibility
# As a workaround for yacs limitation, we use None to indicate
# that we're using the large set of default hyperparameters.
_C.TRAINER.PARAM_GRID = None
# Use non-linear transformations (no interpretability)
_C.TRAINER.NONLINEAR = False
# Search strategy for hyperparameter tuning
Expand Down
30 changes: 28 additions & 2 deletions tutorials/brain-disorder-diagnosis/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,12 @@
import pandas as pd
import gdown

from sklearn.utils._param_validation import StrOptions, validate_params
from sklearn.utils._param_validation import (
StrOptions,
validate_params,
Interval,
Integral,
)


@validate_params(
Expand All @@ -28,12 +33,18 @@
)
],
"vectorize": ["boolean"],
"top_k_sites": [None, Interval(Integral, 1, None, closed="left")],
"verbose": ["boolean"],
},
prefer_skip_nested_validation=False,
)
def load_data(
data_dir="data", atlas="cc200", fc="tangent-pearson", vectorize=True, verbose=True
data_dir="data",
atlas="cc200",
fc="tangent-pearson",
vectorize=True,
top_k_sites=None,
verbose=True,
):
"""
Load functional connectivity data and phenotypic data with gdown support.
Expand All @@ -55,6 +66,10 @@ def load_data(
vectorize : bool, optional (default=True)
Whether to vectorize the upper triangle of the connectivity matrices.

top_k_sites : int or None, optional (default=None)
If specified, only the top K sites with the most subjects will be used.
If None, all sites will be used.

verbose : bool, optional (default=True)
Whether to print download and progress messages.

Expand Down Expand Up @@ -101,6 +116,17 @@ def load_data(
rois = np.array(f.read().strip().split("\n"))
coords = np.load(os.path.join(atlas_path, "coords.npy"))

sites = phenotypes["SITE_ID"].value_counts()
if top_k_sites is not None:
if top_k_sites > len(sites):
raise ValueError(
f"top_k_sites ({top_k_sites}) cannot be greater than the number of sites ({len(sites)})"
)
top_sites = sites.nlargest(top_k_sites).index
mask = phenotypes["SITE_ID"].isin(top_sites)
phenotypes = phenotypes[mask]
fc_data = fc_data[mask]

return fc_data, phenotypes, rois, coords


Expand Down
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
DATASET:
ATLAS: hcp-ica
TOP_K_SITES: 10

CROSS_VALIDATION:
NUM_REPEATS: 1
SPLIT: lpgo
NUM_FOLDS: 1

TRAINER:
NUM_SEARCH_ITER: 20
NUM_SEARCH_ITER: 100
NUM_SOLVER_ITER: 100

RANDOM_STATE: 0
15 changes: 15 additions & 0 deletions tutorials/brain-disorder-diagnosis/experiments/lpgo/tmi2022.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
CROSS_VALIDATION:
SPLIT: lpgo
NUM_FOLDS: 1

TRAINER:
CLASSIFIER: ridge
PARAM_GRID:
- [alpha, [0.25, 0.5, 0.75, 1.0]]
- [domain_adapter__num_components, [50, 150, 300]]
- [domain_adapter__mu, [0.25, 0.5, 0.75, 1.0]]
- [domain_adapter__ignore_y, [True]]
- [domain_adapter__augment, [pre, post, null]]
Copy link

Copilot AI Jun 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider quoting the values 'pre', 'post', and 'null' in the parameter grid to ensure they are parsed as strings in YAML.

Suggested change
- [domain_adapter__augment, [pre, post, null]]
- [domain_adapter__augment, ['pre', 'post', 'null']]

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

@zaRizk7 zaRizk7 Jun 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using 'null' might made yacs mis-parsed it as a string, as the expected result is null = None.

SEARCH_STRATEGY: grid

RANDOM_STATE: 0
14 changes: 14 additions & 0 deletions tutorials/brain-disorder-diagnosis/experiments/skf/base.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
DATASET:
ATLAS: hcp-ica
TOP_K_SITES: 10

CROSS_VALIDATION:
SPLIT: skf
NUM_FOLDS: 5
NUM_REPEATS: 2

TRAINER:
NUM_SEARCH_ITER: 100
NUM_SOLVER_ITER: 100

RANDOM_STATE: 0
16 changes: 16 additions & 0 deletions tutorials/brain-disorder-diagnosis/experiments/skf/tmi2022.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
CROSS_VALIDATION:
SPLIT: skf
NUM_FOLDS: 10
NUM_REPEATS: 5

TRAINER:
CLASSIFIER: ridge
PARAM_GRID:
- [alpha, [0.25, 0.5, 0.75, 1.0]]
- [domain_adapter__num_components, [50, 150, 300]]
- [domain_adapter__mu, [0.25, 0.5, 0.75, 1.0]]
- [domain_adapter__ignore_y, [True]]
- [domain_adapter__augment, [pre, post, null]]
Copy link

Copilot AI Jun 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider quoting the values 'pre', 'post', and 'null' in the parameter grid to ensure they are parsed as strings in YAML.

Suggested change
- [domain_adapter__augment, [pre, post, null]]
- [domain_adapter__augment, ['pre', 'post', 'null']]

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

@zaRizk7 zaRizk7 Jun 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using 'null' might made yacs mis-parsed it as a string, as the expected result is null = None.

SEARCH_STRATEGY: grid

RANDOM_STATE: 0
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading