Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JOSS paper #36

Merged
merged 61 commits into from
Nov 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
3904cf2
Add JOSS paper
rikhuijzer Jun 6, 2023
b424a92
Update paper
rikhuijzer Jun 6, 2023
d0fd581
Merge branch 'main' into rh/paper
rikhuijzer Jun 29, 2023
677dd53
Rewrite section
rikhuijzer Jun 29, 2023
f44cac4
Update `Statement of need`
rikhuijzer Jun 29, 2023
d719cae
Add text
rikhuijzer Jun 29, 2023
f2c3227
Add table
rikhuijzer Jun 29, 2023
728af97
Merge branch 'main' into rh/paper
rikhuijzer Jun 29, 2023
7e96e05
Merge branch 'main' into rh/paper
rikhuijzer Jul 5, 2023
38d36ca
Process feedback
rikhuijzer Jul 5, 2023
a3175ed
Merge branch 'main' into rh/paper
rikhuijzer Jul 5, 2023
d5d728d
Add many datasets to benchmark
rikhuijzer Jul 5, 2023
e0bfed0
Update conclusion
rikhuijzer Jul 5, 2023
7421b35
Update
rikhuijzer Jul 7, 2023
092fef3
Update tables
rikhuijzer Jul 7, 2023
bbf5320
Merge branch 'main' into rh/paper
rikhuijzer Jul 7, 2023
ddee2e2
Process feedback
rikhuijzer Jul 7, 2023
27cc389
Add missing citation
rikhuijzer Jul 7, 2023
aca55af
Add extra sentence in summary
rikhuijzer Jul 7, 2023
9f777d7
Add workflow
rikhuijzer Jul 7, 2023
15b29fa
Clarify the drawback of semi-interpretable models
rikhuijzer Aug 14, 2023
d0cbfe8
Merge branch 'main' into rh/paper
rikhuijzer Aug 24, 2023
e09a81f
Incorporate feedback from Antonello Lobianco
rikhuijzer Aug 24, 2023
4321eb9
Merge branch 'main' into rh/paper
rikhuijzer Aug 24, 2023
3671372
Make text clearer based on some feedback from Ruud
rikhuijzer Aug 25, 2023
0b51dc3
Merge branch 'main' into rh/paper
rikhuijzer Aug 25, 2023
6f39b12
Remove non-paper workflows
rikhuijzer Aug 25, 2023
30e499f
Fix two mistakes in the text
rikhuijzer Aug 26, 2023
9823730
Merge branch 'main' into rh/paper
rikhuijzer Sep 11, 2023
b60215d
Add link to original SIRUS source code
rikhuijzer Sep 11, 2023
5f6bbd1
Add quantitive information about lines of code
rikhuijzer Sep 11, 2023
c09a5ef
Merge branch 'main' into rh/paper
rikhuijzer Sep 11, 2023
3af7744
Merge branch 'main' into rh/paper
rikhuijzer Sep 13, 2023
632d6b1
Update performance results
rikhuijzer Sep 13, 2023
f23847c
Merge `main` into `rh/paper`
rikhuijzer Sep 15, 2023
b54a901
Update code example and output for the fitted model
rikhuijzer Sep 15, 2023
416b41d
Mention some more related papers
rikhuijzer Sep 22, 2023
64b3da7
Comments van Ruud verwerkt
rikhuijzer Sep 25, 2023
0a249a2
Non-interpretable to noninterpratable
rikhuijzer Sep 25, 2023
47d12d2
Suggestie van Ruud om om te draaien
rikhuijzer Sep 25, 2023
90c5695
Update text
rikhuijzer Sep 25, 2023
996702b
Make the introduction flow better
rikhuijzer Sep 26, 2023
3054e70
Make the introduction flow better
rikhuijzer Sep 26, 2023
e67c8ea
Add text to example code
rikhuijzer Sep 26, 2023
3e5d8ec
Merge branch 'main' into rh/paper
rikhuijzer Sep 26, 2023
fbd16d0
Mention Shapley.jl
rikhuijzer Sep 26, 2023
49708a6
Merge branch 'main' into rh/paper
rikhuijzer Sep 27, 2023
04839ad
Improve structure and split introduction into paragraphs
rikhuijzer Sep 27, 2023
e49c151
Fix one typo
rikhuijzer Sep 27, 2023
a271976
Add comparison to the original SIRUS implementation
rikhuijzer Oct 6, 2023
9017cf5
Fix "last details" issue raised by Guillaume
rikhuijzer Oct 10, 2023
e96e061
Incorporate feedback from Antonello
rikhuijzer Oct 10, 2023
16eae13
Fix typos in Table 1
rikhuijzer Oct 10, 2023
7597f67
Fix typo
rikhuijzer Oct 10, 2023
07ef4e3
add citation of Julia (#63)
jbytecode Oct 11, 2023
3767dc5
Update bibtex (#64)
jbytecode Oct 11, 2023
664349b
Revert CI changes
rikhuijzer Nov 22, 2023
97bb385
Merge branch 'main' into rh/paper
rikhuijzer Nov 22, 2023
f4b4b6b
Move workflows to right place again
rikhuijzer Nov 22, 2023
5f2157c
Remove outdated file
rikhuijzer Nov 22, 2023
63d2ec0
Trigger CI
rikhuijzer Nov 22, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions paper/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/usr/bin/env bash

DIR=$(realpath $(dirname "$0"))

docker run --rm \
--volume $DIR:/data \
--user $(id -u):$(id -g) \
--env JOURNAL=joss \
--platform linux/amd64 \
openjournals/inara
13 changes: 13 additions & 0 deletions paper/entr-build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/usr/bin/env bash

DIR=$(realpath $(dirname "$0"))

FILES=$(find "$DIR" \
-iname "*.md" \
-o -iname "*.bib")

echo "Running build.sh..."

echo "$FILES" | entr -s "$DIR/build.sh"

echo "Build finished"
311 changes: 311 additions & 0 deletions paper/paper.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,311 @@
@article{julia,
doi = {10.1137/141000671},
url = {https://doi.org/10.1137%2F141000671},
year = 2017,
month = {jan},
publisher = {Society for Industrial {\&} Applied Mathematics ({SIAM})},
volume = {59},
number = {1},
pages = {65--98},
author = {Jeff Bezanson and Alan Edelman and Stefan Karpinski and Viral B. Shah},
title = {Julia: A Fresh Approach to Numerical Computing},
journal = {{SIAM} Review}
}

@article{ashari2013performance,
title={Performance comparison between Na{\"\i}ve {B}ayes, decision tree and k-nearest neighbor in searching alternative design in an energy simulation tool},
author={Ashari, Ahmad and Paryudi, Iman and Tjoa, A Min},
journal={International Journal of Advanced Computer Science and Applications},
volume={4},
number={11},
year={2013},
publisher={Citeseer},
doi={10.14569/IJACSA.2013.041105}
}

@article{barredo2020explainable,
title={{Explainable Artificial Intelligence (XAI)}: Concepts, taxonomies, opportunities and challenges toward responsible {AI}},
author={Barredo Arrieta, Alejandro and D{\'i}az-Rodr{\'i}guez, Natalia and Del Ser, Javier and Bennetot, Adrien and Tabik, Siham and Barbado, Alberto and Garc{\'i}a, Salvador and Gil-L{\'o}pez, Sergio and Molina, Daniel and Benjamins, Richard and others},
journal={Information fusion},
volume={58},
pages={82--115},
year={2020},
publisher={Elsevier},
doi={10.1016/j.inffus.2019.12.012}
}

@article{benard2021sirus,
title={{SIRUS: Stable and Interpretable RUle Set for classification}},
author={Cl{\'e}ment B{\'e}nard and G{\'e}rard Biau and S{\'e}bastien Da Veiga and Erwan Scornet},
volume={15},
journal={Electronic Journal of Statistics},
number={1},
publisher={Institute of Mathematical Statistics and Bernoulli Society},
pages={427 -- 505},
year={2021},
doi={10.1214/20-EJS1792},
URL={https://doi.org/10.1214/20-EJS1792}
}

@inproceedings{benard2021interpretable,
title={Interpretable random forests via rule extraction},
author={B{\'e}nard, Cl{\'e}ment and Biau, G{\'e}rard and Da Veiga, S{\'e}bastien and Scornet, Erwan},
booktitle={International Conference on Artificial Intelligence and Statistics},
pages={937--945},
year={2021},
organization={PMLR}
}

@article{biau2016random,
title={A random forest guided tour},
author={Biau, G{\'e}rard and Scornet, Erwan},
journal={Test},
volume={25},
pages={197--227},
year={2016},
publisher={Springer},
doi={10.1007/s11749-016-0481-7}
}

@article{blaom2020mlj,
title={{MLJ}: A {J}ulia package for composable machine learning},
author={Anthony D. Blaom and Franz Kiraly and Thibaut Lienart and Yiannis Simillides and Diego Arenas and Sebastian J. Vollmer},
year={2020},
publisher={The Open Journal},
volume={5},
number={55},
pages={2704},
journal={Journal of Open Source Software},
doi={10.21105/joss.02704}
}

@article{breiman2001random,
title={Random forests},
author={Breiman, Leo},
journal={Machine learning},
volume={45},
pages={5--32},
year={2001},
publisher={Springer},
doi={10.1023/A:1010933404324}
}

@inproceedings{chen2016xgboost,
title={{XGBoost}: A scalable tree boosting system},
author={Chen, Tianqi and Guestrin, Carlos},
booktitle={{Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining}},
pages={785--794},
year={2016},
doi={10.1145/2939672.2939785}
}

@article{cranmer2023interpretable,
title={Interpretable machine learning for science with {PySR} and {SymbolicRegression.jl}},
author={Cranmer, Miles},
journal={arXiv preprint arXiv:2305.01582},
year={2023},
doi={10.48550/arXiv.2305.01582}
}

@article{doshi2017towards,
title={Towards a rigorous science of interpretable machine learning},
author={Doshi-Velez, Finale and Kim, Been},
journal={arXiv preprint arXiv:1702.08608},
year={2017},
doi={10.48550/arXiv.1702.08608}
}

@book{eaton1995titanic,
title={Titanic: Triumph and tragedy},
author={Eaton, John P and Haas, Charles},
year={1995},
publisher={WW Norton \& Company}
}

@article{fisher1936use,
title={The use of multiple measurements in taxonomic problems},
author={Fisher, Ronald A},
journal={Annals of eugenics},
volume={7},
number={2},
pages={179--188},
year={1936},
publisher={Wiley Online Library},
doi={10.1111/j.1469-1809.1936.tb02137.x}
}

@misc{haberman1999survival,
author={Haberman, S},
title={{Haberman\'s Survival}},
year={1999},
howpublished={UCI Machine Learning Repository},
doi={10.24432/C5XK51},
url={https://doi.org/10.24432/C5XK51}
}

@misc{hanson2023discourse,
title={[ANN] SIRUS.jl v1.2: Interpretable Machine Learning via Rule Extraction},
author={Hanson, Eric},
year={2023},
url={https://discourse.julialang.org/t/ann-sirus-jl-v1-2-interpretable-machine-learning-via-rule-extraction/100932/3}
}

@article{harrison1978hedonic,
title={Hedonic housing prices and the demand for clean air},
author={Harrison, David and Rubinfeld, Daniel L},
journal={Journal of environmental economics and management},
volume={5},
number={1},
pages={81--102},
year={1978},
publisher={Elsevier},
doi={10.1016/0095-0696(78)90006-2}
}

@book{james2013introduction,
title={An introduction to statistical learning},
author={James, Gareth and Witten, Daniela and Hastie, Trevor and Tibshirani, Robert and others},
volume={112},
year={2013},
publisher={Springer},
doi={10.1007/978-1-0716-1418-1}
}

@article{lundberg2017unified,
title={A unified approach to interpreting model predictions},
author={Lundberg, Scott M and Lee, Su-In},
journal={Advances in neural information processing systems},
volume={30},
year={2017}
}

@book{molnar2022interpretable,
title={Interpretable machine learning},
author={Molnar, Christoph},
year={2022}
}

@software{grisel2023scikit,
author={Olivier Grisel and
Andreas Mueller and
Lars and
Alexandre Gramfort and
Gilles Louppe and
Thomas J. Fan and
Peter Prettenhofer and
Mathieu Blondel and
Vlad Niculae and
Joel Nothman and
Arnaud Joly and
Guillaume Lemaitre and
Loïc Estève and
Jake Vanderplas and
Jérémie du Boisberranger and
manoj kumar and
Hanmin Qin and
Nicolas Hug and
Nelle Varoquaux and
Robert Layton and
Adrin Jalali and
Jan Hendrik Metzen and
(Venkat) Raghav, Rajagopalan and
Johannes Schönberger and
Roman Yurchak and
Julien Jerphanion and
Tom Dupré la Tour and
Wei Li and
Lucy Liu and
Chiara Marmo},
title={scikit-learn/scikit-learn: Scikit-learn},
publisher={Zenodo},
year={2023},
doi={10.5281/zenodo.8363803},
url={https://doi.org/10.5281/zenodo.8363803}
}

@software{sadeghi2022decisiontree,
title={{DecisionTree.jl} - A {Julia} implementation of the {CART} {Decision Tree and Random Forest} algorithms},
author={Ben Sadeghi and Poom Chiarawongse and Kevin Squire and Daniel C. Jones and Andreas Noack and Cédric St-Jean and Rik Huijzer and Roland Schätzle and Ian Butterworth and Yu-Fong Peng and Anthony Blaom},
month = nov,
year = 2022,
publisher={Zenodo},
version={0.12.3},
doi={10.5281/zenodo.7359268}
}

@inproceedings{smith1988using,
title={Using the {ADAP} learning algorithm to forecast the onset of diabetes mellitus},
author={Smith, Jack W and Everhart, James E and Dickson, WC and Knowler, William C and Johannes, Robert Scott},
booktitle={Proceedings of the annual symposium on computer application in medical care},
pages={261},
year={1988},
organization={American Medical Informatics Association}
}

@article{taleb2020statistical,
title={Statistical consequences of fat tails: Real world preasymptotics, epistemology, and applications},
author={Taleb, Nassim Nicholas},
journal={arXiv preprint arXiv:2001.10488},
year={2020},
doi={10.48550/arXiv.2001.10488}
}

@article{innes2018flux,
title = {Flux: Elegant machine learning with {J}ulia},
author={Mike Innes},
journal = {Journal of Open Source Software},
doi={10.21105/joss.00602},
url={https://doi.org/10.21105/joss.00602},
year={2018},
publisher = {The Open Journal},
volume = {3},
number = {25},
pages = {602}
}

@article{ke2017lightgbm,
title={{LightGBM}: A highly efficient gradient boosting decision tree},
author={Ke, Guolin and Meng, Qi and Finley, Thomas and Wang, Taifeng and Chen, Wei and Ma, Weidong and Ye, Qiwei and Liu, Tie-Yan},
journal={Advances in neural information processing systems},
volume={30},
year={2017}
}

@article{lobianco2021betaml,
title={BetaML: The Beta Machine Learning Toolkit, a self-contained repository of Machine Learning algorithms in {J}ulia},
author={Antonello Lobianco},
doi={10.21105/joss.02849},
url={https://doi.org/10.21105/joss.02849},
year={2021},
publisher={The Open Journal},
journal={Journal of Open Source Software},
volume={6},
number={60},
pages={2849}
}

@inproceedings{shalev2007pegasos,
title={{Pegasos: Primal Estimated sub-GrAdient SOlver for SVM}},
author={Shalev-Shwartz, Shai and Singer, Yoram and Srebro, Nathan},
booktitle={Proceedings of the 24th international conference on Machine learning},
pages={807--814},
year={2007},
doi={10.1145/1273496.1273598}
}

@misc{wolberg1995breast,
author={Wolberg, William, Mangasarian, Olvi, Street, Nick, and Street, W.},
title={{Breast Cancer Wisconsin (Diagnostic)}},
year={1995},
howpublished={UCI Machine Learning Repository},
doi={10.24432/C5DW2B}
}

@inproceedings{yu2020veridical,
title={Veridical data science},
author={Yu, Bin},
booktitle={Proceedings of the 13th International Conference on Web Search and Data Mining},
pages={4--5},
year={2020},
doi={10.1073/pnas.1901326117}
}
Loading
Loading