View in JupyterLab with Open With → Markdown Preview.
- Introduction to Machine Learning for Coders, fast.ai [DONE]
- Deep Learning, fast.ai [in progress]
- Computational Linear Algebra, fast.ai [TBD]
- mlcourse.ai [Schedule to start Sept 9, 2019]
- Deep Learning Book, math prerequesites. [TBD]
- Feature Engineering Talk, slides link. [TBD]
- 3blue1brown Linear Algebra.
- Owen Zhang talk at NYC Data Academy (link). Key ideas on model stacking (using glm on sparse and then feeding into xgb); using leave-one-out target encoding for high cardinality categorical variables; gbm tuning.
- raddar My Journey to Kaggle Grandmasster, Kaggle Days talk link.
- raddar NCAA March Madness competition 1st place model approach; paris madness kernel. link
- CPMP Beyond Feature Engineering and HPO, Kaggle Days talk link.
- Vincent W. Winning with Linear Models link.
- Vincent W. The Duct Tape of Heroes (Bayesian stats; pomegranate) link.
- Szilard, @datascienceLA, On Machine Learning Software link.
- Szilard, @datascienceLA, Better than Deep Learning: GBM link
- Tianqi Chen, XGBoost: A Scaleable Tree Boosting System, June 2016 talk at DataScienceLA link
from pandas_profiling import ProfileReport
pyjanitor
https://github.com/ericmjl/pyjanitor
from sklearn.dummy import DummyRegressor
from sklearn.metrics import make_scorer
scorer = make_scorer(mean_squared_error)
scores_dummy = cross_val_score(baseline, train_df.values, y, cv=RepeatedKFold(n_repeats=100), scoring=scorer)
from eli5 import show_weights
from eli5.sklearn import PermutationImportance
perm = PermutationImportance(model, random_state=1).fit(X_train, y_train)
show_weights(perm, top=50, feature_names=top_columns)
tree_iterpreter
shap
- Jeremy's dendrogram code to inspect for redundant features
- Jeremy's RF code to see if feature can predict if a sample is in/out of the test set. If it can, this means that
- Category Encoding http://contrib.scikit-learn.org/categorical-encoding/index.html
import category_encoders as ce
encoder = ce.LeaveOneOutEncoder(cols=[...])
encoder.fit(X, y)
X_cleaned = encoder.transform(X_dirty)
from fancyimpute import KNN
X_filled_knn = KNN(k=3).fit_transform(X_incomplete)
vectack
package compat withsklearn
api, https://github.com/vecxoz/vecstack
BayesOptCV
- Kaggle Days SF h2o, AutoML gets 8th place on Hackathon link