misc thoughts and reminders
-
uncertain excursions (talking about uncertainty)
-
Practical modeling vs. idealistic (academic) modeling
-
Data Issues
- missing data
- recommend assessment of predicted data similarity to observed data
- data quality and reliability/measurement
- Sparsity
- Outliers
- Imbalanced data
- 'Big' Data, Scalability
- Data types
- Categorical
- Ordinal
- Continuous
- Time series
- Text
- Images
- Audio
- Video
- Geospatial
- etc.
- Feature Engineering/Pre-processing/Categorical Embeddings/Dimensionality Reduction/Feature Selection/Feature Extraction
- misc feature types: ordinal, zero-infated, etc.
- Transformations: std, log, max
- Data leakage
- Data drift
- Data bias (lack of representativeness), vs. statistical bias
- Misc:
- Data privacy, security, ethics
- Data provenance, governance
- missing data
-
Causality
- Causal inference
- Techniques: experimental design, matching, meta-learners, uplift modeling, etc.
-
Model interpretability
- Feature importance
- Model explainability
- Model transparency (e.g. model cards)
- Model debugging
- Model fairness: models are ideas, ideas may not be correct, may be ill-posed, or generally off-base, and even wrong by most standards. The data may be inaccurate, or not representative. None of this is the model's fault.
-
Uncertainty
- Bayesian inference
- Bootstrap
- Conformal Predictions
-
Misc Models
- Graphical/Network models
- Survival models, censoring
- MMM
- Mixture models/Clustering
- zero-infated/altered/adjusted
- Time series models
- Reinforcement learning
- Ensemble models
- Regression vs. Classification
-
Inference vs./is not Prediction (somewhere along the way these ideas were conflated). Prediction could be said to be a form of inference, but not all inference is prediction. Inference does not require a data-driven model, nor even the direction of generalization implied by such models. In the modeling sense, inference strongly suggests a causal framework. We can make inference mean whatever we want in the modeling context- causal modeling, understanding the data generating process, prediction, or whatever, but doing so doesn't add clarity because of its long-standing usage outside of the modeling context (which also applies to modeling in a general way). Refs: ISL, https://stats.stackexchange.com/questions/244017/what-is-the-difference-between-prediction-and-inference
- optimization function that was developed one the spot in the classroom? RMSProp by Hinton
- https://arxiv.org/pdf/1609.04747.pdf (Ruder)
- The initial 'book' was just some algorithms, this book should not be
-
Literati
- Arthur C. Clarke
- Barthelme
- Bukowski
- Huxley
-
Music
- Chuck D.
- Wu Tang
- Joy Division
- David Berman (Oh data, you shine with an evil light...)
-
Film
- Star wars/trek
-
Science
- Jacques Cousteau
- Carl Sagan
- Tukey
Preface:
- Book does not need to be read incrementally, take what you need, but realize there is a thread.