Skip to content

Latest commit

 

History

History
102 lines (86 loc) · 3.3 KB

misc_thoughts_and_reminders.md

File metadata and controls

102 lines (86 loc) · 3.3 KB

misc thoughts and reminders

topics to touch on in some fashion

  • uncertain excursions (talking about uncertainty)

  • cross validation and related

Interludes

  • Practical modeling vs. idealistic (academic) modeling

  • Data Issues

    • missing data
      • recommend assessment of predicted data similarity to observed data
    • data quality and reliability/measurement
    • Sparsity
    • Outliers
    • Imbalanced data
    • 'Big' Data, Scalability
    • Data types
      • Categorical
      • Ordinal
      • Continuous
      • Time series
      • Text
      • Images
      • Audio
      • Video
      • Geospatial
      • etc.
    • Feature Engineering/Pre-processing/Categorical Embeddings/Dimensionality Reduction/Feature Selection/Feature Extraction
    • misc feature types: ordinal, zero-infated, etc.
    • Transformations: std, log, max
    • Data leakage
    • Data drift
    • Data bias (lack of representativeness), vs. statistical bias
    • Misc:
      • Data privacy, security, ethics
      • Data provenance, governance
  • Causality

    • Causal inference
    • Techniques: experimental design, matching, meta-learners, uplift modeling, etc.
  • Model interpretability

    • Feature importance
    • Model explainability
    • Model transparency (e.g. model cards)
    • Model debugging
    • Model fairness: models are ideas, ideas may not be correct, may be ill-posed, or generally off-base, and even wrong by most standards. The data may be inaccurate, or not representative. None of this is the model's fault.
  • Uncertainty

    • Bayesian inference
    • Bootstrap
    • Conformal Predictions
  • Misc Models

    • Graphical/Network models
    • Survival models, censoring
    • MMM
    • Mixture models/Clustering
    • zero-infated/altered/adjusted
    • Time series models
    • Reinforcement learning
    • Ensemble models
    • Regression vs. Classification
  • Inference vs./is not Prediction (somewhere along the way these ideas were conflated). Prediction could be said to be a form of inference, but not all inference is prediction. Inference does not require a data-driven model, nor even the direction of generalization implied by such models. In the modeling sense, inference strongly suggests a causal framework. We can make inference mean whatever we want in the modeling context- causal modeling, understanding the data generating process, prediction, or whatever, but doing so doesn't add clarity because of its long-standing usage outside of the modeling context (which also applies to modeling in a general way). Refs: ISL, https://stats.stackexchange.com/questions/244017/what-is-the-difference-between-prediction-and-inference

random thoughts

issues of focus

  • The initial 'book' was just some algorithms, this book should not be

Folks to quote

  • Literati

    • Arthur C. Clarke
    • Barthelme
    • Bukowski
    • Huxley
  • Music

    • Chuck D.
    • Wu Tang
    • Joy Division
    • David Berman (Oh data, you shine with an evil light...)
  • Film

    • Star wars/trek
  • Science

    • Jacques Cousteau
    • Carl Sagan
    • Tukey

Preface:

  • Book does not need to be read incrementally, take what you need, but realize there is a thread.