# Machine Learning, AI, and GenAI Taxonomy and MLOps Frameworks

## 1. Foundational Concepts and Terminology

### Core Machine Learning Concepts

- **Machine Learning (ML)**: Field of study that gives computers the ability to learn without being explicitly programmed
- **Algorithm**: Step-by-step procedure for solving a problem or accomplishing a task
- **Model**: Mathematical representation learned from data to make predictions or decisions
- **Feature**: Individual measurable property or characteristic used as input for a machine learning algorithm
- **Label/Target**: The output value that a model is trained to predict
- **Training**: Process of teaching a model using labeled data
- **Inference**: Process of using a trained model to make predictions
- **Supervised Learning**: Training a model on labeled data
- **Unsupervised Learning**: Finding patterns in unlabeled data
- **Semi-Supervised Learning**: Training with a combination of labeled and unlabeled data
- **Reinforcement Learning**: Learning through interaction with an environment using rewards and penalties
- **Transfer Learning**: Applying knowledge from one trained model to a different but related task
- **Fine-tuning**: Further training a pre-trained model on a specific dataset for a specific task
- **Overfitting**: When a model learns the training data too well, including noise and outliers
- **Underfitting**: When a model is too simple to capture the underlying patterns in the data
- **Bias**: Error due to overly simplistic assumptions in the model
- **Variance**: Error due to too much complexity in the model
- **Hyperparameter**: Parameter set before training begins (not learned from data)
- **Parameter**: Value learned during model training from data

### Artificial Intelligence Concepts

- **Artificial Intelligence (AI)**: Broad field encompassing machine learning and systems designed to mimic human intelligence
- **Narrow/Weak AI**: AI systems designed for specific tasks (all current AI systems)
- **General/Strong AI**: Hypothetical AI with human-like general intelligence across domains
- **Artificial General Intelligence (AGI)**: AI with the ability to understand, learn, and apply knowledge across different domains
- **Superintelligence**: Hypothetical AI that surpasses human intelligence
- **Expert System**: Rule-based AI system that emulates human expertise in a specific domain
- **Cognitive Computing**: AI systems that attempt to simulate human thought processes
- **Computer Vision**: AI field focused on enabling computers to see, identify and process images
- **Natural Language Processing (NLP)**: AI field focused on enabling computers to understand and generate human language

### Generative AI Concepts

- **Generative AI (GenAI)**: AI systems that can generate new content (text, images, audio, video, etc.)
- **Generative Model**: Model that can generate new data instances similar to the training data
- **Large Language Model (LLM)**: Neural network trained on vast text data capable of generating human-like text
- **Foundation Model**: Large model trained on broad data that can be adapted to many downstream tasks
- **Diffusion Model**: Type of generative model that gradually adds and then removes noise to generate data
- **Prompt**: Input text that directs or instructs an AI system to generate specific outputs
- **Prompt Engineering**: Practice of designing effective prompts to get desired outputs from AI systems
- **In-context Learning**: Ability of models to learn from examples provided in the prompt
- **Retrieval-Augmented Generation (RAG)**: Technique that enhances generation by retrieving relevant information
- **Hallucination**: When AI generates factually incorrect or nonexistent information
- **Multimodal Model**: Model capable of processing and generating multiple types of data (text, image, audio)
- **Text-to-Image**: AI models that generate images based on text descriptions
- **Text-to-Audio/Speech**: AI models that convert text to spoken audio
- **Text-to-Video**: AI models that generate video content based on text descriptions

### Neural Network Concepts

- **Neural Network**: Computing system inspired by biological neural networks
- **Deep Learning**: Subset of ML using neural networks with multiple layers
- **Neuron/Node**: Basic computational unit in a neural network
- **Weight**: Parameter associated with connections between neurons
- **Activation Function**: Function determining the output of a neuron
- **Backpropagation**: Algorithm for updating weights based on error gradient
- **Gradient Descent**: Optimization algorithm for minimizing error
- **Epoch**: One complete pass through the entire training dataset
- **Batch**: Subset of training data processed together
- **Layer**: Group of neurons processing input and producing output
- **Hidden Layer**: Layer between input and output layers
- **Convolutional Neural Network (CNN)**: Neural network specialized for image processing
- **Recurrent Neural Network (RNN)**: Neural network with feedback connections for sequential data
- **Transformer**: Neural network architecture using self-attention mechanisms
- **Attention Mechanism**: Technique allowing models to focus on relevant parts of input
- **Encoder-Decoder**: Architecture consisting of modules for encoding input and decoding output
- **Embedding**: Dense vector representation of discrete variables
- **Tokenization**: Process of converting text into discrete tokens for processing

### Causal AI Concepts

- **Causal AI**: AI systems designed to understand and model cause-and-effect relationships rather than just statistical correlations
- **Causality**: Study of how one event influences the occurrence of another event
- **Causal Inference**: Process of determining cause-and-effect relationships from data
- **Counterfactual**: Hypothetical scenario describing what would have happened under different conditions
- **Structural Causal Model (SCM)**: Mathematical framework for representing causal relationships
- **Causal Graph/Diagram**: Visual representation of causal relationships between variables
- **Directed Acyclic Graph (DAG)**: Graph with directed edges and no cycles, used to represent causal relationships
- **Confounding Variable**: Factor that influences both the cause and effect, potentially leading to spurious correlations
- **Treatment Effect**: Causal effect of an intervention or treatment on an outcome
- **Average Treatment Effect (ATE)**: Average effect of a treatment across an entire population
- **Do-Calculus**: Set of rules for manipulating causal graphs to identify causal effects
- **Instrumental Variable**: Variable used to estimate causal relationships when confounding is present
- **Propensity Score**: Probability of receiving treatment based on observed covariates
- **Rubin Causal Model**: Framework for causal inference based on potential outcomes
- **Granger Causality**: Statistical concept where past values of one time series predict another
- **Natural Experiment**: Observational study where treatment assignment resembles random assignment
- **Randomized Controlled Trial (RCT)**: Experimental design where subjects are randomly assigned to treatment groups
- **Causal Discovery**: Automated identification of causal relationships from observational data

## 2. MLOps Terminology and Concepts

### MLOps General Concepts

- **MLOps**: Practices combining Machine Learning, DevOps, and Data Engineering
- **ML Lifecycle**: End-to-end process of developing, deploying, and maintaining ML models
- **Model Registry**: Central repository for storing and versioning models
- **Feature Store**: System for storing, managing, and serving features
- **Model Versioning**: Tracking different iterations of models
- **Experiment Tracking**: Recording parameters, metrics, and artifacts during model development
- **A/B Testing**: Comparing performance of different models or features
- **Model Monitoring**: Tracking model performance in production
- **Drift Detection**: Identifying changes in data or model performance over time
- **Concept Drift**: Change in the statistical properties of the target variable
- **Data Drift**: Change in the statistical properties of the input data
- **CI/CD for ML**: Continuous Integration/Continuous Deployment adapted for ML workflows
- **Model Governance**: Policies and procedures for managing models throughout their lifecycle
- **Model Explainability**: Making model decisions understandable to humans
- **ML Metadata**: Information about datasets, models, and experiments

### Data Engineering Concepts

- **Data Pipeline**: Series of processing steps to transform raw data
- **ETL (Extract, Transform, Load)**: Process of collecting, transforming, and storing data
- **Data Warehouse**: System for storing and analyzing structured data
- **Data Lake**: Repository for storing structured and unstructured data
- **Data Lakehouse**: Combines elements of data warehouses and data lakes
- **Data Catalog**: Inventory of available data assets
- **Data Lineage**: Documentation of data's origins and transformations
- **Data Version Control**: Tracking changes to datasets over time
- **Data Quality**: Measures of data's fitness for use
- **Data Validation**: Checking data against defined rules and constraints

### Model Deployment Concepts

- **Model Serving**: Making models available for inference
- **Inference API**: Interface for interacting with deployed models
- **Model Containerization**: Packaging models with dependencies
- **Orchestration**: Managing and automating workflows
- **Scaling**: Adjusting resources based on demand
- **Microservices**: Architecture using small, independent services
- **Edge Deployment**: Running models on edge devices
- **Model Compression**: Reducing model size for efficient deployment
- **Quantization**: Reducing precision of model parameters
- **Pruning**: Removing unnecessary connections in neural networks
- **Knowledge Distillation**: Training a smaller model to mimic a larger one

## 3. Python Frameworks for MLOps

### Data Processing and Engineering

- **Pandas**: Data manipulation and analysis
- **NumPy**: Numerical computing with arrays and matrices
- **Polars**: Fast DataFrame library (alternative to Pandas)
- **Dask**: Parallel computing with larger-than-memory datasets
- **PySpark**: Python API for Apache Spark (distributed computing)
- **Great Expectations**: Data validation and documentation
- **Delta Lake**: Storage layer for lakehouse architecture

### Machine Learning and Model Training

- **Scikit-learn**: General-purpose ML library
- **PyTorch**: Deep learning framework
- **TensorFlow**: Deep learning framework
- **JAX**: High-performance numerical computing
- **Keras**: High-level neural networks API
- **XGBoost**: Gradient boosting framework
- **LightGBM**: Gradient boosting framework
- **CatBoost**: Gradient boosting framework
- **Hugging Face Transformers**: Pre-trained models for NLP and computer vision
- **Fastai**: High-level deep learning library
- **PyTorch Lightning**: Lightweight PyTorch wrapper for research
- **Langchain**: Framework for developing LLM applications

### Causal Inference and Causal AI Frameworks

- **DoWhy**: End-to-end library for causal inference
- **EconML**: Library for estimating heterogeneous treatment effects
- **CausalML**: Suite of uplift modeling and causal inference methods
- **CausalNex**: Library for causal reasoning and Bayesian Networks
- **CausalImpact**: Inferring causal effects in time series
- **PyMC**: Probabilistic programming for Bayesian modeling
- **CausalPy**: Tools for causal inference and causal discovery
Ã¬- **WhyNot**: Framework for causal counterfactual analysis

### Experiment Tracking and Model Development

- **MLflow**: Platform for ML lifecycle management
- **Weights & Biases**: Experiment tracking and visualization
- **TensorBoard**: Visualization toolkit for TensorFlow
- **Optuna**: Hyperparameter optimization framework
- **Ray Tune**: Hyperparameter tuning at scale

### Model Deployment and Serving

- **FastAPI**: Web framework for building APIs
- **Flask**: Lightweight web framework
- **TorchServe**: Serving framework for PyTorch models
- **TensorFlow Serving**: Serving system for TensorFlow models
- **Seldon Core**: Platform for deploying ML models on Kubernetes
- **Ray Serve**: Framework for scalable model serving
- **Gradio**: Library for creating UIs for ML models
- **Streamlit**: App framework for ML and data science

### MLOps Orchestration and Pipelines

- **Airflow**: Platform for orchestrating workflows
- **Prefect**: Workflow management system
- **Kubeflow**: ML toolkit for Kubernetes
- **Luigi**: Pipeline management framework
- **Kedro**: Development workflow framework

### Model Explainability and Interpretability

- **SHAP (SHapley Additive exPlanations)**: Game theoretic approach to explain model output
- **LIME (Local Interpretable Model-agnostic Explanations)**: Explaining predictions of any classifier
- **InterpretML**: Package for training interpretable models and explaining blackbox systems
- **ELI5**: Library for debugging/inspecting machine learning classifiers

### LLM and GenAI Frameworks

- **LangChain**: Framework for developing applications with LLMs
- **LlamaIndex**: Data framework for LLM applications
- **Haystack**: Framework for building search systems and question answering
