<a href="https://colab.research.google.com/github/mugalan/working/blob/main/Projects_2026.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Simulation and Experimental Verification of a Momentum Based PID Controller for Rigid Body Tracking


## Overview

This project aims at simualtion and experimental verification of a **Geometric Almost Global Locally Exponential PID controller for fully actuated rigid body systems.**   

A key feature of this controller is that it leverages the **linearity of the momentum equations**. By formulating the error dynamics in terms of momentum. Then the control design reduces to applying **standard PID structure** on a linear system ‚Äî despite the nonlinear configuration space. This dramatically simplifies the controller implementation while preserving geometric correctness.

This controller achieves **almost global coordinate-free tracking** of desired rigid body trajectories, avoiding singularities and ambiguities associated with parameterizations of manifolds.


Due to the topological properties of most of the Lie groups such as the rotation group $SO(3)$, **global asymptotic stabilization** is impossible using continuous state-feedback. However, this controller achieves **almost-global locally exponential (AGLE) convergence**, meaning:


* The desired configuration is **asymptotically stable** from almost all initial conditions.
* The only exceptions are a measure-zero set of initial attitudes ‚Äî these are **unstable saddle points** of the error function.
* The convergence is locally exponential

This is the best possible result achievable with smooth feedback on general Lie groups, making the controller **theoretically optimal** under the constraints of continuous control.

## Objective
The project aims at exploring the robsutness properties of the controller to parameter uncertatinties and unmeaureds disturbances using realistic simulations and experimentation using the twin rotor setup in the laboratory.


## Referneces

* D.H.S. Maithripala, Jordan M. Berg,
An intrinsic PID controller for mechanical systems on Lie groups, Automatica, Volume 54, 2015, Pages 189-200, ISSN 0005-1098,
[PDF](https://www.sciencedirect.com/science/article/pii/S0005109815000060)

* Rama Seshan Chandrasekaran, Ravi N. Banavar, Arun D. Mahindrakar, D.H.S. Maithripala,
Geometric PID controller for stabilization of nonholonomic mechanical systems on Lie groups, Automatica, Volume 165, 2024, 111658, ISSN 0005-1098, [PDF](https://www.sciencedirect.com/science/article/pii/S0005109824001511)

* D. H. S. Maithripala, J. M. Berg and W. P. Dayawansa, "Almost-global tracking of simple mechanical systems on a general class of Lie Groups," in IEEE Transactions on Automatic Control, vol. 51, no. 2, pp. 216-225, Feb. 2006, doi: 10.1109/TAC.2005.862219. [PDF](https://ieeexplore.ieee.org/abstract/document/1593897)

* https://github.com/mugalan/intrinsic-rigid-body-control-estimation

# Simulation and Experimental Verification of an Intrinsic Extended Kalman Filter for Rigid Body Attitude Tracking

## Overview

This project aims at the experimental validation of a **geometrically consistent formulation** EKF on the group of rigid body motion $SE(3)$

Instead of running an EKF in a Euclidean chart (where you must pick coordinates and retractions), the **IEKF** keeps the state on the group $G$ (e.g., $SO(3), SE(3)$) and defines the estimation error **invariantly** using group operations. Then one propagates the state with the group exponential, put process noise in the Lie algebra, and linearize the **error dynamics** (not the state) so that the linearized model depends only on inputs‚Äînot on the current estimate. This yields better consistency, especially under large rotations or fast motion.

### Why IEKF (vs. ‚Äúplain‚Äù EKF)

* **Geometric consistency:** State stays on $G$ updates use $\exp$ instead of add-then-reproject.
* **Trajectory-independent linearization:** Invariant error dynamics make filter parameters depend on inputs and not the current estimate. This gives rise to less linearization bias and better consistency under aggressive motion.
* **Robustness at high rates/large rotations:** Error lives in the vector space of a Lie algebra, even when the state undergoes large motions on $G$.
* **Clean observability structure:** Invariant formulation exposes what‚Äôs observable from your actions/outputs.
* **Implementation clarity:** Same pattern for different types of rigid body motion.


## Objective

This project has two parts. **First**, we will validate the IEKF for rigid-body attitude, and an extension with **dead-reckoning** for short-term position estimation on a **public dataset** (e.g., M2DGR / Oxford RobotCar) to benchmark the filters under controlled, repeatable conditions. **Then**, we will experimentally verify the filter on hardware using IMU measurements with the [BNO055](https://www.adafruit.com/product/2472?srsltid=AfmBOoosqAqLrgpJstjU4Gk0sP7RAk6JLxFUTCUPnOxTxecSNgejZiNz), comparing accuracy across large-angle and high-rate maneuvers.


## References


* https://github.com/mugalan/intrinsic-rigid-body-control-estimation
* **Barrau, A. & Bonnabel, S.**
  *The Invariant Extended Kalman Filter as a Stable Observer.* **SICON**, 2017. (Foundational IEKF theory.)
  *The Invariant EKF: Theory and Applications to Robotics.* (tutorial/overview, 2018‚Äì2020; see arXiv:1410.1465 and follow-ups.)
* **Bonnabel, S.** *Left-Invariant Extended Kalman Filter on Lie Groups.* **CDC**, 2007. (Early invariant observer.)
* **Bloesch, M., et al.** *A Tutorial on SE(3) Estimation.* (Various tutorials; error-state filtering on manifolds.)
* **Sola, J.** *Quaternion kinematics for the error-state Kalman filter.* 2017. (Clear notes on attitude ES-KF/IEKF.)
* **Barfoot, T.** *State Estimation for Robotics.* 2017. (Manifold/SE(3) estimation primer.)
* **Forster, C., et al.** *IMU Preintegration on Manifolds.* **TRO**, 2017. (Closely related manifold filtering.)
* [BNO055](https://www.adafruit.com/product/2472?srsltid=AfmBOoosqAqLrgpJstjU4Gk0sP7RAk6JLxFUTCUPnOxTxecSNgejZiNz)

## Public Datasets: 9-axis IMU, GPS, and Ground Truth

- **M2DGR (SJTU)** ‚Äî ground robot; **9-axis IMU (Handsfree A9)**, **u-blox GNSS**, and high-accuracy GT (Vicon/RTK depending on sequence).  
  [Dataset & paper](https://github.com/SJTU-ViSYS/M2DGR)

- **Oxford RobotCar** ‚Äî vehicle platform; automotive IMU + GPS/INS, with **centimetre-level RTK ground truth** add-on sets. *(No magnetometer in OXTS; still excellent GPS+IMU+GT.)*  
  [Dataset](https://robotcar-dataset.robots.ox.ac.uk/) ¬∑ [RTK Ground Truth](https://robotcar-dataset.robots.ox.ac.uk/ground_truth/)

- **NCLT (UMich)** ‚Äî long-term campus drives; multiple IMUs, GPS (standard & RTK), LiDAR/vision; strong GT resources. *(Mag availability varies; treat as IMU+GPS+GT.)*  
  [Dataset](https://robots.engin.umich.edu/nclt/)

- **UrbanNav (PolyU/IPNL)** ‚Äî urban canyon focus; **u-blox F9P GNSS**, dedicated IMU, **Applanix POS LV** GT (vehicle). *(IMU typically 6-axis; mag not guaranteed.)*  
  [Dataset](https://www.ipin-conference.org/urbaNav-dataset/)

- **KITTI** ‚Äî AV classic; OXTS GPS/IMU and lidar/camera-derived GT. *(IMU high-grade; no magnetometer.)*  
  [Dataset](http://www.cvlibs.net/datasets/kitti/)

- **RoNIN** ‚Äî smartphone inertial navigation; **accel/gyro/mag** with trajectories; some logs include GPS fields.  
  [Dataset](https://ronin.cs.sfu.ca/)

- **IDOL** ‚Äî iPhone **accel/gyro/mag** + GT from LiDAR-VIO rig (Kaarta Stencil); indoor trajectories (GPS not primary).  
  [Dataset](https://github.com/NIH-CCB-IDOL/IDOL)


# Development of a Simulation Platform for Testing Distributed Control Architecture of Decentalized Rigid Body Multi-Agent systems

## üìò Project Overview

This project aims to develop a **scalable, physics-faithful simulation platform** for studying and verifying **distributed control architectures** in **decentralized rigid-body multi-agent systems** evolving on the Lie group $SE(3)$.  
Each agent represents a fully actuated rigid body ‚Äî such as a quadrotor or spacecraft ‚Äî with six degrees of freedom and independent local dynamics, control, and communication interfaces.

The platform will enable:
- **Realistic multibody dynamics**, using accurate physics rather than kinematic approximations.  
- **Scalable simulations** with tens or hundreds of agents running in parallel.  
- **Decentralized control architectures**, where each agent executes its own control law and exchanges information only with neighboring agents.  
- **Configurable communication topologies**, supporting fixed, random, or time-varying network graphs.  
- **Distributed synchronization and consensus experiments**, under realistic conditions including link dropouts, delays, and asynchronous updates.

This environment will serve as a research-grade testbed for developing and evaluating **geometric control laws**, **formation maintenance**, and **multi-agent coordination** strategies on $SE(3)$.

---

### ‚öôÔ∏è Motivation

While several robotic simulators exist ‚Äî including **Gazebo**, **PyBullet**, **CoppeliaSim**, and **Drake** itself ‚Äî none currently provide a **unified framework** that combines:

- Full **Lie-group-based rigid-body dynamics**,  
- Modular **multi-agent abstraction**,  
- Explicit **distributed communication models**, and  
- Scalable **parallel execution**.

Existing tools are typically centralized, designed for single-robot physics, or rely on simplified kinematic models without intrinsic geometric structure.  
Thus, there is **no single platform** that accommodates *decentralized control*, *realistic physics*, and *agent-level communication* within one coherent simulation framework.

---

### üß† Why Drake (MIT)?

**Drake (MIT)** is chosen as the base of this platform because it provides:
- **Native SE(3) multibody dynamics**, handled via the `MultibodyPlant` framework.  
- **Accurate inertial, constraint, and contact modeling**, suitable for aerial and space robots.  
- **System/Diagram abstraction**, allowing modular composition of multiple interacting subsystems.  
- **Symbolic and differentiable mechanics**, useful for verifying Lyapunov-based stability and control laws.  
- **Python and C++ APIs**, making it ideal for coupling with high-level multi-agent orchestration.

Drake‚Äôs modular structure allows each agent to be represented as an independent **`MultibodyPlant` + controller subsystem**, connected through a **communication graph layer** that defines inter-agent information flow.

---

### üß© How This Project Extends Drake

The proposed system will **extend Drake** by developing an upper-layer framework that manages:
1. **Agent objects:** each encapsulating a local instance of Drake's `MultibodyPlant`, controller logic, and state observer.  
2. **Communication network:** a distributed message-passing system using `networkx` and `asyncio` or `mpi4py` to simulate realistic topologies and delays.  
3. **Global orchestrator:** a parallel simulation manager coordinating agents' time steps and interactions.  For example using `LangGraph`
4. **Visualization tools:** live rendering via Drake's `MeshCat` and additional 3D plotting of communication graphs, trajectories, and formation metrics.

Together, these extensions will enable **realistic, decentralized, and scalable multi-agent simulations**, bridging the gap between **theoretical geometric control** and **practical distributed robotics** experimentation.

---


## References

* D. H. S. Maithripala, J. M. Berg, D. H. A. Maithripala and S. Jayasuriya,  **"A geometric virtual structure approach to decentralized formation control,"**  *2014 American Control Conference*, Portland, OR, USA, 2014, pp. 5736‚Äì5741.  [PDF](https://ieeexplore.ieee.org/abstract/document/6859451)

* D. H. S. Maithripala and J. M. Berg,  **"An intrinsic PID controller for mechanical systems on Lie groups,"**  *Automatica*, Vol. 54, 2015, pp. 189-200.  [PDF](https://www.sciencedirect.com/science/article/pii/S0005109815000060)

* Rama Seshan Chandrasekaran, Ravi N. Banavar, Arun D. Mahindrakar, D. H. S. Maithripala,  **"Geometric PID controller for stabilization of nonholonomic mechanical systems on Lie groups,"**  *Automatica*, Vol. 165, 2024, 111658. [PDF](https://www.sciencedirect.com/science/article/pii/S0005109824001511)

* D. H. S. Maithripala, J. M. Berg, W. P. Dayawansa,  **"Almost‚Äìglobal tracking of simple mechanical systems on a general class of Lie Groups,"**  *IEEE Transactions on Automatic Control*, Vol. 51, No. 2, pp. 216‚Äì225, 2006. [PDF](https://ieeexplore.ieee.org/abstract/document/1593897)

* Reza Olfati-Saber and Naomi Ehrich Leonard,  **"Consensus and Cooperation in Networked Multi-Agent Systems,"**  *Proceedings of the IEEE*, Vol. 95, No. 1, 2007, pp. 215‚Äì233.  [PDF](https://ieeexplore.ieee.org/document/4118472)

* https://github.dev/mugalan/multi-agents-on-a-lie-group

# From Insight to Forecast: Bayesian & Kalman Models for Garment Productivity

## Overview

Build a rigorous, time-aware analytics and forecasting pipeline‚Äîgrounded in **Bayesian modeling** and **Kalman filtering (Dynamic Linear Models **‚Äîto **explain**, **predict**, and **optimize** team-day productivity in
a garment factory using the [UCI dataset](https://archive.ics.uci.edu/ml/datasets/Productivity%2BPrediction%2Bof%2BGarment%2BEmployees).

The results will be compared with the findings of the paper [*Mining the productivity data of the garment industry*](https://dl.acm.org/doi/abs/10.1504/ijbidm.2021.118183). Specifically, we will **cross-validate our Bayesian findings against the paper‚Äôs baselines** on the questions it actually studied: (i) **drivers of low productivity**‚Äîwe'll compare our posterior effect sizes and interaction terms (e.g., incentive, WIP, workers, and incentive √ó WIP) with the paper‚Äôs rule-based importance and thresholds (notably the ~69.5 BDT incentive split); (ii) **classification performance**‚Äîfor both **3-class** (low/moderate/normal) and **2-class** (low vs. not-low) formulations, we will report accuracy/AUC on the paper‚Äôs split and benchmark against its top models (tree ensemble; GBT + SMOTE); (iii) **actionable thresholds**‚Äîwe‚Äôll check whether our Bayesian ‚Äúwhat-if‚Äù and monotone/spline estimates imply breakpoints consistent with the paper‚Äôs rules; and (iv) **class-imbalance handling**‚Äîwhile Bayes won‚Äôt use SMOTE, we will compare our calibrated probability outputs and cost-sensitive results to the paper‚Äôs oversampled 2-class scores to assess whether we match or improve their detection of ‚Äúlow‚Äù days. (Time-aware forecasting, drift, and causal policy optimization are outside the paper‚Äôs scope and will be presented as added value rather than direct validations.)

We will also validate the method against another dataset: **SECOM (UCI, semiconductor manufacturing).** The [SECOM dataset](https://archive.ics.uci.edu/ml/datasets/SECOM) contains **1,567 runs √ó 590+ anonymized sensor features** with a **pass/fail yield label** (‚àí1 = pass, 1 = fail) and **timestamped records with substantial missingness and class imbalance**, making it a strong proxy for line-productivity analytics and ideal for **Bayesian GLMs with sparsity priors**, **Dynamic Linear Models/Kalman filtering** (to capture sensor or per-tool drift), and **time-aware validation**. Numerous studies use SECOM to benchmark preprocessing and modeling strategies‚Äîe.g., a **comprehensive evaluation** on imbalanced, high-dimensional industrial data that analyzes SECOM alongside other sets ([Salem et al., 2018](https://www.mdpi.com/2504-2289/2/4/30)) and recent **preprocessing reviews** focused on missing-data handling and rebalancing for semiconductor processes ([Park, 2024](https://pmc.ncbi.nlm.nih.gov/articles/PMC11398254/)). For drift-aware workflows, Kalman-based drift suppression in industrial sensing provides a template you can adapt to SECOM‚Äôs multivariate signals ([Arpaia et al., 2022](https://www.mdpi.com/1424-8220/22/1/182)).

## Questions that will be attempted to be answered (feasible & high-value)

**Descriptive/diagnostic**

* How do **incentive**, **WIP**, **overtime**, **SMV**, and **staffing** relate to productivity? Are effects **non-linear** (e.g., diminishing returns to incentive)?
* Do **interactions** (e.g., **WIP √ó workers**, **incentive √ó WIP**) meaningfully shift outcomes?
* Are there **calendar effects** (weekday, quarter-of-month) and **style-change dips**?

**Predictive**

* What is each team‚Äôs **one-day-ahead** productivity forecast with calibrated uncertainty?
* What is the **probability of a low-productivity day** tomorrow, $\Pr(y_{t+1}<\tau)$?

**Prescriptive/what-if**

* Expected uplift from **raising incentives** by $\Delta$, **reducing WIP** by $\Delta$, or **adding $k$ workers**.
* Simple, actionable **thresholds** (e.g., incentive $\geq \tau$, WIP $\leq \tau$) that minimize low-productivity risk.

**Stability/heterogeneity**

* Do the effects of **incentive/WIP** **drift over time** (change-points)?
* Which **teams** benefit most/least from specific levers (heterogeneous effects)?

## Dataset & Granularity

* Unit: **team √ó day** (Jan‚ÄìMar 2015).
* Target: **actual_productivity $\in [0,1]$**.
* Key covariates: incentive, WIP (with missingness), over_time, SMV, no_of_workers, no_of_style_change, plus department, day, quarter.

## Methodology (Bayes + Kalman, end-to-end)

### 1) Leakage-safe time splits

Train up to date **$T$**, validate on **$(T,;T+\Delta]$**, test on a **held-out future block**. Group by **team** where needed.

### 2) Data model (two complementary lenses)

**A. Bayesian Beta (or Logit-Normal) Regression**

* **Likelihood:** $y_t \in (0,1)$ via Beta regression with $\text{logit}(\mu_t)=X_t^\top \beta + u_{\text{team}}$.
* **Priors:** weakly informative; hierarchical **team effects** $u_{\text{team}}$.
* **Nonlinearity:** monotone splines for **incentive**; interaction terms (**WIP x workers**, **incentive √ó WIP**).
* **Missing WIP:** model jointly (latent) or include a missingness indicator; compare both.

**B. Dynamic Linear Model (Kalman) = Time-varying Linear Regression**

* Transform $y_t$ via logit to $z_t \in \mathbb{R}$.
* **Observation:** $z_t = X_t^\top \theta_t + v_t, v_t\sim\mathscr N(0,R)$.
* **State (drift):** $\theta_t = \theta_{t-1} + w_t, w_t\sim\mathscr N(0,Q)$.
* **Seasonality:** weekday states or dummies; **change-points** via temporary ($Q$) inflation at style changes.
* **Latent WIP option:** augment state with a local-level WIP process; when observed, WIP updates as a measurement; when missing, it's inferred.

> **Why both?** The **Bayesian GLM** gives interpretable global/pooled effects with credible intervals; the **DLM** captures **time variation** and supports **rolling forecasts** and **online updates**.

### 3) Hyperparameter learning

* **Kalman EM** (estimate $Q,R$) or discount-factor tuning on a rolling validation.
* For Bayes, **prior sensitivity** and **PSIS-LOO** to compare link functions and spline complexity.

### 4) Inference to decisions

* Compute **uplift curves**: $\mathbb{E}[\Delta y \mid \Delta \text{incentive}]$, stratified by team/weekday.
* **Risk controls:** $\Pr(y_{t+1}<\tau)$ alarms; recommend **threshold rules** (e.g., raise incentive if risk $> p^*$ and WIP $\leq$ cap).
* **Budgeted allocation:** greedy optimization using posterior means/variances to maximize expected productivity under an incentive budget.

## Outputs & Deliverables

* **Reports & dashboards**

  * Driver analysis (posterior effect sizes, PDP/ICE-style summaries with uncertainty).
  * Rolling **1-day-ahead forecasts** + 80/95% bands by team.
  * **Risk of low day** heatmap and **what-if calculators** (incentive/WIP/workers).
* **Model artifacts**

  * PyMC model for Beta/Logit-Normal regression (posterior saved).
  * NumPy/Python DLM with EM; utility to run per-team or pooled.
* **Playbook**

  * Data prep (time splits, encoding), missing-WIP handling, leakage checks.
  * Threshold recommendation rules and budget-allocation procedure.

## Validation & Metrics

* **Forecast:** rolling **MAE/RMSE** on original scale; **calibration** (coverage of prediction intervals).
* **Classification proxy:** AUC/PR for ‚Äúlow day‚Äù threshold.
* **Stability:** coefficient-drift diagnostics; change-point flags aligned with style changes.
* **Decision utility:** expected gain vs. baseline incentives (offline policy evaluation).

## Risks & Mitigations

* **Missing WIP (MNAR risk):** include a missingness model and latent-state WIP variant; run sensitivity analysis.
* **Small sample per team:** hierarchical pooling and partial pooling of $Q$ across teams.
* **Non-stationarity:** allow time-varying coefficients; change-point handling.
* **Leakage:** strict temporal splits; audit feature timestamps (e.g., ensure WIP is start-of-day or treated as latent).

## References

Here‚Äôs a Markdown-ready reference list including the **original garment dataset and paper**, plus the SECOM set and related methods papers:

* **Imran, A. A., Rahim, M. S., & Ahmed, T. (2021). *Mining the productivity data of the garment industry*. International Journal of Business Intelligence and Data Mining.** DOI: 10.1504/IJBIDM.2021.118183. [link](https://www.inderscienceonline.com/doi/10.1504/IJBIDM.2021.118183)
* **UCI ML Repository ‚Äî *Productivity Prediction of Garment Employees* (Dataset).** DOI: 10.24432/C51S6D. [link](https://archive.ics.uci.edu/ml/datasets/Productivity%2BPrediction%2Bof%2BGarment%2BEmployees)
* **McCann, M., & Johnston, A. (2008). *SECOM* (Dataset). UCI ML Repository.** DOI: 10.24432/C54305. [link](https://archive.ics.uci.edu/ml/datasets/SECOM)
* **Salem, M., Taheri, S., & Yuan, J.-S. (2018). *An Experimental Evaluation of Fault Diagnosis from Imbalanced and Incomplete Data for Smart Semiconductor Manufacturing*. Big Data and Cognitive Computing, 2(4), 30.** [link](https://www.mdpi.com/2504-2289/2/4/30)
* **Park, H. J. (2024). *Study on Data Preprocessing for Machine Learning Based on Semiconductor Manufacturing Processes*. Sensors.** (open-access review). [link](https://pmc.ncbi.nlm.nih.gov/articles/PMC11398254/)
* **Arpaia, P., Buzio, M., Di Capua, V., Grassini, S., Parvis, M., & Pentella, M. (2022). *Drift-Free Integration in Inductive Magnetic Field Measurements Achieved by Kalman Filtering*. Sensors, 22(1), 182.** [link](https://www.mdpi.com/1424-8220/22/1/182)


#HVAC Controller Simulation and Online Parameter Identification Using EKF/RL (Reserved)

## Overview

This project aims to create an AI-assisted, simulation-driven framework for intelligent building HVAC, modeling, control and characterization. First, we will build a multi-zone HVAC modeling and simulation platform on top of EnergyPlus. This platform will support natural-language interaction for configuring buildings, HVAC systems, and schedules, and provide a user-friendly interface for visualizing simulation outputs (temperature, humidity, $CO_2$, loads, flows, etc.).

Using that platform, we will develop an Extended Kalman Filter (EKF)‚Äìbased online parameter estimation method to identify key unknown building/zone parameters in real time. The target parameters include: (i) effective sensible thermal capacitance, (ii) overall heat transfer conductance to ambient, (iii) zone occupancy, and (iv) infiltration / outdoor air mass flow rates. We will verify the EKF estimator entirely in simulation by injecting known ‚Äúground truth‚Äù parameters into EnergyPlus and checking recovery accuracy and convergence.

Beyond the EKF, we will explore alternative adaptive/learning strategies for online estimation of the same parameters (for example, recursive least squares variants or data-driven/ML estimators), and compare their performance and robustness.

Finally, we will experimentally validate the estimator on a controlled laboratory-scale single-zone HVAC test rig. The goal is to demonstrate that the identified parameters track real physical behavior and can serve as a foundation for future decentralized multizone HVAC control and optimization.


## Objectives

* Develop a multi-zone HVAC modeling and simulation platform that is based on **EnergyPlus**. A conversationally assited platform for interactively building the model and convinient user interface for simuation data visualization.

* Develop a EKF based parameter estimation method to extract unknown model parameters such as effective sensible thermal capacitance, overall heat transfer conductance, occupancy, and infiltrated air mass flowrates.

* Simulation verification of the EKF using the developed simulation platform.

* Explore other other learning strtegies for the online estimation of unknown paramters such as effective sensible thermal capacitance, overall heat transfer conductance, occupancy, and infiltrated air mass flowrates.

* Experimental validation of the parameter estimation on a laboratory scale single zone HVAC setup



## References (Selected)


**EKF-based Estimation in Buildings**  
3. **Madsen, H., & co‚Äëauthors.** ‚ÄúGrey‚Äëbox modeling and Kalman filtering for building thermal dynamics and parameter estimation.‚Äù (Various works, e.g., DTU Technical University reports and journal articles, 2000s‚Äì2010s.)  
4. **Wang, S., & Chen, Q.** ‚Äú$CO_2$‚Äëbased occupancy estimation and ventilation control: modeling and state estimation approaches.‚Äù *Building and Environment*, ~2012 (methods include state‚Äëspace estimation such as EKF variations).

**Simulation Engine**  
5. **EnergyPlus Documentation.** *Engineering Reference* and *Input Output Reference*. U.S. DOE/ORNL/NREL. Available at: <https://energyplus.net/documentation>




#Decentralized HVAC Controller Development and Experimental Validation (Reserved)

## Overview

This 28-week undergraduate project develops and validates a **decentralized HVAC control system** for a multi-zone building. Each zone runs a local controller that regulates **temperature**, **humidity ratio**, and **$CO_2$** concentration using zone **mass flow rate** commands, while a lightweight **central Air‚ÄëHandling Unit (AHU) coordinator** chooses AHU setpoints (supply air temperature, humidity ratio, and $CO_2$) to meet aggregate load and ventilation requirements.

Simulation is performed in **EnergyPlus**, with final **bench‚Äëscale experimental verification** on a single‚Äëzone testbed.

---

## Objectives

1. **Control**: Design per‚Äëzone controllers that drive zone states into a comfort/safety set while minimizing energy use.
2. **Coordination**: Implement a central AHU policy that chooses $(T_{sa}, \omega_{sa}, c_{sa})$ based on the zones' aggregate thermal/moisture/IAQ needs.
4. **Validation**: Demonstrate closed‚Äëloop performance and energy savings in EnergyPlus; verify modeling/estimation on a single‚Äëzone physical rig.

---

## Problem Setting & Notation

For each zone $z \in \{1,\dots,N\}$, define the zone state
\begin{align}
\mathbf{x}_z(t) \;\triangleq\; \begin{bmatrix} T_z(t) \\ \omega_z(t) \\ c_z(t) \end{bmatrix},
\qquad
\mathbf{x}_z(t) \in \mathscr{D},
\end{align}
where the admissible set $\mathscr{D}$ encodes comfort/IAQ limits:
\begin{align}
\mathscr{D} \;=\; \Big\{ (T_z,\omega_z,c_z)\; \Big|\; T_{\min}\le T_z \le T_{\max},\; \omega_{\min}\le \omega_z \le \omega_{\max},\; c_z \le c_{\max} \Big\}.
\end{align}

**Control inputs** are zone **mass flow rates** $\dot m_{sa,z}(t)$ supplied from an AHU delivering air at $(T_{sa}(t), \omega_{sa}(t), c_{sa}(t))$. The AHU variables are coordinated centrally.

A simplified, control‚Äëoriented continuous‚Äëtime model for thermal/moisture/$CO_2$ dynamics is
\begin{align}
\dot T_z = \frac{1}{C_{T,z}}\Big( U_z(T_o-T_z) + \dot m_{sa,z}c_p(T_{sa}-T_z) + Q^{\mathrm{int}}_{T,z} \Big),
\end{align}
\begin{align}
\dot \omega_z = \frac{1}{C_{\omega,z}}\Big( k_z(\omega_o-\omega_z) + \dot m_{sa,z}(\omega_{sa}-\omega_z) + Q^{\mathrm{int}}_{\omega,z} \Big),
\end{align}
\begin{align}
\dot c_z = \frac{1}{V_z}\Big( \dot m_{sa,z}(c_{sa}-c_z) + \dot m^{\mathrm{inf}}_z(c_o-c_z) + q^{\mathrm{occ}}_z \Big).
\end{align}
Unknowns such as $C_{T,z}, C_{\omega,z}, U_z, k_z, q^{\mathrm{occ}}_z$ are **estimated online** (EKF).

**Goal**: choose $\dot m_{sa,z}(t)$ and AHU setpoints $(T_{sa}, \omega_{sa}, c_{sa})$ to minimize energy use (fan + coil/pump surrogates) **subject to** $\mathbf{x}_z(t)\in\mathscr{D}$ and equipment limits.

---

## System Architecture

- **Zone Controllers (decentralized):** Each zone computes $\dot m_{sa,z}$ using local measurements $(T_z,\omega_z,c_z)$, forecasts (optional), and EKF state/parameter estimates. Controllers can be PI/MPC with soft comfort constraints.
- **AHU Coordinator (centralized, lightweight):** Aggregates zone demands to pick $(T_{sa},\omega_{sa},c_{sa})$ and total supply flow $\sum_z \dot m_{sa,z}$. Examples: rule‚Äëbased ‚Äúcooling‚Äëdominant/heating‚Äëdominant‚Äù logic or small convex program.
- **Estimator (per zone):** EKF/UKF estimates thermal capacity, effective envelope conductance, and occupancy‚Äërelated gains from data (T, RH, $CO_2$).
- **Supervisor:** Enforces safety limits, fault flags, and fallbacks (e.g., revert to baseline schedules if estimates diverge).

---

## Research Questions

1. **Decentralization vs. performance:** How close can zone‚Äëwise controllers get to a centralized optimum with only minimal coordination?
2. **Robustness:** How sensitive is performance to modeling error, weather disturbances, and actuator limits?
3. **Observability:** What minimal sensing (T, RH, $CO_2$, flows) yields reliable EKF parameter/occupancy estimates?
4. **AHU policy:** What simple policies for $(T_{sa},\omega_{sa},c_{sa})$ work well with diverse zone needs?

---

## Metrics & Evaluation

- **Comfort/IAQ compliance**: fraction of time $\mathbf{x}_z(t)\in \mathscr{D}$.  
- **Energy surrogate**: fan power $ \propto \big(\sum_z \dot m_{sa,z}\big)^\alpha$, coil loads $\propto \dot m_{sa}(T_{mix}-T_{sa})$, latent $\propto \dot m_{sa}(\omega_{mix}-\omega_{sa})$.  
- **Stability/robustness**: boundedness under forecast/model error; constraint violations (count, magnitude).  
- **Estimator quality**: parameter RMSE, occupancy estimation error, innovation whiteness.


---

## Resources

- EnergyPlus with Python API (runtime callbacks for reading/writing actuators & variables).  
- Standard Python stack (NumPy/SciPy, plotting, optimization).  
- Single‚Äëzone testbed (ducted fan, heating/cooling source, T/RH/$CO_2$ sensors, DAQ).

---

## Expected Outcomes

- A working **decentralized HVAC** controller with a simple **AHU coordinator**.  

---

## References (Selected)

**Decentralized / Distributed HVAC Control**  
1. **Ma, Y., Kelman, A., Daly, A., & Borrelli, F.** ‚ÄúDistributed Model Predictive Control for Building Temperature Regulation.‚Äù *American Control Conference (ACC)*, 2012.  
2. **Dounis, A. I., & Caraiscos, C.** ‚ÄúAdvanced control systems engineering for energy and comfort management in a building environment‚ÄîA review.‚Äù *Renewable and Sustainable Energy Reviews*, 2012.
3. **Yang, Y., Srinivasan, S., Hu, G., & Spanos, C. J.** (2021). "Distributed Control of Multi-Zone HVAC Systems Considering Indoor Air Quality." arXiv:2003.08208.


**Simulation Engine**  
5. **EnergyPlus Documentation.** *Engineering Reference* and *Input Output Reference*. U.S. DOE/ORNL/NREL. Available at: <https://energyplus.net/documentation>

> The list above provides high‚Äëimpact entry points. During the project, refine with the exact editions/DOIs most aligned to your chosen model structures and estimation variants.


#Maintenance Management

## References

### Scientific Literature using Graph / Knowledge Graph Approaches in Industrial Maintenance

1. **Xia et al. (2023)**  
   *Maintenance planning recommendation of complex industrial equipment based on knowledge graph and graph neural network*  
   [Reliability Engineering & System Safety, Vol 232](https://doi.org/10.1016/j.ress.2022.109068)  
   DOI: 10.1016/j.ress.2022.109068

2. **Lou et al. (2023)**  
   *Knowledge Graph Construction Based on a Joint Model for Equipment Maintenance*  
   [Mathematics, 11(17): 3748](https://www.mdpi.com/2227-7390/11/17/3748)  
   DOI: 10.3390/math11173748

3. **Teern et al. (2022)**  
   *Knowledge graph construction and maintenance process: Design challenges for industrial maintenance support*  
   [CEUR Workshop Proceedings (PDF)](https://www.researchgate.net/publication/363926032_Knowledge_graph_construction_and_maintenance_process_Design_challenges_for_industrial_maintenance_support)

4. **Stewart et al. (2024)**  
   *MWO2KG and Echidna: Constructing and exploring an interactive maintenance knowledge graph*  
   [Journal of Maintenance & Innovation (DOI)](https://journals.sagepub.com/doi/10.1177/1748006X221131128)

5. **Cai et al. (2024)**  
   *Knowledge graph‚Äëdriven equipment fault diagnosis method for intelligent manufacturing*  
   [Int J Adv Manufacturing Technology, Vol 130](https://link.springer.com/article/10.1007/s00170-024-12998-x)

6. **P√©rez Hern√°ndez (2022)**  
   *Maintenance Strategies for Networked Assets*  
   [University of Cambridge Repository (PDF)](https://www.repository.cam.ac.uk/bitstream/handle/1810/336867/Maintenance_Strategies_for_Networked_Assets.pdf)

7. **Barber√° et al. (2013)**  
   *The Graphical Analysis for Maintenance Management Method (GAMM)*  
   [ResearchGate (PDF)](https://www.researchgate.net/publication/262906105_The_Graphical_Analysis_for_Maintenance_Management_Method_A_Quantitative_Graphical_Analysis_to_Support_Maintenance_Management_Decision_Making)

8. **Zheng et al. (2022)**  
   *Query-based Industrial Analytics over Knowledge Graphs with Ontology Reshaping*  

   [arXiv preprint](https://arxiv.org/abs/2209.11089)

9. **Fenza et al. (2020)**  
    *A Cognitive Approach based on the Actionable Knowledge Graph for supporting Maintenance Operations*  
    [arXiv preprint](https://arxiv.org/abs/2011.09554)

* https://github.com/jonathanwvd/awesome-industrial-datasets

* https://github.com/IBM/FailureSensorIQ/

* https://github.com/jonathanwvd/awesome-industrial-datasets/blob/master/markdown/industrial_safety_and_health_analytics_database.md

* https://github.com/jonathanwvd/awesome-industrial-datasets/blob/master/markdown/smart_manufacturing_iot-cloud_monitoring_dataset.md

* https://github.com/jonathanwvd/awesome-industrial-datasets/blob/master/markdown/productivity_prediction_of_garment_employees.md

# Data-Driven Workplace Safety Insights from the ‚ÄúIndustrial Safety & Health Analytics‚Äù Dataset

## Overview

This project uses the **Industrial Safety & Health Analytics Database** (accident logs from 12 plants across 3 countries; each row = one incident) to generate practical, evidence-based safety insights. We will build an explainable analytics + **Bayesian** pipeline focused on severity drivers, near-miss patterns, **Tier-3 text classification**, and a **domain Knowledge Graph (KG)** that grounds predictions in a structured safety ontology for richer queries, reasoning, and explainability. Bayesian components include hierarchical (partial-pooling) ordinal models for severity, **a parallel Bayesian classifier on sentence embeddings for calibrated uncertainty**, and decision-focused posterior risk scoring. The **KG** captures entities like Task, Equipment, Hazard, and Control, and links them to each incident for search, analytics, and constraint checks. ([Kaggle][1]). The developed methods will be validated against at least one other similar dataset.

## Core Questions

1. **Drivers of severity:** Which factors most strongly predict higher `Accident Level (I‚ÄìVI)`? (Explainable ordinal/multiclass models, SHAP + KG features).
2. **Near-miss hotspots:** Where is `Potential Accident Level` ‚â´ actual, and which **Task‚ÄìHazard‚ÄìControl** motifs dominate those gaps (via KG subgraphs)?
3. **Auto-tagging risk from text (Tier-3 + KG):** Can `Description` accurately predict `Critical Risk`/severity using a fine-tuned transformer plus a Bayesian companion ‚Äî **and** ground those labels to KG nodes for auditable, queryable context?
4. **Role/party disparities:** How do risks and severities differ for employees vs. third-party workers and across sectors/plants (with KG-derived features and motifs)?
5. **When risk spikes:** What time patterns (hour/weekday/month) correlate with higher counts and severity; which KG motifs are over-represented in those windows?

## Data & Features

* Source: Kaggle ‚ÄúIndustrial Safety & Health Analytics Database‚Äù (IHM Stefanini). Columns include date/time, sector, country (anonymized), `Accident Level`, `Potential Accident Level`, `Critical Risk`, `Employee/Third-Party`, narrative `Description`, etc. ([Kaggle][1])
* Feature sets:

  * **Tabular:** one-hot sector/risk/party; temporal (hour/weekday/month).
  * **Text:** transformer tokenization of `Description` for fine-tuning; parallel **sentence embeddings** (e.g., MiniLM) reduced (PCA 50‚Äì200 dims) for the Bayesian classifier.
  * **KG-derived:** indicators of extracted **Task/Equipment/Hazard/Control**; motif counts (e.g., `working at height ‚àß missing harness`), node centralities, and link-prediction scores.
  * **Fusion:** concatenate KG-derived features with text embeddings/tabular for downstream models.
  * For exploratory context, see a community EDA example. ([Kaggle][2])

## Methods (brief)

* **EDA & balancing:** stratified summaries; handle class imbalance (class weights / focal loss for DL; class weights for Bayesian/logistic).

* **Bayesian modeling (severity):**

  * **Ordinal severity (I‚ÄìVI):** cumulative probit/logit with **hierarchical (plant/sector) random effects** for partial pooling.
  * **Decision layer:** pick alert thresholds via **expected utility** using posterior predictive draws.

* **Text classification:**

  1. **Fine-tuned transformer (primary):** DistilBERT/BERT fine-tuned for `Critical Risk` (and/or severity) with class weights/focal loss; calibrate (temperature or isotonic).
  2. **Bayesian companion (calibration & uncertainty):** Bayesian logistic/ordinal on MiniLM embeddings ¬± selected tabular/KG features with a **regularized horseshoe** prior; use posterior draws for credible intervals and disagreement checks.
  3. **Active learning (HIL):** query by uncertainty **and** model disagreement; newly confirmed labels also update **KG nodes/edges** (closing the loop).

* **Knowledge Graph:**

  * **Ontology (starter):** `Incident`, `Plant`, `Sector`, `Actor`, `Task`, `Equipment`, `Location`, `Hazard`, `CriticalRisk`, `Control`, `InjuryType`, `SeverityLevel`.
  * **Extraction:** rules + NER/RE (spaCy/transformer) from `Description` to populate `involves(Task/Equipment/Hazard/Control)`, `hasCriticalRisk`, `resultedIn(Severity)`.
  * **Entity linking & normalization:** synonym tables (e.g., ‚Äúforklift‚Äù ‚âà ‚Äúlift truck‚Äù); map model outputs to canonical KG nodes.
  * **Reasoning & QA:** run **constraint checks** (e.g., SHACL-like rules: `working at height` should mention `harness/guardrail`), surface **missing controls**, and produce **motif analytics** (Task‚ÄìHazard‚ÄìControl triads).
  * **Features to ML:** KG motifs, degree/centrality, and rule-violation flags feed the transformer head (via adapters) and the Bayesian model (as covariates).

* **Explainability:**

  * **Transformer:** salient phrases (Integrated Gradients) + KG grounding (‚Äúwhy‚Äù links to Hazard/Control nodes).
  * **Bayesian:** posterior effect sizes for KG features/embeddings; show credible intervals.
  * **KG views:** incident cards with extracted entities, relations, provenance spans, and confidence.

* **Validation:**

  * Text tasks: **macro-F1**, class-wise F1, **ECE** calibration, PR curves for rare classes.
  * Ordinal severity: MAE (ordinal), C-index, calibration.
  * KG extraction: precision/recall on a labeled subset; **rule-violation detection rate** and downstream lift when adding KG features.
  * Robustness: temporal split to check drift; KG-motif drift over time.

## Deliverables

* Reproducible notebooks: cleaning, EDA, **Tier-3 text classification**, Bayesian severity model, KG extraction/ingest, and explainability.
* A **running KG** (Neo4j or RDF store) with ontology, ETL scripts, synonym tables, and provenance.
* Interactive dashboards: incident/severity trends, near-miss gap heatmaps, **uncertainty-aware predictions**, and **KG motif explorer** (Task‚ÄìHazard‚ÄìControl patterns, missing controls).
* A short memo with actionable recommendations and **calibrated risk** bands; KG-grounded checklists (e.g., ‚ÄúTop hazards lacking controls‚Äù).

## References

1. **Kaggle: Industrial Safety & Health Analytics Database** (IHM Stefanini). Dataset page and schema. ([Kaggle][1])
2. Community example: ‚ÄúIndustrial Safety & Health Analytics ‚Äî EDA‚Äù (Kaggle notebook). ([Kaggle][2])
3. Lipianina-Honcharenko, K. et al. ‚ÄúIntelligent Method for Classifying the Level of Anthropogenic Disasters,‚Äù *Big Data and Cognitive Computing*, 2023. ([MDPI][3])
4. Samarasinghe, H.; Heenatigala, S. ‚ÄúInsights from the Field: A Comprehensive Analysis of Industrial Accidents in Plants‚Ä¶,‚Äù 2024. ([SCIRP][4])
5. Kundra, I. et al. ‚ÄúChatbot for Industrial Safety and Health Analytics Database using NLP and ML,‚Äù *JETIR*, 2023. ([Jetir][5])
6. **Salvatier, J., Wiecki, T.V., & Fonnesbeck, C.** (2016). ‚ÄúProbabilistic programming in Python using PyMC3,‚Äù *PeerJ Computer Science*, 2:e55. ([PeerJ CS][7])
7. **Chipman, H.A., George, E.I., & McCulloch, R.E.** (2010). ‚ÄúBART: Bayesian Additive Regression Trees,‚Äù *Annals of Applied Statistics*, 4(1), 266‚Äì298. ([AOAS][8])

[1]: https://www.kaggle.com/datasets/ihmstefanini/industrial-safety-and-health-analytics-database?utm_source=chatgpt.com "Industrial Safety and Health Analytics Database"
[2]: https://www.kaggle.com/code/niteshhalai/industrial-safety-and-health-analytics-eda?utm_source=chatgpt.com "Industrial Safety and Health Analytics ‚Äî EDA"
[3]: https://www.mdpi.com/2504-2289/7/3/157?utm_source=chatgpt.com "Intelligent Method for Classifying the Level of ..."
[4]: https://www.scirp.org/pdf/ojsst_2024032115560814.pdf?utm_source=chatgpt.com "A Comprehensive Analysis of Industrial Accidents in Plants ..."
[5]: https://www.jetir.org/papers/JETIR2312694.pdf?utm_source=chatgpt.com "Chatbot for Industrial Safety and Health Analytics Database ..."
[7]: https://peerj.com/articles/cs-55/?utm_source=chatgpt.com "Probabilistic programming in Python using PyMC3"
[8]: https://projecteuclid.org/journals/annals-of-applied-statistics/volume-4/issue-1/BART-Bayesian-additive-regression-trees/10.1214/09-AOAS285.full?utm_source=chatgpt.com "BART: Bayesian Additive Regression Trees"
