# A4 Report — DevOps, CI/CD, and Quality Assurance

This notebook documents the DevOps and quality assurance improvements implemented in the project, including:

- CI/CD pipeline development
- Automated linting and notebook quality checks
- Unit testing integration
- Deployment safeguards for HuggingFace
- Adoption of Git LFS for model storage
- Team development and coding practices

The goal is to improve reliability, reproducibility, and deployment stability of the machine learning system.


## Project Context

The application is deployed via HuggingFace Spaces using Python and Gradio.

Key challenges before improvements:

- No CI/CD quality gates
- Direct pushes to main branch
- Deployment failures caused by incompatible files
- Models stored externally (Google Drive), causing version inconsistencies
- Lack of automated testing
- Notebook-heavy workflow without linting support

The improvements documented here address these issues.


## CI/CD Pipeline Implementation

The GitHub Actions pipeline was extended to introduce quality assurance barriers before deployment.

### Previous pipeline
- Only synchronized repository with HuggingFace
- No linting
- No testing
- No deployment safety checks

### Updated pipeline flow

1. Repository checkout (with Git LFS enabled)
2. Python environment setup
3. Dependency installation
4. Linting for Python scripts
5. Notebook linting using nbQA
6. File restriction checks
7. Unit test execution
8. Deployment to HuggingFace

Deployment only occurs if all quality checks pass.


## CI/CD Workflow Design

The GitHub Actions workflow enforces code quality and deployment stability.

Key components:

### Linting
- flake8 for Python scripts
- nbQA + flake8 for Jupyter notebooks

### Deployment safeguards
- CI fails if .pdf or .xlsx files are committed
- Prevents HuggingFace sync crashes

### Unit testing
- pytest integrated into CI
- Tests run before deployment


The implemented tests validate the full ML pipeline, including:
- Regression model loading
- Regression prediction functionality
- Classification model loading
- Classification prediction functionality
- Model artifact structure validation
- Error handling for incorrect inputs and failures


### Git LFS support
- Models tracked using Git LFS
- Ensures version-controlled model artifacts

This transforms the pipeline into a quality-gated deployment system.


## Notebook Linting with nbQA

The project relies heavily on Jupyter notebooks for:

- Model experimentation
- Evaluation
- Feature engineering

Traditional linters do not support .ipynb files.

nbQA enables:

- Running flake8 on notebooks
- Detecting unused imports
- Detecting syntax errors
- Improving notebook readability

This ensures notebooks meet the same quality standards as Python scripts.


## Unit Testing Integration

Unit testing was introduced using pytest.

The CI pipeline executes:

pytest A4/ -v --tb=short

Purpose:

- Validate model behavior
- Prevent regression errors
- Verify preprocessing and prediction logic
- Support reproducibility

One example includes test_model.py, which evaluates model predictions and generates diagnostic plots.

Testing will expand as more components stabilize.


## Model Versioning with Git LFS

Originally, models were stored on Google Drive, leading to:

- Version inconsistencies
- Difficulty reproducing results
- Deployment mismatches

Git LFS was introduced to store models directly in the repository.

Benefits:

- Version-controlled model artifacts
- Consistent deployment models
- Easier collaboration
- Improved reproducibility

CI uses:
checkout with lfs: true

This ensures models are downloaded correctly during pipeline execution.


## Deployment Stability Improvements

The pipeline now prevents common failure scenarios.

### Restricted files
CI blocks:
- .pdf
- .xlsx

These previously caused HuggingFace sync crashes.

### Dependency consistency
- scikit-learn version pinned
- Prevents InconsistentVersionWarning



## DevOps and QA Process Improvements

The project transitioned from ad-hoc development to structured DevOps practices.

Improvements include:

- Automated linting
- Notebook quality enforcement
- Unit testing integration
- Deployment safeguards
- Git LFS model management
- CI quality gates before deployment

These changes improve:

- reliability
- collaboration
- reproducibility
- deployment stability


## Design and Coding Rules

The team defined shared development practices.

### Code structure
- Modular Python scripts
- Separation of experimentation and production logic

### Notebook standards
- Executable cells
- Clear documentation
- Reduced unused code

### Deployment awareness
- Avoid large or incompatible files
- Maintain compatibility with HuggingFace environment

### Quality enforcement
- CI linting
- Automated tests
- Dependency control


## Future Work

Planned DevOps enhancements:

- Full PR-based workflow
- Automated model evaluation metrics in CI
- Continuous training pipelines
- Model version tracking dashboards
- Automated notebook formatting

The current pipeline provides the foundation for these improvements.


## A4 – Classification Task

Two datasets were merged into a single dataset containing 41 features (including movement angles and weak-link indicators). For each data point, the weakest link was identified by selecting the column with the maximum score.

Initially, a 14-class classifier was used. An alternative approach was then explored by separating features into upper-body and lower-body regions and further trained another model on upper and lower body features, following lab guidance and feedback. Models were trained separately for body regions and then combined to evaluate performance improvements.

- 5-fold cross-validation was applied  
- Weighted averages were used due to class imbalance  

Body-region classification models tested:
- Logistic Regression  
- LDA  
- QDA  
- Naive Bayes  
- KNN (k = 5, 7, 10)
(Champion model -> best performer with knn = 7)

For the 14-class weak-link classification:
- Logistic Regression  
- LDA  
- Naive Bayes  
- KNN (k = 5, 7, 10)
(Champion model -> LDA performed best (F1 = 0.57))

Following feedback, a two-step approach was tested:
1. Predict body region using KNN  
2. Apply LDA for upper/lower classification  

This did not improve performance (F1 ≈ 0.54).

Conducting a statistical t-test among the two-step approach and the previous model gave t-statistic = -0.661 and p-value of 0.5447 which indicates the result is not statistically significant. It can be concluded that pipeline is not better than single LDA. 

Applying Random Forest Classifier improved results:

- Baseline (LDA): F1 = 0.57  
- After feedback adjustments: F1 = 0.54  
- Random Forest: F1 = 0.61 (best performance)

The A4_Classification notebook extends A3 with these improvements.

---

## A4 – Regression Task

The regression setup remains consistent with A2, with Random Forest Regressor introduced to improve performance. 

- Baseline model R²: 0.54  
- Random Forest R²: 0.65  

This represents a direct improvement over the earlier regression pipeline.

Conducting a statistical t-test among Random forest and the previous champion model (feature_selection_lasso) gave t-statistic = 11.76 and p-value of 0.00029 which indicates the result is statistically significant (p < 0.05). It can be concluded that random forest is reliably better than feature_selection_lasso. Even looking at the 95% confidence interval score [0.0935, 0.1435] we can say with 95% confidence that the improvement is positive. 

The A4_Regression notebook is an enhanced version of A2_ModelBuilding.ipynb, while A4_Classification extends A3 based on feedback and model experimentation.
