# Syllabus

**06-642: Data Science and Machine Learning in Chemical Engineering**

**Spring 2026 · Half Semester · Carnegie Mellon University**

## Instructor

Professor John Kitchin  
Department of Chemical Engineering  
jkitchin@cmu.edu

## Course Description

This course introduces data science and machine learning techniques with applications to chemical engineering problems. We emphasize practical implementation in Python, focusing on tools and methods that are directly applicable to research and industrial practice.

Topics include:
- Data manipulation with NumPy and Pandas
- Regression and classification with scikit-learn
- Dimensionality reduction and clustering
- Ensemble methods (Random Forests, Gradient Boosting)
- Uncertainty quantification
- Model interpretability

## Learning Objectives

By the end of this course, students will be able to:

1. Load, clean, and manipulate data using Pandas
2. Visualize data effectively using Matplotlib
3. Build and evaluate regression models
4. Apply cross-validation and regularization to prevent overfitting
5. Use ensemble methods for improved predictions
6. Perform clustering and dimensionality reduction on complex datasets
7. Quantify uncertainty in model predictions
8. Interpret machine learning models to extract scientific insights

## Schedule

| Week | Lectures | Topics | Assignment Due |
|------|----------|--------|----------------|
| 1 | 00, 01 | Introduction, NumPy | A00, A01 |
| 2 | 02, 03 | Pandas Intro, Intermediate Pandas | A02, A03 |
| 3 | 04, 05 | Dimensionality Reduction, Linear Regression | A04, A05 |
| 4 | 06, 07 | Regularization, Nonlinear Methods | A06, A07 |
| 5 | 08, 09 | Ensemble Methods, Clustering | A08, A09 |
| 6 | 10, 11 | Uncertainty Quantification, Interpretability | A10, A11 |
| 7 | — | Project work | Project |

## Grading

| Component | Weight |
|-----------|--------|
| Assignments (12) | 30% |
| Participation | 20% |
| Project | 50% |

### Assignments

There is one assignment per lecture module. Each assignment consists of:
- Technical implementation (2/3 of assignment grade)
- Presentation/documentation quality (1/3 of assignment grade)

Assignments are due one week after the lecture.

### Project

The project is an opportunity to apply course techniques to a problem of your choice. Projects should:
- Address a real chemical engineering or scientific problem
- Use at least 3 techniques from the course
- Include proper validation and uncertainty quantification
- Be presented in a well-documented notebook

Project proposals are due Week 4. Final projects are due at the end of Week 7.

### Grade Scale

| Grade | Percentage |
|-------|------------|
| A | ≥ 95% |
| A- | 90-94% |
| B+ | 83-89% |
| B | 76–82% |
| B- | 70-76% |
| C+ | 60-69% |
| C | 50–59% |
| R | < 50% |

## Required Software

- Python 3.10+
- Jupyter Notebook or JupyterLab
- Required packages: numpy, pandas, matplotlib, scikit-learn, pycse, shap, xgboost

See the Introduction lecture for installation instructions.

## Academic Integrity

You are encouraged to discuss concepts with classmates, but all submitted work must be your own. Code copied from external sources must be cited. Using AI assistants (e.g., ChatGPT, Claude) is permitted for learning and debugging, but you must understand and be able to explain all code you submit.

## Accommodations

If you have a disability and require accommodations, please contact the Office of Disability Resources as soon as possible.

## Getting Help

- Office hours: TBD

In [None]:
! pip install -q pycse
from pycse.colab import pdf