STAT 151A - Linear Modeling: Theory and Applications

Description: This is a course on linear models as well as generalized linear models and their application. Topics include linear regression and modeling, visualization and diagnostics, confidence intervals and hypothesis, analysis of variance, dealing with large number of predictors, and generalized linear models.
Instructor: Gaston Sanchez
Lecture: 3 hours of lecture per week
Lab: 2 hours of computer lab sessions
Assignments: biweekly HW assignments
Exams: Up to 2 midterm exams, and final test
Notes and texts:
- Prof. Sanchez's notes
- Applied Regression Analysis and Generalized Linear Models (by John Fox)
Prerequisites: Statistical and Probability Theory, as well as Linear Algebra. It owuld also be nice to have some familiarity with R.
LMS: the specific learning resources of a given semester are shared in the Learning Management Sysment (LMS) approved by Campus authorities (e.g. bCourses, Canvas)
Policies:
- Lab
- HW
- Email
- Academic Integrity

1. Introduction

📇 ABOUT: By the end of this introductory module, you will be able to:

Define what a linear model is (in what sense a model is said to be linear)
Describe the high-level intuition of regression (and the regression function)

📖 READING:

Chapters 1 and 4
Preliminary concepts

✏️ TOPICS:

Preliminary Concepts
- Intuition of regression
- Meaning of the term "linear"
- Geometric duality of a data set
- Review of orthogonal projections

2. Simple Linear Regression (SLR)

📇 ABOUT:

In this week, we introduce the descriptive aspects of a Simple Linear Regression model. This involves postponing the discussion of inferential aspects for later. In particular, we focus on the method of (Ordinary) Least Squares to obtain the estimated coefficients of a simple linear model. Likewise, we discuss the geometric aspects of OLS, and understand how the Gauss-Markov assumptions wrap a linear model with a first layer of "soft" statistical assumptions.

📖 READING:

Chapters 5.1 and 10.1
Geometry of simple regression
Gauss-Markov assumptions in simple regression

✏️ TOPICS:

Simple Linear Regression (SLR)
- Residual Sum of Squares
- Least Squares estimates
- Geometry of simple OLS
- Analysis of Variance decomposition
SLR under GM assumptions
- Gauss Markov Assumptions
- Properties of OLS coefficients
- Properties of OLS estimates
- Estimate of standard deviation (sigma)
- Gauss-Markov Theorem

3. Multiple Linear Regression (MLR)

📇 ABOUT:

This week, we introduce the model-fitting aspects of Multiple Linear Regression. Like we did in the previous module, we postpone the discussion of the inferential aspects for later. We'll keep our focus on the method of (Ordinary) Least Squares to obtain the coefficients of a multiple linear model. Likewise, we'll continue to study the geometric aspects of OLS, and understand how the Gauss-Markov assumptions wrap a linear model with a first layer of "soft" statistical assumptions.

📖 READING:

Chapters 5.2 and 10.2 and 10.33
Geometry of multiple regression
Gauss-Markov assumptions in multiple regression

✏️ TOPICS:

Multiple Linear Regression (MLR)
- Introduction to Mulriple Regression
- Least Squares estimates
- Geometry of simple OLS
SLR under GM assumptions
- Properties of OLS coefficients
- Properties of OLS estimates (y-hat and residuals)
- Estimate of variance
- Gauss-Markov Theorem

4. Normality Assumptions in Linear Regression

📇 ABOUT:

In this module, we begin the introduction of the Normal Theory (i.e. so-called Normality assumptions) for linear regression models. This involves assuming that random error terms are Normally distributed, which is a requirement in order to make inferences (e.g. confidence intervals, hypothesis tests) within regression modeling.

We study how the Normality assumptions wrap a linear model with another layer of theoretical assumptions (we like to think of this as a second layer of "hard" statistical assumptions). This involves deriving Maximum Likelihood (ML) estimators, and also studying the distributions of the estimated regression quantities (e.g. coefficients, fitted values, residuals, sums of squares, etc).

📖 READING:

Chapter 6
Normality assumptions in simple regression
Normality assumptions in multiple regression

✏️ TOPICS:

Normality assumptions in SLR
- Normality assumptions
- Maximum Likelihood estimators
- Distributions of estimators
- Distributions of sum of squares
Normality assumptions in MLR
- Multivariate Normal distribution
- Distributions of estimators
- Distributions of sum of squares

5. Inference in Linear Regression Models

📇 ABOUT:

After reviewing the normality assumptions in regression models and how they affect the distributions of various estimates, we move onto the inferential aspects. In this module we describe how to construct confidence intervals and how to make hypothesis tests.

📖 READING:

Chapter 6
Confidence Intervals in regression models
Hypothesis Tests in regression models

✏️ TOPICS:

Confidence Intervals
- Confidence intervals for regression coefficients
- Meaning of "predictions"
- Intervals for predictions
Hypothesis Tests
- Test for a single predictor
- F-test for multiple predictors
- F-test and anova test

6. Dummy Variables and ANOVA

📇 ABOUT:

So far we've studied linear regression models under the implicit assumption that both the response and the predictors are quantitative variables. However, we still need to study what to do when we have one or more predictors that are qualitative (i.e. categorical).

📖 READING:

Chapters 7 and 8
Dummy Variables
ANOVA

✏️ TOPICS:

Dummy Variables
- Dummy Regressors for categorical variables
- The use of dummy (i.e. binary) indicator variables
- Various types of encoding for categorical variables
ANOVA
- Introduction to ANOVA
- One-way anova: constraints, estimates, and dispersion
- Anova test

7. Residual Analysis and Diagnostic Tools

📇 ABOUT:

The estimation of and inference from the regression model depend on several assumptions. These assumptions should be checked using regression diagnostics before using the model in earnest. This week, we cover diagnostic tools for assessing the validity of assumptions about the model specification, the error terms, and issues with unusual and influential observations.

📖 READING:

Chapters 11 and 12
Residual Analysis (part 1)
Residual Analysis (part 2)

✏️ TOPICS:

Residual Analysis (part 1)
- Problems in regression analysis
- Residuals and Leverages
- Types of residuals
- Basic residual plots
Residual Analysis (part 2)
- Detecting heteroscedasticity
- Detecting non-normality
- Detecting unusual observations
- Detecting influential observations

8. Multicollinearity

📇 ABOUT:

Previously, we mentioned that one class of problematic issues in regression has to do with the Rank assumption of the design matrix X (full rank). This week, we discuss in what way not having a full rank matrix X affects the estimated regression quantities. More specifically, we'll study the common issue of dealing with multicollinearity.

📖 READING:

Chapter 13
The Sum-of-Squares-and-Cross-Products (SSCP) matrix X'X
Multicollinearity

✏️ TOPICS:

Review of the SSCP matrix
- The Sum-of-Squares-and-Cross-Products (SSCP) matrix
- SSCP and friends
- Notion and measures of multidimensional scatter
- Eigenstructure of the SSCP matrix
Multicollinearity
- What is multicollinearity
- Examples of multicollinearity
- Variance Inflation Fator (VIF)
- Singular Value Decomposition (SVD) and multicollinearity

9. Dealing with Multicollinearity

📇 ABOUT:

In this module, we continue the discussion about multicollinearity. More specifically, we describe two methods, Principal Components Regression (PCR) and Ridge Regression (RR), that allow us to overcome some of the obstacles posed when dealing with multicollinearity.

📖 READING:

Chapter 13.1 and 13.2.3
Principal Components Analysis (PCA)
Ridge Regression

✏️ TOPICS:

Use of PCA to deal with multicollinearity
- Crash introduction to Principal Components Analysis
- PCA and EVD
- Geometry of PCA
- Use of PCA for regression analysis
Ridge Regression
- Introduction to Ridge Regression
- Mean-Square-Error (MSE) in Ridge Regression
- Geometry of Ridge Regression
- Solution of Ridge Regression

10. Variable Selection and Model Building

📇 ABOUT:

In this module, we go over common methods for selecting variables, comparing models of different sizes (i.e. different number of predictors), and choosing the "best" model.

📖 READING:

Chapter 22.1
Model Choice Criteria

✏️ TOPICS:

Model Selection
- Introduction to model selection
- Predictive performance
- Limitations of R2 for comparing models of different number of predictors
Model Comparison Criteria
- Adjusted R-squared
- Mallows's Cp
- Akaike Information Criterion (AIC)
- Bayesian Information Criterion (BIC)

11. Introduction to Logistic Regression

📇 ABOUT:

In this module, we transition into the so-called framework of Generalized Linear Models (GLM). Specifically, we start with regression models to predict a (binary) categorical predictor using the "plain vanilla" logistic regression model.

📖 READING:

Chapter 14.1
Logistic Regression
Logistic Regression toy example

✏️ TOPICS:

Logistic Regression
- Limitations of a linear model when applied on a binary response variable
- Core idea to formulate a binary regression model with a logistic function
- The Logistic regression model
Logistic Regression Example
- Coronary Heart Disease (chd) data
- Fitting a logistic regression model
- Interpretation of regression coefficients

12. Estimation in Logistic Regression

📇 ABOUT:

In this week, we focus on the estimation of logistic regression models. The estimation criterion is based on maximum likelihood, which unfortunately cannot be solved analytically. Instead, we need to use numerical methods such as Newton's method (aka Newton-Raphson's method). This is the method behind what is perhaps the most common algorithm to estimate logistic regression models, namely: IWLS "Iterative Weighted Least Squares" (aka Iterative Re-weighted Least Squares, IRLS).

📖 READING:

Chapter 14.1
Estimation of Logistic Regression

✏️ TOPICS:

Maximum Likelihood estimation in Logistic Regression
- Derivation of the (log)likelihood of a binary logistic regression model
- Limitation for maximizing log-likelihood analytically
- Estimation via numerical optimization methods (e.g. Newton's method)
- Review of Newton's method
Numerical estimation in Logistic Regression
- Newton's method to estimate a logistic regression model
- Iterative Weighted Least Squares (IWLS) algorithm

13. Poisson Regression

📇 ABOUT:

This week we briefly describe poisson regression, and the theoretical framework of Generalized Linear Models (GLM). Much of what we've discussed about logistic regression applies to poisson regression, and to other members of GLM.

📖 READING:

Chapter 15
Introduction to Poisson Regression
GLM Framework

✏️ TOPICS:

Poisson Regression
- Derivation of the (log)likelihood of poisson regression model
- Limitation for maximizing log-likelihood analytically
- Estimation via numerical optimization methods (e.g. Newton's method)
- Review of Newton's method
GLM Framework
- Main components of a GLM (random component, linear predictor, and link function)
- Link functions, and their inverses, for linear regression, poisson regression, and logistic regression
- R functions glm() and their summary() outputs

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Repository files navigation

STAT 151A - Linear Modeling: Theory and Applications

1. Introduction

2. Simple Linear Regression (SLR)

3. Multiple Linear Regression (MLR)

4. Normality Assumptions in Linear Regression

5. Inference in Linear Regression Models

6. Dummy Variables and ANOVA

7. Residual Analysis and Diagnostic Tools

8. Multicollinearity

9. Dealing with Multicollinearity

10. Variable Selection and Model Building

11. Introduction to Logistic Regression

12. Estimation in Logistic Regression

13. Poisson Regression

About

Releases

Packages

gastonstat/stat151a

Folders and files

Latest commit

History

README.md

README.md

Repository files navigation

STAT 151A - Linear Modeling: Theory and Applications

1. Introduction

2. Simple Linear Regression (SLR)

3. Multiple Linear Regression (MLR)

4. Normality Assumptions in Linear Regression

5. Inference in Linear Regression Models

6. Dummy Variables and ANOVA

7. Residual Analysis and Diagnostic Tools

8. Multicollinearity

9. Dealing with Multicollinearity

10. Variable Selection and Model Building

11. Introduction to Logistic Regression

12. Estimation in Logistic Regression

13. Poisson Regression

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages