# IS-4100: NFL Team Performance Analysis using Generalized Linear Models (GLMs)

**Why are GLMs Useful?**
- Flexibility in Modeling Different Types of Data:
  - GLMs can handle various types of response variables:
    - Continuous (Normal distribution)
    - Binary outcomes (Binomial distribution)
    - Counts (Poisson or Negative Binomial distribution)
    - Proportions and more
  - This flexibility allows for modeling data that do not meet the assumptions of traditional linear regression (e.g., non-constant variance, non-normal errors).
- Appropriate for Non-Normal Distributions:
  - Many real-world datasets have response variables that are skewed, bounded, or discrete.
  - GLMs can model these variables appropriately, providing better estimates and inference.
- Interpretability:
  - Coefficients in GLMs can often be interpreted in terms of odds ratios, risk ratios, or rate ratios, which are meaningful in many applied contexts.
- Extensibility:
  - GLMs form the foundation for more advanced models like Generalized Linear Mixed Models (GLMMs), allowing for random effects and hierarchical data structures.
- Modeling Non-Linear Relationships:
  - Through the use of link functions, GLMs can model non-linear relationships between the predictors and the response variable.

In this lab, you will analyze NFL team performance data using Generalized Linear Models (GLMs). You'll utilize the `nfl_data_py` or `nflfastR` to perform your analysis.

**Learning Objectives**
- Data Retrieval: Learn how to fetch and manipulate NFL data using `nfl_data_py` or `nflfastR`.
- Data Preprocessing: Clean and prepare the dataset for modeling.
- GLM Modeling: Understand and apply GLMs to model NFL team performance.
- Interpretation: Interpret the results of your GLM and derive meaningful insights.

**Assignment Overview**

You will perform the following tasks:

- Setup and Data Retrieval
- Data Preprocessing / EDA
  - Build upon feedback and extra functions / methods we have covered in class.
- GLM Modeling
  - Choose an appropriate GLM for modeling team performance.
  - Fit the model and check for assumptions.
- Results Interpretation
  - Analyze the output of your GLM.
  - Discuss the implications of your findings.
- Reporting
  - Summarize your methodology, results, and conclusions in a brief report.

**Extra Resources**
- [nfl_data_py Documentation](https://pypi.org/project/nfl-data-py/)
- [nflfastR Documentation](https://www.nflfastr.com/)
- [nflfastR Data Dictionary](https://www.nflfastr.com/articles/field_descriptions.html)
- [Generalized Linear Models in Python: A Comprehensive Guide](https://statisticseasily.com/generalized-linear-models-in-python/#google_vignette)
- [GLM guide in R](https://albert-rapp.de/posts/14_glms/14_glms)

**Potential Areas to Explore**
- Predicting Touchdown Probability
  - Estimate the probability of a play resulting in a touchdown based on game and play characteristics.
  - Model: Logistic Regression.
- Estimating Player Performance with Poisson Regression
  - Predict the number of tackles a defensive player makes in a game (count data).
- Assessing Penalty Likelihood
  - Predict the probability of a penalty occurring on a play.
  - Model: Logistic Regression
- Evaluating Defensive Sack Rates
  - Analyze factors that contribute to the likelihood of a quarterback sack.
  - Model: Logistic Regression.
- Modeling Over/Under Betting Outcomes
  - Objective: Predict if the total game score will be over or under the bookmaker's line.
  - Model: Logistic Regression.