Skip to content

shijiew97/PGQR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PGQR

This is code to implement the Penalized Generative Quantile Regression (PGQR) model in "Generative Quantile Regression with Variability Penalty" by Shijie Wang, Minsuk Shin, and Ray Bai. https://arxiv.org/abs/2301.03661

Abstract

We introduce a deep learning generative model for joint quantile estimation called Penalized Generative Quantile Regression (PGQR). Our approach simultaneously generates samples from many random quantile levels, allowing us to infer the conditional distribution of a response variable given a set of covariates. Our method employs a novel variability penalty to avoid the problem of vanishing variability, or memorization, in deep generative models. Further, we introduce a new family of partial monotonic neural networks (PMNN) to circumvent the problem of crossing quantile curves. A major benefit of PGQR is that it can be fit using a single optimization, thus bypassing the need to repeatedly train the model at multiple quantile levels or use computationally expensive cross-validation to tune the penalty parameter. We illustrate the efficacy of PGQR through extensive simulation studies and analysis of real datasets.

Prerequisites for PGQR

In order to sucessfully run the PGQR model, we need to pre-install and confirm the following environments on your local machine. Moreover, there are several R packages that need to be installed beforehand.

A. Python, Pytorch and CUDA environment

The main code for implementing PGQR is in Python while simulation and data generation is coded in R. The partial monotonic neural networks (PMNN) is constructed by the Pytorch library. We strongly recommend using CUDA (GPU-based tool) to train PGQR, which can accelerate the runtime a lot more than using CPU.

B. Required R package

In R, we need the reticulate package to run PGQR which is coded in Python in R. For comparison, we also considered other traditional CDE methods, including

  • Random Forest CDE (RFCDE)
  • Nearest Neighbor Conditional Density Estimation (NNKCDE)
  • FlexCoDE the specifics of FlexCoDE installation can be found in at FlexCoDE.

The motorcycle dataset is included in the adlift package and nonparameteric quantile regression is implemented using R package quantreg.

install.package("reticulate")
install.package("RFCDE")
install.package("NNKCDE")
install.package("HDInterval")
install.package("adlift")
install.package("quantreg")

Implementation of PGQR

To implement PGQR, we provide the Python code of PGQR under Python code folder and R code for simulation under R code folder. More detailed expalanations are provided below.

Working directory

To run the PGQR model, save the results and produce the plots, we need to set the working directory beforehand. It's very crucial to set the working directory manually "/yourlocalmachine/" to sucessfully run the PGQR model for every R code file .

  • Create a Python code folder: "/yourlocalmachine/Python_code/" which should inculdes the python files the same in 'Github/Python_code/' folder.
  • Create a R code folder: "/yourlocalmachine/R_code/" which should inculdes the R files the same in 'Github/R_code/' folder.
  • Create a result folder: "/yourlocalmachine/result/"
  • Create two subfolders in the result folder: "/yourlocalmachine/result/2000/" for simulation studies and "/yourlocalmachine/result/real/" for real data analysis.

Python code folder

Under the Python_code folder, we have the following scripts:

  • QR_pen_m.py constructs the main body of penalized Generative Quantile Regression (PGQR).
  • QR_nopen_m.py constructs the Generative Quantile Regression(GQR) without regularization term.
  • Cond_WGAN.py constructs the Wasserstein generative conditional sampler (WGCS).
  • CondGAN_MS.py constructs Generative conditional distribution sampler (GCDS).

R_code folder

It is very crucial to set the working directory manually such as "/yourlocalmachine/" to sucessfully run the PGQR model for every R code file . Under the R_code folder, we provide code for training PGQR, saving the results in .RData form, plotting the graphs present in paper, and contructing the results tables for the simulation studies and real data analyses in Section 6 of the manuscript.

A. PGQR Simulation Train

  • PGQR.R is to train PGQR model under different simulation settings (see code annotation and descriptions in paper). The results should be saved under the path "/result/2000" where "/2000/" is the corresponding sample size.
  • data_gen.R is to generate the simulation dataset. It should be under "/R_code/" directory.
  • model_fit.R is to run the Python code for the deep generative models such as PGQR, GCDS or WGCS.

B. PGQR Simulation Graph

  • graph.R is to plot the graph from the saved results (in .RData form). The resultant graph will be saved in "/yourlocalmahine/result/2000/"

C. PGQR Simulation table

  • table_compute.R is to compute the simulation table in the paper and save the corresponding results.
  • table_eval.R is to evaluate the performance measure described in paper from the results saved by __table_compute.R__and summarize them in table form.
  • quantile_plot.R is to plot the predicted mean squared error (PMSE) of different quantiles and produce the resultant plot, which is evaluated from table_compute.R.

D. Real data analysis

  • real_fit.R is to implement PGQR on three real datasets, as well as two classic crossing-quantile benchmark datasets and save the results under "/yourlocalmachine/real/"
  • real.table.R is to evaluate the out-of-sample prediction interval width and coverage rate from results produced by real_fit.R.
  • cross_quantile.R is to produce the plot of the crossing quantile phenomenon in the motorcycle and bone mass density datasets from the results by real_fit.R.
  • APL_plot.R is to produce the plot of APL/F ratio plot from the results by real_fit.R.

E. Takeuchi's Example

  • Takeuchi_fit.R is to implement PGQR on the illustration exmaple 1 from "Nonparametric Quantile Estimation (Takeuchi et al., 2006)".
  • Takeuchi_plot.R is to plot the result from Takeuchi_fit.R.

F. Quantile comparison

  • quantile_compare.R is to compare PGQR with MCQRNN and NMQN under Simulation 1 to 6, which runs NMQN and MCQRNN.
  • quantile_summary.R is to compute and plot the performance comparison in terms of total variation distance (TV) and Hellinger distance (HD).

About

Penalized Generative Quantile Regression

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published