Skip to content

Commit

Permalink
new dataset - examples for classification
Browse files Browse the repository at this point in the history
  • Loading branch information
pbiecek committed Aug 3, 2018
1 parent 4972bbf commit d9b6af9
Show file tree
Hide file tree
Showing 6 changed files with 91 additions and 1 deletion.
2 changes: 1 addition & 1 deletion DESCRIPTION
@@ -1,6 +1,6 @@
Package: DALEX
Title: Descriptive mAchine Learning EXplanations
Version: 0.2.3
Version: 0.2.4
Authors@R: person("Przemyslaw", "Biecek", email = "przemyslaw.biecek@gmail.com", role = c("aut", "cre"))
Description: Machine Learning (ML) models are widely used and have various applications in classification
or regression. Models created with boosting, bagging, stacking or similar techniques are often
Expand Down
4 changes: 4 additions & 0 deletions NEWS.md
@@ -1,3 +1,7 @@
DALEX 0.2.4
----------------------------------------------------------------
* New dataset `HR` and `HRTest`. Target variable is a factor with three levels. Is used in examples for classification.

DALEX 0.2.3
----------------------------------------------------------------
* Small fixes in `variable_response()` to better support of `gbm` models (c8393120ffb05e2f3c70b0143c4e92dc91f6c823).
Expand Down
54 changes: 54 additions & 0 deletions R/HR.R
@@ -0,0 +1,54 @@
#' Human Resources Data
#'
#' Datasets \code{HR} and \code{HRTest} are artificial, generated form the same model.
#' Structure of the dataset is based on a real data, from Human Resources department with
#' information which employees were promoted, which were fired.
#'
#' Values are generated in a way to:
#' - have interaction between age and gender for the 'fired' variable
#' - have non monotonic relation for the salary variable
#' - have linear effects for hours and evaluation.
#'
#' \itemize{
#' \item gender - gender of an employee.
#' \item age - gender of an employee in the moment of evaluation.
#' \item hours - average number of working hours per week.
#' \item evaluation - evaluation in the scale 2 (bad) - 5 (very good).
#' \item salary - level of salary in the scale 0 (lowest) - 5 (highest).
#' \item status - target variable, either `fired` or `promoted` or `ok`.
#' }
#'
#' @aliases HRTest
#' @docType data
#' @keywords HR
#' @name HR
#' @usage data(HR)
#' @format a data frame with 10000 rows and 6 columns
NULL


# N <- 10000
# set.seed(1313)
#
# gender <- rbinom(N, size = 1, prob = 0.5)
# age <- runif(N, 20, 60)
# hours <- 35 + 45*runif(N, 0, 1)^2
# evaluation <- floor(runif(N, 0, 4)) + 2
# salary <- floor(runif(N, 0, 6))
#
#
# score1 <- 2*(gender - 0.5)*(age-40)/15 + 0.35*(salary - 2.5)^2 - 1.6*(hours > 45)
# score2 <- 2*(evaluation > 3.5) + (hours-50)/15
#
# y1 <- runif(N) < pnorm(score1 - mean(score1))
# y2 <- runif(N) < pnorm(score2 - mean(score2))
#
# HR <- data.frame(gender = factor(ifelse(gender == 0, "female", "male")),
# age, hours, evaluation, salary,
# status = factor(ifelse(y1 == 1, "fired",
# ifelse(y2 == 1, "promoted",
# "ok"))),
# y1 = factor(y1),
# y2 = factor(y2))
# HR <- HR[!(y1&y2),1:6]

Binary file added data/HR.rda
Binary file not shown.
Binary file added data/HRTest.rda
Binary file not shown.
32 changes: 32 additions & 0 deletions man/HR.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit d9b6af9

Please sign in to comment.