Distinguish between discrete and continuous variables #7

tfaits · 2016-07-19T14:23:28Z

In its current form, PathoStat accepts "batch" and "condition" as possible discrete variables, and gives the user the option to color/group data (in various plots) by either of those. However, we're adding functionality: PathoStat will accept any number of covariates, such as patient age, weight, race, disease status, whatever. We still want to let users color/group data based on these things, but that doesn't make much sense for continous variables. Without binning, how do you group people by weight? You can, however, order data by continuous variables. We want to at least distinguish between the two types, and we may want to add functionality for continuous variables.

mlbendall · 2016-07-28T21:53:02Z

I agree with this, I am running up against the same issue now. If you are just looking for the types as currently assigned, you can do this:

sapply(sample_variables(pstat), function(v) { class(sample_data(pstat)[[v]]) })

However, I think we need to be explicit in assigning types to sample variables. A function should be implemented that accepts user input to assign types, or attempts to infer from the data. Inferring may not be 100% accurate. For example, R (read.table or similar) interprets "Subject ID" as an integer, but it should be a factor, since there is no meaningful ordering to the subjects. Still, inferring from the data would be a good first step.

I propose we have more than two types. I think our types should be according to the standard R data types:

factors: categorical/nominal variables
ordered factors: ordinal variables, useful for representing longitudinal variables and discretizing continuous variables
integer: continuous type
numeric/double: continuous type
character: text that does not need to be treated as a variable, mostly for display purposes.

These types will naturally suggest how to display them. For example, factors can be displayed using "select" inputs and qualitative color palettes, while ordered factors may also use "select" inputs but be displayed with sequential color palettes.

In addition, users should be able to indicate which covariates are "of interest". Perhaps there should be several categories, such as secondary/confounders, batch covariates, and random effects.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distinguish between discrete and continuous variables #7

Distinguish between discrete and continuous variables #7

tfaits commented Jul 19, 2016

mlbendall commented Jul 28, 2016

Distinguish between discrete and continuous variables #7

Distinguish between discrete and continuous variables #7

Comments

tfaits commented Jul 19, 2016

mlbendall commented Jul 28, 2016