Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code quit working #138

Open
morgan-j-black opened this issue Mar 28, 2022 · 5 comments
Open

Code quit working #138

morgan-j-black opened this issue Mar 28, 2022 · 5 comments

Comments

@morgan-j-black
Copy link

Hi there,
I am not sure if this issue is related at all to #135, but I have been using the same code and data for many weeks and a few days ago it quit working and produced an inexplicable error. I am using the most recent versions of R and RStudio and I have re-installed Hmsc from Github. Any suggestions would be much appreciated. Here is the call, the error, and the structure of the data the error pertains to.

model = Hmsc(Y=Y,

  •          XData = XData,  XFormula= ~ trt + season,
    
  •          studyDesign = data.frame(Site = RE),
    
  •          ranLevels = list(Site = HmscRandomLevel(units=levels(RE))),
    
  •          distr="poisson")
    

Error in Hmsc(Y = Y, XData = XData, XFormula = ~CG + season, studyDesign = data.frame(Site = RE), :
all XData variables must be numeric or factors

str(XData)
'data.frame': 46 obs. of 14 variables:
$ sample : Factor w/ 46 levels "08RF12016","08RF12017",..: 1 2 3 4 5 6 7 8 9 10 ...
$ site : Factor w/ 18 levels "08RF","11CG",..: 1 1 1 1 2 2 2 2 3 3 ...
$ CG : int 0 0 0 0 1 1 1 1 0 0 ...
$ season : Factor w/ 2 levels "Spring","Summer": 1 1 2 2 1 1 2 2 1 1 ...
$ year : int 2016 2017 2016 2017 2016 2017 2016 2017 2016 2017 ...
$ trt : Factor w/ 2 levels "aRF","CG": 1 1 1 1 2 2 2 2 1 1 ...
$ bay : Factor w/ 3 levels "Calvert","Kanish",..: 2 2 2 2 2 2 2 2 2 2 ...
$ island : Factor w/ 2 levels "Calvert","Quadra": 2 2 2 2 2 2 2 2 2 2 ...
$ lat : num 50.3 50.3 50.3 50.3 50.2 ...
$ long : num -125 -125 -125 -125 -125 ...
$ Richness : num [1:46, 1] -1.935 -0.935 -5.935 -4.935 7.065 ...
$ Complexity : num [1:46, 1] -11.13 -11.13 -11.13 -11.13 -6.41 ...
$ Dist.Refuge: num [1:46, 1] -1.37 -1.37 -1.37 -1.37 3.13 ...
$ sdlevel : Factor w/ 2 levels "high","low": 2 2 2 2 2 2 2 2 2 2 ...

@jarioksa
Copy link
Collaborator

Again a tibble! That tibble brings so much pain.

Variables Richness, Complexity and Dist.Refuge are not variables, but they are matrices. Please change them to variables (vectors). Currently their dimensions are given as [46, 1], but it should only read Richness: num -1.935, -0.935... without those dimensions.

See issue #65 which is a duplicate of this.

@morgan-j-black
Copy link
Author

morgan-j-black commented Mar 28, 2022 via email

@morgan-j-black
Copy link
Author

Just a quick follow up... in this instance, the matrices that should have been vectors were generated using the scale function in base r. I had been avoiding the use of tibbles as I know they can be problematic in general.

dat$Richness = scale(dat$richAll) # this code created the matrices in the dataframe.

@brendanf
Copy link

@morgan-j-black You are right, base scale() always returns a matrix, even if its argument is a vector, and matrix columns are allowed in base data.frames. Single-column matrices are converted to a vector by the data.frame() constructor function, but this does not happen when they are assigned into an existing data.frame. Try dat$Richness = c(scale(dat$richAll)) to remove the dimension attribute.

(As an aside, as recently as 2018, tibble was more restrictive than base data.frame, and did not allow matrix columns by assignment.)

@jarioksa
Copy link
Collaborator

I too hastily put the blame on tibble where we have seen this problem earlier. Actually, it is possible to have even more pathological data frames, such as with poly(x, 2) which adds one variable that is a two-column matrix.

We (or probably I) added the test against matrix entries after issue #65. The basic Hmsc and sampleMcmc commands accept data frames with matrix entries, but then some posterior analysis tools fail for reasons that were outside the Hmsc package (that is, we called functions in other packages such as base and stats and these failed). So we considered it is better to catch these cases before sampleMcmc run, and not weeks later when you finished with sampling and tried to do something with the result. The change was quick and dirty. The minimum is that we need to improve error reporting. I don't have an instant idea to automatically remove the matrix entries from the data frames stored in the result object. This really concerns those auxiliary methods that need access to the original data frame (most of analytic Hmsc tools do not need data frame but they only operate on model matrix which is OK with matrix variables).

jarioksa added a commit that referenced this issue Mar 29, 2022
Matrix-variables in data.frames can occur, but are difficult to
detect, and our reporting was really cryptic: now we at least
report names of those variables that we do not accept. See
issues #67 and #138.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants