Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restrict map_rect to legal data-only args #441

Open
bob-carpenter opened this issue Jan 23, 2020 · 7 comments
Open

restrict map_rect to legal data-only args #441

bob-carpenter opened this issue Jan 23, 2020 · 7 comments
Labels
bug Something isn't working feature New feature or request

Comments

@bob-carpenter
Copy link
Contributor

The following program should not compile because xi[i] isn't an entirely data-only expression. The value xi is a data variable, but the index i is local. That allows illegal programs like this to get compiled:

functions {
  vector f(vector beta, vector theta, data real[] xr, data int[] xi ) {
    return [1.0]';
  }
}
transformed data {
  vector[2] beta;
  vector[2] theta[2];
  real xr[2, 2];
  int xi[2, 2, 2];
}
parameters {
  real y;
}
transformed parameters {
  real s = 0;
  for (i in 1:2) {
    // BAD!  DATA CHANGING BETWEEN CALLS
    s += sum(map_rect(f, beta, theta, xr, xi[i]));
  }
}

These will produce different results with MPI turned on and turned off, which we don't ever want to happen.

@bob-carpenter bob-carpenter added bug Something isn't working feature New feature or request labels Jan 23, 2020
@rok-cesnovar
Copy link
Member

This is for MPI or any map_rect case? Also tagging @wds15

@wds15
Copy link

wds15 commented Jan 23, 2020

Only mpi needs these additional restrictions.

@wds15
Copy link

wds15 commented Jan 24, 2020

I just want to point out that our user doc does correctly state the additional restrictions which we need for map_rect when used with MPI:

https://mc-stan.org/docs/2_21/stan-users-guide/map-function.html

The thing is that for MPI we call "data" something which never changes throughout program execution - that is different as compared to calling "data" something which we do not autodiff.

The crux with the example above is that when MPI is not used (no threading on or we use threading), then map_rect will use the data as given in each loop iteration, since no data caching is happening. Only if MPI is used - so something happening after the parser gets involved - and only then data caching with the assumption of static data (as being defined in data or transformed data) is being used.

The GPU caching facilities should be affected by these issues as well.

FIY, we decided during the development of map_rect that using hashes of the data would be too costly in order to decide if one would need to update the data.

The hardest thing with this parser "bug" is to not restrict map_rect too much since the additional requirements are really only applicable to MPI which is probably anyway not getting used so much any longer given that the Intel TBB is far easier to deploy, run & there are no more speed penalties for using threading compared to MPI (these went away with the TLS refactor and the use of the TBB).

So before putting additional constraints on the usability, we should consider to build into the map_rect MPI functionalities which make it possible to detect if data caching holds or not. The MPI user group is likely a lot smaller than the TBB user group.

@rok-cesnovar
Copy link
Member

The GPU caching facilities should be affected by these issues as well.

Yes, the way the GPU caching was done initally was affected by the same issue. But we removed that after a month or so as Sean pointed out this problem.

In order to leave data on the GPU and not copy it each iteration, we then modified stanc3. Stanc3 now copies all the data or transformed data variables in the init phase of the model (data input). It only copies the variables that are used with GPU supported functions. But its easier to do that there as we have a separate type and we know where data resides based on type (on the OpenCL device if matrix_cl, and in the CPU global memory otherwise).

We could add the MPI restrictions to the stanc3 compiler in a similar way as for OpenCL. When you call the compierl with --use-opencl those checks and moves are then. So we could also do --use-mpi here. The only problem would be if someone created the .hpp file without STAN_MPI and then compiled the model with the flag. That could be solved by adding make/local to the .hpp file dependency that would for a rerun of stanc3.

So we could move the data to the MPI nodes in the same way (its much more clean that any sort of caching), as long as we have the restriction that it can only be used by data.

@rok-cesnovar
Copy link
Member

rok-cesnovar commented Jan 24, 2020

Example:

  • additional matrix_cl typed vars are definied here
  • copy happens here
  • and then used here

For MPI I think we can go with the same idea. If you have an array of N vectors like in this x case, you have N copies and that is it. But again, only data can then be used.

Am I making any sense? :)

@wds15
Copy link

wds15 commented Jan 24, 2020

Thanks @rok-cesnovar ...it does make sense from what you write. I would need to dig into the material you link...can do that later.

... but in all honesty... I am challenging a little bit if it is really worth all the effort for a feature like map_rect MPI. At least I will always have this in my head, given that threading works just fine and does not require all these hoops.

@rok-cesnovar
Copy link
Member

Yeah, it might not be biggest priority, but I think it should bit fixed. It would also simplify the map_rect MPI code in Math. I am going to have to get deeper in stanc3 stuff anyways for our OpenCL backend, so I can also tackle this one once I gather enough knowledge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants