Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rstan model run time too slow (while CPU and RAM are avaiable) #158

Closed
lcumra opened this issue Apr 25, 2015 · 4 comments
Closed

rstan model run time too slow (while CPU and RAM are avaiable) #158

lcumra opened this issue Apr 25, 2015 · 4 comments

Comments

@lcumra
Copy link

lcumra commented Apr 25, 2015

dear all,
I am a newbie of the Rstan world, but I really need it for my thesis. I am actually using the script and a similar dataset from a guy from NYU, who reports as an estimated time about 18 hours. However, when I try to run my model it won't do more than 10% in 18hours. Thus, I ask for some little help to understand what I am doing wrong and how to improve the efficiency.

I am running a 500 iter, 100 warmup 2 chains model with a Bernoulli_logit function over 4 parameters, trying to estimate 2 of them through a No U Turn MC procedure. My data is a 10.000x1004 matrix of 0s and 1s. To wrap it up, the data is a matrix about people following politicians on twitter and I want to estimate their political ideas given who they follow. I run the model on RStudio with R x64 3.1.1 on a Win8 Professional, 6bit, I7 quad core with 16 GB ram.
Checking the performances, rsession uses no more than 14% CPU and 6GB of ram, although 7 more GB are free. While trying to subsample to a 10.000x250 matrix, I have noticed that it will use below 1.5GB instead. However, I have tried the procedure with a 50x50 dataset and it worked just fine, so there is no mistake in the procedure.
Rsession opens 8 threads, i see activity on each core but none is fully occupied.
I wonder why is it the case that my PC does not work at the best of its possibilities and whether there might be some bottleneck, a cap or a setup that prevents it to do so.

this is what happens when i compile it

    Iteration: 1 / 1 [100%]  (Sampling)
   #  Elapsed Time: 0 seconds (Warm-up)
   #                11.451 seconds (Sampling)
   #                11.451 seconds (Total)

SAMPLING FOR MODEL 'stan.code' NOW (CHAIN 2).

Iteration: 1 / 1 [100%]  (Sampling)
#  Elapsed Time: 0 seconds (Warm-up)
#                12.354 seconds (Sampling)
#                12.354 seconds (Total)

while when i run it it just works for hours but it never goes beyond the 10% of the first chain (mainly because I have interrupted it after my pc was about to melt down).

     Iteration:   1 / 500 [  0%]  (Warmup)

the model is estimating the following function

    y[n] ~ bernoulli_logit( alpha[kk[n]] + beta[jj[n]] - gamma * square( theta[jj[n]] - phi[kk[n]] ) );

and has these parameters
(n being about 10mln)

   stan.model <- stan(model_code=stan.code, data = stan.data, init=inits, iter=1, warmup=0, chains=2)

    ## running modle
  stan.fit <- stan(fit=stan.model, data = stan.data, iter=500, warmup=100, chains=2, thin=thin, init=inits)

please help me find what is slowing down the procedure (and if nothing wtong is happening, what can I manipulate to have still some reasonable result in shorter time?).

I thank you in advance,
ML

@lcumra lcumra changed the title rstan model slow in running rstan model run time too slow (while CPU and RAM are avaiable) Apr 25, 2015
@bob-carpenter
Copy link

This is something that should be on the users list (which I'm cc-ing),

http://mc-stan.org/groups.html

not on the issue tracker, but we can discuss it here then close
this issue.

But you need to show us your full model and how you called it.
And have you modified the C++ options? You want to call
set_cppo(mode="fast") to make sure you have optimization turned
on (it's on by default unless something other than Stan modified
your C++ compilation settings for R).

Stan will only use as many cores as you have parallel chains running.
I'm not sure how to run parallel chains on Windows, but someone else
reading this should be able to tell you how to do that.

  • Bob

On Apr 26, 2015, at 6:23 AM, lcumra notifications@github.com wrote:

dear all,
I am a newbie of the Rstan world, but I really need it for my thesis. I am actually using the script and a similar dataset from a guy from NYU, who reports as an estimated time about 18 hours. However, when I try to run my model it won't do more than 10% in 18hours. Thus, I ask for some little help to understand what I am doing wrong and how to improve the efficiency.

I am running a 500 iter, 100 warmup 2 chains model with a Bernoulli_logit function over 4 parameters, trying to estimate 2 of them through a No U Turn MC procedure. My data is a 10.000x1004 matrix of 0s and 1s. To wrap it up, the data is a matrix about people following politicians on twitter and I want to estimate their political ideas given who they follow. I run the model on RStudio with R x64 3.1.1 on a Win8 Professional, 6bit, I7 quad core with 16 GB ram.
Checking the performances, rsession uses no more than 14% CPU and 6GB of ram, although 7 more GB are free. While trying to subsample to a 10.000x250 matrix, I have noticed that it will use below 1.5GB instead. However, I have tried the procedure with a 50x50 dataset and it worked just fine, so there is no mistake in the procedure.
Rsession opens 8 threads, i see activity on each core but none is fully occupied.
I wonder why is it the case that my PC does not work at the best of its possibilities and whether there might be some bottleneck, a cap or a setup that prevents it to do so.

this is what happens when i compile it
Iteration: 1 / 1 100%

Elapsed Time: 0 seconds (Warm-up)

11.451 seconds (Sampling)

11.451 seconds (Total)

SAMPLING FOR MODEL 'stan.code' NOW (CHAIN 2).

Iteration: 1 / 1 100%

Elapsed Time: 0 seconds (Warm-up)

12.354 seconds (Sampling)

12.354 seconds (Total)

while when i run it it just works for hours but it never goes beyond the 10% of the first chain (mainly because I have interrupted it after my pc was about to melt down).
Iteration: 1 / 500 0%

the model is estimating the following function

y[n] ~ bernoulli_logit( alpha[kk[n]] + beta[jj[n]] - gamma * square( theta[jj[n]] - phi[kk[n]] ) );

and has these parameters
(n being about 10mln)
stan.model <- stan(model_code=stan.code, data = stan.data, init=inits, iter=1, warmup=0, chains=2)

running modle

stan.fit <- stan(fit=stan.model, data = stan.data, iter=500, warmup=100, chains=2, thin=thin, init=inits)

please help me find what is slowing down the procedure (and if nothing wtong is happening, what can I manipulate to have still some reasonable result in shorter time?).

I thank you in advance,
ML


Reply to this email directly or view it on GitHub.

@lcumra
Copy link
Author

lcumra commented Apr 26, 2015

sure, we can move elsewhere, as you prefere. Sorry for writing in the wrong place

here's the model

n.iter <- 500
n.warmup <- 100
thin <- 2 ## this will give up to 200 effective samples for each chain and par

Adjmatrix <- read.csv("D:/TheMatrix/Adjmatrix_1004by10000_20150424.txt", header=FALSE)  
##10.000x1004 matrix of {0, 1} with the relationship "user i follows politician j"
StartPhi <- read.csv("D:/TheMatrix/StartPhi_20150424.txt", header=FALSE)  
##1004 vector of values [-1, 1] that should be a good prior for the Phi I want to estimate

start.phi<-ba<-c(do.call("cbind",StartPhi))
y<-Adjmatrix

J <- dim(y)[1]
K <- dim(y)[2]
N <- J * K
jj <- rep(1:J, times=K)
kk <- rep(1:K, each=J)

stan.data <- list(J=J, K=K, N=N, jj=jj, kk=kk, y=c(as.matrix(y)))

## rest of starting values
colK <- colSums(y)
rowJ <- rowSums(y)
normalize <- function(x){ (x-mean(x))/sd(x) }

inits <- rep(list(list(alpha=normalize(log(colK+0.0001)), 
                   beta=normalize(log(rowJ+0.0001)),
                   theta=rnorm(J), phi=start.phi,mu_beta=0, sigma_beta=1, 
                   gamma=abs(rnorm(1)), mu_phi=0, sigma_phi=1, sigma_alpha=1)),2)
##alpha and beta are the popularity of the politician j and the propensity to follow people of user i;
##phi and theta are the position on the political spectrum of pol j and user i; phi has a prior given by expert surveys
##gamma is just a weight on the importance of political closeness

library(rstan)

stan.code <- '
data {
int<lower=1> J; // number of twitter users
int<lower=1> K; // number of elite twitter accounts
int<lower=1> N; // N = J x K
int<lower=1,upper=J> jj[N]; // twitter user for observation n
int<lower=1,upper=K> kk[N]; // elite account for observation n
int<lower=0,upper=1> y[N]; // dummy if user i follows elite j
}
parameters {
vector[K] alpha;
vector[K] phi;
vector[J] theta;
vector[J] beta;
real mu_beta;
real<lower=0.1> sigma_beta;
real mu_phi;
real<lower=0.1> sigma_phi;
real<lower=0.1> sigma_alpha;
real gamma;
}
model {
alpha ~ normal(0, sigma_alpha);
beta ~ normal(mu_beta, sigma_beta);
phi ~ normal(mu_phi, sigma_phi);
theta ~ normal(0, 1); 
for (n in 1:N)
y[n] ~ bernoulli_logit( alpha[kk[n]] + beta[jj[n]] - 
gamma * square( theta[jj[n]] - phi[kk[n]] ) );
}
'

## compiling model
stan.model <- stan(model_code=stan.code, 
data = stan.data, init=inits, iter=1, warmup=0, chains=2)

## running modle
stan.fit <- stan(fit=stan.model, data = stan.data, 
iter=n.iter, warmup=n.warmup, chains=2, 
thin=thin, init=inits)

samples <- extract(stan.fit, pars=c("alpha", "phi", "gamma", "mu_beta",
                                "sigma_beta", "sigma_alpha"))

I have not modified any of the options
I have tried again to launch it and now it seems to use more RAM, up to 9GB (but still low on cpu)
If you have more experience with similar models, how much time would you expect as reasonable to run it? it has been working for 12 hours now and it it still at iteration 50/500 of the first chain. What do you suggest to reduce the running time (it seems that subsampling the dataset to 10.000x250 from 10K x 1000 did not help much in terms of time, I have not tried to seduce it in the rows number, as I was afraid it could provide bad results. I actually use 10K as the original study does so, but I can subsample if you think it won't compromise the results), while having some acceptable results?
just one last silly question, is the time of the iteration declining as the procedure keeps going, or the computing time is always the same?

@bob-carpenter
Copy link

I think the best thing you can do for speed is add priors for
all of your hyperpriors (mu_beta, mu_phi, sigma_alpha, sigma_beta, and sigma_phi)
and for gamma.

The form of your likelihood

y[n] ~ bernoulli_logit( alpha[kk[n]] + beta[jj[n]]
- gamma * square( theta[jj[n]] - phi[kk[n]] ) );

can also be problematic because you are going to get a strong correlation
among all the parameters (alpha, beta, theta, phi, and gamma). So you might
want to try eliminating some of the terms --- for instance, just alpha and
beta should fit fine, and you have alpha centered, so that should be loosely
identified by the prior (though the scale can float with a sigma_alpha) being
a parameter, which is why you want to put a prior on sigma_alpha.

  • Bob

On Apr 26, 2015, at 7:20 PM, lcumra notifications@github.com wrote:

sure, we can move elsewhere, as you prefere. Sorry for writing in the wrong place

here's the model

n.iter <- 500
n.warmup <- 100
thin <- 2 ## this will give up to 200 effective samples for each chain and par

Adjmatrix <- read.csv("D:/TheMatrix/Adjmatrix_1004by10000_20150424.txt", header=FALSE)
StartPhi <- read.csv("D:/TheMatrix/StartPhi_20150424.txt", header=FALSE)

start.phi<-ba<-c(do.call("cbind",StartPhi))
y<-Adjmatrix

J <- dim(y)[1]
K <- dim(y)[2]
N <- J * K
jj <- rep(1:J, times=K)
kk <- rep(1:K, each=J)

stan.data <- list(J=J, K=K, N=N, jj=jj, kk=kk, y=c(as.matrix(y)))

rest of starting values

colK <- colSums(y)
rowJ <- rowSums(y)
normalize <- function(x){ (x-mean(x))/sd(x) }

inits <- rep(list(list(alpha=normalize(log(colK+0.0001)),
beta=normalize(log(rowJ+0.0001)),
theta=rnorm(J), phi=start.phi,mu_beta=0, sigma_beta=1,
gamma=abs(rnorm(1)), mu_phi=0, sigma_phi=1, sigma_alpha=1)),2)

library(rstan)

stan.code <- '
data {
int<lower=1> J; // number of twitter users
int<lower=1> K; // number of elite twitter accounts
int<lower=1> N; // N = J x K
int<lower=1,upper=J> jj[N]; // twitter user for observation n
int<lower=1,upper=K> kk[N]; // elite account for observation n
int<lower=0,upper=1> y[N]; // dummy if user i follows elite j
}
parameters {
vector[K] alpha;
vector[K] phi;
vector[J] theta;
vector[J] beta;
real mu_beta;
real<lower=0.1> sigma_beta;
real mu_phi;
real<lower=0.1> sigma_phi;
real<lower=0.1> sigma_alpha;
real gamma;
}
model {
alpha ~ normal(0, sigma_alpha);
beta ~ normal(mu_beta, sigma_beta);
phi ~ normal(mu_phi, sigma_phi);
theta ~ normal(0, 1);
for (n in 1:N)
y[n] ~ bernoulli_logit( alpha[kk[n]] + beta[jj[n]] -
gamma * square( theta[jj[n]] - phi[kk[n]] ) );
}
'

compiling model

stan.model <- stan(model_code=stan.code,
data = stan.data, init=inits, iter=1, warmup=0, chains=2)

running modle

stan.fit <- stan(fit=stan.model, data = stan.data,
iter=n.iter, warmup=n.warmup, chains=2,
thin=thin, init=inits)

samples <- extract(stan.fit, pars=c("alpha", "phi", "gamma", "mu_beta",
"sigma_beta", "sigma_alpha"))

I have not modified any of the options
I have tried again to launch it and now it seems to use more RAM, up to 9GB (but still low on cpu)
If you have more experience with similar models, how much time would you expect as reasonable to run it? it has been working for 12 hours now and it it still at iteration 50/500 of the first chain. What do you suggest to reduce the running time (it seems that subsampling the dataset did to 10.000x250 from 10K x 1000 did not help much in terms of time), while having some acceptable results?


Reply to this email directly or view it on GitHub.

@lcumra
Copy link
Author

lcumra commented Apr 26, 2015

First, let me thank you for your help, Bob. I really need it!

actually, there should be the priors already in the inits (all normals should come from a N(0,1)). Anway, I have explicitely removed the variables from the model and put 0 and 1 instead
I will try to use alpha and beta's priors as a constant for the whole model, that should reduce indeed the computation time. What I really need are phi and theta, while i think i can safely use the indegrees as measures of the popularity alpha and beta.

computing that I had:

Iteration: 1 / 1 [100%]  (Sampling)
#  Elapsed Time: 0 seconds (Warm-up)
#                8.247 seconds (Sampling)
#                8.247 seconds (Total)


SAMPLING FOR MODEL 'stan.code' NOW (CHAIN 2).

Iteration: 1 / 1 [100%]  (Sampling)
#  Elapsed Time: 0 seconds (Warm-up)
#                7.623 seconds (Sampling)
#                7.623 seconds (Total)
SAMPLING FOR MODEL 'stan.code' NOW (CHAIN 1).

which is slightly better that what I had with the estimation of the whole thing (11 seconds).

In terms or warmup and iter, do you think there's room to reduce some of those so to speed up?
do you think that subsampling and reducing the dimension on the y matrix could be helpful (i'd like to estimate the position on as many politicians as possible, but i can draft some first results on the top 20% for instance or reduce the number of sample followers from 10K to 5K if you think this will not impact badly the estimates)

the model now goes like

stan.code <- '
data {
int<lower=1> J; // number of twitter users
int<lower=1> K; // number of elite twitter accounts
int<lower=1> N; // N = J x K
int<lower=1,upper=J> jj[N]; // twitter user for observation n
int<lower=1,upper=K> kk[N]; // elite account for observation n
int<lower=0,upper=1> y[N]; // dummy if user i follows elite j
real alpha[K];
real beta [J];
}
parameters {
vector[K] phi;
vector[J] theta;
real gamma;
}
model {
phi ~ normal(0, 1);
theta ~ normal(0, 1); 
for (n in 1:N)
y[n] ~ bernoulli_logit( alpha[kk[n]] + beta[jj[n]] - gamma * square( theta[jj[n]] - phi[kk[n]] ) );     }
' 

btw I wonder how comes the authors claims to run the model in 18h only with his own dataset (which is in structure very similar to mine). Is it just a matter of having bigger better computers or there might be something wrong with my data?

@jgabry jgabry closed this as completed Aug 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants