HMM<GMM < > > does not scale to more than 7 dimensions for the observations for unsupervised training #301

Closed
rcurtin opened this Issue Dec 29, 2014 · 8 comments

Projects

None yet

1 participant

@rcurtin
Member
rcurtin commented Dec 29, 2014

Reported by fleischhauf on 29 May 44034775 16:30 UTC
Attached file creates an HMM<GMM<> > with 1 to 12 dimensions and generates random training data for unsupervised training. It runs for 1-7 dimensions, from the 8th onward it throws a double or free corruption error.

cheers,
Nik

Migrated-From: http://trac.research.cc.gatech.edu/fastlab/ticket/314

@rcurtin rcurtin self-assigned this Dec 29, 2014
@rcurtin
Member
rcurtin commented Dec 30, 2014

Commented by fleischhauf on 11 Jan 44038327 01:39 UTC
ok, i just tried it on a different system and it seems to work there, maybe its some problem with the libraries installed.

@rcurtin
Member
rcurtin commented Dec 30, 2014

Commented by rcurtin on 23 Oct 44038371 19:12 UTC
I was looking at this earlier today before I had to move to different work, and I think it's definitely a problem; valgrind was reporting that uninitialized memory is being used in various places, which would cause your issue. So before this can be resolved those memory issues will need to be worked out (both in the HMM and GMM classes).

@rcurtin
Member
rcurtin commented Dec 30, 2014

Commented by rcurtin on 13 Apr 44040846 13:29 UTC
Ok, as of r16139, r16150, and r16151, I have fixed all the memory issues valgrind was complaining about. So I suspect that if you were to try this again on the system you were having a problem on, I think it will be fixed. If not, feel free to reopen. :)

Thanks for reporting this!

@rcurtin
Member
rcurtin commented Dec 30, 2014

Commented by fleischhauf on 6 Nov 44042814 14:37 UTC
Hey,
thanks for fixing this so fast!
The double or free corruption disappeared.
Unfortunately there is a segmentation fault now :/
(for dimensions 8, i think in the first iteration)

happens at:
hmm_impl.hpp->163:emission[emissionProbstate;
gmm_impl.hpp->349:fitter.Estimate(observations, probabilities, means, covariances, weights,useExistingModel);
em_fit_impl.hpp->199: l = LogLikelihood(observations, means, covariances, weights);
em_fit_impl.hpp->282: phi(observations, means[covariancesi, phis);
phi.hpp->135: arma::mat rhs = -0.5 * inv(cov) * diffs;

if that helps..

@rcurtin
Member
rcurtin commented Dec 30, 2014

Commented by rcurtin on 25 May 44043045 21:42 UTC
Ok, I can't reproduce this, so can I get a little more information?

  • the architecture and OS (and its version) of the system you're running on
  • the output when you compile both mlpack and your executable with debugging flags (when you use CMake to build mlpack, add -DDEBUG=ON, and when you build your executable add -g -DDEBUG to the compiler command line)
  • the compiler you're using

That should shed some more light on what's going on. My current guess is that the covariance matrix isn't invertible, because of the line your problem is coming from, but that situation shouldn't happen...

Thanks,

Ryan

@rcurtin
Member
rcurtin commented Dec 30, 2014

Commented by fleischhauf on 26 Jun 44043883 19:48 UTC
Hey Ryan,
Im using:

a Intel Core i7 CPU Q 720 @ 1.60GHz 8
3.8 GiB ram.
Ubuntu 13.10 x64.

compiler version should be:
g++ (Ubuntu/Linaro 4.8.1-10ubuntu9) 4.8.1

I've attached the application output (for the last two runs 7 and 8 dimensions)
cmake output for mlpack, make output mlpack and compile output of the test program (in case its useful).

cheers from amsterdam,
nik

@rcurtin
Member
rcurtin commented Dec 30, 2014

Commented by rcurtin on 22 Aug 44543833 00:29 UTC
Hi Nik,

I've spent the past couple days trying to reproduce this issue, but I can't. I found a system with the same configuration (Ubuntu 13.10, x86_64, same gcc version, same Boost version, same Armadillo version; different processor though). I ran with valgrind in both debug and non-debug mode, but valgrind reported no invalid accesses or memory issues. In addition, I couldn't get it to segfault.

Then, I slightly modified the main executable to set the random seed to std::time(NULL), to see if it was an odd problem caused by the particular way the random data was created. I tried running this for the past couple of days (probably hundreds of runs) but was unable to ever produce a random seed that could cause a segfault. If the issue was an Armadillo issue, it should have issued an error because the program was compiled in debug mode, but I don't see any Armadillo error either.

Without any ability to reproduce the issue, I'm leaning towards this being a hardware issue; maybe a bit of bad RAM or something like that. I'm sorry I can't give a better answer than that, but unless you have a way to reproduce the issue on multiple machines I can't actually debug it any further.

Alternately, you could run the program in gdb, and procure a backtrace when the segfault occurs. That is what I was going to do if I could reproduce the issue.

Sorry for the long delay in response to this; it took me a while to find time to dig up a system with the same specs.

@rcurtin rcurtin removed their assignment Dec 30, 2014
@rcurtin
Member
rcurtin commented Jan 9, 2015

Can't reproduce. Closing for activity.

@rcurtin rcurtin closed this Jan 9, 2015
@rcurtin rcurtin added the R: inactive label Jan 9, 2015
@ShangtongZhang ShangtongZhang pushed a commit to ShangtongZhang/mlpack that referenced this issue Mar 11, 2015
@rcurtin rcurtin Modified patch from Saheb for #301; this unifies the constructors for
NeighborSearch so they work with all tree types.  The modifications I've made
make it so that the referenceCopy and queryCopy matrices aren't full copies of
the referenceSet and querySet matrices when the tree doesn't modify them (in the
case where they aren't modified, it's not necessary to copy them, that's just a
waste of memory).
b8747ec
@ShangtongZhang ShangtongZhang pushed a commit to ShangtongZhang/mlpack that referenced this issue Mar 11, 2015
@rcurtin rcurtin Patch from Saheb for #301; refactor RangeSearch constructors so that …
…leafSize is

not a parameter.
0081a00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment