New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KMEANS in dlibml #97

Merged
merged 6 commits into from Aug 14, 2017

Conversation

Projects
None yet
3 participants
@Iron-Stark
Contributor

Iron-Stark commented Jul 31, 2017

@zoq @rcurtin

As discussed I have opened this PR for figuring out the problems with the Makefile.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jul 31, 2017

Member

Looks like we have to link against blas, the only problem I see is, that we have to build against blas or openblas so either -lblas or -lopenblas. I thought we could use pkg-config --libs dlib-1, unfortunately, that does not work. I guess another solution would be to write a simple CMake file, I think that would also make the build script process cleaner. Anyway, for now, let's link against openblas to solve the build issue on the benchmark system.

@Iron-Stark let us know if that solves the build problem on your machine.

Member

zoq commented Jul 31, 2017

Looks like we have to link against blas, the only problem I see is, that we have to build against blas or openblas so either -lblas or -lopenblas. I thought we could use pkg-config --libs dlib-1, unfortunately, that does not work. I guess another solution would be to write a simple CMake file, I think that would also make the build script process cleaner. Anyway, for now, let's link against openblas to solve the build issue on the benchmark system.

@Iron-Stark let us know if that solves the build problem on your machine.

@Iron-Stark

This comment has been minimized.

Show comment
Hide comment
@Iron-Stark

Iron-Stark Jul 31, 2017

Contributor

@zoq

I worked but I also had to build against lapack. So finally adding -llapack and -lblas solved the issue.

Contributor

Iron-Stark commented Jul 31, 2017

@zoq

I worked but I also had to build against lapack. So finally adding -llapack and -lblas solved the issue.

@Iron-Stark

This comment has been minimized.

Show comment
Hide comment
@Iron-Stark

Iron-Stark Jul 31, 2017

Contributor

@zoq

When I do make run BLOCK=dlibml METHODBLOCK=KMEANS, I get could not execute command error. How do I rectify it. I am finding it difficult to debug the code.

Contributor

Iron-Stark commented Jul 31, 2017

@zoq

When I do make run BLOCK=dlibml METHODBLOCK=KMEANS, I get could not execute command error. How do I rectify it. I am finding it difficult to debug the code.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Jul 31, 2017

Member

You could use gdb to step through each line of:

LD_LIBRARY_PATH="libraries/lib/" methods/dlibml/dlibml_kmeans -r datasets/waveform.csv -q datasets/waveform_centroids.csv -v -k 2

also if you run the command above it will return with a segmentation fault, and it turns out you missed the set_size function, so if you do

sample_type m;
m.set_size(referenceData.n_rows);

and

sample_type centers;
centers.set_size(referenceData.n_rows);

it should work.

Member

zoq commented Jul 31, 2017

You could use gdb to step through each line of:

LD_LIBRARY_PATH="libraries/lib/" methods/dlibml/dlibml_kmeans -r datasets/waveform.csv -q datasets/waveform_centroids.csv -v -k 2

also if you run the command above it will return with a segmentation fault, and it turns out you missed the set_size function, so if you do

sample_type m;
m.set_size(referenceData.n_rows);

and

sample_type centers;
centers.set_size(referenceData.n_rows);

it should work.

@Iron-Stark

This comment has been minimized.

Show comment
Hide comment
@Iron-Stark

Iron-Stark Aug 1, 2017

Contributor

@zoq

When I use the line:

std::vector assignments = spectral_cluster(kernel_type(0.1), samples, k);

the code gets stuck but without this it is running fine but the assignments being made need to be timed. Do you observe any mistake in this line?

Contributor

Iron-Stark commented Aug 1, 2017

@zoq

When I use the line:

std::vector assignments = spectral_cluster(kernel_type(0.1), samples, k);

the code gets stuck but without this it is running fine but the assignments being made need to be timed. Do you observe any mistake in this line?

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Aug 1, 2017

Member

The spectral_cluster function is a graph based clustering method, @rcurtin correct me if I'm wrong; so you basically time two methods here. You can remove the function from the script and loop over the samples and use test(sample); to get the assignments.

Member

zoq commented Aug 1, 2017

The spectral_cluster function is a graph based clustering method, @rcurtin correct me if I'm wrong; so you basically time two methods here. You can remove the function from the script and loop over the samples and use test(sample); to get the assignments.

@Iron-Stark

This comment has been minimized.

Show comment
Hide comment
@Iron-Stark

Iron-Stark Aug 1, 2017

Contributor

@zoq

Thanks. Also I am getting can't parse the times : wrong format error while doing parseTimer(s). What can be a possible reason? because I don't see an error in the output. It is:

[INFO ] Loading 'datasets/iris.csv' as CSV data. Size is 4 x 150.
[INFO ] Loaded reference data from 'datasets/iris.csv' (4 x 150).
[INFO ] Loading 'datasets/iris_centroids.csv' as CSV data. Size is 4 x 3.
[INFO ] Loaded query data from 'datasets/iris_centroids.csv' (4 x 3).
[INFO ]
[INFO ] Execution parameters:
[INFO ] help: false
[INFO ] info: ""
[INFO ] k: 3
[INFO ] query_file: datasets/iris_centroids.csv
[INFO ] reference_file: datasets/iris.csv
[INFO ] verbose: true
[INFO ] version: false
[INFO ]
[INFO ] Program timers:
[INFO ] clustering: 0.005752s
[INFO ] loading_data: 0.000510s
[INFO ] total_time: 0.006750s

and this looks like the right format.

Contributor

Iron-Stark commented Aug 1, 2017

@zoq

Thanks. Also I am getting can't parse the times : wrong format error while doing parseTimer(s). What can be a possible reason? because I don't see an error in the output. It is:

[INFO ] Loading 'datasets/iris.csv' as CSV data. Size is 4 x 150.
[INFO ] Loaded reference data from 'datasets/iris.csv' (4 x 150).
[INFO ] Loading 'datasets/iris_centroids.csv' as CSV data. Size is 4 x 3.
[INFO ] Loaded query data from 'datasets/iris_centroids.csv' (4 x 3).
[INFO ]
[INFO ] Execution parameters:
[INFO ] help: false
[INFO ] info: ""
[INFO ] k: 3
[INFO ] query_file: datasets/iris_centroids.csv
[INFO ] reference_file: datasets/iris.csv
[INFO ] verbose: true
[INFO ] version: false
[INFO ]
[INFO ] Program timers:
[INFO ] clustering: 0.005752s
[INFO ] loading_data: 0.000510s
[INFO ] total_time: 0.006750s

and this looks like the right format.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Aug 1, 2017

Member

I hope my comment above is helpful.

Member

zoq commented Aug 1, 2017

I hope my comment above is helpful.

@Iron-Stark

This comment has been minimized.

Show comment
Hide comment
@Iron-Stark

Iron-Stark Aug 1, 2017

Contributor

@zoq
Thanks a lot. The comments were very helpful.

@zoq @rcurtin
This PR is ready for review once the test passes. I will open up another PR for the other implementations.

Contributor

Iron-Stark commented Aug 1, 2017

@zoq
Thanks a lot. The comments were very helpful.

@zoq @rcurtin
This PR is ready for review once the test passes. I will open up another PR for the other implementations.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Aug 2, 2017

Member

Can you validate that the dlib assignments match with mlpack's?

Member

zoq commented Aug 2, 2017

Can you validate that the dlib assignments match with mlpack's?

Show outdated Hide outdated methods/dlibml/src/KMEANS.cpp
Show outdated Hide outdated methods/dlibml/src/KMEANS.cpp
Show outdated Hide outdated methods/dlibml/src/KMEANS.cpp
Show outdated Hide outdated methods/dlibml/src/KMEANS.cpp
Show outdated Hide outdated methods/dlibml/src/KMEANS.cpp
Show outdated Hide outdated methods/dlibml/src/KMEANS.cpp
Show outdated Hide outdated methods/dlibml/src/KMEANS.cpp
@Iron-Stark

This comment has been minimized.

Show comment
Hide comment
@Iron-Stark

Iron-Stark Aug 2, 2017

Contributor

@zoq @rcurtin

How do I write the final centers (stored in a vector initial_centers) and the assignments (stored in assignments) to a csv file (output.csv) in the same way it is written in mlpack implementations. I printed out the outputs and and observed that the mlpack and dlibml implementations are producing the same assignments but it would be better if I could write the outputs produced here into a csv file.

Contributor

Iron-Stark commented Aug 2, 2017

@zoq @rcurtin

How do I write the final centers (stored in a vector initial_centers) and the assignments (stored in assignments) to a csv file (output.csv) in the same way it is written in mlpack implementations. I printed out the outputs and and observed that the mlpack and dlibml implementations are producing the same assignments but it would be better if I could write the outputs produced here into a csv file.

@zoq

This comment has been minimized.

Show comment
Hide comment
@zoq

zoq Aug 2, 2017

Member

A simple solution would be to store the assignments in arma::mat and to use data::Save to save the results to csv. Here is a simple sketch:

arma::mat assignments(1, samples.size());
for(size_t i = 0; i < samples.size(); i++)
{
  assignments(i) = nearest_center(initial_centers, samples[i]); 
}

data::Save("assignments.csv", assignments);
Member

zoq commented Aug 2, 2017

A simple solution would be to store the assignments in arma::mat and to use data::Save to save the results to csv. Here is a simple sketch:

arma::mat assignments(1, samples.size());
for(size_t i = 0; i < samples.size(); i++)
{
  assignments(i) = nearest_center(initial_centers, samples[i]); 
}

data::Save("assignments.csv", assignments);
@Iron-Stark

This comment has been minimized.

Show comment
Hide comment
@Iron-Stark

Iron-Stark Aug 3, 2017

Contributor

@zoq @rcurtin

The dlibml outputs:
https://paste.ubuntu.com/25235295/

The mlpack outputs:
https://paste.ubuntu.com/25235300/

They look fine to me. Please let me know if there are any errors.

Contributor

Iron-Stark commented Aug 3, 2017

@zoq @rcurtin

The dlibml outputs:
https://paste.ubuntu.com/25235295/

The mlpack outputs:
https://paste.ubuntu.com/25235300/

They look fine to me. Please let me know if there are any errors.

@rcurtin

Looks pretty good to me; thank you for fixing this to be regular k-means and not kernel k-means. The assignments you posted look correct, so I think the last thing to do is save the assignments to a file. Once that is done I think this is basically ready.

Show outdated Hide outdated Makefile
# Parse options into string.
optionsStr = ""
if "clusters" in options:
optionsStr = "-k " + str(options.pop("clusters"))

This comment has been minimized.

@rcurtin

rcurtin Aug 6, 2017

Member

I think the clusters option should be required; the algorithm can't work without it.

@rcurtin

rcurtin Aug 6, 2017

Member

I think the clusters option should be required; the algorithm can't work without it.

Show outdated Hide outdated methods/dlibml/src/KMEANS.cpp
Show outdated Hide outdated methods/dlibml/KMEANS.py
Show outdated Hide outdated methods/dlibml/src/KMEANS.cpp
@zoq

zoq approved these changes Aug 11, 2017

Looks good for me. Thanks for the hard work!

@Iron-Stark

This comment has been minimized.

Show comment
Hide comment
@Iron-Stark

Iron-Stark Aug 11, 2017

Contributor

@zoq

Thanks for appreciating the work. :)

Contributor

Iron-Stark commented Aug 11, 2017

@zoq

Thanks for appreciating the work. :)

@rcurtin rcurtin merged commit eef1245 into mlpack:master Aug 14, 2017

1 check passed

Benchmarks Checks Build finished.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment