Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Win build and app tutorials #1436

Merged
merged 11 commits into from Jul 6, 2018
Merged

Conversation

gmanlan
Copy link
Contributor

@gmanlan gmanlan commented Jun 18, 2018

  • Tutorial 1: Up-to-date (VS2017/mlpack3.0.2) Windows Build Guide (doc/guide/build_windows.hpp)
  • Tutorial 2: Sample ML C++ App for Windows (doc/guide/sample_ml_app.hpp)
  • VS project: Sample App for Windows (doc/examples/sample-ml-app)

Tested using Win10, VS2017, latest version of mlpack, armadillo 8.500, boost 1.66

Copy link
Member

@zoq zoq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome, the windows workflow sounds so much easier this way.

and make sure you can use it from the Command Prompt (may need to add to the PATH)

- Download the latest mlpack release from here:
<a href="http://www.mlpack.org/files/mlpack-3.0.2.tar.gz">mlpack-3.0.2</a>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking if it might be useful to provide an alias for the latest package so that we don't have to update this tutorial once we have a new release. @rcurtin what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the fixed path - now using the generic download path so users can grab the latest stable release

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already go through doc/ when I do every release and increment any version numbers, so it's not a huge issue either way. Aliases are nice if we can do it; do you have a suggested way to do it?


@section build_instructions Windows build instructions

- Unzip mlpack to "C:\mlpack\mlpack-3.0.2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to use an existing home directory? Something like ~/on linux.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tricky - we could use %userprofile% in Windows, but that directory is not very friendly for the purpose of an easy-to-follow tutorial (and it changes depending on the windows version) - I think for simplicity we should use the most basic directory we can imagine (there is also a note that says this path is just for reference). Long paths or paths with spaces may produce issues in Windows so we are aiming for the safest option.

Let me know if you still want to change it.

(i.e.: mlpack/tests/data/german.csv), assuming the labels don't require normalization.

@code
bool loaded = mlpack::data::Load("data/german.csv", dataset);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we define dataset, labels first? I guess people might copy each line, and end up with some errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, added the missing dataset definition.

Row<size_t> predictions;
rf.Classify(dataset, predictions);
const size_t correct = arma::accu(predictions == labels);
printf("\nTraining Accuracy: %f", (double(correct) / double(labels.n_elem)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, should we use std::cout here? Usually we don't use printf in the codebase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed all the printf to cout - thanks

Now that our model is trained and validated, we save it to a file so we can use it later.

@code
mlpack::data::Save("mymodel.xml", "model", rf, false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Save should be able to derive the format from the filename.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, removed the unnecessary parameter

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there Germán,

Thanks so much for taking the time to write this. I think it's a huge improvement to documentation and will help a lot of Windows users. I think it's really nice to have the Windows project in the repo ready to go too, so that users can base their own code off the example solution you've given. So definitely this is a much needed improvement to the state of the documentation.

Some minor comments---

  • Do you want to add the BSD license to the various code files? You could also add your name as '@author' if you like.

  • Would it be possible to use AppVeyor to ensure that the example can build? This would really help us ensure that the code doesn't go out of date, which will happen over the years as Visual Studio changes versions, etc.

Thanks again! 👍 (or should I use the 🚀 emoji? I am still figuring these things out)

@@ -0,0 +1,95 @@
/*! @page build_windows Building mlpack From Source

@section build_intro Introduction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think (I would have to check) that Doxygen doesn't use unique identifiers for individual pages, so this reference build_intro will collide with the one from the build page, so I guess we should change these to, e.g., build_windows_intro, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -0,0 +1,95 @@
/*! @page build_windows Building mlpack From Source
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add "On Windows" to the page title?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -0,0 +1,197 @@
/*! @page sample_ml_app Sample C++ ML App
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a good idea to add something about Windows here too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@section sample_crossvalidation Cross-Validating

To evaluate the classifier, we use K-Fold cross-validation. We also define which metric to use in order
to assess the quality of the trained model.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, so the Random Forest rf isn't actually used in this section. So maybe it would be better to restate it as

Instead of training the Random Forest directly, we could also use k-fold cross-validation for training, which will give us a measure of performance on a held-out test set.  This can give us a better estimate of how the model will perform when given new data.

(or something like that?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

KFoldCV<RandomForest<GiniGain, RandomDimensionSelect>, Accuracy> cv(k,
dataset, labels, numClasses);
double cvAcc = cv.Evaluate(numTrees, minimumLeafSize);
cout << "\nKFoldCV Accuracy: " << cvAcc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we extract one of the models trained on cross-validation?

rf = cv.Model(); // this will get the model trained on the last fold

Alternately I guess we could mention that we could train on all of the training data, and the k-fold CV is just to get an idea of the performance. I'm not picky, I'm just hoping to ensure that the example doesn't confuse anyone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Added a note for that.

Now that our model is trained and validated, we save it to a file so we can use it later.

@code
mlpack::data::Save("mymodel.xml", "model", rf, false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth pointing out that we could also save as mymodel.bin which will be much smaller. The XML saves are huge because of all the XML tags :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@gmanlan
Copy link
Contributor Author

gmanlan commented Jun 21, 2018

I'm thinking that using AppVeyor for the example VS project would require to modify the project configuration (i.e. paths) so it works with the build system - however this would break the consistency between the example project and the 'sample_project_config' section of the 'sample_ml_app.hpp' tutorial ... what do you think?

@zoq
Copy link
Member

zoq commented Jun 23, 2018

I guess we could use mlpack-latest instead of mlpack-3.0.2 or something similar to get the same paths, do you think that would be reasonable, or is there anything else I missed? If the path is the main reason, we could also create an alias as part of the build script.

@rcurtin
Copy link
Member

rcurtin commented Jun 25, 2018

I can set up an mlpack-latest.tar.gz link on mlpack.org for the sake of documentation, but I don't think it's a problem to hardcode mlpack-3.0.2 and then update it after each release. What do you think, would that work?

@gmanlan
Copy link
Contributor Author

gmanlan commented Jun 26, 2018

To reduce the overhead of updating the doc each time a new release is available, I have changed the mlpack download path to "http://www.mlpack.org/download.html" so it always refers to the latest version.

@rcurtin
Copy link
Member

rcurtin commented Jul 3, 2018

I talked to @gmanlan over email, I think maybe the best thing to do here is merge as-is, and create an issue for the VS build of the tutorial.

For that build, I think the best idea is to have a special VS project configuration that we can keep in some directory like .appveyor/. Then in appveyor.yml we can just copy that configuration into place, overwriting the existing configuration, and then run the example build as an extra step.

@zoq
Copy link
Member

zoq commented Jul 3, 2018

Sounds reasonable to me, no need to delay this really helpful tutorial any longer.

Copy link
Member

@zoq zoq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, no more comments from my side.

Copy link
Member

@ShikharJ ShikharJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am +1 for this as well.

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, I'll leave 3 days before merge for any more comments. When I merge, I'll open an issue for the AppVeyor build that we can handle some other time.

@rcurtin rcurtin merged commit e3fe135 into mlpack:master Jul 6, 2018
@rcurtin
Copy link
Member

rcurtin commented Jul 6, 2018

Thanks again for the contribution! I really appreciate it. I forgot to add, if you'd like to add your name to src/mlpack/core.hpp and COPYRIGHT.txt, please feel free and I will merge it! And if you'd like some mlpack stickers to put on your laptop, feel free to send Marcus or I an email with your mailing address and we will get them sent. :)

I opened #1463 for the build part.

@gmanlan
Copy link
Contributor Author

gmanlan commented Jul 7, 2018

Great - I'm glad it helps. I just realized that we have not linked/updated the main doc/tutorials page at http://www.mlpack.org/docs/mlpack-3.0.2/doxygen/tutorials.html, so I will be updating this soon to make sure users can find both Linux and Windows tutorials.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants