-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Win build and app tutorials #1436
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome, the windows workflow sounds so much easier this way.
doc/guide/build_windows.hpp
Outdated
and make sure you can use it from the Command Prompt (may need to add to the PATH) | ||
|
||
- Download the latest mlpack release from here: | ||
<a href="http://www.mlpack.org/files/mlpack-3.0.2.tar.gz">mlpack-3.0.2</a> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking if it might be useful to provide an alias for the latest package so that we don't have to update this tutorial once we have a new release. @rcurtin what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the fixed path - now using the generic download path so users can grab the latest stable release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already go through doc/
when I do every release and increment any version numbers, so it's not a huge issue either way. Aliases are nice if we can do it; do you have a suggested way to do it?
doc/guide/build_windows.hpp
Outdated
|
||
@section build_instructions Windows build instructions | ||
|
||
- Unzip mlpack to "C:\mlpack\mlpack-3.0.2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to use an existing home directory? Something like ~/
on linux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tricky - we could use %userprofile% in Windows, but that directory is not very friendly for the purpose of an easy-to-follow tutorial (and it changes depending on the windows version) - I think for simplicity we should use the most basic directory we can imagine (there is also a note that says this path is just for reference). Long paths or paths with spaces may produce issues in Windows so we are aiming for the safest option.
Let me know if you still want to change it.
doc/guide/sample_ml_app.hpp
Outdated
(i.e.: mlpack/tests/data/german.csv), assuming the labels don't require normalization. | ||
|
||
@code | ||
bool loaded = mlpack::data::Load("data/german.csv", dataset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we define dataset, labels
first? I guess people might copy each line, and end up with some errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, added the missing dataset definition.
doc/guide/sample_ml_app.hpp
Outdated
Row<size_t> predictions; | ||
rf.Classify(dataset, predictions); | ||
const size_t correct = arma::accu(predictions == labels); | ||
printf("\nTraining Accuracy: %f", (double(correct) / double(labels.n_elem))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, should we use std::cout here? Usually we don't use printf
in the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed all the printf to cout - thanks
doc/guide/sample_ml_app.hpp
Outdated
Now that our model is trained and validated, we save it to a file so we can use it later. | ||
|
||
@code | ||
mlpack::data::Save("mymodel.xml", "model", rf, false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Save should be able to derive the format from the filename.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, removed the unnecessary parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there Germán,
Thanks so much for taking the time to write this. I think it's a huge improvement to documentation and will help a lot of Windows users. I think it's really nice to have the Windows project in the repo ready to go too, so that users can base their own code off the example solution you've given. So definitely this is a much needed improvement to the state of the documentation.
Some minor comments---
-
Do you want to add the BSD license to the various code files? You could also add your name as '@author' if you like.
-
Would it be possible to use AppVeyor to ensure that the example can build? This would really help us ensure that the code doesn't go out of date, which will happen over the years as Visual Studio changes versions, etc.
Thanks again! 👍 (or should I use the 🚀 emoji? I am still figuring these things out)
doc/guide/build_windows.hpp
Outdated
@@ -0,0 +1,95 @@ | |||
/*! @page build_windows Building mlpack From Source | |||
|
|||
@section build_intro Introduction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think (I would have to check) that Doxygen doesn't use unique identifiers for individual pages, so this reference build_intro
will collide with the one from the build page, so I guess we should change these to, e.g., build_windows_intro
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
doc/guide/build_windows.hpp
Outdated
@@ -0,0 +1,95 @@ | |||
/*! @page build_windows Building mlpack From Source |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add "On Windows" to the page title?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
doc/guide/sample_ml_app.hpp
Outdated
@@ -0,0 +1,197 @@ | |||
/*! @page sample_ml_app Sample C++ ML App |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be a good idea to add something about Windows here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
doc/guide/sample_ml_app.hpp
Outdated
@section sample_crossvalidation Cross-Validating | ||
|
||
To evaluate the classifier, we use K-Fold cross-validation. We also define which metric to use in order | ||
to assess the quality of the trained model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, so the Random Forest rf
isn't actually used in this section. So maybe it would be better to restate it as
Instead of training the Random Forest directly, we could also use k-fold cross-validation for training, which will give us a measure of performance on a held-out test set. This can give us a better estimate of how the model will perform when given new data.
(or something like that?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
KFoldCV<RandomForest<GiniGain, RandomDimensionSelect>, Accuracy> cv(k, | ||
dataset, labels, numClasses); | ||
double cvAcc = cv.Evaluate(numTrees, minimumLeafSize); | ||
cout << "\nKFoldCV Accuracy: " << cvAcc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we extract one of the models trained on cross-validation?
rf = cv.Model(); // this will get the model trained on the last fold
Alternately I guess we could mention that we could train on all of the training data, and the k-fold CV is just to get an idea of the performance. I'm not picky, I'm just hoping to ensure that the example doesn't confuse anyone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Added a note for that.
doc/guide/sample_ml_app.hpp
Outdated
Now that our model is trained and validated, we save it to a file so we can use it later. | ||
|
||
@code | ||
mlpack::data::Save("mymodel.xml", "model", rf, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth pointing out that we could also save as mymodel.bin
which will be much smaller. The XML saves are huge because of all the XML tags :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I'm thinking that using AppVeyor for the example VS project would require to modify the project configuration (i.e. paths) so it works with the build system - however this would break the consistency between the example project and the 'sample_project_config' section of the 'sample_ml_app.hpp' tutorial ... what do you think? |
I guess we could use |
I can set up an |
To reduce the overhead of updating the doc each time a new release is available, I have changed the mlpack download path to "http://www.mlpack.org/download.html" so it always refers to the latest version. |
I talked to @gmanlan over email, I think maybe the best thing to do here is merge as-is, and create an issue for the VS build of the tutorial. For that build, I think the best idea is to have a special VS project configuration that we can keep in some directory like |
Sounds reasonable to me, no need to delay this really helpful tutorial any longer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, no more comments from my side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am +1 for this as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, I'll leave 3 days before merge for any more comments. When I merge, I'll open an issue for the AppVeyor build that we can handle some other time.
Thanks again for the contribution! I really appreciate it. I forgot to add, if you'd like to add your name to I opened #1463 for the build part. |
Great - I'm glad it helps. I just realized that we have not linked/updated the main doc/tutorials page at http://www.mlpack.org/docs/mlpack-3.0.2/doxygen/tutorials.html, so I will be updating this soon to make sure users can find both Linux and Windows tutorials. |
Tested using Win10, VS2017, latest version of mlpack, armadillo 8.500, boost 1.66