Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added install.R. runtime.txt, and R notebook #129

Closed

Conversation

abernauer
Copy link

  1. Added the R notebook
  2. Updated the binder directory with a runtime.txt file containing the R version.
  3. Updated the binder directory with a install.R file with packages to install.

@mlpack-bot
Copy link

mlpack-bot bot commented Oct 21, 2020

Thanks for opening your first pull request in this repository! Someone will review it when they have a chance. In the mean time, please be sure that you've handled the following things, to make the review process quicker and easier:

  • All code should follow the style guide
  • Documentation added for any new functionality
  • Tests added for any new functionality
  • Tests that are added follow the testing guide
  • Headers and license information added to the top of any new code files
  • HISTORY.md updated if the changes are big or user-facing
  • All CI checks should be passing

Thank you again for your contributions! 👍

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

binder/install.R Outdated
@@ -0,0 +1,6 @@
install.packages("mlpack")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong but I think mlpack isn't in CRAN - mlpack/mlpack#2636. I guess the easiest workaround for now is to enable the R bindings in the conda package. On your local setup you manually build the R bindings?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I forgot about this when writing that line. You can install r packages on the command line using R CMD INSTALL mlpack.tar.gz. I got the bindings from a git hub archive of the R bindings from yashwant.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that line should be removed. Yeah enabling the R bindings in the conda package is probably the easiest. The other work around is installing the package with what I mentioned above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess since installing the necessary packages takes a while it makes sense to include the files in the conda package. Will update the package later and post an update here.

Copy link
Author

@abernauer abernauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad the comments were pending.

binder/install.R Outdated
@@ -0,0 +1,6 @@
install.packages("mlpack")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I forgot about this when writing that line. You can install r packages on the command line using R CMD INSTALL mlpack.tar.gz. I got the bindings from a git hub archive of the R bindings from yashwant.

binder/install.R Outdated
@@ -0,0 +1,6 @@
install.packages("mlpack")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that line should be removed. Yeah enabling the R bindings in the conda package is probably the easiest. The other work around is installing the package with what I mentioned above.

@birm
Copy link
Member

birm commented Oct 26, 2020

If unintentional, this works towards #97. 🥇
This is awesome, but I don't feel I can give an approval since I learned exactly enough R to marginally pass assignments where it was needed 🙃

@abernauer
Copy link
Author

@birm Yes sort of unintentional. Though I had fun writing the example and would love to adapt some of the other examples for the R bindings.

@abernauer
Copy link
Author

Pushed the commit addressing the comments by @zoq 😄

@@ -0,0 +1,933 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be really great if you added a quick header comment describing what this example does. You could use the exact same comment from the other notebooks.


Reply via ReviewNB

@@ -0,0 +1,933 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ratings-only.csv.gz file is 62MB, so this is a fairly large example. Maybe it would be good to mention that in case a user thinks that this notebook will only take a second to run? :)


Reply via ReviewNB

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should switch to the small data set, as done for the the other notebook. We can use the same settings, but it's much faster. In movie-lens-cf-py.ipynb I used https://lab.mlpack.org/data/MovieLens-small.zip which is < 1MB.

@@ -0,0 +1,933 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should put this comment just before we load the data? Also, it's probably good to introduce the whole idea of the MovieLens 20M dataset first. Marcus has a nice comment describing it in the C++ notebook. (Where 20M refers to the number of ratings.)

Also some quick formatting changes might be helpful. For instance, "userId an integer id for the user" might be clearer as "userId , an integer id representing the user".


Reply via ReviewNB

@@ -0,0 +1,933 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's :)


Reply via ReviewNB

@@ -0,0 +1,933 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Peek not Peak :)


Reply via ReviewNB

@@ -0,0 +1,933 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice---if you want to use descriptive statistics, it's probably good to add in a comment that this is mlpack support. Specifically we are using mlpack to compute some nice statistics about each dimension in our data, but here there is only one dimension (the rating itself), so we get some nice statistics on the ratings.


Reply via ReviewNB

@@ -0,0 +1,933 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above these cells it could be nice to add a comment block describing that we will train the model, that we want to do a split into a train/test set first, and we'll use mlpack tools for both of these tasks, etc. etc. Basically the reader will be scrolling through, and it's really helpful for them if we can describe why we're doing something instead of making them try and figure out why we did it. :)


Reply via ReviewNB

@@ -0,0 +1,933 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like maybe one more space is needed so the arguments to cf() align? Or maybe that is just my browser.

What does the # model comment mean?


Reply via ReviewNB

@@ -0,0 +1,933 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's :)


Reply via ReviewNB

@@ -0,0 +1,933 @@
{
Copy link
Member

@rcurtin rcurtin Nov 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't recognize a single one of these movies (that doesn't mean they aren't good 😄). I see that in the Python notebook Marcus used user 2; maybe user 2 has less obscure taste? :) (Marcus also printed all of the movies that the user had previously rated, so that the reader can see the correlation between the two. That would probably also be helpful here.)


Reply via ReviewNB

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same I don't know any of the listed movies, maybe that means I should watch more movies, also the movies are quite random, don't really see a pattern :)

@rcurtin
Copy link
Member

rcurtin commented Nov 11, 2020

Hey @abernauer, thanks for taking the time to implement this! It looks great. I left a bunch of comments through ReviewNB; hopefully they're helpful. I do have some slight (but not serious) concern about the CF model itself; I wonder if we need to tune it with different parameters so that the movies it shows are more common. Probably the rank parameter is what would need to be tuned. Maybe worth a shot to see what happens and if the results look qualitatively better or worse? 👍

@shrit
Copy link
Member

shrit commented Jul 3, 2024

@abernauer this is nice work, but I will have to close this PR, we did a major refactoring of this repository and I would doubt that it will be easy to solve all the merge conflicts. Therefore, if you are still interested feel free to open this PR again.

@shrit shrit closed this Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants