Implementation of Tests? #57

kartikdutt18 · 2020-02-29T04:02:39Z

Hi everyone, This is in reference to PR #50, if models are added as libraries rather than cpp file so that they can be included in other cpp files, Rather than testing only on the system of the person who created the PR, does it makes sense to add a single pass on given dataset?
Another test would be loading of weights for the given model.
I am not sure if this needed but I think if changes are made in mlpack that might affect models here, a testing unit can help make the changes.
This will also prevent addition of models that have bugs. In the long run, this might also streamline addition of models.
I would love to hear your opinion.
Thanks.

prince776 · 2020-02-29T09:25:08Z

I agree, if we have a models library(like keras.models) with and without pretrained weights then transfer learning will also become easier to implement. If you need any help, I'll be happy to help.

kartikdutt18 · 2020-02-29T09:32:35Z

Great, If this is something that is worth implementing we can split up the task, I think I have a clear idea of implementation and it would end up requiring 3 changes and we could split them up so that we can get more done.

zoq · 2020-03-03T21:41:09Z

Good idea, I guess any easy way to test the models is to provide predefined weights for each model, let them train for a number of iterations and see if they return the correct results

favre49 · 2020-03-03T22:48:51Z

I think this ties into the more general question of how to structure this repository, because as it stands it isn't as nice and clean as our other repositories. In my opinion, we should decide on a clear file heirarchy before we continue.

Doesn't this also mean we should also change VAE? Also, for the Kaggle folder, we have "examples" instead of these extensible "models", so maybe we should be making an examples folder for cases like that? Maybe we could have the .cpp files for the models in "examples"?

kartikdutt18 · 2020-03-04T10:43:12Z

I completely agree, I wanted to implement the following changes (For at least all the models that I add).
If it's okay It might be a nice idea to do it for all all code in models repo.
Here are the changes that I think might be useful:

Models should be included as libraries / classes (sample can be seen in PR Adding AlexNet #50). So a user can call models/alexnet.hpp and use alexnet as base network for something else.
DataLoaders Folder (Will be pushing changes for it this week in PR#50). Here we should provide a wrapper around data::Load, scalers and data::Split. Especially useful for known datasets such as MNIST. This will also include support for resizing etc.
A unified samples or as @favre49 suggested an examples folder (Refer Adding AlexNet #50) , where a call to each class should be explained. Eg. Object Detection Sample that has sample code for training and validation on a dataset. Similar file for object localization and so on.
A testing folder (this will be for contributors) where they will add test to load weights train for 5 epochs and set a reasonable accuracy as baseline. This will ensure addition of valid models as well as streamline addition of models.
All data in one folder. I would however prefer if we made changes in cmake to download data rather than storing it in repo to make models repo lighter.
Weights folder in models folder where weights of each models is stored.

Why it might be useful:

Shorter and cleaner code for user. (Load data using data loaders,import model and train).
Better for addition of models.
Refined UI.
Helpful for new users to easily train models.

This is also the API (Basic) that I was taking about in my GSOC proposal for ANN Algorithms on the mailing list.
Does this seem okay?

favre49 · 2020-03-04T10:55:22Z

You've pretty much read my mind as to the structure of the codebase. This would mean some restructuring of the code already on the repo, but I don't see any problem with this.

kartikdutt18 · 2020-03-04T10:56:51Z

Great, This should be fun to work on. @zoq, what do you think?

prince776 · 2020-03-04T19:35:41Z

I was also imagining it like this @kartikdutt18 , I would love to help in this. I'll also add some new models like LeNet and VGG.

zoq · 2020-03-04T19:54:09Z

I completely agree, I wanted to implement the following changes (For at least all the models that I add).
If it's okay It might be a nice idea to do it for all all code in models repo.
Here are the changes that I think might be useful:
1. Models should be included as libraries / classes (sample can be seen in PR #50). So a user can call models/alexnet.hpp and use alexnet as base network for something else.

A unified interface would be useful, that said I also don't want to introduce too much boilerplate code, I see the code as a neat way to show some real-world examples; so not sure if CompileModel is needed we could just construct the model in the constructor, or I guess SaveModel is another function that is probably just a wrapper around data::Save, we don't necessarily have to replicate the interface from other libs, but I think that are just details we can discuss on the PR itself.

Also, I think it would be useful if we use the CLI implementation from mlpack, this would bring us one step further to use the bindings framework, so support for python, Julia, etc.

2. DataLoaders Folder (Will be pushing changes for it this week in PR#50). Here we should provide a wrapper around data::Load, scalers and data::Split. Especially useful for known datasets such as MNIST. This will also include support for resizing etc.

Sounds good, so I guess, I could just provide a dataset name and it would return the necessary data structures?

3. A unified samples or as @favre49 suggested an examples folder (Refer #50) , where a call to each class should be explained. Eg. Object Detection Sample that has sample code for training and validation on a dataset. Similar file for object localization and so on.

Not sure what that means, I think each model could serve as an example?

4. A testing folder (this will be for contributors) where they will add test to load weights train for 5 epochs and set a reasonable accuracy as baseline. This will ensure addition of valid models as well as streamline addition of models.

Sounds good.

5. All data in one folder. I would however prefer if we made changes in cmake to download data rather than storing it in repo to make models repo lighter.

Yes, we can do that.

6. Weights folder in models folder where weights of each models is stored.

Right, same as for data this can be stored at some location.

favre49 · 2020-03-05T03:01:59Z

Not sure what that means, I think each model could serve as an example?

What I was thinking for the examples folder is a set of programs that a user could run directly, using the models defined in the header files. This is also better in cases like the digit recognizer, where I don't see the point of making a model - it just serves as a popular basic example.

kartikdutt18 · 2020-03-05T03:18:00Z

I also don't want to introduce too much boilerplate code, I see the code as a neat way to show some real-world examples

Agreed, It would be much cleaner and simpler that way.

Sounds good, so I guess, I could just provide a dataset name and it would return the necessary data structures?

Yes, the user could provide name of supported dataset (like mnist) or path to his own dataset (ReadME would have structure in which csv should be in).

Not sure what that means, I think each model could serve as an example?

What I had in mind was, that in examples folder we could have code for problems such as time series, object detection etc. A user would use CLI to specify model name, path to dataset or name of dataset (that model repo supports) and other parameters and it would run training and validation on it. Internally it would have all models included. (object_detection.cpp)
Also we could add a file to demonstrate how to add a model and train for some popular dataset such as
mnist_cnn_tutorial.cpp for this purpose.
What do you think, should I do this or one file for each model or something different?

kartikdutt18 · 2020-03-05T03:31:43Z

@prince776, That would be great. There is PR open for VGG already. I think there should be one PR to restructure (for now) otherwise we might end up having different style of implemented models and data-loaders, I think once I open the PR and the basic structure is approved you can modify the code into that structure and we could also add them to the repo. Till then a good way to start might be implementing and testing the model locally. For restructuring, There is a lot that needs to be done so I'll tag you in the PR and things that would run better parallelly could be done by both of us or some more contributors. I think we will figure it out.

prince776 · 2020-03-05T03:57:19Z

@prince776, That would be great. There is PR open for VGG already. I think there should be one PR to restructure (for now) otherwise we might end up having different style of implemented models and data-loaders, I think once I open the PR and the basic structure is approved you can modify the code into that structure and we could also add them to the repo. Till then a good way to start might be implementing and testing the model locally. For restructuring, There is a lot that needs to be done so I'll tag you in the PR and things that would run better parallelly could be done by both of us or some more contributors. I think we will figure it out.

Ok sure. There are a lot of things I want to do, fell a little back due to exams but it's ok.

kartikdutt18 · 2020-03-05T04:09:14Z

There are a lot of things I want to do, fell a little back due to exams but it's ok.

I am in a similar situation right now.

zoq · 2020-03-05T22:06:02Z

What I had in mind was, that in examples folder we could have code for problems such as time series, object detection etc. A user would use CLI to specify model name, path to dataset or name of dataset (that model repo supports) and other parameters and it would run training and validation on it. Internally it would have all models included. (object_detection.cpp)
Also we could add a file to demonstrate how to add a model and train for some popular dataset such as
mnist_cnn_tutorial.cpp for this purpose.
What do you think, should I do this or one file for each model or something different?

I think the advantage of the current approach is that we can build each model independently, so if a user is only interested in a specific model there is no need to build everything else, which might come with some extra dependencies like opencv that are just not needed. But I think we could do both, but start with one?

zoq · 2020-03-05T22:07:02Z

There are a lot of things I want to do, fell a little back due to exams but it's ok.

I am in a similar situation right now.

No worries, best of luck with you exams.

kartikdutt18 · 2020-03-06T01:20:07Z

What I had in mind was, that in examples folder we could have code for problems such as time series, object detection etc. A user would use CLI to specify model name, path to dataset or name of dataset (that model repo supports) and other parameters and it would run training and validation on it. Internally it would have all models included. (object_detection.cpp)
Also we could add a file to demonstrate how to add a model and train for some popular dataset such as
mnist_cnn_tutorial.cpp for this purpose.
What do you think, should I do this or one file for each model or something different?

I think the advantage of the current approach is that we can build each model independently, so if a user is only interested in a specific model there is no need to build everything else, which might come with some extra dependencies like opencv that are just not needed. But I think we could do both, but start with one?

Agreed, This was also discussed in Video meetup yesterday, some other points that were mentioned are:

Ensure optional download of weight (testing is optional for user).
Same for datasets.

I'll start with one thing and take it from there. I think this would be great.
Thanks.

zoq · 2020-03-06T21:56:13Z

Sounds good, I think it's easier to discuss changes if we have a minimal example.

kartikdutt18 · 2020-03-08T18:18:02Z

Hi everyone I changed the DigitRecognizerCNN a bit to support LeNet1, LeNet4 and LeNet5.

Here are some intial results for LeNet 4(Ran this on my local machine, will run more epochs on my college workstation.)

Epoch 1/150 loss: 10.2334
Epoch 2/150 loss: 1.66515
Epoch 3/150 loss: 0.953763
Validation Accuracy : 89.5476

I haven't added comments and there are some style issues. Once I fix them, I hope to open a PR for this issue tomorrow.

Also in the new script (without comments), it takes only 10 lines to train and test a model.
Thanks.

mlpack-bot · 2020-04-07T18:35:01Z

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

kartikdutt18 · 2020-04-09T16:53:08Z

Keep Open (At least till mlpack/models#3 is merged).

kartikdutt18 · 2020-06-19T04:26:31Z

Tests have been implemented in models repo. Thanks everyone for all the help!

mlpack-bot bot added s: unanswered s: unlabeled labels Feb 29, 2020

birm added t: question and removed s: unlabeled labels Feb 29, 2020

This was referenced Mar 10, 2020

What is this repository for? #61

Closed

Restructuring code for models repo. #60

Closed

This was referenced Mar 19, 2020

Adding Better Documentation and Modifying ReadME. #65

Closed

Simplifying the Examples repository. #66

Closed

Restructuring - 1 (Adding Data loader and associated Utility Functions). mlpack/models#3

Merged

mlpack-bot bot added the s: stale label Apr 7, 2020

mlpack-bot bot removed the s: stale label Apr 9, 2020

birm added the s: keep open label Apr 9, 2020

kartikdutt18 closed this as completed Jun 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of Tests? #57

Implementation of Tests? #57

kartikdutt18 commented Feb 29, 2020 •

edited

Loading

prince776 commented Feb 29, 2020 •

edited

Loading

kartikdutt18 commented Feb 29, 2020

zoq commented Mar 3, 2020

favre49 commented Mar 3, 2020

kartikdutt18 commented Mar 4, 2020 •

edited

Loading

favre49 commented Mar 4, 2020

kartikdutt18 commented Mar 4, 2020

prince776 commented Mar 4, 2020

zoq commented Mar 4, 2020

favre49 commented Mar 5, 2020

kartikdutt18 commented Mar 5, 2020 •

edited

Loading

kartikdutt18 commented Mar 5, 2020

prince776 commented Mar 5, 2020

kartikdutt18 commented Mar 5, 2020

zoq commented Mar 5, 2020

zoq commented Mar 5, 2020

kartikdutt18 commented Mar 6, 2020

zoq commented Mar 6, 2020

kartikdutt18 commented Mar 8, 2020 •

edited

Loading

mlpack-bot bot commented Apr 7, 2020

kartikdutt18 commented Apr 9, 2020

kartikdutt18 commented Jun 19, 2020

Implementation of Tests? #57

Implementation of Tests? #57

Comments

kartikdutt18 commented Feb 29, 2020 • edited Loading

prince776 commented Feb 29, 2020 • edited Loading

kartikdutt18 commented Feb 29, 2020

zoq commented Mar 3, 2020

favre49 commented Mar 3, 2020

kartikdutt18 commented Mar 4, 2020 • edited Loading

favre49 commented Mar 4, 2020

kartikdutt18 commented Mar 4, 2020

prince776 commented Mar 4, 2020

zoq commented Mar 4, 2020

favre49 commented Mar 5, 2020

kartikdutt18 commented Mar 5, 2020 • edited Loading

kartikdutt18 commented Mar 5, 2020

prince776 commented Mar 5, 2020

kartikdutt18 commented Mar 5, 2020

zoq commented Mar 5, 2020

zoq commented Mar 5, 2020

kartikdutt18 commented Mar 6, 2020

zoq commented Mar 6, 2020

kartikdutt18 commented Mar 8, 2020 • edited Loading

mlpack-bot bot commented Apr 7, 2020

kartikdutt18 commented Apr 9, 2020

kartikdutt18 commented Jun 19, 2020

kartikdutt18 commented Feb 29, 2020 •

edited

Loading

prince776 commented Feb 29, 2020 •

edited

Loading

kartikdutt18 commented Mar 4, 2020 •

edited

Loading

kartikdutt18 commented Mar 5, 2020 •

edited

Loading

kartikdutt18 commented Mar 8, 2020 •

edited

Loading