Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New design of model structure #363

Closed
beam2d opened this issue Aug 31, 2015 · 29 comments
Closed

New design of model structure #363

beam2d opened this issue Aug 31, 2015 · 29 comments
Labels
cat:feature Implementation that introduces new interfaces.
Milestone

Comments

@beam2d
Copy link
Member

beam2d commented Aug 31, 2015

I am thinking about redesigning the model structure of Chainer. It aims at following goals:

  • Provide "the standard way" to define a structured model
  • Provide a safe and flexible way to save and load models
  • Make Optimizer more flexible

I wrote a document on a new design:
https://docs.google.com/document/d/1iO30OCnMFvn48Q6-2BOgKsJQXIKe0EQuCGwRdDJlgaE/edit#.
In this document I wrote several other issues that I want to resolve by the new design. I will try to implement it in a new branch as soon as possible. Please add comments on the document or this issue if you have any questions, suggestions, or discussion items.

@jheymann85
Copy link
Contributor

Catching up on a recent comment from karpathy (see here from 'badmephisto') I would suggest at least thinking about making the functions/layers and the parameters independent.
E.g. there would be a Linear function taking a weight and optional a bias when being applied. The weight itself should be abstracted in a parameter class in order to keep track of its gradient.
The parameters could then be organized as a ParameterSet for saving/loading/optimization.

@beam2d
Copy link
Member Author

beam2d commented Sep 7, 2015

Thank you for the suggestion with the reddit post (I missed that!). It seems an important point for more flexibility. Linear and Convolution2D already have such variants (linear and convolution_2d combined with Parameter), though other parameterized functions do not have. I did not think about changing designs around computational graphs in this issue, but this might be a good chance to make things better. I will think about making the functions and the parameters independent by default through this issue.

On the other hand, I think it should be still useful to provide some abstraction layer to combine procedures and parameters (as benanne wrote in the comment on the reddit post). In the new design, Model class does such thing. Existing parameterized functions might be implemented as callable Model even after the separation of functions and parameters. I think it is acceptable.

@beam2d
Copy link
Member Author

beam2d commented Sep 7, 2015

I updated the document. I will further update for Variable support on parameters.

@tkng
Copy link
Contributor

tkng commented Sep 10, 2015

Hello, I'm excited that we can save/load model in a portable manner. I'm definitely looking forward to it.

By the way, I want to discuss about the design of Model/ModelDict/ModelList. When I read the design document first, it seems a bit difficult for me to use ModelDict/ModelList properly. i.e. When I should use ModelDict and when I should use ModelList is not obvious for me.

IMHO, the name of 'Model' is a bit confusing. What the current Model do are saving/loading parameters, (make it possible to) sharing parameters and regularizing parameters. (Sorry if I misunderstood.) Model has little to do except handling parameters. Therefore I propose to rename the 'Model' to 'Parameter'. According to this change, ModelDict/ModelList could be renamed to ParameterDict/ParameterList. Of course there may be a better idea, but it still doesn't clear for me.

What do you think?

@beam2d
Copy link
Member Author

beam2d commented Sep 10, 2015

Thank you for the comment and suggestion! I describe how to use ModelDict/ModelList (which might be renamed as your suggestion) in following lines.

I intended to use them as base classes of "model" definitions. For example, the current Linear function will be defined as a subclass of Model (that uses a "pure" Function inside of it). Multi-layer perceptron with an arbitrary number of layers might be defined as a subclass of ModelList (since it is like a list of multiple submodels). AutoEncoder might be defined as a subclass of ModelDict (since it uses named submodels like 'encoder' and 'decoder'). Model is not just a container of parameters. It defines a hierarchy of models each of which can have parameters (though at least the leaves of the hierarchy should have parameters).

BTW I agree that "Model" is a confusing name. I want a name more expressive, while I do not want to make it just a container of parameters. Any ideas?

@beam2d
Copy link
Member Author

beam2d commented Sep 13, 2015

How about "Learnable"?

@beam2d
Copy link
Member Author

beam2d commented Sep 14, 2015

Current options for the name:

  • Parameter / ParameterDict / ParameterList
  • Learnable / LearnableDict / LearnableList
  • Trainable / TrainableDict / TrainableList
  • Parameterized / ParameterizedDict / ParameterizedList

@tkng
Copy link
Contributor

tkng commented Sep 14, 2015

Sorry for my late replying.

I think avoid use of 'Model' is a good idea, but if we can give the clear definition for the word 'Model', it would be great. Because avoid use of Model will not address the problem: "There is no standard way to define ones’ neural net models."

Below is not my opinion, but my current thought. Sorry for no conclusion and not enough wordsmith.

As a user, I expect the Model class following:

  • can use (e.g. predict, classify, may be generalized as forward)
  • can train
  • can save to file
  • can load from file

I read documents of scikit-learn, and above expectation is not so unusual, I think.

What I expect the Model class when developing the new model?

  • provide a way to easy save/load implementation
  • does not prevent composition/enhancement of existing models
  • provide coding standards as APIs

What I do not expect to the Model class:

  • callable
  • internal model hierarchy

Though yet I cannot describe why, the idea of callable model is not comfortable for me. Internal model hierarchy is also not comfortable for me, because I'm not confident that I can always define a clear model hierarchy for my current neural network.

From the other point of view, are LearnableList/LearnableDict adequate for describing model hierarchy? LearnableDict may be suffice.

@beam2d
Copy link
Member Author

beam2d commented Sep 14, 2015

(I continue using the name "Model" in this comment since the name is not determined yet, though it does not mean I decided to use this name as the final decision.)

In my thought, the internal model hierarchy is just a standard way to define composite models. If model definition is based on ModelDict/ModelList, then users can easily identify the "children" models by standard path names (so the relative path becomes a part of the interface of user-defined composite models). I believe it's a not complicated thing; just use ModelDict/ModelList to use external models inside of your models, then they automatically define the model hierarchy.

I agree that "callable models" do not look good in general. I think, though, there might be models that behave like functions. I think this is just a design issue of each model, like "Should this functionality be defined by __call__ or not?"

As you wrote, ModelDict is enough for defining the model hierarchy. ModelList is just for convenience. I (and some other users) feel much frustration on defining a FunctionSet with a sequence of parameterized Functions (like a general MLP with configurable number of layers), since it requires explicit handling of the length and indexes (e.g. we cannot iterate over the dict with correct orders). I think this is not pythonic, so I added ModelList. It's not mandatory, though.

@tkng
Copy link
Contributor

tkng commented Sep 15, 2015

Now I might understand what causes my uncomfortableness for ModelDict/ModelList.

For me, Model sounds like 'utilizable', but current Model interface does not guarantee that. What I expect the word 'utilizable' are predict, train, save, load, etc. There are two mindset problems (for me).

  1. Model does not provide 'predict'. Predict or similar feature is provided by Function.
  2. Model does not provide 'train'. What you can do with the Model are only addgrads/resetgrads.

ModelDict was very uncomfortable for me, because I could not imagine what happens when I call predict (forward) of the ModelDict. But the answer was very simple, since ModelDict does not provide such a feature, there's no problem.

There are two solutions. One is already mentioned, we do not use the name 'Model'.

Another is, the Model class inherits the Function class. In this case, calling of ModelList would be straightforward, but calling of ModelDict is not trivial. But I believe you are smart enough to solve this problem. In the design doc, you notified that Inception module should not inherit Function class, but I think it's ok to inherits the Function, because Inception module modifies the computation graph, though itself may not participate to the computation graph.

This new solution is apparently far from perfect. I hope it helps you, but, if it's not, please ignore and go ahead. Anyway, since I found the root cause of my uncomfortableness, now I'm happy.

@beam2d
Copy link
Member Author

beam2d commented Sep 15, 2015

Thank you for the further suggestion! I feel the design gets much clearer. I'm not smart enough to brush up your second solution in a few days, though :)

calling of ModelList would be straightforward

Do you mean making ModelList a sequence of functions? I do not want to let one define a network by data. This is a core concept of Chainer; networks should be codes, not data.

I don't agree on making Model a subclass of Function. Model might have multiple forwarding methods. There might be a case that one cannot decide which one should be the main predict method. For example, think of writing Variational AutoEncoder as a model. It might have methods that compute the reconstruction loss, the KL loss, the whole loss, the distribution of hidden variables, samples of hidden variables, the distribution of reconstruction, etc. etc. Which one should be the main predict method?

The main difference from models of, e.g., scikit-learn, is that neural networks have much more composability compared to other machine learning models. In order to provide the power of composition, we sometimes have to keep flexibility instead of usability. Above VAE example is such case. We can choose one method as predict, which might lose chances to use it in other ways. OK, we can still provide other methods for this class, then I do not want to call it a function, since it has multiple roles.

I agree that we should rename the class to use as is. I've come up with another name (when writing this comment): Network/NetworkDict/NetworkList.

@tkng
Copy link
Contributor

tkng commented Sep 16, 2015

Ah, Network/NetworkDict/NetworkList looks much better!

I also found that in the design doc, SimpleLinear class does not inherit Function, instead, it uses composition. (I misunderstood that point.) I think it's better than inheriting Function class.

@unnonouno unnonouno added this to the v1.4.0 milestone Sep 16, 2015
@beam2d
Copy link
Member Author

beam2d commented Sep 18, 2015

Now I think the names "ParameterizedObject / ParameterizedDict / ParameterizedList" are most suitable for this design, since it accurately describes what these classes provide. The concept of "models" that I used is renamed to "parameterized objects" in this case, and users can define a hierarchy of parameterized objects.

I've first thought the name "Network" is comparatively a good idea, though now I think it introduces incorrect impression for users that it must define one solid network structure while its interface does not define anything about networking.

@tkng
Copy link
Contributor

tkng commented Sep 20, 2015

Hmm, your concern seems reasonable for me.

Now I'm thinking that the word NetworkDict/NetworkList has a week point, i.e. what the exact meaning of the word is not clear. From this point of view, ParameterizedObject seems better.

However, the meaning of the PatameterizedDict is not obvious for me, because it looks like the dictionary is parameterized. How about ParameterizedObjectDict? Is it too long?

@beam2d
Copy link
Member Author

beam2d commented Sep 21, 2015

Even I think ParameterizedObject is already too long... I'll further search another word.

@choni
Copy link
Contributor

choni commented Sep 23, 2015

tkng wrote

Now I'm thinking that the word NetworkDict/NetworkList has a week point, i.e. what the exact meaning of the word is not clear. From this point of view, ParameterizedObject seems better.

Can we define the word network in Chainer's document to avoid unclarity ?
e.g. "We use the word network for ........."

I think Parameterized * is not suitable. Bacause most objects in computer are parameterized.
The word Parameterized is not wrong, but say nothing.

@beam2d
Copy link
Member Author

beam2d commented Sep 24, 2015

I rather want to use "parameter" a special word in this framework meaning values to be learned, though it does not mean the word Parameterized is suitable. I now think Network is misleading even if we define its meaning somewhere. Maybe a more abstract word with precise definition is required, like "Module" in Torch? (It actually cannot be used for our case since it's confusing in Python context, though)

@beam2d
Copy link
Member Author

beam2d commented Sep 25, 2015

New name candidates: Block, Box, Gear, Modelet. Block is confusing with another framework, though.

@beam2d
Copy link
Member Author

beam2d commented Sep 25, 2015

If these are not acceptable, I will further think about restructure the design.

@jheymann85
Copy link
Contributor

Yet another suggestion taking the framework name into account: Link, as a basic building block of a chain.
Its meaning has to be explained first though. But so do the other suggestions like Block or Gear. But I think once the analogy is clear, its a suitable and short name imho.

@beam2d
Copy link
Member Author

beam2d commented Sep 28, 2015

Link sounds good. Thank you for the suggestion.

@beam2d
Copy link
Member Author

beam2d commented Sep 29, 2015

I'll use the name Link with documentations. Thank you for all (@tkng, @choni, @jheymann85, and core dev team) that help me to get the name!!!

@beam2d
Copy link
Member Author

beam2d commented Oct 8, 2015

Based on the offline discussion by the dev team, I decided to make the link usage more concrete. It will make the concept "link" more understandable. Since it requires more design and implementation effort, I want to defer the release. Sorry for inconvenience. I will upload a new design doc as soon as possible.

@beam2d
Copy link
Member Author

beam2d commented Oct 14, 2015

@tscohen tscohen mentioned this issue Oct 15, 2015
@beam2d
Copy link
Member Author

beam2d commented Oct 16, 2015

I'm starting to implement the new doc. I noticed some part should be fixed, so the doc will be updated a bit during the implementation. Sorry for inconvenience.

@tscohen
Copy link
Contributor

tscohen commented Oct 27, 2015

I think having a purely functional API (stateless Functions) and on top of that a layer of abstraction (Links and Chains) that manage parameters is really nice. However, I would like to also see a way to use parameters in the purely functional API. For example, my colleague just asked how to use a log-variance parameter (which has to be exponentiated and then multiplied by some tensor computed from inputs). I think right now the only way is to implement a Function for this. If we could designate the log-variance as a parameter and then call exp() and mul(), this would be much easier and avoid bloat in the library of supported functions.

Perhaps we can have Variables maintain a list of dependencies (i.e. the inputs from which they where computed). An input to the graph as well as a parameter would have an empty list of dependencies (they are free variables). When backpropagating, the user can specify a wrt (with-respect-to) parameter, which is a list of tensors to compute the gradient with respect to. It could include inputs, internal variables and parameters.

To stay backwards compatible, perhaps we can also have a is_param flag that designates a tensor as a parameter. The default behaviour of backward() when no wrt is given could be to compute gradients with respect to all parameters.

@beam2d
Copy link
Member Author

beam2d commented Oct 29, 2015

As for the situation of your colleague, isn't it enough to write a chain that passes the log-variance to linear function? (I might not grasp what your colleague wants to do.) Or do you want to make existing links be able to use log-variance parameters?

It seems a good direction for the next step to distinguish parameters and non-parameter terminal variables to skip unnecessary backward computations.

@tscohen
Copy link
Contributor

tscohen commented Nov 15, 2015

Yes, we figured we can just use a Parameter Function for this.

In the new design, what is the recommended way to manage constants? I'm thinking of a fixed set of weights and indices used for bilinear interpolation. If I create a Function BilinearInterpolate, can I take the constant weight and index matrices as input to the constructor, and store it as a member? The alternative is to have it as an input to the Function itself (passed to the forward method), but in that case Chainer would expect a gradient for that input as well, which I don't need.

While parameters should not be stored in a Function, perhaps we can make an exception for constants. Or there a reason we shouldn't do this?

@beam2d
Copy link
Member Author

beam2d commented Nov 16, 2015

Fixing them in the initializer is the current recommended way in the new design to manage constants. It is really desirable to support writing Functions with optional gradient computations, though I do not come up with a good way to support this yet. So I agree with you. Think constants as a part of configurations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:feature Implementation that introduces new interfaces.
Projects
None yet
Development

No branches or pull requests

7 participants