New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Unbundling Keras modules into separate repositories #9256
Comments
We could also do the same with |
Minor concern : We should define clear boundaries for each repos. Also, will others repos be allowed to add dependencies? And most examples in |
They would just depend on Keras. Preprocessing/etc should still be importable from Keras. This change would be a pure refactoring, where the code is moved to a different place but the APIs don't change.
For now I would suggest Keras contrib. But conceptually a layer would definitely be in Keras, not keras-processing, regardless of what it does. The question is thus whether we want data augmentation layers as part of the core Keras API. |
Okay, all fine then! |
If anybody hasn't begun working on this, I would like to give it a go! |
The two new repositories will need "owners". An owner is someone who has write rights and bears responsibility for reviewing PRs (and merging if they think it should be merged), and to a large extent, for answering issues. I will still give guidance on API choices, but that will be limited to user-facing APIs decisions. @srjoglekar246 which repo are you interested in working on, @taehoonlee would you be interested in becoming owner of the |
@farizrahman4u @ozabluda @ahundt do you have any interest in getting involved too (see comment above)? |
keras-processing sounds good then! |
@fchollet, I'd love to! I wonder if you intend to separate tests for the cores and the applications (as described in "Faster CI runs, better test coverage"). My minor concern is that the application tests will not be triggered whenever the cores are updated if the applications are unbundled. |
Testing applications when changing layers would fall into "integration tests", not unit tests. Issues with the layers should be caught by the layers unit tests. We could add one application-related integration test to fix this. |
OK, I understand. |
@fchollet Does it make sense to start work on this with a new personal repo and then move it all to one owned by keras-team? |
For applications, this would be the other way around right? keras would be a dependency of applications. There is some "preprocessing" code in applications as well, (imagenet_utils). Would that be moved to applications repo or preprocessing repo? What about the examples? Separate repo? Overall I think this is great. I use the numpy-only stuff in projects that do not actually use keras, refactoring it into a seperate repo would be really useful. |
@farizrahman4u I see that you maintain one of Keras' spin-offs. I will probably look over the commits in your repo to get an idea of TODOs :-P |
The split modules need to be importable from Keras, so they would be a dependency of Keras. We're not changing any API. That means that
That's application-specific, so it would stay in the keras-applications repo.
I think the examples are fine as they are, we don't have the same constraints there. They're already well-separated from the codebase itself (and the API), and they're not tested so they don't inflict additional load on CI. They're also not replicated in |
A lot of people still use this function for their preprocessing. |
In terms of timeline, I'll do the split, and it will take place in late March or early April. |
This process can be managed fairly smoothly if you'd like to do it, but I'd like to provide some food for thought based on lessons I've taken from seeing this process unfold in several other widely used projects (boost, homebrew, and ROS are examples). Here are some questions I'd ask and answer:
At this time, the effects of items like "easier to sync with tf.keras" seem murky with respect to users and might benefit from better definition and/or alternative solutions. What if CI problems were solved with backend infrastructure changes that would be invisible to users? Could "owners" simply be assigned keras subdirectories? Everything might be totally worthwhile in the end, I'm just hoping to give perspective on alternative options. The project might also consider taking the following steps, if appropriate. I believe they turned out to be enormously helpful on other projects:
Some of the above can probably come from pip so perhaps it isn't as big deal in this case when compared to others I'm familiar with. Example issues that might come up include:
Hopefully this information is helpful with navigating the process for any approach you decide to take! |
@ahundt thanks for the detailed thoughts. These are all very good points.
The idea is that this is a refactoring change (refactoring to get less code duplication): it makes life easier for devs (which ultimately had indirect user benefits) while not affecting the experience at all for end users. Specifically, we're trying to:
I agree that we need a clear plan to make sure that the user experience is not affected. I think automatic dependency management (pip) solves most issues.
We should push out new releases of Let's say current Keras is 2.1.8 and current
This implies that the bleeding edge version of
Thankfully we're in a situation where this will not happen. If this was a risk we would not do it. In our case What we should do is CI-test the new modules both against the last release of Keras, and against the bleeding edge version of Keras. That way we ensure that changes in bleeding-edge modules don't break any existing users, and that they are ready to be used with bleeding-edge Keras. This guarantees an additional layer of stability. (the reverse is not a problem since the Keras logic does not depend on the new modules, the pip dependency will only be to keep the existing namespace
I think this is solved by having the new modules be compatible with both the bleeding edge and the latest release.
If we have intractable issues, we can always revert to the current state of affairs at any time, since:
|
This sounds like a pretty reasonable plan to me. I'll just mention a couple additional potential pain points to consider based on your comments:
This would definitely mitigate many potential issues. I think to make this happen the following steps might be worthwhile:
I'll mention a couple specific examples of conflicts that showed up later in travis. keras-contrib actually broke quite a few times over the first few months due to internal keras changes for which internal calls were not easily avoidable without duplication. Examples include:
Everything can be resolved if there are cycles available to fix it, I just wanted to give a couple of specific examples of pain point categories that will multiply as the number of repositories increases.
Someone, or many individuals might deploy code in production and not be able to update for one valid reason or another. One option is to have periodic LTS releases are the typical way to meet the needs of groups like that, but it isn't the only way to go. I hope that helps, overall your proposed plan sounds reasonable! Thanks for taking my ideas into consideration. |
I've been delaying doing this because I wanted to think about it for a while and make sure this was the right thing to do. Thanks everyone for the various comments and feedback! Now, let's make a decision. @Dref360 @taehoonlee: should we do this split of We'll make a decision asap based on your answers, and do the split next week if the decision is to do it. |
We should definitely do the split for keras.applications, which would be the "official" keras-zoo. For keras.preprocessing, I think your motivations are valid. And the community will be much more easier to manage with it. I'm kinda worried about the
We should be very cautious with that. Also, we would need CI for multiple keras versions and deprecate the versions overtime. So overall I would like the repo to move foward with the split and we should follow every @ahundt advices. I think all of them are valid concerns. On the ownership, I can take care of keras.preprocessing, I would need some help with keras.preprocessing.sequence since it's not my main area of research. I am open to get some help from other contributors as needed. :) |
Cool, thanks @Dref360! @taehoonlee, any thoughts? |
Also, we have tons of PR waiting for those modules. When the split is done, we would have to go through it. |
@fchollet, It would be nice if we could do the split. Especially, I strongly agree with:
There are no benefits as @ahundt pointed out. However, in my opinion, this is not a problem because the intention of the split is to facilitate management without harming the user experience. Of course, the potential issues may actually occur. I think that the issues are not the reasons why we should not split, but the matters we need to care about in the future. |
And, I'd like to take ownership of the |
keras-contrib currently has a specific, clear example (travis link) of the sort of breakage I mentioned might be a concern in unbundled modules. I believe master moved some files around and removed some legacy code, and now the keras-contrib build doesn't work. I haven't figured out a fix yet, but haven't spent much time on it either. This is just for awareness that this sort of thing would become much more common, and the person who makes the change many not realize or have a chance to fix the issue downstream. This sort of thing also doesn't show up until travis automatically re-runs the dependency, which in the case of keras-contrib is up to a 24 hour delay. |
@ahundt, The first one arose due to a dependency to legacy codes and seems to be already resolved by you. The preprocessing and applications exploit public and stable APIs and do not depend to legacy codes. The second one is orthogonal to the split. It even fails to import keras ( |
@taehoonlee yes, thanks. The purpose of my post was to provide a clear example so tradeoffs are clear in a more practical capacity than vague what-ifs. This is an admittedly very solvable problems that will need to be resolved around |
We'll do it then. There's a technical problem we'll need to solve:
A possible solution is to ship |
For the record, I'm thinking of something like: Contents of try:
import keras_preprocessing # Note: what should we name these modules?
_check_version(keras_preprocessing)
keras_preprocessing.set_keras_module(sys[__name__])
except ImportError:
raise ImportError('...')
Iterator = keras_preprocessing.image.Iterator
# etc... |
Nevermind, we can structure the code in a way such that we can import the submodule and call I don't think it matters, because presumably the module would get used from Keras almost every time ( |
Side note: after consideration I think it would have been a bad idea to package
|
I've implemented this initial plan, but I think we're going to have to go with a different design to make it possible to use |
I've tried a couple different designs, but things are turning out to be incredibly complicated in practice when you start converting Keras to use I'm currently leaning towards a more verbose / advanced design, that would be hopefully safe:
|
Implemented that last design (more or less) and I believe it's safe. One side-effect is that import keras
from keras_preprocessing import image # for instance Or (preferred): from keras import preprocessing # this is literally keras_preprocessing |
The first stage of the split is executed (creating independent modules for @taehoonlee in the process I've run into a NASNet bug -- error wrt incorrect number of layers when loading ImageNet weights. I've verified that the weights files are up to date. |
@fchollet, The error has been resolved in keras-team/keras-applications#1. |
Thanks for the PR, merged! @Dref360 @taehoonlee as you have noticed the repos are created, and you have write rights on them! Here are a few TODOs I haven't done yet (true for both repos), that you can work on if you like:
And please check out and review any PRs that our contributors send out 👍 |
Maybe it is time to release 2.1.7, @fchollet? |
@taehoonlee yes, I will release this week. It will be 2.2.0 due to a large amounts of changes in this release. |
We should consider moving the modules
preprocessing
andapplications
to separate repositories underkeras-team
:keras-team/preprocessing
andkeras-team/applications
.They would be listed as a dependency of
keras
, and would still be importable from e.g.keras.preprocessing
.Why?
keras-team/keras
andtf.keras
(shared dependencies instead of code duplication)Any comments or concerns? Who would be interested in taking greater ownership of these repos, should they materialize?
The text was updated successfully, but these errors were encountered: