Trim the build for Play! 2 applications #48

Merged
merged 5 commits into from Oct 10, 2013

Conversation

Projects
None yet
5 participants
@jeantil
Contributor

jeantil commented Aug 15, 2013

Drop the following from the slug

  • ${BUILD_DIR}/project/boot
  • ${BUILD_DIR}/.jdk
  • ${BUILD_DIR}/target/scala-*
  • ${BUILD_DIR}/target/streams
  • ${BUILD_DIR}/target/resolution-cache

This reduced my slug size from 142MB to 39MB. By comparison, sbt dist creates a 34MB zip file.

@technomancy

This comment has been minimized.

Show comment
Hide comment
@technomancy

technomancy Aug 17, 2013

Removing the JDK means your app will depend on the stack JDK, which does not get upgraded very frequently. Keeping it in your slug is recommended in order to stay on top of security issues.

Removing the JDK means your app will depend on the stack JDK, which does not get upgraded very frequently. Keeping it in your slug is recommended in order to stay on top of security issues.

@jeantil

This comment has been minimized.

Show comment
Hide comment
@jeantil

jeantil Aug 19, 2013

Contributor

Hmmm, it sounds weird that the stack is not as secure as it can be by default ... and if we must keep the jdk/jre in the slug could it somehow not be counted against the slug size limit ?

I made this pull request because I have a very small Play! 2 app (sbt dist gives 35MB) reaching 142MB slug size (see http://blog.byjean.eu/2013/08/15/bring-your-play2-slug-size/ for details) which seems insane. 87c80b0 will reduce that size by 55MB which leaves 77MB of jdk in the slug.Even if the slug size has been increased to 300MB (I didn't even know that I was convinced it was stil 200MB) it seems like a big waste.

Contributor

jeantil commented Aug 19, 2013

Hmmm, it sounds weird that the stack is not as secure as it can be by default ... and if we must keep the jdk/jre in the slug could it somehow not be counted against the slug size limit ?

I made this pull request because I have a very small Play! 2 app (sbt dist gives 35MB) reaching 142MB slug size (see http://blog.byjean.eu/2013/08/15/bring-your-play2-slug-size/ for details) which seems insane. 87c80b0 will reduce that size by 55MB which leaves 77MB of jdk in the slug.Even if the slug size has been increased to 300MB (I didn't even know that I was convinced it was stil 200MB) it seems like a big waste.

@ryanbrainard

This comment has been minimized.

Show comment
Hide comment
@ryanbrainard

ryanbrainard Aug 19, 2013

Contributor

Beyond security, the other reason the JDK is vendored directly into the slug is to allow users to choose their own major JDK version:

https://devcenter.heroku.com/articles/scala#optionally-choose-a-jdk

If an app is relying on the JDK in the stack, it could change and break the app.

Contributor

ryanbrainard commented Aug 19, 2013

Beyond security, the other reason the JDK is vendored directly into the slug is to allow users to choose their own major JDK version:

https://devcenter.heroku.com/articles/scala#optionally-choose-a-jdk

If an app is relying on the JDK in the stack, it could change and break the app.

@jeantil

This comment has been minimized.

Show comment
Hide comment
@jeantil

jeantil Aug 19, 2013

Contributor

I have reworked the commits of the pull request to leave the JDK in the slug. That means that the minimum slug size for a JVM based application is 77MB (the size of the jdk)

Contributor

jeantil commented Aug 19, 2013

I have reworked the commits of the pull request to leave the JDK in the slug. That means that the minimum slug size for a JVM based application is 77MB (the size of the jdk)

+if is_play $BUILD_DIR ; then
+ if [ -d $SBT_USER_HOME/.ivy2 ]; then
+ echo "-----> Dropping ivy cache from the slug"
+ rm -rf $SBT_USER_HOME/.ivy2

This comment has been minimized.

@jsuereth

jsuereth Sep 28, 2013

+1 from me for this.

@jsuereth

jsuereth Sep 28, 2013

+1 from me for this.

@jsuereth

This comment has been minimized.

Show comment
Hide comment
@jsuereth

jsuereth Sep 28, 2013

This looks great to me. I think in Play 2.2 (and maybe prior?) You should be able to remove everything in target except the "stage" directories (wherever they land), which means you should only have Play + JDK as the sources of size for your slugs. For now, I think you've nailed the biggest disk-consuming directories.

Thanks for the efforts! Helps all of us using Play on heroku :)

This looks great to me. I think in Play 2.2 (and maybe prior?) You should be able to remove everything in target except the "stage" directories (wherever they land), which means you should only have Play + JDK as the sources of size for your slugs. For now, I think you've nailed the biggest disk-consuming directories.

Thanks for the efforts! Helps all of us using Play on heroku :)

@joost-de-vries

This comment has been minimized.

Show comment
Hide comment
@joost-de-vries

joost-de-vries Oct 8, 2013

It took me quite a while to work out why my play app didn't fit within the 300MB limit. Turns out that heroku documentation is out of date (.slugignore and .gitignore have no influence). And that this buildpack doesn't support composite projects.
The structure of my play app is
web/app web/conf [..etc] core integration project/Build.scala
Turns out I have to make a dummy
conf/application.conf file. That made my slug size drop from 290MB to 140MB.
This feels like a non obvious and not very elegant solution...
To be honest I was evaluating other PaaS options because fixing this just took up too much time. Thank deity I got it working.

It took me quite a while to work out why my play app didn't fit within the 300MB limit. Turns out that heroku documentation is out of date (.slugignore and .gitignore have no influence). And that this buildpack doesn't support composite projects.
The structure of my play app is
web/app web/conf [..etc] core integration project/Build.scala
Turns out I have to make a dummy
conf/application.conf file. That made my slug size drop from 290MB to 140MB.
This feels like a non obvious and not very elegant solution...
To be honest I was evaluating other PaaS options because fixing this just took up too much time. Thank deity I got it working.

@jsuereth

This comment has been minimized.

Show comment
Hide comment
@jsuereth

jsuereth Oct 8, 2013

I think, if we can convince folks, we can migrate to using the native-packager as a solution for all heroku build-packs using sbt, so the conf/application.conf detection goes away (which, as you say, is only a 70% solution). We can also use this to reduce slug size by dumping the sbt build + source files themselves, keeping only the binaries, similar to what this pack does. I haven't had time to dig into forking my own build-pack with this behavior, but I'm willing to help anyone who would have time for this.

jsuereth commented Oct 8, 2013

I think, if we can convince folks, we can migrate to using the native-packager as a solution for all heroku build-packs using sbt, so the conf/application.conf detection goes away (which, as you say, is only a 70% solution). We can also use this to reduce slug size by dumping the sbt build + source files themselves, keeping only the binaries, similar to what this pack does. I haven't had time to dig into forking my own build-pack with this behavior, but I'm willing to help anyone who would have time for this.

@jeantil

This comment has been minimized.

Show comment
Hide comment
@jeantil

jeantil Oct 8, 2013

Contributor

I am very very interested in an alternative solution. I will start looking by myself but if you have any pointers regarding the native-packager I would love to hear of it !

Contributor

jeantil commented Oct 8, 2013

I am very very interested in an alternative solution. I will start looking by myself but if you have any pointers regarding the native-packager I would love to hear of it !

@jsuereth

This comment has been minimized.

Show comment
Hide comment
@jsuereth

jsuereth Oct 8, 2013

@jeantil I guess, possibly the simplest solution would be to have the stage task generate ".slugignore" or ".slugkeep" files which the build back could use to determine what needs to be removed. These would have to be "found" in the case of multi-project builds. We can attach the generation of such files to the "stage" task, then create a forked build-pack which has knowledge of this. I think that's the laziest setup, but perhaps not the best? WDYT?

jsuereth commented Oct 8, 2013

@jeantil I guess, possibly the simplest solution would be to have the stage task generate ".slugignore" or ".slugkeep" files which the build back could use to determine what needs to be removed. These would have to be "found" in the case of multi-project builds. We can attach the generation of such files to the "stage" task, then create a forked build-pack which has knowledge of this. I think that's the laziest setup, but perhaps not the best? WDYT?

@jeantil

This comment has been minimized.

Show comment
Hide comment
@jeantil

jeantil Oct 8, 2013

Contributor

I don't think that would work. Currently, you push your source code to heroku which will run sbt stage and generate the executable jars.

The .slugignore file causes files to be removed after you push code to Heroku and before the buildpack runs. 

This means you would have to generate the executable jars locally then add that to git and push that to heroku's repo.

Contributor

jeantil commented Oct 8, 2013

I don't think that would work. Currently, you push your source code to heroku which will run sbt stage and generate the executable jars.

The .slugignore file causes files to be removed after you push code to Heroku and before the buildpack runs. 

This means you would have to generate the executable jars locally then add that to git and push that to heroku's repo.

@jsuereth

This comment has been minimized.

Show comment
Hide comment
@jsuereth

jsuereth Oct 8, 2013

Ah, i was thinking we could minimize the slug size after building on heroku. Are you saying that we don't even have room to build on heroku on some projects?

jsuereth commented Oct 8, 2013

Ah, i was thinking we could minimize the slug size after building on heroku. Are you saying that we don't even have room to build on heroku on some projects?

@jeantil

This comment has been minimized.

Show comment
Hide comment
@jeantil

jeantil Oct 8, 2013

Contributor

If I understand the documentation correctly here is a high level order of operations when pushing to heroku :

 git push to heroku
 git post-commit hook on heroku
 process .slugignore file, deleting any listed file 
 run buildpack
   |-> run sbt stage
 zip build directory (and call it slug)
 deploy slug on 

As you can see generating a .slugignore in the stage task would be useless as the slugignore file is not used to actually filter the slug but to delete "big" binary files which are useless for the build (for instance if you store you project's technical specifications, and other documentation in your project's git repository, you may want to remove that right from the start).

It makes sense when you consider web applications written in a dynamic language (such as php or ruby) in which case you don't have a compilation stage and you can use the slugignore to remove files which are not used when running the application (example use case include removing the tests, static assets hosted somewhere else, etc more use cases at http://matthodan.com/2010/08/20/exclude-static-assets-from-heroku-slug.html). Basically checking out the repository is enough to get the executable application.

Following this logic, the "correct" way to package a java/scala app for heroku would be to check-in and push the executable jars to heroku and not the source code.

Contributor

jeantil commented Oct 8, 2013

If I understand the documentation correctly here is a high level order of operations when pushing to heroku :

 git push to heroku
 git post-commit hook on heroku
 process .slugignore file, deleting any listed file 
 run buildpack
   |-> run sbt stage
 zip build directory (and call it slug)
 deploy slug on 

As you can see generating a .slugignore in the stage task would be useless as the slugignore file is not used to actually filter the slug but to delete "big" binary files which are useless for the build (for instance if you store you project's technical specifications, and other documentation in your project's git repository, you may want to remove that right from the start).

It makes sense when you consider web applications written in a dynamic language (such as php or ruby) in which case you don't have a compilation stage and you can use the slugignore to remove files which are not used when running the application (example use case include removing the tests, static assets hosted somewhere else, etc more use cases at http://matthodan.com/2010/08/20/exclude-static-assets-from-heroku-slug.html). Basically checking out the repository is enough to get the executable application.

Following this logic, the "correct" way to package a java/scala app for heroku would be to check-in and push the executable jars to heroku and not the source code.

@jsuereth

This comment has been minimized.

Show comment
Hide comment
@jsuereth

jsuereth Oct 8, 2013

Hmm, perhaps. I was thinking we could, for a sbt-specific build pack, have some kind of file the build would write that the script can read to know how to clean the slug before deploying.

The idea is that we'd use the build tasks to ensure the smallest slug. To do so, we'd have to communicate from sbt (a scala process) to the buildpack (a shell script). Hence, some kind of easily shell-parsable file.

If we need to drop the size of data hitting the heroku build server, we're somewhat limited in options unless we do the "check in binaries" thing.

I guess my question is if we improve the communication between the "stage" task and the buildpack, can we get smaller slugs without having to gunk up git repositories with binaries? I'd like to think we could....

Specifically, this flow:

 git push to heroku
 git post-commit hook on heroku
 process .slugignore file, deleting any listed file (perhaps we can remove src/test from all projects?)
 run buildpack
   |-> run sbt stage.   This outputs .sbtkeep files denoting which files are needed to run your applciations.
   |-> search for .sbtkeep files, delete any file not listed in one .sbtkeep file.
 zip build directory (and call it slug)
 deploy slug on 

I think it's a pretty "light" change, that could result in some good benefit, and may be more general than looking for conf/application.conf. i.e. it'd be good for any sbt project, not just single-project-play applications.

jsuereth commented Oct 8, 2013

Hmm, perhaps. I was thinking we could, for a sbt-specific build pack, have some kind of file the build would write that the script can read to know how to clean the slug before deploying.

The idea is that we'd use the build tasks to ensure the smallest slug. To do so, we'd have to communicate from sbt (a scala process) to the buildpack (a shell script). Hence, some kind of easily shell-parsable file.

If we need to drop the size of data hitting the heroku build server, we're somewhat limited in options unless we do the "check in binaries" thing.

I guess my question is if we improve the communication between the "stage" task and the buildpack, can we get smaller slugs without having to gunk up git repositories with binaries? I'd like to think we could....

Specifically, this flow:

 git push to heroku
 git post-commit hook on heroku
 process .slugignore file, deleting any listed file (perhaps we can remove src/test from all projects?)
 run buildpack
   |-> run sbt stage.   This outputs .sbtkeep files denoting which files are needed to run your applciations.
   |-> search for .sbtkeep files, delete any file not listed in one .sbtkeep file.
 zip build directory (and call it slug)
 deploy slug on 

I think it's a pretty "light" change, that could result in some good benefit, and may be more general than looking for conf/application.conf. i.e. it'd be good for any sbt project, not just single-project-play applications.

@joost-de-vries

This comment has been minimized.

Show comment
Hide comment
@joost-de-vries

joost-de-vries Oct 8, 2013

Sounds like a solution that´s resilient to differing directory layouts and
to differences between play apps and other sbt projects. Good idea imo.

On 8 October 2013 17:20, Josh Suereth notifications@github.com wrote:

Hmm, perhaps. I was thinking we could, for a sbt-specific build pack, have
some kind of file the build would write that the script can read to know
how to clean the slug before deploying.

The idea is that we'd use the build tasks to ensure the smallest slug. To
do so, we'd have to communicate from sbt (a scala process) to the buildpack
(a shell script). Hence, some kind of easily shell-parsable file.

If we need to drop the size of data hitting the heroku build server, we're
somewhat limited in options unless we do the "check in binaries" thing.

I guess my question is if we improve the communication between the "stage"
task and the buildpack, can we get smaller slugs without having to gunk up
git repositories with binaries? I'd like to think we could....

Specifically, this flow:

git push to heroku
git post-commit hook on heroku
process .slugignore file, deleting any listed file (perhaps we can remove src/test from all projects?)
run buildpack
|-> run sbt stage. This outputs .sbtkeep files denoting which files are needed to run your applciations.
|-> search for .sbtkeep files, delete any file not listed in one .sbtkeep file.
zip build directory (and call it slug)
deploy slug on

I think it's a pretty "light" change, that could result in some good
benefit, and may be more general than looking for conf/application.conf.
i.e. it'd be good for any sbt project, not just single-project-play
applications.


Reply to this email directly or view it on GitHubhttps://github.com/heroku/heroku-buildpack-scala/pull/48#issuecomment-25899472
.

Sounds like a solution that´s resilient to differing directory layouts and
to differences between play apps and other sbt projects. Good idea imo.

On 8 October 2013 17:20, Josh Suereth notifications@github.com wrote:

Hmm, perhaps. I was thinking we could, for a sbt-specific build pack, have
some kind of file the build would write that the script can read to know
how to clean the slug before deploying.

The idea is that we'd use the build tasks to ensure the smallest slug. To
do so, we'd have to communicate from sbt (a scala process) to the buildpack
(a shell script). Hence, some kind of easily shell-parsable file.

If we need to drop the size of data hitting the heroku build server, we're
somewhat limited in options unless we do the "check in binaries" thing.

I guess my question is if we improve the communication between the "stage"
task and the buildpack, can we get smaller slugs without having to gunk up
git repositories with binaries? I'd like to think we could....

Specifically, this flow:

git push to heroku
git post-commit hook on heroku
process .slugignore file, deleting any listed file (perhaps we can remove src/test from all projects?)
run buildpack
|-> run sbt stage. This outputs .sbtkeep files denoting which files are needed to run your applciations.
|-> search for .sbtkeep files, delete any file not listed in one .sbtkeep file.
zip build directory (and call it slug)
deploy slug on

I think it's a pretty "light" change, that could result in some good
benefit, and may be more general than looking for conf/application.conf.
i.e. it'd be good for any sbt project, not just single-project-play
applications.


Reply to this email directly or view it on GitHubhttps://github.com/heroku/heroku-buildpack-scala/pull/48#issuecomment-25899472
.

@jeantil

This comment has been minimized.

Show comment
Hide comment
@jeantil

jeantil Oct 8, 2013

Contributor

It sounds complex with regards to the problem we are trying to solve.

In play apps we have the dist command which packages everything needed to run the app in a single file. I don't know how standard that is but it seems like a cleaner way to address the problem. If it can be generalized to every SBT build (and I don't see why not) wouldn't that be better ?

Then the buildpack would consist of

Run sbt dist
Remove everything except .jdk and dist artefact
Unpack dist artefact

Contributor

jeantil commented Oct 8, 2013

It sounds complex with regards to the problem we are trying to solve.

In play apps we have the dist command which packages everything needed to run the app in a single file. I don't know how standard that is but it seems like a cleaner way to address the problem. If it can be generalized to every SBT build (and I don't see why not) wouldn't that be better ?

Then the buildpack would consist of

Run sbt dist
Remove everything except .jdk and dist artefact
Unpack dist artefact

+ fi
+ if [ -d $BUILD_DIR/target ] ; then
+ echo "-----> Dropping compilation artifacts from the slug"
+ rm -rf $BUILD_DIR/target/scala-*

This comment has been minimized.

@ryanbrainard

ryanbrainard Oct 8, 2013

Contributor

Are we sure this won't break anything? Will nothing at runtime depend on these scala versions?

@ryanbrainard

ryanbrainard Oct 8, 2013

Contributor

Are we sure this won't break anything? Will nothing at runtime depend on these scala versions?

This comment has been minimized.

@jeantil

jeantil Oct 9, 2013

Contributor

I have been using this for one of my apps. project/boot and target/scala-* are only intermediary compilation artifacts not the final jar files, at least that's true for simple play 2 apps with the normal structure

@jeantil

jeantil Oct 9, 2013

Contributor

I have been using this for one of my apps. project/boot and target/scala-* are only intermediary compilation artifacts not the final jar files, at least that's true for simple play 2 apps with the normal structure

@ryanbrainard

This comment has been minimized.

Show comment
Hide comment
@ryanbrainard

ryanbrainard Oct 8, 2013

Contributor

I agree it would be good to do this in a way that would work for all SBT apps. If the native packager plugin can do that for us in a way that is backward compatible and removes the special casing for Play, that would be ideal. Does anyone know if that would be possible?

If we instead go with a special file like .sbtkeep (or its inverse .sbtignore), that would work too, but if SBT is supposed to emit that file, it seems like it could just do the clean up itself. Perhaps with plugin that runs after stage. It could maybe part of the HerokuPlugin so that it only runs on Heroku?

With that said, since it seems like Play apps are probably feeling the most pain with slug sizes, this change probably makes sense as is (except the question I have above about dropping $BUILD_DIR/target/scala-* and additional testing) with the idea that we will work toward a more general, long-term solution.

Contributor

ryanbrainard commented Oct 8, 2013

I agree it would be good to do this in a way that would work for all SBT apps. If the native packager plugin can do that for us in a way that is backward compatible and removes the special casing for Play, that would be ideal. Does anyone know if that would be possible?

If we instead go with a special file like .sbtkeep (or its inverse .sbtignore), that would work too, but if SBT is supposed to emit that file, it seems like it could just do the clean up itself. Perhaps with plugin that runs after stage. It could maybe part of the HerokuPlugin so that it only runs on Heroku?

With that said, since it seems like Play apps are probably feeling the most pain with slug sizes, this change probably makes sense as is (except the question I have above about dropping $BUILD_DIR/target/scala-* and additional testing) with the idea that we will work toward a more general, long-term solution.

@ryanbrainard ryanbrainard merged commit bb9d0c4 into heroku:master Oct 10, 2013

@ryanbrainard

This comment has been minimized.

Show comment
Hide comment
@ryanbrainard

ryanbrainard Oct 10, 2013

Contributor

I did additional testing on this and it looks good. There do seem to be some additional dirs that can be dropped as well to slim things even more, but since we got the majority of the big ones and we're planning to explore the native packager plugin going forward, I pulled this in and deployed it to Heroku. Thanks again and sorry for the delay.

Contributor

ryanbrainard commented Oct 10, 2013

I did additional testing on this and it looks good. There do seem to be some additional dirs that can be dropped as well to slim things even more, but since we got the majority of the big ones and we're planning to explore the native packager plugin going forward, I pulled this in and deployed it to Heroku. Thanks again and sorry for the delay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment