Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build fully automated build chain for TensorFlow and all other kernels #74

Open
6 tasks
achimnol opened this issue Apr 3, 2018 · 1 comment
Open
6 tasks
Assignees
Labels
enhancement major Major issue to solve in current milestone.
Milestone

Comments

@achimnol
Copy link
Member

achimnol commented Apr 3, 2018

TensorFlow v1.7 will be the last version that supports our current CUDA 8.0 + cuDNN 6.0 build chain.
From TensorFlow v1.8 we need to upgrade CUDA.

Currently we build the images on a dedicated physical machine, which has a single CUDA version.
For maximum stability and automation, it would be nice to run our builds on spot p2.xlarge/p3.xlarge instances with an appropriate Amazon DeepLearning Base AMI (v4.0 or v6.0).

Let's write scripts to do this.

  • Create a cloud build configuration that supports:
    • Automatically trigger the build process by git pushes to this repository and the kernel runner releases on PyPI
    • Build only modified Dockerfiles but with dependency checks to base images
    • Ability to force-rebuild specific images (manual trigger)
    • Push rebuilt images to the docker hub and designated private docker registries (for enterprise customers)
  • Optional but good to have
    • Save/load tarball'ed docker images for cache heating for docker builds (maybe from/to S3, or utilize EFS) -> comparison test required
    • Automatically run basic code execution tests against newly built images
      • maybe using ansible, pupeet, vagrant, etc. on temporary p2/p3 instances
@achimnol achimnol changed the title Build fully automated build chain for TensorFlow Build fully automated build chain for TensorFlow and all other kernels Sep 11, 2018
@inureyes inureyes added this to the 19.06 milestone Jun 10, 2019
@inureyes inureyes added the major Major issue to solve in current milestone. label Jun 10, 2019
@inureyes
Copy link
Member

@hephaex Please check this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement major Major issue to solve in current milestone.
Projects
None yet
Development

No branches or pull requests

3 participants