-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reuse Build Workspace in PR based Pipeline #66
Comments
I'm assuming this will be public read access for contributors. We should also include this access as part of the cost. We may also need to setup monitoring to avoid unexpected costs with high S3 transfers. |
Agreed, the transfer cost to internet is much higher than to any AWS AZ, we should limit the artifacts size that can be downloaded by public. Probably require an approval before they can download the workspace to public. |
What's the additional build time added to the AR run using S3SIS to download artifacts. Or the avg download time to download the entire workspace.
Just curious, does this mean that only the first build will use the S3SIS tool? Will other builds skip this step? |
You mentioned in the meeting we can disable this with a jenkins parameter since S3SIS won't help much. I think this is fine for the initial version. Later on I think add a step in the pipeline to determine the files that changed and disable/enable s3sis based on that. |
Yeah, only the build first will use S3SIS to sync workspace, the following builds will reuse artifacts that are already built in first build |
Agreed, we should investigate automation for this |
Adding a note for a discussion point made regarding developer access to the artifacts uploaded by S3SIS. This will be a feature implemented later on so we have time to consider and review the impact of making read access public for O3DE developers. |
We should probably add some data on this solution vs other build caching solutions (ccache, sccache, etc.) |
May want to avoid enabling this for non-Profile builds, due to the significant increase in size from artifacts such as PDBs. This seems adequately covered by the staged rollout plan. Recommend the SIS upload is performed before any tests execute, as tests (like any other tool) can corrupt the workspace. Definitely should persist after build, and perhaps processed assets should also be included? Is this going to persist everything in the entire drive, just the github repo, or only specific artifact folders within the repo? The latter few would reduce size, but may also require more steps to synchronize. May also need to make sure that nothing containing cached AWS Instance or Jenkins secrets gets picked up. What is the proposed security on the S3 bucket(s) which hold the SIS resources? |
Sure, will add it |
We can specify which folders to include/exclude for download/upload. In my testing, I included the entire workspace because we need the .git folder to checkout the commit, and we also need to preserve the timestamps of all source codes and build outputs. There will be no credentials stored on build nodes since S3SIS will use build node's IAM role. In the first phase, we will restrict the S3 access to build node only. "Download build workspace to local machine" will be a future improvement and we will setup a cloudfront endpoint and come up with an approval process for it. |
Related to test files left in the workspace. When enabling TIAF in the AR pipeline, they discovered an issue where tests would leave behind files that caused the next build to fail. One of those tests was fixed here: o3de/o3de#13049 This was mainly related to temp files being generated in the wrong locations so they were not cleaned up. One solution brought up to prevent this was to check for temp files in the workspace (files not committed to the repo and not build artifacts) prior to the run. This may be something to investigate as a check prior to uploading in a later version. |
Reuse Build Workspace in PR based Pipeline
Summary:
Each pull request has to pass PR based build pipeline in order to be merged. This proposal is to reduce PR based build time by reusing more relevant build workspace from previous builds, and thus increase O3DE contributors' work efficiency.
Problem try to resolve
O3DE Jenkins build workspace is located on EBS volume created from daily EBS snapshot. Currently, more than half of O3DE PR based builds take over 3 hours to finish, this is mainly because:
What is the relevance of this feature?
This is important because it reduces O3DE contributors' waiting time to merge pull requests and enables the following:
Feature design description:
S3SIS (S3 Single Instance Storage) will be used to share build workspace between different build pipelines. It reduces the file transfer time and cost by only transferring the delta files. In addition, files with same content hash are deduplicated using S3SIS, this significantly saves S3 storage cost.
After each non-PR based AR build is done, the build workspace will be uploaded to S3 using S3SIS. PR based build will download and reuse the build workspace.
Use cases:
Technical design description:
Install S3SIS CLI on Build Image
Due to the issue that CLI tool installed using O3DE Python cannot be executed, S3SIS CLI needs to be installed with system installed Python on build nodes.
Steps to install S3SIS CLI:
git clone https://github.com/aws-lumberyard/s3sis.git
or download the source code.python setup.py install" to install S3SIS CLI
.s3siscli configure" to configure S3SIS CLI
.S3SIS CLI should be part of the build image, so it doesn't need to be installed and configured in every build.
Build Workspace Label
S3SIS requires a label to upload and download, the label is used to group a set of file objects on S3.
In order to make Jenkins build be able to find the correct workspace to sync, the label needs to include information about pipeline name, platform, build configuration and commit id.
Label format used to upload/download workspace using S3SIS:
{pipeline_name}{platform}{build_configuration}_{commit_id}
S3SIS Manifest
S3SIS manifest file stores necessary file information, and it's the main file to lookup when doing upload or download.
Upload Workspace
To reduce the number of workspace uploaded, workspace is uploaded by non-PR based AR builds on success, like https://jenkins.build.o3de.org/job/O3DE/job/development/. The upload won't increase any PR build time.
If the build has multiple commits, s3siscli upload should run for each commit id. After first upload, all files are already on S3, the following upload command will upload nothing but a manifest file.
Upload workflow:
currentBuild.changeSets
to find commit ids built in current build.s3siscli upload --label {pipeline_name}_{platform}_{build_configuration}_{commit_id}
for each commit id.Download Workspace
Most build systems use timestamp to decide when to rebuild. In order to reuse the build workspace, all files' timestamps and attributes should be preserved during the download process.
Ninja is used for Linux build, it tracks file timestamp at nanoseconds level. Ninja will rebuild a file whenever the file timestamp changed at nanoseconds level. In this case, all files' timestamps should be preserved at nanoseconds level.
Only first PR based build will sync workspace from the closest commit id, all following builds will reuse workspace from previous builds.
Download workflow:
{pipeline_name}_{platform}_{build_configuration}_{parent_commit_id}
.--label {label} --preserve-timestamp --preserve-attributes --preserve-empty-folders --cleanup
to syn workspace.File Cleanup
Stale files should be cleanup regularly to avoid high S3 storage cost.
Each file on S3 is linked to one or multiple S3SIS manifest file, and each manifest file is linked to a commit id. A manifest is considered to be stale if the commit is over 1 month old.
Use a DynamoDB table to keep track of the number of manifest that each file is linked to. Set object hash as primary key to reduce lookup time. For example:
Run a daily lambda function to lookup manifest files, update the table and delete objects with 0 ref_count. Run a weekly lambda function to make sure S3 object, manifest files and DynamoDB table are synced.
Daily Cleanup Lambda Workflow:
Cost
The cost increased is mainly from S3 storage and S3 transfer. However, it reduces EC2 cost when build takes less time.
To reduce cost in the first place, only enable this feature for bottleneck build (Linux) since it directly impact the overall pipeline time. This feature can be rolled out gradually if it produces a good result.
S3 Cost Increased:
Linux build workspace is 180GB. Assume there are 150 AR builds and 200 pull request created per month.
In worst case scenario, all 180GB files are transferred for every build, the S3 cost would be $1341 per month.
Because S3SIS only transfers delta files, the files transferred will be significantly reduced, assume there are 30GB files transferred per build on average (in fact, the size could be less, it transfers 0 files for a zero build). The cost would be $223 per month.
EC2 Cost Reduced:
Linux node type is c4.4xlarge whose price is $0.796 per hour.
Assume 1 hour is saved on average for 1 PR based Linux build, 200 PR based builds would save $159.2.
What are the advantages of the feature?
What are the disadvantages of the feature?
How will this be implemented or integrated into the O3DE environment?
First integrate this with O3DE bottleneck build, like Linux profile_nounity build, because it directly impacts the overall PR based build time. If it produces good result, then gradually roll this out to other builds.
Are there any alternatives to this feature?
Yes, using distributed build or build cache can also reduce overall build time.
How will users learn this feature?
Are there any open questions?
The text was updated successfully, but these errors were encountered: