Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Native] Reduce CI time for native code #18627

Open
majetideepak opened this issue Nov 4, 2022 · 16 comments
Open

[Native] Reduce CI time for native code #18627

majetideepak opened this issue Nov 4, 2022 · 16 comments
Assignees
Labels
prestissimo Presto Native Execution

Comments

@majetideepak
Copy link
Collaborator

majetideepak commented Nov 4, 2022

A change, say to a build script should not result in a total build.
However, that is the case today. This needs an investigation.

https://github.com/prestodb/presto/actions/runs/3386920136/jobs/5628225267

Documentation on cache https://github.com/marketplace/actions/cache

@majetideepak majetideepak added the prestissimo Presto Native Execution label Nov 4, 2022
@mshang816
Copy link
Contributor

@majetideepak
Did you mean we don't have to run "Maven install" every time ?
We need to rebuild the whole maven project everytime or we may run the tests using older presto java code.
I'm not sure if we have a way to avoid such run when the changes are not on java code and maven build files.

@mshang816
Copy link
Contributor

Ah, you are taking about ccache. Currently I reset the cache key daily, Is it because it's the first such build for today ?

@mshang816
Copy link
Contributor

mshang816 commented Nov 4, 2022

I checked the test run. It looks like we did have a github action cache hit:

usr/bin/docker exec  1ccee4dd853e172e616ac5efe483eec6f6cdd24cfc5c7db5ea3a6608a76860fa sh -c "cat /etc/*release | grep ^ID"
Received 58720256 of 64406778 (91.2%), 55.9 MBs/sec
Received 64406778 of 64406778 (100.0%), 52.5 MBs/sec
Cache Size: ~61 MB (64406778 B)
/usr/bin/tar -z -xf /__w/_temp/02a5fb26-1ff9-4b69-8d32-83bfac57511a/cache.tgz -P -C /__w/presto/presto
Cache restored successfully
Cache restored from key: Linux-presto-native-ccache-[20](https://github.com/prestodb/presto/actions/runs/3386920136/jobs/5628225267#step:6:21)2[21](https://github.com/prestodb/presto/actions/runs/3386920136/jobs/5628225267#step:6:22)103

but it seems that the cache is not warm so the ccache hit rate is zero:

cache directory                     /github/home/.ccache
primary config                      /github/home/.ccache/ccache.conf
secondary config      (readonly)    /etc/ccache.conf
stats updated                       Thu Nov  3 17:13:04 2022
cache hit (direct)                     0
cache hit (preprocessed)               0
cache miss                          1136
cache hit rate                      0.00 %
cleanups performed                     0
files in cache                      3370
cache size                           4.0 GB
max cache size                       5.4 GB

I think we should look into that

@majetideepak
Copy link
Collaborator Author

@mshang816 I agree. If we can resolve this, we can significantly reduce the test native CI job duration.

@majetideepak
Copy link
Collaborator Author

majetideepak commented May 9, 2023

test-native total time is very long now (~3 hours) and is impacting productivity. A significant portion of this is build time.
https://github.com/prestodb/presto/actions/workflows/test-native.yml

In the build, the cache is only effective for the same Pull Request(PR) and not across PRs.
Cache miss build time is ~80 minutes vs ~16 minutes with a cache hit.

Looking at the documentation, https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#example-of-search-priority
the search priority is the PR (feature branch) followed by the default branch (master).

If we enable test-native CI runs on master, we might be able to improve the build times.

@mbasmanova who can help with this? Thanks.

@mbasmanova
Copy link
Contributor

@majetideepak This is indeed a serious problem that needs attention.

@amitkdutta @mshang816 Amit, Michael, would you help take a look?

@majetideepak Another option is to switch to using CircleCI which has beefier machines which allow for multi-threaded build. This is something TSC may need to decide on.

CC: @tdcmeehan @arunthirupathi

@tdcmeehan
Copy link
Contributor

@mbasmanova Last I checked, @mshang816 was working with Meta OSS team to have Presto Foundation use Meta's larger account to run CircleCI tests for this reason. @mshang816, please let me know if you have run into any difficulty with this.

@mbasmanova
Copy link
Contributor

CC: @pedroerp

@mbasmanova mbasmanova changed the title [Native] ccache seems to be not helping with build [Native] Reduce CI time for native code May 9, 2023
@majetideepak
Copy link
Collaborator Author

In the TSC meeting, it was decided that the test-native will be moved to CircleCI. I can take a stab at this.

@mbasmanova
Copy link
Contributor

@majetideepak That's great news. Deepak, do you have a timeline for this?

@majetideepak
Copy link
Collaborator Author

@mbasmanova I don't yet. This space is new to me. If anyone from Meta can give some pointers, it will be very helpful.
But this is on top of my list. I will investigate this tomorrow and update you here.

@mbasmanova
Copy link
Contributor

@mshang816 Michael, can you help provide some guidance for Deepak?

@majetideepak
Copy link
Collaborator Author

majetideepak commented May 12, 2023

@mbasmanova, @tdcmeehan, @pedroerp A couple of posts on CircleCI seem to say that enabling jobs must be done by the account owner of CircleCI. So Meta OSS team must enable this.
There is already a .circleci folder with some job descriptions in the Presto project. Once CI is enabled, I can port test-native workflow to CI.

@pedroerp
Copy link
Contributor

@mshang816 @amitkdutta are asking our internal team to get the right credentials.

@mshang816
Copy link
Contributor

@mbasmanova, @tdcmeehan, @pedroerp A couple of posts on CircleCI seem to say that enabling jobs must be done by the account owner of CircleCI. So Meta OSS team must enable this. There is already a .circleci folder with some job descriptions in the Presto project. Once CI is enabled, I can port test-native workflow to CI.

@majetideepak It's a long story. I created the circle ci runs last year but we didn't have enough credits for our daily runs so I believe @tdcmeehan turned off the circle ci runs. And a couple weeks ago, we finally are able to be added to Meta's enterprise account which gives us way more credits to burn so yes it's time to migrate github actions to circle ci again.
Actually I had a new partner introduced to me who will work on the migration piece from next week. I've also ask our internal contact on how we can enable circle ci again. But I think the migration itself is not that hard but we also need to optimize the process to make it run as fast as it can. Additionally, we also need to enable the MacOS runs that we disabled in github action due to some problems. So let me finish the first round of migration then we can brainstorm if we need further optimization.

c.c. @ericyuliu @amitkdutta @v-jizhang @pedroerp

@majetideepak
Copy link
Collaborator Author

@mshang816 thanks for the update! Let me know if I can help in any way.

@majetideepak majetideepak removed their assignment May 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
prestissimo Presto Native Execution
Projects
None yet
Development

No branches or pull requests

5 participants