-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel builds from CircleCI don't aggregate correctly #1341
Comments
Will this be looked at? It's blocking most ppl that are using parallel runs in CircleCI i believe. @afinetooth |
@kelvintyb this is being looked at. Team is aware, and I will try to reproduce to gain further insight. No ETA yet, but will feed back asap. |
Hi @kelvintyb and @danadaldos, could you please post your If you'd prefer not to post here publicly, you could email it to us at support@coveralls.io |
These are my current, non-Coveralls.io using setup files. They are currently NOT CONFIGURED TO USE COVERALLS. See the steps listed below for the changes that I made in order to configure Coveralls within this setup. circle/config.yml: test script: They have changed somewhat since I posted this issue 8 months ago. Namely, now we are using CircleCI workflows, and when we made that change and instituted parallelization on CircleCI, I tried reinstating Coveralls.io (this was about a month ago). I made the following changes: Configuration Steps
I have confirmed via SSH into Circle builds that this is set correctly in the environment.
I have tried calling this in various ways and from various places, including a separate job in CircleCI that posts the correct I am remiss to post any of the specific |
@danadaldos I believe this may be the issue:
Because the defp get_job_id do
# When using workflows, each job has a separate `CIRCLE_BUILD_NUM`, so this needs to be used as the Job ID and not
# the Job Number.
System.get_env("CIRCLE_BUILD_NUM")
end https://github.com/parroty/excoveralls/blob/master/lib/excoveralls/circle.ex#L65 It's expecting to be run within a workflow so that this is unique per parallel job. Here's how our Ruby library handles it: config[:service_job_number] = ENV['CIRCLE_NODE_INDEX'] https://github.com/lemurheavy/coveralls-ruby/blob/master/lib/coveralls/configuration.rb#L66 Otherwise, Coveralls thinks that it's a duplicate job since the Build number and Job Id match, so it removes them. Which is why you're seeing multiple at first in our UI, then they're removed shortly after. The Elixir lib may need to be updated to support this non-workflow based Circle parallel setup to use E.g.: defp get_job_id do
"#{System.get_env("CIRCLE_BUILD_NUM")}-#{System.get_env("CIRCLE_NODE_INDEX")}"
end |
@nickmerwin Yes, please see my most recent comment. I updated the information. We are using workflows now and I am sending |
Thanks @danadaldos can you link me to your most recent test build using the new workflows setup? |
@nickmerwin Here is our most recent build on Coveralls.io: https://coveralls.io/builds/29361557 To be clear, this build was set up exactly as I mentioned: |
Thanks @danadaldos, could you SSH into a build and confirm that It appears that |
@nickmerwin Yeah, it will take me a minute to get things reconfigured for Coveralls. |
@nickmerwin I'm sorry, I was giving you wrong information. I actually did get this to build correctly. The issue that I'm now having is that in order for this to build correctly, it's taking 30 minutes to do so. Our test suite on CircleCI finishes after 5 minutes normally. Here is a link to a build around the same time that built correctly with all containers: https://coveralls.io/builds/29358320 Any insight as to why it takes so long? I have a build currently running on CircleCI that I will link to once it reports to Coveralls. |
And just to be thorough, My working setup is:
These are the only changes between a 5 minute build and a 30 minute build with Coveralls. |
Most recent job finished after 40 minutes: https://coveralls.io/builds/29978345 Here's a screenshot of our workflows for comparison's sake: |
@danadaldos I checked the calculation time for that build on our side and it was only 2.2 seconds after the webhook came in. I suspect Circle may have queued up the webhook for those 40 minutes. Perhaps you could add another webhook receiver like https://requestbin.com to confirm the delay. We keep metrics on how quickly our background processors dequeues jobs here: Since the original issue of parallel coverage run merging is resolved, I'm closing the issue for now, but will monitor the thread for any other questions that arise. Thank you! |
@nickmerwin Thank you for following-up on this. It's extremely helpful to know how long it took on your end, and now I can take that information to CircleCI to see what's up. Thank you again! Update - I did try sending the webhook to Requestbin.com and there was no delay. The CI job finished in ~5 minutes and Requestbin received the message right when it finished. Finally, if I still have your ear @nickmerwin, please please address the documentation found here: https://docs.coveralls.io/parallel-build-webhook |
Context:
As I understand it, from looking at documentation and replies to other issues, Coveralls.io has three requirements in order to correctly record and aggregate parallel builds (please correct me if I'm wrong):
JSON data for submitted jobs needs to have the
parallel: true
set either via the ENV varCOVERALLS_PARALLEL=true
ormix coveralls.circle --parallel
(they accomplish the same thing). This suspends the final analysis until the webhook arrives.Incoming jobs must share the same
service_number
. As long as they are coming in marked "parallel", Coveralls designates these with the shared build ID and a decimal showing its place (i.e.21989.3
,21989.4
).A post to the webhook with the Repo Token in order to signal that the build is finished. Docs recommended:
https://coveralls.io/webhook repo_token=$COVERALLS_REPO_TOKEN
, other sources recommended addinghttps://coveralls.io/webhook?repo_token=$COVERALLS_REPO_TOKEN -d "payload[build_num]=$BUILD_NUMBER&payload[status]=done"
to explicitly send the status and the build number.Description:
I have parallelization on CircleCI building correctly with 4 containers and reporting to Coveralls.io via the
excoveralls
library.I have set the
--parallel
flag when excoveralls runs which correctly adds theparallel: true
param to the JSON (see below). I also have theCOVERALLS_PARALLEL=true
set in various places just to be sure.As the build runs, I see jobs reporting with the expected
22043.1
,22043.2
designations, but then the jobs are replaced with the later job22043.3
, and finally22043.4
, which is the final job and the one that remains on the build. The results do not aggregate correctly and we see a massive drop in coverage overmaster
. Each container on CircleCI ends withSuccessfully uploaded the report to 'https://coveralls.io'.
.A sanity check with 1 container (
parallelization: 1
) on CircleCI showed that the splitting and building was working correctly @ ~93% coverage: https://coveralls.io/builds/25348343I have tried a number of different webhook calls, including the documented:
As well as one that explicitly includes the
done
status. Notice that[build_num]
has been replaced with[service_number]
, I have tried both ways. :Neither this, nor manual calls to
curl -k https://coveralls.io/webhook...
from the terminal have caused the resulting build to work correctly. Posting to either manually from the terminal gave a response of{"done":true}%
, which tells me that it's working correctly (other variations resulted in errors).Note: I am not using workflows, the
"service_job_id"
and the"service_number"
in the JSON payload are the same number, namely the$CIRCLE_BUILD_NUMBER
(see JSON below).JSON:
Screenshots:
Build
22043
showing first two jobs:Same build showing only the final job and skewed results:
Related Issues:
#1191
#1178
#1093
The text was updated successfully, but these errors were encountered: