Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1153: Some emails still not posted as comments in PR #1214

wants to merge 3 commits into from


Copy link

@erikj79 erikj79 commented Sep 2, 2021

After fixing SKARA-1148, there were still some emails that weren't being posted as comments in PRs. The underlying cause this time was emails that Skara thinks it has sent, but they never were, so they weren't present in the mail archive. Emails sent in response to such missing emails have bad In-Reply-To headers as they point to emails that aren't in the archive. When the Mbox parser tries to piece together conversations, it fails to find the correct parents in these cases.

The reason we need to piece conversations together is to find the original RFR email, which contains the pull request link. This is how each email is associated with the correct pull request.

The fix for this is to also look at the References header as a backup. The References header contains the whole ancestor chain of email IDs, so using that, we can connect conversations regardless of missing emails in between, especially since we only really care about the first email in the conversation. Looking at the pipermail thread view, this is what it must be doing as it would otherwise fail miserably. I also tweaked the loops here to hopefully do a bit less work, which I think is worth it due to the pretty large number of emails being parsed here.

In addition to this small change to Mbox, I'm also tinkering with some other things that are related. These were things I stumbled over and needed to properly debug and test the issue.

As I noted in SKARA-1148 already, the MailingListArchiveReaderBot is doing a lot of redundant work. I blame this on SKARA-843, where this bot was changed from one global instance to running one instance for each configured repository. There were two problems introduced with this change.

  1. Each instance still has all the configured repositories, so every found conversation is evaluated against every repository in every instance. This creates an unnecessary N^2 complexity in number of configured repositories.
  2. Each instance has its own MailingListReader, configured for exactly the set of mailing lists used for the repository. We have a lot of repositories that share the exact same mailing list config. This means that each of these instances will read all the archives for themselves, with no sharing of this data. The reader already does caching, so after the first time around, it's much faster. Still, the first time around we read jdk-updates-dev a pretty large number of times.

My solution for 1 is to only have one single repository in each MailingListArchiveReaderBot. This looks like a simple oversight in the previous patch.

For 2, I make sure to only create one MailingListReader for each unique set of mailing lists. This will not remove all the redundant reads, but it will bring them down significantly. I also think it's the functionally correct solution as we will then only consider email threads on the relevant mailing lists for each repository, so less unnecessary (and potentially bad) cross evaluation between mailing lists and repositories that aren't configured as related.

To get 2 to actually work correctly, I needed to tweak the logic that protects us from running the wrong WorkItems concurrently. To be able to share MailingListReader between multiple WorkItems, we have WorkItem::concurrentWith. Unfortunately, this method was also used to detect duplicate WorkItems in the pending pool of the scheduler. This is correct in most cases, but not here. An ArchiveReaderWorkItem for jdk11u and jdk15u shares MailingListReader, but queueing one should not replace the other. To solve this I added another method WorkItem::replaces for this particular check (with a default method to preserve the current behavior for all other WorkItems).

Finally I also changed ArchiveReaderWorkItem::toString to print the repository name rather than the list of mailing lists. This made a lot more sense to me while debugging as we now create an instance per repository rather than based on mailinglists.


  • Change must not contain extraneous whitespace
  • Change must be properly reviewed


  • SKARA-1153: Some emails still not posted as comments in PR



Using git

Checkout this PR locally:
$ git fetch pull/1214/head:pull/1214
$ git checkout pull/1214

Update a local copy of the PR:
$ git checkout pull/1214
$ git pull pull/1214/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 1214

View PR using the GUI difftool:
$ git pr show -t 1214

Using diff file

Download this PR as a diff file:

Copy link

@bridgekeeper bridgekeeper bot commented Sep 2, 2021

👋 Welcome back erikj! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot changed the title SKARA-1153 1153: Some emails still not posted as comments in PR Sep 2, 2021
@openjdk openjdk bot added the rfr label Sep 2, 2021
Copy link

@mlbridge mlbridge bot commented Sep 2, 2021


Copy link

@kevinrushforth kevinrushforth left a comment

This looks good as far as I can tell. I left a couple questions.

Is this something you can test before deploying?

Copy link

@openjdk openjdk bot commented Sep 2, 2021

@erikj79 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file for details.

🔍 One or more changes in this pull request modifies files in areas of the source code that often require two reviewers. Please consider if this is the case for this pull request, and if so, await a second reviewer to approve this pull request before you integrate it.

🌎 Applicable reviewers for one or more changes in this pull request are spread across multiple different time zones. Please consider waiting with integrating this pull request until it has been out for review for at least 24 hours to give all reviewers a chance to review the pull request.

After integration, the commit message for the final commit will be:

1153: Some emails still not posted as comments in PR

Reviewed-by: kcr

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Sep 2, 2021
Copy link
Member Author

@erikj79 erikj79 commented Sep 3, 2021

Is this something you can test before deploying?

With a local edit of the BotRunner, I have manually tested the MailingListArchiveReaderBot and ArchiveReaderWorkItem by running them with the full production configuration, but prevented any other WorkItems from actually running. I've then inspected the set of CommentPosterWorkItems that get scheduled.

Copy link
Member Author

@erikj79 erikj79 commented Sep 3, 2021

Thanks Kevin!

Copy link

@openjdk openjdk bot commented Sep 3, 2021

Going to push as commit 3293f7a.

@openjdk openjdk bot closed this Sep 3, 2021
@openjdk openjdk bot added integrated and removed ready labels Sep 3, 2021
Copy link

@openjdk openjdk bot commented Sep 3, 2021

@erikj79 Pushed as commit 3293f7a.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@openjdk openjdk bot removed the rfr label Sep 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants