What Do We Want and Need from Smoke Testing?
What Do We Want and Need from Smoke Testing?
In the past two years there have been steps both forwards and backwards with respect to smoke-testing of the core distribution. I propose that at the hackathon we discussion what we want, what we need and where do we go from here.
But first, let's review what we used to have and what we have now.
What We Used to Have
Let's start by noting what we used to have but have no longer.
We used to have a Jenkins server, maintained by Dennis Kaarsemaker and presumably run on equipment donated by Booking, which tested the core distribution on (I think) a per-commit basis. IIRC the Jenkins configuration included a Linux run, a threaded Linux run and a Windows run. "Build failed in Jenkins" reports went to the p5p list, followed by "Jenkins build is back to normal" messages.
IMO the best thing about the Jenkins setup was that it quickly caught the most obvious errors and, by reporting them to list, forced us to address them. What are the "most obvious errors"? Those that you the committer would have caught if you had not skipped running
make test_harnessor even
P5P has a strong tendency to ignore Windows and fail to address failing tests thereon. So it is not surprising that the worst thing about the Jenkins setup was the high volume of failure messages generated by the Windows runs.
The last report I can find from Jenkins was generated on Jan 30 2017. Karl queried the absence of Jenkins in early June; Dennis replied, "It was deemed useless and shut down." Who did the deeming is not specified. In July Karl requested the restoration of the Linux run and I seconded the motion. We have heard nothing since.
George Greer's smokers
This is really a special case of what we still have, namely, individually maintained smoke-testing rigs using CPAN distro Test-Smoke. George had one rig set up for Debian Linux and another for Windows. Failure reports were sent to the p5p list. Main problem: the Windows version was "MSWin32 Win2000 SP4" -- too old a version to warrant an investment in time and attention by p5p. The last report I could find was dated Oct 01 2016.
What We Have Right Now
Right now we mainly rely on Test-Smoke-based smoke-testing rigs maintained by a variety of individuals. You can search reports at either develop-help.com or test-smoke.org. Tony Cook and Tux are the stand-out contributors here.
Until recently the overwhelming majority of smoke-test reports came from Linux; the majority still do. As I discussed in my TPC::NA talk in June (How Do We Assess and Maintain the Health of the Perl 5 Codebase? or slides), the risk inherent in this became apparent in 2016 once I set up VMs on my laptop running two versions of FreeBSD. New code in blead that had been testing perfectly well on Linux for five months experienced difficult-to-correct failures on FreeBSD.
In that TPC talk I encouraged people to set up smoke-testing rigs on a wider variety of operating systems. Much to my surprise, some people actually did just that! I particularly want to cite the efforts of Carlos Guevara, who now submits reports for three different versions of FreeBSD, three different versions of Linux, NetBSD and OpenBSD.
What We Lack Right Now
While the smoke-testing situation is better than it was a few months ago, it still has a number of obvious weaknesses.
Little testing of branches
While blead is well exercised by our smoke-testing cohort, smoke-testing of branches is very scanty. AFAICT, the only automated smoke-testing of
smoke-mebranches is a set of two rigs run by Tony Cook on Darwin and Solaris. (I sometimes do manually initiated smoke-testing runs of branches in my FreeBSD VMs.) So suppose someone wants to do the kind of core hacking that ought to be done in a branch or series of branches (e.g., Karl's work on locales; John P Linderman's current work on sort). That core hacker has to explicitly beg for testing on platforms other than what he or she has available.
Little capacity to follow up FAILs on non-Linux platforms
It's good that we are now steadily receiving smoke-testing reports on the various BSDs, AIX, and so forth. But we do not and cannot expect the individuals running those rigs to be the "Perl maintainers" on those platforms or to have the time and energy to fix test failures thereon. We're not adequately doing the job which the name Perl 5 Porters implies.
Reliance on the kindness of strangers and friends
AFAIK we have no organizational (companies using Perl; Perl Foundation) support for any smoke-testing of Perl 5. When "life happens" to any of the individuals doing smoke-testing, we usually lose a resource with no forewarning or explanation.
What Do We Want?
To be discussed.
What Do We Need?
Well, what I, at least, think we need is a way to address the three bullet-points under "What We Lack Right Now" above.
Where Do We Go From Here?
To be discussed.
SUMMARY OF DISCUSSION
This topic was discussed at the hackathon on Saturday, October 14.
ACTION ITEM: Do we want to restore Jenkins? Albeit, perhaps, only with Linux non-threaded and threaded build?
RECOMMENDATION: It was agreed that we need a testing rig which functions as an early warning system for build or test failures which a committer failed to detect before pushing. This rig should be very simple -- probably just Linux non-threaded and threaded builds. We will first explore a Travis configuration which performs these build and reports failures to perl5-porters mailing list.
ACTION ITEM: Do we want to get more automated smoke testing of branches (other than blead)?
RECOMMENDATION: Yes, we do want to get more automated smoke testing of non-blead branches. We will first explore whether this can be done with a Travis configuration.
ACTION ITEM: Do we want to discourage sending smoke reports to p5p list?
RECOMMENDATION: Yes. Jim Keenan will prepare a message, to be reviewed by Tux, which will be sent to those smoke-testers sending reports to perl5-porters. That message will recommend that the testers upgrade to the latest version of Test::Smoke and unset the configuration option which directs mail to perl5-porters.
ACTION ITEM: What priority do we attach to getting smoke reports from Darwin and Windows?
RECOMMENDATION: Due to a shortage of human resources skilled in addressing problems on these operating systems, we haven't been able to respond to the very few reports we actually receive. Mauke reported that he has tried to perform smoke testing with a Strawberry Perl but has experienced build failures. Since the Strawberry Perl team evidently have had success in this area, we will first attempt to communicate with them to get a successful smoke rig on that implementation.
ACTION ITEM: Do we want an "offical" BBC testing platform? All of CPAN? Prioritize top distros from CPAN River?
RECOMMENDATION: It was felt that Perl 5 Porters should explore additional ways to obtain BBC failure data and respond thereto. It was suggested that we see whether we can explore the data which CPANtesters.org collects on tests of CPAN libraries against the monthly development releases of Perl and leverage our efforts upon that data. Jim Keenan has initiated that exploration with a message to the cpan-testers-discuss mailing list (see https://www.nntp.perl.org/group/perl.cpan.testers.discuss/2017/10/msg4172.html).