|
| 1 | +--- |
| 2 | +title: Continuous Integration Log Explorer |
| 3 | +--- |
| 4 | + |
| 5 | +## Goals |
| 6 | + |
| 7 | +Create a web-based tool that can be used to explore continuous integration test |
| 8 | +logs suitable for large projects with big workflows that are susceptible to rare |
| 9 | +intermittent failures. |
| 10 | + |
| 11 | +There are two components to this goal. |
| 12 | + |
| 13 | +1. Create a service that automatically inserts test logs into a full text search |
| 14 | + database. |
| 15 | + |
| 16 | +2. Create a web tool for querying the full text search database and visualizing |
| 17 | + results. |
| 18 | + |
| 19 | +## Background |
| 20 | + |
| 21 | +The Haskell compiler GHC has an old testsuite that is slowly lumbering into the |
| 22 | +modern era. As more aspects of GHC are tested automatically, rare intermittent |
| 23 | +failures that cause spurious test results are uncovered. As more infrastructure |
| 24 | +is added to support automation, the surface area for such spurious failures |
| 25 | +increases. Collectively, the intermittent failures affect many CI runs and can |
| 26 | +create a frustrating experience for would-be GHC contributors. |
| 27 | + |
| 28 | +One successful technique for combating intermittent failures is to collect data |
| 29 | +from many test runs and look for patterns. By finding the "fingerprint" of a |
| 30 | +particular failure, we can identify whether it is indeed spurious, what |
| 31 | +circumstances accompany the failure, and how frequently it occurs. This |
| 32 | +information can be used to identify the root cause and fix the failure. At the |
| 33 | +very least, it can be used to recover from the failure automatically, giving |
| 34 | +contributors a smoother experience. |
| 35 | + |
| 36 | +### Existing Tooling |
| 37 | + |
| 38 | +Some tooling to support this technique is found at |
| 39 | +https://gitlab.haskell.org/chreekat/spurious-failures/-/tree/master/local-tooling. |
| 40 | +It requires the user to manually download all job logs, and the "interface" is |
| 41 | +nothing more than a sqlite database. This project will improve on the idea. |
| 42 | + |
| 43 | +There is already a service that listens to job events, found at |
| 44 | +https://gitlab.haskell.org/chreekat/spurious-failures/-/tree/master/spuriobot. |
| 45 | +Therefore, the first component of the project goal (creating a service that |
| 46 | +automatically inserts test logs into a full text search database) will only need |
| 47 | +to extend that service with the log-insertion feature. |
| 48 | + |
| 49 | +## Outcomes |
| 50 | + |
| 51 | +Phase 1: The tool will be implemented and brought online with a basic user |
| 52 | +interface. It will only support GHC. |
| 53 | + |
| 54 | +Phase 2, option 1: Guided by user feedback, better visualizations will be added |
| 55 | +to the UI. |
| 56 | + |
| 57 | +Phase 2, option 2: The service that automatically inserts test logs into a full |
| 58 | +text search database will be extended to support Github workflows, allowing the |
| 59 | +tool to be used much more widely. |
| 60 | + |
| 61 | +Phase 2, option 3: *Use* the tool to characterize spurious failures in GHC. |
| 62 | +There is a large list of potential spurious failures that can be investigated. |
| 63 | +And maybe fix them! |
| 64 | + |
| 65 | +## Size |
| 66 | + |
| 67 | +The first deliverable, described in Phase 1, is **small**. By choosing from the |
| 68 | +Phase 2 options, however, the project can be extended to **medium** or **large** |
| 69 | +as suits the circumstances. |
| 70 | + |
| 71 | +## Required Skills |
| 72 | + |
| 73 | +* Read and write technical English |
| 74 | +* Haskell programming basics |
| 75 | + |
| 76 | +## Suitable for the Following Interests |
| 77 | + |
| 78 | +* devops |
| 79 | +* Haskell tooling |
| 80 | +* web app development |
| 81 | +* web services |
| 82 | +* data visualization |
| 83 | + |
| 84 | +## Project Mentor |
| 85 | + |
| 86 | +* Bryan Richter, Haskell Foundation DevOps engineer and author of existing tooling |
0 commit comments