Skip to content

Commit 63f360e

Browse files
authored
Add ci log explorer idea (#184)
1 parent 1eb9405 commit 63f360e

File tree

1 file changed

+86
-0
lines changed

1 file changed

+86
-0
lines changed

content/ideas/ci-log-explorer.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
---
2+
title: Continuous Integration Log Explorer
3+
---
4+
5+
## Goals
6+
7+
Create a web-based tool that can be used to explore continuous integration test
8+
logs suitable for large projects with big workflows that are susceptible to rare
9+
intermittent failures.
10+
11+
There are two components to this goal.
12+
13+
1. Create a service that automatically inserts test logs into a full text search
14+
database.
15+
16+
2. Create a web tool for querying the full text search database and visualizing
17+
results.
18+
19+
## Background
20+
21+
The Haskell compiler GHC has an old testsuite that is slowly lumbering into the
22+
modern era. As more aspects of GHC are tested automatically, rare intermittent
23+
failures that cause spurious test results are uncovered. As more infrastructure
24+
is added to support automation, the surface area for such spurious failures
25+
increases. Collectively, the intermittent failures affect many CI runs and can
26+
create a frustrating experience for would-be GHC contributors.
27+
28+
One successful technique for combating intermittent failures is to collect data
29+
from many test runs and look for patterns. By finding the "fingerprint" of a
30+
particular failure, we can identify whether it is indeed spurious, what
31+
circumstances accompany the failure, and how frequently it occurs. This
32+
information can be used to identify the root cause and fix the failure. At the
33+
very least, it can be used to recover from the failure automatically, giving
34+
contributors a smoother experience.
35+
36+
### Existing Tooling
37+
38+
Some tooling to support this technique is found at
39+
https://gitlab.haskell.org/chreekat/spurious-failures/-/tree/master/local-tooling.
40+
It requires the user to manually download all job logs, and the "interface" is
41+
nothing more than a sqlite database. This project will improve on the idea.
42+
43+
There is already a service that listens to job events, found at
44+
https://gitlab.haskell.org/chreekat/spurious-failures/-/tree/master/spuriobot.
45+
Therefore, the first component of the project goal (creating a service that
46+
automatically inserts test logs into a full text search database) will only need
47+
to extend that service with the log-insertion feature.
48+
49+
## Outcomes
50+
51+
Phase 1: The tool will be implemented and brought online with a basic user
52+
interface. It will only support GHC.
53+
54+
Phase 2, option 1: Guided by user feedback, better visualizations will be added
55+
to the UI.
56+
57+
Phase 2, option 2: The service that automatically inserts test logs into a full
58+
text search database will be extended to support Github workflows, allowing the
59+
tool to be used much more widely.
60+
61+
Phase 2, option 3: *Use* the tool to characterize spurious failures in GHC.
62+
There is a large list of potential spurious failures that can be investigated.
63+
And maybe fix them!
64+
65+
## Size
66+
67+
The first deliverable, described in Phase 1, is **small**. By choosing from the
68+
Phase 2 options, however, the project can be extended to **medium** or **large**
69+
as suits the circumstances.
70+
71+
## Required Skills
72+
73+
* Read and write technical English
74+
* Haskell programming basics
75+
76+
## Suitable for the Following Interests
77+
78+
* devops
79+
* Haskell tooling
80+
* web app development
81+
* web services
82+
* data visualization
83+
84+
## Project Mentor
85+
86+
* Bryan Richter, Haskell Foundation DevOps engineer and author of existing tooling

0 commit comments

Comments
 (0)