Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub as event stream proof-of-concept #44

Closed
wants to merge 5 commits into from

Conversation

bryanburgers
Copy link
Contributor

We currently have two sources-of-truth when syncing on startup: the
database and issue comments.

This proof of concept tries to bring us down to a single source-of-truth
by effectively elimating the local database (at least for pull request
state; other aspects like treeclosed are not addressed).

In order to do this, we use GitHub's "timeline"
(v3,
v4)
feature to get a fuller view of what has happened to a pull request so
we can reify state in a more complete manner than using issue comments
alone. For example, the timeline API interleaves issue comments with
things like AssignedEvent, HeadRefForcePushedEvent, and
RenamedTitleEvent in the order they happened so we can reliably build
the full state from this historical record.

The general concept is that in both synchronizing and realtime mode,
we have the pull request object respond to the event to update the
state. Side effects like comments and label updates are returned, and
the code can use or drop those depending on which mode Homu in.

PR state --------\
                 |     +---------------+
                 \---> |               | -----> Modified PR state
                       | process_event |
                 /---> |               | --\
                 |     +---------------+   +--> Realtime comments
GitHub Event ----/                         +--> Label updates
                                           \--> Database update

The good thing is that we can model these timeline items and GitHub's
event webhooks (which we use when processing in realtime mode) in much
the same way, so that we can unify them in a way that makes it possible
to effectively test the effect of events on a pull request.

This particular pull request does not attempt to integrate these changes
into the codebase yet, because I wanted to get feedback that this is
indeed a viable path to head down.

Instead, this pull request (which should not be merged) provides a new
executable that demonstrates this ability on existing pull requests in a
read-only mode. The output is a series of relevant timeline items, and
the state of the pull request as it changes.

pip install -e homu
homu-sync-test --repo rust-lang/rust 61419

image

Start adding some tests to Homu by testing the issue comment
command-parsing logic.

Take `parse_commands` and break it apart into two phases

* Parsing phase
* Execution phase

The parsing phase returns a list of commands with action names (ideally,
this would be a Rust enum, but to simulate that, we use action names on
a single class) that are returned regardless of the current state.  So
for example, `@bors retry` will return a `retry` command regardless of
the current state of `realtime`.

The execution phase then inspects the list of commands and decides what
to do with them. So for example, the `retry` command will be skipped if
`realtime == False`.

This has the positive result of having a parsing phase that has no
side-effects, which makes it much easier to test. This can lead to
higher confidence that the code works as expected without the high cost
of testing in production and possibly disrupting the build flow.

This has the negative result of adding a lot of lines of code to achieve
command parsing, which we already do successfully without the tests.
Use `python setup.py test` to run the tests, and a few minor changes
required to support that.
We currently have two sources-of-truth when syncing on startup: the
database and issue comments.

This proof of concept tries to bring us down to a single source-of-truth
by effectively elimating the local database (at least for pull request
state; other aspects like `treeclosed` are not addressed).

In order to do this, we use GitHub's "timeline"
([v3](https://developer.github.com/v3/issues/timeline/),
[v4](https://developer.github.com/v4/object/pullrequest/#timelineitems))
feature to get a fuller view of what has happened to a pull request so
we can reify state in a more complete manner than using issue comments
alone. For example, the timeline API interleaves issue comments with
things like `AssignedEvent`, `HeadRefForcePushedEvent`, and
`RenamedTitleEvent` in the order they happened so we can reliably build
the full state from this historical record.

The general concept is that in both `synchronizing` and `realtime` mode,
we have the pull request object respond to the event to update the
state. Side effects like comments and label updates are returned, and
the code can use or drop those depending on which mode Homu in.

    PR state --------\
                     |     +---------------+
                     \---> |               | -----> Modified PR state
                           | process_event |
                     /---> |               | --\
                     |     +---------------+   +--> Realtime comments
    GitHub Event ----/                         +--> Label updates
                                               \--> Database update

The good thing is that we can model these timeline items and GitHub's
event webhooks (which we use when processing in `realtime` mode) in much
the same way, so that we can unify them in a way that makes it possible
to effectively test the effect of events on a pull request.

This particular pull request does not attempt to integrate these changes
into the codebase yet, because I wanted to get feedback that this is
indeed a viable path to head down.

Instead, this pull request (which should not be merged) provides a new
executable that demonstrates this ability on existing pull requests in a
read-only mode. The output is a series of relevant timeline items, and
the state of the pull request as it changes.

    pip install -e homu
    homu-sync-test --repo rust-lang/rust 61419
@bryanburgers
Copy link
Contributor Author

@kennytm @pietroalbini @Manishearth (not sure who else) I'd love to get some 👀 on this, and your thoughts on this approach.

Might be some significant cutting to get there so I wanted to get feedback first, but I think it could have some real tangible benefits in the future, too.

@kennytm
Copy link
Member

kennytm commented Jun 4, 2019

+1 for reducing the SQLite dependency

@pietroalbini
Copy link
Member

This approach seems awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants