Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a Discourse Plugin #865

Open
decentralion opened this issue Sep 20, 2018 · 3 comments

Comments

@decentralion
Copy link
Member

commented Sep 20, 2018

Today, SourceCred gets data from two plugins: the git and github plugins.

The plugins load data from Git and GitHub, and map that data into a graph showing how contributions in the domain are interrelated.
Contributions, like Git commits or GitHub pull requests, become nodes in that graph, and relationships between them (authorship, references, etc) are edges.

SourceCred uses Discourse for its community forum, available at https://discourse.sourcecred.io.
We should add a third plugin, the Discourse plugin, which loads data from arbitrary Discourse instances.

To start, it should have the following nodes and edges:

Nodes:

  • USER, a Discourse login identity
  • THREAD, a top-level thread
  • POST, an individual pots in a thread (including the first message in the thread)
  • CATEGORY, a Discourse category of threads

Edges:

  • AUTHORS, user authors a post or thread
  • REFERENCES, a post references a user, thread, or post (via url or @-refernece)
  • QUOTES, a post quotes another post (there's an explicit tag)
  • CONTAINS, categories contain threads and threads contain posts
  • LIKES, a user likes a post

This is a relatively involved undertaking; whoever decides to do this, please
feel free to drop by our Discord to
chat about how the architecture works.

@decentralion

This comment has been minimized.

Copy link
Member Author

commented Jan 4, 2019

Closing, because we aren't making active use of Discourse atm.

@BrianLitwin BrianLitwin reopened this Mar 29, 2019

@decentralion

This comment has been minimized.

Copy link
Member Author

commented Mar 29, 2019

I spent a bit of time today looking at the discourse API.

Some thoughts:

  • Getting data from the API requires a key to be created by the forum maintainers. This means, in contrast to GitHub, it will be impossible to generate cred for a Discourse forum without explicit buy-in from the admins
  • There doesn't seem to be any way to scope API keys with specific permissions, so they're quite sensitive. We should be careful with these.
  • The Topics api has endpoints for getting recent topics and top topics, but not for getting all topics. This could be a problem, except:
  • There's a get Topic API by integer ID, and looking at our instance, it appears that the topics are assigned incrementing ids, starting at 1. So we could retrieve all the topics by getting the latest id, and then iterating from 1 to the latest id
  • Similarly, while there is no method to get all the posts on a topic, the posts have incrementing integer ids, and we can get a post for a topic by id
  • Much like the GitHub plugin, we should cache the results locally to avoid needing to download everything every time we re-run sourcecred. I think using sqlite again to build the cache will make sense. This time, I expect we will have the graph get generated directly from the sqlite cache rather than going through an extra layer of indirection.
  • It would be helpful if Discourse is willing to give us a free instance to use as a test discourse, which has a few canonical posts and replies and which we can use for E2E snapshot testing. Much like example-github
  • We should take some care to make sure that SourceCred doesn't leak details about private posts. The simplest way to do this would be to simply skip all topics that have the "visible": false set.
@decentralion

This comment has been minimized.

Copy link
Member Author

commented Apr 10, 2019

We now have a test discourse instance: https://sourcecred-test.discourse.group/. Thanks @erlend-sh!

decentralion added a commit that referenced this issue Aug 6, 2019

Add class for fetching data from Discourse
The `DiscourseFetcher` class abstracts over fetching from the Discourse
API, and post-processing and filtering the result into a form that's
convenient for us.

Testing is a bit tricky because the Discourse API keys are sensitive
(they are admin keys) and so I'm reluctant to commit them, even for our
test instance. As a workaround, I've added a shell script which
downloads some data from the SourceCred test instance, and saves it with
a filename which is an encoding of the actual endpoint. Then, in
testing, we can use a mocked fetch which actually hits the snapshots
directory, and thus validate the processing logic on "real" data from
the server. We also test that the fetch headers are set correctly, and
that we handle non-200 error codes appropriately.

Test plan: In addition to the included tests, I have an end-to-end test
which actually uses this fetcher to fully populate the mirror and then
generate a valid SourceCred graph.

This builds on API investigations
[here](#865 (comment)),
and is general progress towards #865. Thanks to @erlend-sh, without whom
we wouldn't have a test instance.

decentralion added a commit that referenced this issue Aug 6, 2019

Add class for fetching data from Discourse
The `DiscourseFetcher` class abstracts over fetching from the Discourse
API, and post-processing and filtering the result into a form that's
convenient for us.

Testing is a bit tricky because the Discourse API keys are sensitive
(they are admin keys) and so I'm reluctant to commit them, even for our
test instance. As a workaround, I've added a shell script which
downloads some data from the SourceCred test instance, and saves it with
a filename which is an encoding of the actual endpoint. Then, in
testing, we can use a mocked fetch which actually hits the snapshots
directory, and thus validate the processing logic on "real" data from
the server. We also test that the fetch headers are set correctly, and
that we handle non-200 error codes appropriately.

Test plan: In addition to the included tests, I have an end-to-end test
which actually uses this fetcher to fully populate the mirror and then
generate a valid SourceCred graph.

This builds on API investigations
[here](#865 (comment)),
and is general progress towards #865. Thanks to @erlend-sh, without whom
we wouldn't have a test instance.

decentralion added a commit that referenced this issue Aug 6, 2019

Add a Discourse API mirror
The mirror wraps a SQLite database which will store all of the data we
download from Discourse.

On a call to `update`, it downloads new data from the server and stores
it. Then, when it is asked for information like the topics and posts, it
can just pull from its local copy. This means that we don't need to
re-download the content every time we load a Discourse instance, which
makes the load more performant, more robust to network failures, etc.

Thanks to @wchargin, whose work on the GraphQL mirror for GitHub (#622)
inspired this mirror.

Test plan: I've written unit tests that use a mock fetcher to validate
the update logic. I've also used this to do a full load of the real
SourceCred Discourse instance, and to create a corresponding graph
(using subsequent commits).

Progress towards #865.

decentralion added a commit that referenced this issue Aug 6, 2019

Add class for fetching data from Discourse
The `DiscourseFetcher` class abstracts over fetching from the Discourse
API, and post-processing and filtering the result into a form that's
convenient for us.

Testing is a bit tricky because the Discourse API keys are sensitive
(they are admin keys) and so I'm reluctant to commit them, even for our
test instance. As a workaround, I've added a shell script which
downloads some data from the SourceCred test instance, and saves it with
a filename which is an encoding of the actual endpoint. Then, in
testing, we can use a mocked fetch which actually hits the snapshots
directory, and thus validate the processing logic on "real" data from
the server. We also test that the fetch headers are set correctly, and
that we handle non-200 error codes appropriately.

Test plan: In addition to the included tests, I have an end-to-end test
which actually uses this fetcher to fully populate the mirror and then
generate a valid SourceCred graph.

This builds on API investigations
[here](#865 (comment)),
and is general progress towards #865. Thanks to @erlend-sh, without whom
we wouldn't have a test instance.

decentralion added a commit that referenced this issue Aug 6, 2019

Add class for fetching data from Discourse
The `DiscourseFetcher` class abstracts over fetching from the Discourse
API, and post-processing and filtering the result into a form that's
convenient for us.

Testing is a bit tricky because the Discourse API keys are sensitive
(they are admin keys) and so I'm reluctant to commit them, even for our
test instance. As a workaround, I've added a shell script which
downloads some data from the SourceCred test instance, and saves it with
a filename which is an encoding of the actual endpoint. Then, in
testing, we can use a mocked fetch which actually hits the snapshots
directory, and thus validate the processing logic on "real" data from
the server. We also test that the fetch headers are set correctly, and
that we handle non-200 error codes appropriately.

Test plan: In addition to the included tests, I have an end-to-end test
which actually uses this fetcher to fully populate the mirror and then
generate a valid SourceCred graph.

This builds on API investigations
[here](#865 (comment)),
and is general progress towards #865. Thanks to @erlend-sh, without whom
we wouldn't have a test instance.

decentralion added a commit that referenced this issue Aug 6, 2019

Add a Discourse API mirror
The mirror wraps a SQLite database which will store all of the data we
download from Discourse.

On a call to `update`, it downloads new data from the server and stores
it. Then, when it is asked for information like the topics and posts, it
can just pull from its local copy. This means that we don't need to
re-download the content every time we load a Discourse instance, which
makes the load more performant, more robust to network failures, etc.

Thanks to @wchargin, whose work on the GraphQL mirror for GitHub (#622)
inspired this mirror.

Test plan: I've written unit tests that use a mock fetcher to validate
the update logic. I've also used this to do a full load of the real
SourceCred Discourse instance, and to create a corresponding graph
(using subsequent commits).

Progress towards #865.

decentralion added a commit that referenced this issue Aug 6, 2019

Add a Discourse API mirror
The mirror wraps a SQLite database which will store all of the data we
download from Discourse.

On a call to `update`, it downloads new data from the server and stores
it. Then, when it is asked for information like the topics and posts, it
can just pull from its local copy. This means that we don't need to
re-download the content every time we load a Discourse instance, which
makes the load more performant, more robust to network failures, etc.

Thanks to @wchargin, whose work on the GraphQL mirror for GitHub (#622)
inspired this mirror.

Test plan: I've written unit tests that use a mock fetcher to validate
the update logic. I've also used this to do a full load of the real
SourceCred Discourse instance, and to create a corresponding graph
(using subsequent commits).

Progress towards #865.

decentralion added a commit that referenced this issue Aug 15, 2019

Add class for fetching data from Discourse
The `DiscourseFetcher` class abstracts over fetching from the Discourse
API, and post-processing and filtering the result into a form that's
convenient for us.

Testing is a bit tricky because the Discourse API keys are sensitive
(they are admin keys) and so I'm reluctant to commit them, even for our
test instance. As a workaround, I've added a shell script which
downloads some data from the SourceCred test instance, and saves it with
a filename which is an encoding of the actual endpoint. Then, in
testing, we can use a mocked fetch which actually hits the snapshots
directory, and thus validate the processing logic on "real" data from
the server. We also test that the fetch headers are set correctly, and
that we handle non-200 error codes appropriately.

Test plan: In addition to the included tests, I have an end-to-end test
which actually uses this fetcher to fully populate the mirror and then
generate a valid SourceCred graph.

This builds on API investigations
[here](#865 (comment)),
and is general progress towards #865. Thanks to @erlend-sh, without whom
we wouldn't have a test instance.

decentralion added a commit that referenced this issue Aug 15, 2019

Add class for fetching data from Discourse (#1265)
The `DiscourseFetcher` class abstracts over fetching from the Discourse
API, and post-processing and filtering the result into a form that's
convenient for us.

Testing is a bit tricky because the Discourse API keys are sensitive
(they are admin keys) and so I'm reluctant to commit them, even for our
test instance. As a workaround, I've added a shell script which
downloads some data from the SourceCred test instance, and saves it with
a filename which is an encoding of the actual endpoint. Then, in
testing, we can use a mocked fetch which actually hits the snapshots
directory, and thus validate the processing logic on "real" data from
the server. We also test that the fetch headers are set correctly, and
that we handle non-200 error codes appropriately.

Test plan: In addition to the included tests, I have an end-to-end test
which actually uses this fetcher to fully populate the mirror and then
generate a valid SourceCred graph.

This builds on API investigations
[here](#865 (comment)),
and is general progress towards #865. Thanks to @erlend-sh, without whom
we wouldn't have a test instance.

decentralion added a commit that referenced this issue Aug 15, 2019

Add a Discourse API mirror
The mirror wraps a SQLite database which will store all of the data we
download from Discourse.

On a call to `update`, it downloads new data from the server and stores
it. Then, when it is asked for information like the topics and posts, it
can just pull from its local copy. This means that we don't need to
re-download the content every time we load a Discourse instance, which
makes the load more performant, more robust to network failures, etc.

Thanks to @wchargin, whose work on the GraphQL mirror for GitHub (#622)
inspired this mirror.

Test plan: I've written unit tests that use a mock fetcher to validate
the update logic. I've also used this to do a full load of the real
SourceCred Discourse instance, and to create a corresponding graph
(using subsequent commits).

Progress towards #865.

decentralion added a commit that referenced this issue Aug 15, 2019

Add a Discourse API mirror (#1266)
The mirror wraps a SQLite database which will store all of the data we
download from Discourse.

On a call to `update`, it downloads new data from the server and stores
it. Then, when it is asked for information like the topics and posts, it
can just pull from its local copy. This means that we don't need to
re-download the content every time we load a Discourse instance, which
makes the load more performant, more robust to network failures, etc.

Thanks to @wchargin, whose work on the GraphQL mirror for GitHub (#622)
inspired this mirror.

Test plan: I've written unit tests that use a mock fetcher to validate
the update logic. I've also used this to do a full load of the real
SourceCred Discourse instance, and to create a corresponding graph
(using subsequent commits).

Progress towards #865.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.