A GitHub Action to suggest related or similar issues, documents, and links. Based on the power of NLP and fastText.
Sometimes, users create a new issue that is similar to already opened or closed issues. When you face the issue, you need to find similar issues or related links and post it as a comment. That process is a pain, right? This action can do it instead of you!
Create your YAML workflow file as follows.
e.g. .github/workflows/suggest-related-links.yml
name: 'Suggest Related Links'
on:
issues:
types:
- opened
- edited
workflow_dispatch:
schedule:
- cron: '13 13 * * */7'
jobs:
action:
runs-on: ubuntu-18.04
steps:
- name: Cache dependencies
uses: actions/cache@v2
with:
path: ~/actions-suggest-related-links-tmp
key: ${{ runner.os }}-action-${{ hashFiles('~/actions-suggest-related-links-tmp/training-data.json') }}
restore-keys: |
${{ runner.os }}-action-
- uses: peaceiris/actions-suggest-related-links@v1.1.1
- uses: peaceiris/actions-suggest-related-links/models/fasttext@v1.1.1
if: github.event_name == 'issues'
with:
version: v1.1.1
- uses: peaceiris/actions-suggest-related-links@v1.1.1
with:
mode: 'suggest'
repository: 'peaceiris/actions-gh-pages'
unclickable: true
Run It manually only the first time to save issues data.
After the first running, it will be automatically executed following the setting of the schedule.cron
.
Some related links which are similar to the created issue body will be listed by this action.
Our GitHub Action is actions-suggest-related-links, which suggests related or similar issues, documents, and links. This action mainly consists of 5 parts: Data Collection, Preprocessing, Train model, Find similar issues, and Suggest Issues.
All issues of the repository are collected with the GitHub API. The issues include the title, body, and comments. Training Data is regularly collected using the scheduling function and output as an artifact and saved as a cache.
The Markdown format is converted to plain text with unified. At this time, symbols that are not alphabetic characters are deleted.
When a new issue is created or updated, the fastText model is trained. In accordance with its name, fastText has the advantage of very short inference times.
We think training time at the GitHub Actions runners won't be an issue. In the case of GitHub Actions for GitHub Pages repository, the training execution time is 1 sec and the total execution time is about 30 sec.
Calculate word vectors of training data and word vectors of posted data in fastText. The cosine similarity is used to determine which word vectors of training data is close to the word vectors of the posted data. The higher the cosine similarity, the more similar the sentence.