Skip to content

Commit

Permalink
Merge branch 'main' into feature/tree-recombination
Browse files Browse the repository at this point in the history
  • Loading branch information
tc85324 committed Sep 4, 2024
2 parents f0cdf80 + 2bb9990 commit 7fecf80
Show file tree
Hide file tree
Showing 53 changed files with 2,548 additions and 515 deletions.
13 changes: 13 additions & 0 deletions .copyright-template
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# © Crown Copyright GCHQ
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
1 change: 1 addition & 0 deletions .cspell/custom_misc.txt
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ kernelized
KSD
linewidth
mapsto
Matern
ml.p3.8xlarge
ndmin
parsable
Expand Down
5 changes: 5 additions & 0 deletions .cspell/library_terms.txt
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ jaxlib
jaxopt
jaxtyping
jumanjihouse
keepends
linalg
linkcheck
literalinclude
Expand All @@ -65,9 +66,11 @@ mathbb
mathbf
mathrm
matplotlib
maxval
meshgrid
mimread
mimsave
minval
modindex
myst
nabla
Expand All @@ -76,6 +79,7 @@ nanargmin
ndarray
ndim
newaxis
nobs
nonzero
numpy
opencv
Expand Down Expand Up @@ -121,6 +125,7 @@ toctree
tomli
tqdm
triu
ttest
tuplegetter
typehints
undoc
Expand Down
1 change: 1 addition & 0 deletions .cspell/people.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ Litterer
Loera
Lyons
Martinsson
Matérn
Meunier
Motonobu
Nabil
Expand Down
69 changes: 69 additions & 0 deletions .github/workflows/performance.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
name: Performance
# Monitor performance of Coreax code.

on:
push:
branches:
- main
pull_request:
branches:
- "**"

jobs:
performance-check:
name: Check performance
runs-on: ubuntu-latest
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
gist_id: 3707a122b3697109068a3e55487de4fc
steps:
- uses: actions/checkout@v4
- name: Set up Python 3.12
uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: pip
cache-dependency-path: pyproject.toml
- name: Upgrade pip
run: python -m pip install --upgrade pip
- name: Install package and dependencies
run: pip install -e .
- name: Assess performance
run: python tests/performance/run.py --output-file $RUNNER_TEMP/performance.json
- name: Download historic performance data
if: github.event_name == 'pull_request'
run: gh gist clone ${{ env.gist_id }} $RUNNER_TEMP/historic
- name: Compare performance against historic data
if: github.event_name == 'pull_request'
run: |
# save the commit subject to a file in case it contains any shell
# special characters
git log -1 --pretty=%s > $RUNNER_TEMP/commit_subject.txt
python tests/performance/compare.py \
$RUNNER_TEMP/performance.json \
$RUNNER_TEMP/historic \
--commit-short-hash $(git log -1 --pretty=%h) \
--commit-subject-file $RUNNER_TEMP/commit_subject.txt \
> $RUNNER_TEMP/comment.md
cat $RUNNER_TEMP/comment.md
- name: Comment performance update on PR
if: github.event_name == 'pull_request'
uses: actions/github-script@v7
with:
script: |
var fs = require('fs');
const RUNNER_TEMP = process.env.RUNNER_TEMP
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: fs.readFileSync(`${RUNNER_TEMP}/comment.md`, "utf8")
})
- name: Save performance data to Gist
if: github.event_name == 'push'
env:
# this is the only step that should actually need write permissions
GITHUB_TOKEN: ${{ secrets.COVERAGE_GIST_KEY }}
run: |
OUT_NAME="performance-$(date --utc +%Y-%m-%d--%H-%M-%S)--$GITHUB_SHA--v1.json"
gh gist edit ${{ env.gist_id }} -a $OUT_NAME $RUNNER_TEMP/performance.json
10 changes: 8 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ repos:
- id: forbid-tabs
exclude: documentation/make.bat|documentation/Makefile
- repo: https://github.com/streetsidesoftware/cspell-cli
rev: v8.13.2
rev: v8.13.3
hooks:
# Run a spellcheck (words pulled from cspell.config.yaml)
- id: cspell
Expand Down Expand Up @@ -88,7 +88,7 @@ repos:
# Enforce that type annotations are used instead of type comments
- id: python-use-type-annotations
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.5.7
rev: v0.6.3
hooks:
# Run the linter.
- id: ruff
Expand All @@ -112,3 +112,9 @@ repos:
- "-sn" # Don't display the score
- "--rcfile=.pylintrc" # pylint configuration file
exclude: "documentation/source/snippets"
- id: check-copyright
name: Check for copyright notice
description: Ensure a copyright notice is present at the top of each Python file
entry: python pre_commit_hooks/check_copyright.py
types: [ python ]
language: python
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Added supervised coreset construction algorithm in `coreax.solvers.GreedyKernelPoints`
- Added `coreax.kernels.PowerKernel` to replace repeated calls of `coreax.kernels.ProductKernel`
within the `**` magic method of `coreax.kernel.ScalarValuedKernel`
- Added scalar-valued kernel functions `coreax.kernels.PoissonKernel` and `coreax.kernels.MaternKernel`
- Added `progress_bar` attribute to `coreax.score_matching.SlicedScoreMatching` to enable or
disable tqdm progress bar terminal output. Defaults to disabled (`False`).


### Fixed
Expand Down
115 changes: 78 additions & 37 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,12 @@

## Getting started

If you would like to contribute to the development of coreax, you can do so in a number of ways:
- Highlight any bugs you encounter during usage, or any feature requests that would improve coreax by raising appropriate issues.
- Develop solutions to open issues and create pull requests (PRs) for the development team to review.
If you would like to contribute to the development of coreax, you can do so in a number
of ways:
- Highlight any bugs you encounter during usage, or any feature requests that would
improve coreax by raising appropriate issues.
- Develop solutions to open issues and create pull requests (PRs) for the development
team to review.
- Implement optimisations in the codebase to improve performance.
- Contribute example usages & documentation improvements.
- Increase awareness of coreax with other potential users.
Expand All @@ -21,15 +24,19 @@ Developers should install additional packages required for development using
- Open a new issue:
- For bugs use the [bug report issue template][gh-bug-report].
- For features use the [feature request issue template][gh-feature-request].
- This will make the issue a candidate for inclusion in future sprints, as-well as open to the community to address.
- If you are able to fix the bug or implement the feature, [create a pull request](#pull-requests) with the relevant changes.
- This will make the issue a candidate for inclusion in future sprints, as-well as
open to the community to address.
- If you are able to fix the bug or implement the feature,
[create a pull request](#pull-requests) with the relevant changes.

## Pull requests

Currently, we are using a [GitHub Flow][github-flow] development approach.

- To avoid duplicate work, [search existing pull requests][gh-prs].
- All pull requests should relate to an existing issue.
- If the pull request addresses something not currently covered by an issue, create a new issue first.
- If the pull request addresses something not currently covered by an issue, create a
new issue first.
- Make changes on a [feature branch][git-feature-branch] instead of the main branch.
- Branch names should take one of the following forms:
- `feature/<feature-name>`: for adding, removing or refactoring a feature.
Expand All @@ -40,24 +47,31 @@ Currently, we are using a [GitHub Flow][github-flow] development approach.
- Delete your branch once it has been merged.

### Pull request process
- Create a [Draft pull request][pr-draft] while you are working on the changes to allow others to monitor progress and see the issue is being worked on.

- Create a [Draft pull request][pr-draft] while you are working on the changes to allow
others to monitor progress and see the issue is being worked on.
- Pull in changes from upstream often to minimise merge conflicts.
- Make any required changes.
- Resolve any conflicts with the target branch.
- [Change your PR to ready][pr-ready] when the PR is ready for review. You can convert back to Draft at any time.
- [Change your PR to ready][pr-ready] when the PR is ready for review. You can convert
back to Draft at any time.

Do **not** add labels like `[RFC]` or `[WIP]` to the title of your PR to indicate its state.
Non-Draft PRs are assumed to be open for comments; if you want feedback from specific people, `@`-mention them in a comment.
Do **not** add labels like `[RFC]` or `[WIP]` to the title of your PR to indicate its
state.
Non-Draft PRs are assumed to be open for comments; if you want feedback from specific
people, `@`-mention them in a comment.

### Pull request commenting process

- Use a comment thread for each required change.
- Reviewer closes the thread once the comment has been resolved.
- Only the reviewer may mark a thread they opened as resolved.

### Commit messages

Follow the [conventional commits guidelines][conventional_commits] to *make reviews easier* and to make the git logs more valuable.
An example commit, including reference to some GitHub issue #123, might take the form:
Follow the [conventional commits guidelines][conventional_commits] to
*make reviews easier* and to make the git logs more valuable. An example commit,
including reference to some GitHub issue #123, might take the form:

```
feat: add gpu support for matrix multiplication
Expand Down Expand Up @@ -109,8 +123,9 @@ def my_new_function(x: int) -> int:

## Code

Code must be documented, adequately tested and compliant with in style prior to merging into the main branch. To
facilitate code review, code should meet these standards prior to creating a pull request.
Code must be documented, adequately tested and compliant with in style prior to merging
into the main branch. To facilitate code review, code should meet these standards prior
to creating a pull request.

Some of the following points are checked by pre-commit hooks, although others require
manual implementation by authors and reviewers. Conversely, further style points that
Expand All @@ -121,7 +136,8 @@ need to be aware of them.

A high level overview of the expected style is:
- Follow [PEP 8][pep-8] style where possible.
- Use clear naming of variables rather than mathematical shorthand (e.g. kernel instead of k).
- Use clear naming of variables rather than mathematical shorthand (e.g. kernel instead
of k).
- [Black][black] will be applied by the pre-commit hook but will not reformat strings,
comments or docstrings. These must be manually checked and limited to 88 characters
per line starting from the left margin and including any indentation.
Expand All @@ -146,52 +162,70 @@ to avoid inadvertently permitting spelling errors elsewhere, e.g. add `Blu-Tack`
instead of `Blu`.

### External dependencies
Use standard library and existing well maintained external libraries where possible. New external libraries should be licensed permissive (e.g [MIT][mit]) or weak copyleft (e.g. [LGPL][lgpl]).

Use standard library and existing well maintained external libraries where possible. New
external libraries should be licensed permissive (e.g [MIT][mit]) or weak copyleft
(e.g. [LGPL][lgpl]).

### Testing

All tests are ran via the following [Pytest][pytest] command:
```bash
pytest tests/
```
Either [Pytest][pytest] or [Unittest][unittest] can be used to write tests for coreax.
[Pytest][pytest] is recommended where it would simplify code, such as for parameterized tests. As much effort should be put into developing tests as is put into developing the code.
Tests should be provided to test functionality and also ensuring exceptions and warnings are raised or managed appropriately. This includes:
[Pytest][pytest] is recommended where it would simplify code, such as for parameterized
tests. As much effort should be put into developing tests as is put into developing the
code.
Tests should be provided to test functionality and also ensuring exceptions and warnings
are raised or managed appropriately. This includes:
- Unit testing of new functions added to the codebase
- Verifying all existing tests pass with the integrated changes

Keep in mind the impact on runtime when writing your tests. Favour more tests that are smaller rather than a few large
tests with many assert statements unless it would significantly affect run time, e.g. due to excess set up or duplicated
function calls.
Keep in mind the impact on runtime when writing your tests. Favour more tests that are
smaller rather than a few large tests with many assert statements unless it would
significantly affect run time, e.g. due to excess set up or duplicated function calls.

Use the form: (actual, expected) in asserts, e.g.
```python
assertEqual(actualValue, expectedValue)
```

### Abstract functions
Abstract methods, functions and properties should only contain a docstring. They should not contain a `pass` statement.

Abstract methods, functions and properties should only contain a docstring. They should
not contain a `pass` statement.

### Exceptions and error messages
Custom exceptions should be derived from the most specific relevant Exception class. Custom messages should be succinct and, where easy to implement, offer suggestions to the user on how to rectify the exception.

Avoid stating how the program will handle the error, e.g. avoid Aborting, since it will be evident that the program has terminated. This enables the exception to be caught and the program to continue in the future.
Custom exceptions should be derived from the most specific relevant Exception class.
Custom messages should be succinct and, where easy to implement, offer suggestions to
the user on how to rectify the exception.

Avoid stating how the program will handle the error, e.g. avoid Aborting, since it will
be evident that the program has terminated. This enables the exception to be caught and
the program to continue in the future.

### Docstrings

Docstrings must:
- Be written for private functions, methods and classes where their purpose or usage is not immediately obvious.
- Be written in [reStructured Text][sphinx-rst] ready to be compiled into documentation via [Sphinx][sphinx].
- Be written for private functions, methods and classes where their purpose or usage is
not immediately obvious.
- Be written in [reStructured Text][sphinx-rst] ready to be compiled into documentation
via [Sphinx][sphinx].
- Follow the [PEP 257][pep-257] style guide.
- Not have a blank line inserted after a function or method docstring unless the following statement is a function, method or class definition.
- Start with a capital letter unless referring to the name of an object, in which case match that case sensitively.
- Not have a blank line inserted after a function or method docstring unless the
following statement is a function, method or class definition.
- Start with a capital letter unless referring to the name of an object, in which case
match that case sensitively.
- Have a full stop at the end of the one-line descriptive sentence.
- Use full stops in extended paragraphs of text.
- Not have full stops at the end of parameter definitions.
- If a `:param:` or similar line requires more than the max line length, use multiple lines. Each additional line should
be indented by a further 4 spaces.
- Class `__init__` methods should not have docstrings. All constructor parameters should be listed at the end of the class
docstring. `__init__` docstrings will not be rendered by Sphinx. Any developer comments should be contained in a regular
comment.
- If a `:param:` or similar line requires more than the max line length, use multiple
lines. Each additional line should be indented by a further 4 spaces.
- Class `__init__` methods should not have docstrings. All constructor parameters should
be listed at the end of the class docstring. `__init__` docstrings will not be
rendered by Sphinx. Any developer comments should be contained in a regular comment.

Each docstring for a public object should take the following structure:
```
Expand All @@ -211,20 +245,27 @@ If the function does not return anything, the return line above can be omitted.
### Comments

Comments must:
- Start with a capital letter unless referring to the name of an object, in which case match that case sensitively.
- Start with a capital letter unless referring to the name of an object, in which case
match that case sensitively.
- Not end in a full stop for single-line comments in code.
- End with a full stop for multi-line comments.

### Maths overflow

Prioritise overfull lines for mathematical expressions over artificially splitting them into multiple equations in both comments and docstrings.
Prioritise overfull lines for mathematical expressions over artificially splitting them
into multiple equations in both comments and docstrings.

### Thousands separators

For hardcoded integers >= 1000, an underscore should be written to separate the thousands, e.g. 10_000 instead of 10000.
For hardcoded integers >= 1000, an underscore should be written to separate the
thousands, e.g. 10_000 instead of 10000.

### Documentation and references
The coreax documentation should reference papers and mathematical descriptions as appropriate. New references should be placed in the [`references.bib`](references.bib) file. An entry with key word `RefYY` can then be referenced within a docstring anywhere with `[RefYY]_`.

The coreax documentation should reference papers and mathematical descriptions as
appropriate. New references should be placed in the [`references.bib`](references.bib)
file. An entry with key word `RefYY` can then be referenced within a docstring anywhere
with `` :cite:`RefYY` ``.

### Generating docs with Sphinx

Expand Down
Loading

0 comments on commit 7fecf80

Please sign in to comment.