Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
9e6ed41
Add draft for CfS
fsschneider Aug 10, 2023
d326958
update
fsschneider Aug 11, 2023
06238ae
Merge branch 'dev' into CfS
fsschneider Aug 11, 2023
705c9a5
Update and rename CALL_FOR_SUBMISSIONS.md to SUBMISSION_PROCESS_RULES.md
fsschneider Aug 17, 2023
2057116
Fix filename
fsschneider Oct 3, 2023
2ff2431
Multiple submissions & additional baselines
fsschneider Oct 3, 2023
3298b43
winning hyperparameter configuration
fsschneider Oct 3, 2023
f9b5048
Specify challenging submissions
fsschneider Oct 3, 2023
0195e79
Prize money and challenge deadline
fsschneider Oct 3, 2023
e2043ea
Publication -> Announcement of results
fsschneider Oct 3, 2023
0f106a3
Remove todo for spirit jury
fsschneider Oct 3, 2023
0143c5e
Merge branch 'dev' into CfS
fsschneider Oct 3, 2023
e3f445d
Update dates
fsschneider Oct 3, 2023
4b18cb7
Add CfS placeholder
fsschneider Oct 3, 2023
1551018
Formatting
fsschneider Oct 3, 2023
ca921d1
Formatting
fsschneider Oct 3, 2023
60bdde2
Formatting & increment
fsschneider Oct 3, 2023
a8506dd
Formatting
fsschneider Oct 3, 2023
c6e52d7
Update ToC
fsschneider Oct 3, 2023
5a39cf8
Update rules to exclude test set in scoring
fsschneider Oct 3, 2023
f0e280a
Add link to Google Form
fsschneider Oct 3, 2023
f155287
Merge commit 'f0e280a3f0797838545b1a78250c67fa46c27565' into UpdateTu…
fsschneider Oct 3, 2023
e7a907c
Rename Jury Award
fsschneider Oct 10, 2023
7976442
specify ineligible entities and associated institutions
fsschneider Oct 10, 2023
1bb3854
rephrase "register submission" to "intent to submit"
fsschneider Oct 18, 2023
515dc09
add loss metric to min_eval_metrics registry
priyakasimbeg Oct 24, 2023
1b8c1dc
debugging
priyakasimbeg Oct 24, 2023
4441f32
add loss to scoring registry
priyakasimbeg Oct 24, 2023
5fd528a
fix index
priyakasimbeg Oct 24, 2023
0146ecb
add map to max eval metrics
priyakasimbeg Oct 24, 2023
a415e57
add blue to max eval metrics
priyakasimbeg Oct 24, 2023
0d836cf
remove print statement
priyakasimbeg Oct 24, 2023
8a4f8fb
debugging
priyakasimbeg Oct 24, 2023
d70f432
df
priyakasimbeg Oct 24, 2023
03ad1df
fix
priyakasimbeg Oct 24, 2023
8a937ee
fix
priyakasimbeg Oct 24, 2023
87e0762
verbosity
priyakasimbeg Oct 24, 2023
6723232
fix
priyakasimbeg Oct 24, 2023
e9d3c7d
debugging print statements
priyakasimbeg Oct 25, 2023
b4f6f8c
Merge branch 'dev' into CfS
fsschneider Oct 26, 2023
b616fd2
Merge branch 'CfS' of https://github.com/fsschneider/algorithmic-effi…
fsschneider Oct 26, 2023
4df526b
remove debugging print statements
priyakasimbeg Oct 30, 2023
152cf64
update fastmri targets (#548)
priyakasimbeg Oct 30, 2023
119f8d7
add flag for setting max split size
priyakasimbeg Oct 31, 2023
de45bf7
add documentation
priyakasimbeg Oct 31, 2023
fa23fe8
formatting
priyakasimbeg Oct 31, 2023
9b958c3
revert formatting
priyakasimbeg Oct 31, 2023
ec876fa
formatting
priyakasimbeg Nov 2, 2023
691e2c8
nits
priyakasimbeg Nov 2, 2023
4fe66bf
Merge pull request #559 from mlcommons/pytorch_flag
priyakasimbeg Nov 2, 2023
1cb02ca
remove tabulate import
priyakasimbeg Nov 2, 2023
8736a30
Merge pull request #558 from mlcommons/scoring_fixes
priyakasimbeg Nov 2, 2023
dfb7701
Update README.md
priyakasimbeg Nov 2, 2023
3b47594
Merge pull request #560 from mlcommons/readme_fixes
priyakasimbeg Nov 2, 2023
9bbd933
Remove tabulate requirement
runame Nov 3, 2023
ea2e7fc
Test warnings in get_experiment_df
runame Nov 3, 2023
74b961b
Add warnings when not all workloads or trials are present
runame Nov 3, 2023
c3a6f43
Fix bugs in scoring calculation
runame Nov 3, 2023
4151e09
Fix imports
runame Nov 3, 2023
931f71f
Merge pull request #476 from fsschneider/CfS
priyakasimbeg Nov 3, 2023
0943802
Merge pull request #561 from runame/scoring
priyakasimbeg Nov 3, 2023
d055102
Remove unused hparam from ogbg target-setting run config
runame Nov 4, 2023
d31489e
Merge branch 'dev' into UpdateTuningRules
fsschneider Nov 6, 2023
77c86c2
Merge pull request #528 from fsschneider/UpdateTuningRules
priyakasimbeg Nov 6, 2023
b8afd87
Merge pull request #562 from runame/minor-fix
priyakasimbeg Nov 6, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CALL_FOR_SUBMISSIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# MLCommons™ AlgoPerf: Call for Submissions

🚧 **Coming soon!** 🚧
141 changes: 92 additions & 49 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,28 @@
# Contributing
# MLCommons™ AlgoPerf: Contributing

## Table of Contents <!-- omit from toc -->

- [Setup](#setup)
- [Setting up a Linux VM on GCP](#setting-up-a-linux-vm-on-gcp)
- [Installing GPU Drivers](#installing-gpu-drivers)
- [Authentication for Google Cloud Container Registry](#authentication-for-google-cloud-container-registry)
- [Installation](#installation)
- [Docker workflows](#docker-workflows)
- [Pre-built Images on Google Cloud Container Registry](#pre-built-images-on-google-cloud-container-registry)
- [Trigger rebuild and push of maintained images](#trigger-rebuild-and-push-of-maintained-images)
- [Trigger build and push of images on other branch](#trigger-build-and-push-of-images-on-other-branch)
- [GCP Data and Experiment Integration](#gcp-data-and-experiment-integration)
- [Downloading Data from GCP](#downloading-data-from-gcp)
- [Saving Experiments to GCP](#saving-experiments-to-gcp)
- [Getting Information from a Container](#getting-information-from-a-container)
- [Mounting Local Repository](#mounting-local-repository)
- [Submitting PRs](#submitting-prs)
- [Testing](#testing)
- [Style Testing](#style-testing)
- [Unit and integration tests](#unit-and-integration-tests)
- [Regression tests](#regression-tests)

We invite everyone to look through our rules and codebase and submit issues and pull requests, e.g. for rules changes, clarifications, or any bugs you might encounter. If you are interested in contributing to the work of the working group and influence the benchmark's design decisions, please [join the weekly meetings](https://mlcommons.org/en/groups/research-algorithms/) and consider becoming a member of the working group.

The best way to contribute to the MLCommons is to get involved with one of our many project communities. You find more information about getting involved with MLCommons [here](https://mlcommons.org/en/get-involved/#getting-started).

Expand All @@ -8,29 +32,25 @@ To get started contributing code, you or your organization needs to sign the MLC

MLCommons project work is tracked with issue trackers and pull requests. Modify the project in your own fork and issue a pull request once you want other developers to take a look at what you have done and discuss the proposed changes. Ensure that cla-bot and other checks pass for your Pull requests.

# Table of Contents
- [Setup](#setup)
- [Installation](#installation)
- [Docker workflows](#docker-workflows)
- [Submitting PRs](#submitting-prs)
- [Testing](#testing)
## Setup

### Setting up a Linux VM on GCP

# Setup
## Setting up a Linux VM on GCP
If you want to run containers on GCP VMs or store and retrieve Docker images from the Google Cloud Container Registry, please read ahead.
If you'd like to use a Linux VM, you will have to install the correct GPU drivers and the NVIDIA Docker toolkit.
We recommmend to use the Deep Learning on Linux image. Further instructions are based on that.

### Installing GPU Drivers

You can use the `scripts/cloud-startup.sh` as a startup script for the VM. This will automate the installation of the NVIDIA GPU Drivers and NVIDIA Docker toolkit.

### Authentication for Google Cloud Container Registry

To access the Google Cloud Container Registry, you will have to authenticate to the repository whenever you use Docker.
Use the gcloud credential helper as documented [here](https://cloud.google.com/artifact-registry/docs/docker/pushing-and-pulling#cred-helper).

## Installation

# Installation
If you have not installed the package and dependencies yet see [Installation](./README.md#installation).

To use the development tools such as `pytest` or `pylint` use the `dev` option:
Expand All @@ -42,72 +62,75 @@ pre-commit install

To get an installation with the requirements for all workloads and development, use the argument `[full_dev]`.

## Docker workflows

We recommend developing in our Docker image to ensure a consistent environment between developing, testing and scoring submissions.

# Docker workflows
We recommend developing in our Docker image to ensure a consistent environment between developing, testing and scoring submissions.
To get started see also:

To get started see:
- [Installation with Docker](./README.md#docker)
- [Installation with Docker](./README.md#docker)
- [Running a submission inside a Docker Container](./getting_started.md#run-your-submission-in-a-docker-container)

Other resources:
- [Pre-built Images on Google Cloud Container Registry](#pre-built-images-on-google-cloud-container-registry)
- [GCP Data and Experiment Integration](#gcp-integration)
- [Downloading Data from GCP](#downloading-data-from-gcp)
- [Saving Experiments Results to GCP](#saving-experiments-to-gcp)
- [Getting Information from a Container](#getting-information-from-a-container)
- [Mounting local repository](#mounting-local-repository)
### Pre-built Images on Google Cloud Container Registry


## Pre-built Images on Google Cloud Container Registry
If you want to maintain or use images stored on our Google Cloud Container Registry read this section.
You will have to use an authentication helper to set up permissions to access the repository:
```

```bash
ARTIFACT_REGISTRY_URL=us-central1-docker.pkg.dev
gcloud auth configure-docker $ARTIFACT_REGISTRY_URL
```

To pull the latest prebuilt image:

```
```bash
docker pull us-central1-docker.pkg.dev/training-algorithms-external/mlcommons-docker-repo/<image_name>
```
The naming convention for `image_name` is `algoperf_<framework>_<branch>`.

The naming convention for `image_name` is `algoperf_<framework>_<branch>`.
Currently maintained images on the repository are:

- `algoperf_jax_main`
- `algoperf_pytorch_main`
- `algoperf_both_main`
- `algoperf_jax_dev`
- `algoperf_pytorch_dev`
- `algoperf_both_dev`

To reference the pulled image you will have to use the full `image_path`, e.g.
To reference the pulled image you will have to use the full `image_path`, e.g.
`us-central1-docker.pkg.dev/training-algorithms-external/mlcommons-docker-repo/algoperf_jax_main`.

### Trigger rebuild and push of maintained images

To build and push all images (`pytorch`, `jax`, `both`) on maintained branches (`dev`, `main`).
```

```bash
bash docker/build_docker_images.sh -b <branch>
```

#### Trigger build and push of images on other branch
You can also use the above script to build images from a different branch.

You can also use the above script to build images from a different branch.

1. Push the branch to `mlcommons/algorithmic-efficiency` repository.
2. Run
```

```bash
bash docker/build_docker_images.sh -b <branch>
```

## GCP Data and Experiment Integration
The Docker entrypoint script can transfer data to and from
### GCP Data and Experiment Integration

The Docker entrypoint script can transfer data to and from
our GCP buckets on our internal GCP project. If
you are an approved contributor you can get access to these resources to automatically download the datasets and upload experiment results.
you are an approved contributor you can get access to these resources to automatically download the datasets and upload experiment results.
You can use these features by setting the `--internal_contributor` flag to 'true' for the Docker entrypoint script.

### Downloading Data from GCP

To run a docker container that will only download data (if not found on host)
```

```bash
docker run -t -d \
-v $HOME/data/:/data/ \
-v $HOME/experiment_runs/:/experiment_runs \
Expand All @@ -120,15 +143,18 @@ docker run -t -d \
--keep_container_alive <keep_container_alive> \
--internal_contributor true
```

If `keep_container_alive` is `true` the main process on the container will persist after finishing the data download.
This run command is useful if you are developing or debugging.
This run command is useful if you are developing or debugging.

### Saving Experiments to GCP

If you set the internal collaborator mode to true
experiments will also be automatically uploaded to our GCP bucket under `gs://mlcommons-runs/<experiment_name`.

Command format
```

```bash
docker run -t -d \
-v $HOME/data/:/data/ \
-v $HOME/experiment_runs/:/experiment_runs \
Expand All @@ -146,27 +172,33 @@ docker run -t -d \
--internal_contributor true \
```

## Getting Information from a Container
### Getting Information from a Container

To find the container IDs of running containers
```

```bash
docker ps
```

To see the logging output
```

```bash
docker logs <container_id>
```

To enter a bash session in the container
```

```bash
docker exec -it <container_id> /bin/bash
```

## Mounting Local Repository
### Mounting Local Repository

Rebuilding the docker image can become tedious if
you are making frequent changes to the code.
To have changes in your local copy of the algorithmic-efficiency repo be reflected inside the container you can mount the local repository with the `-v` flag.
```
To have changes in your local copy of the algorithmic-efficiency repo be reflected inside the container you can mount the local repository with the `-v` flag.

```bash
docker run -t -d \
-v $HOME/data/:/data/ \
-v $HOME/experiment_runs/:/experiment_runs \
Expand All @@ -178,33 +210,40 @@ docker run -t -d \
--keep_container_alive true
```

# Submitting PRs
## Submitting PRs

New PRs will be merged on the dev branch by default, given that they pass the presubmits.

# Testing
## Testing

We run tests with GitHub Actions, configured in the [.github/workflows](https://github.com/mlcommons/algorithmic-efficiency/tree/main/.github/workflows) folder.

## Style Testing
### Style Testing

We run yapf and linting tests on PRs. You can view and fix offending errors with these instructions.

To run the below commands, use the versions installed via `pip install -e '.[dev]'`.

To automatically fix formatting errors, run the following (*WARNING:* this will edit your code, so it is suggested to make a git commit first!):

```bash
yapf -i -r -vv -p algorithmic_efficiency baselines datasets reference_algorithms tests *.py
```

To sort all import orderings, run the following:

```bash
isort .
```

To just print out all offending import orderings, run the following:

```bash
isort . --check --diff
```

To print out all offending pylint issues, run the following:

```bash
pylint algorithmic_efficiency
pylint baselines
Expand All @@ -218,16 +257,20 @@ pylint tests
We run unit tests and integration tests as part of the of github actions as well.
You can also use `python tests/reference_algorithm_tests.py` to run a single model update and two model evals for each workload using the reference algorithm in `reference_algorithms/target_setting_algorithms/`.

## Regression tests
### Regression tests

We also have regression tests available in [.github/workflows/regression_tests.yml](https://github.com/mlcommons/algorithmic-efficiency/tree/main/.github/workflows/regression_tests.yml) that can be run semi-automatically.
The regression tests are shorter end-to-end submissions run in a containerized environment across all 8 workloads, in both the jax and pytorch frameworks.
The regression tests are shorter end-to-end submissions run in a containerized environment across all 8 workloads, in both the jax and pytorch frameworks.
The regression tests run on self-hosted runners and are triggered for pull requests that target the main branch. Typically these PRs will be from the `dev` branch
so the tests will run containers based on images build from the `dev` branch.
To run a regression test:

1. Build and upload latest Docker images from dev branch.
```

```bash
bash ~/algorithmic-efficiency/docker/build_docker_images.sh -b dev
```

2. Turn on the self-hosted runner.
3. Run the self-hosted runner application for the runner to accept jobs.
4. Open a pull request into mian to trigger the workflow.
Loading