Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
0f608e5
Add parameter type for fused KV attention
runame Aug 15, 2023
4187b0c
Refactor MultiheadAttention module
runame Aug 15, 2023
4490c8e
Fix pylint and yapf
runame Aug 15, 2023
49292e4
Fix for old yapf version
runame Aug 15, 2023
12578fd
Fix param shapes test for fused attention layers in PyTorch WMT workload
runame Aug 15, 2023
8d5463a
Fix param types test for fused attention layers in PyTorch WMT workload
runame Aug 16, 2023
7308c91
Fix arrangement of elements for attention
runame Aug 16, 2023
8d99d81
Merge branch 'dev' into wmt-speed
runame Aug 16, 2023
04ffcc0
Merge branch 'dev' into wmt-speed
runame Aug 29, 2023
dc92fed
Remove redundant dtype conversions
runame Aug 29, 2023
4a2b58a
update datasetup documentation
priyakasimbeg Aug 30, 2023
14089b4
fastmri datasetup fix
priyakasimbeg Aug 30, 2023
1f12ac4
make data_dir consistent
priyakasimbeg Aug 30, 2023
d43264c
fix fastmri datasetup
priyakasimbeg Aug 30, 2023
a5f42c2
fix end str
priyakasimbeg Aug 30, 2023
8a73022
fix extract method
priyakasimbeg Aug 30, 2023
426a5b8
Add torch.cuda.synchronize() to profiler
runame Aug 30, 2023
96361d3
change librispeech datasetup folder names
priyakasimbeg Aug 31, 2023
d45d7bf
imagenet debugging
priyakasimbeg Aug 31, 2023
66395f6
imagenet fix
priyakasimbeg Aug 31, 2023
239cb5b
download fix
priyakasimbeg Aug 31, 2023
6bd1087
imagenet download fix
priyakasimbeg Aug 31, 2023
9a2c6d2
remove expand user from download_url
priyakasimbeg Aug 31, 2023
8c07812
fix
priyakasimbeg Aug 31, 2023
86ed7de
string fix
priyakasimbeg Aug 31, 2023
5f9005e
debugging
priyakasimbeg Aug 31, 2023
519cd5d
fix expanduser condition
priyakasimbeg Aug 31, 2023
541ce57
remove set resource limit
priyakasimbeg Aug 31, 2023
2cb78b1
formatting
priyakasimbeg Sep 1, 2023
0c33882
formatting
priyakasimbeg Sep 1, 2023
3c418e7
move imagenet_v2 folder
priyakasimbeg Sep 1, 2023
024b7e4
update librispeech instructions
priyakasimbeg Sep 1, 2023
fc1a6a8
documentation fix
priyakasimbeg Sep 1, 2023
254fdb9
fix
priyakasimbeg Sep 1, 2023
ae93bce
undo unintional criteo download changes
priyakasimbeg Sep 1, 2023
e65822c
fix
priyakasimbeg Sep 2, 2023
260bfba
Update README.md
priyakasimbeg Sep 2, 2023
8c44fa5
Merge pull request #506 from mlcommons/workflow_updates
priyakasimbeg Sep 2, 2023
9ecc7cf
critero traindiff fix
chandramouli-sastry Sep 7, 2023
711c579
Merge branch 'mlcommons:dev' into dev
chandramouli-sastry Sep 7, 2023
de74743
style fix
chandramouli-sastry Sep 7, 2023
11997e9
fixes
chandramouli-sastry Sep 7, 2023
6055457
Merge pull request #489 from runame/wmt-speed
znado Sep 7, 2023
722a874
Merge pull request #507 from chandramouli-sastry/dev
znado Sep 7, 2023
c15adb0
fix wmt comparator
chandramouli-sastry Sep 9, 2023
ec96fbe
comparator fix
chandramouli-sastry Sep 9, 2023
04a8380
fix arg for deletion prompt
priyakasimbeg Sep 13, 2023
2f76cb9
Simplify pad function
runame Sep 13, 2023
1a3679d
move delete prompt to end of criteo download
priyakasimbeg Sep 13, 2023
35c8736
Always pad to global_batch_size when it is provided
runame Sep 14, 2023
ad64fd1
Fix pad_size in pad function
runame Sep 14, 2023
efdd670
librispeech processing
priyakasimbeg Sep 14, 2023
26713bc
fix
priyakasimbeg Sep 14, 2023
f3881da
librispeech fix
priyakasimbeg Sep 14, 2023
fd710ab
syntax fix
priyakasimbeg Sep 14, 2023
fa08626
fix
priyakasimbeg Sep 14, 2023
e9119b9
documentation
priyakasimbeg Sep 14, 2023
b629f3b
Merge pull request #508 from chandramouli-sastry/dev
priyakasimbeg Sep 15, 2023
ae9d46f
typo fix
priyakasimbeg Sep 15, 2023
241e546
add test-other counts to librispeech preprocessing
priyakasimbeg Sep 15, 2023
c48d3a9
Merge pull request #510 from mlcommons/setup_debugging
znado Sep 18, 2023
df542c2
add rng_seed flag and save seed to metadata
priyakasimbeg Sep 18, 2023
5ff2ec2
fix
priyakasimbeg Sep 18, 2023
df01623
fix
priyakasimbeg Sep 19, 2023
87ecd5b
debug
priyakasimbeg Sep 19, 2023
828765c
fix
priyakasimbeg Sep 19, 2023
f861353
lint fix
priyakasimbeg Sep 19, 2023
18a8c20
pylint
priyakasimbeg Sep 19, 2023
25b05b8
formatting
priyakasimbeg Sep 19, 2023
d54b866
pass rng_seed arg for self-tuning submission as well
priyakasimbeg Sep 20, 2023
a7b60fa
pin ml_dytpes version
priyakasimbeg Sep 21, 2023
d3fcbb6
Merge pull request #514 from mlcommons/rng_seed_flag
priyakasimbeg Sep 21, 2023
a6d06df
Merge pull request #515 from runame/fix-padding
priyakasimbeg Sep 21, 2023
33a8a9f
add guards for cuda context initializion
priyakasimbeg Sep 21, 2023
92b2d1d
Merge pull request #519 from mlcommons/profiler_oom_fix
priyakasimbeg Sep 22, 2023
cc8b820
minor
pomonam Sep 25, 2023
86ad0af
Add num_batch configs
pomonam Sep 25, 2023
ae3587d
Merge pull request #520 from mlcommons/juhan/criteo_size_fix
priyakasimbeg Sep 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ See instructions [here](https://github.com/NVIDIA/nvidia-docker).

### Running Docker Container (Interactive)
To use the Docker container as an interactive virtual environment, you can run a container mounted to your local data and code directories and execute the `bash` program. This may be useful if you are in the process of developing a submission.
1. Run detached Docker Container. The container_id will be printed if the container is run successfully.
1. Run detached Docker Container. The `container_id` will be printed if the container is running successfully.
```bash
docker run -t -d \
-v $HOME/data/:/data/ \
Expand All @@ -122,7 +122,7 @@ To use the Docker container as an interactive virtual environment, you can run a
-v $HOME/algorithmic-efficiency:/algorithmic-efficiency \
--gpus all \
--ipc=host \
<docker_image_name>
<docker_image_name> \
--keep_container_alive true
```
2. Open a bash terminal
Expand Down
31 changes: 15 additions & 16 deletions algorithmic_efficiency/data_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,15 @@ def shard_and_maybe_pad_np(
inputs = batch['inputs']
current_batch_size = inputs[0].shape[0] if isinstance(
inputs, tuple) else inputs.shape[0]
if global_batch_size is not None:
assert global_batch_size >= current_batch_size, \
'global_batch_size must be larger than or equal to current_batch_size.'
# Always pad to global_batch_size if it is provided.
pad_to_global_batch_size = global_batch_size > current_batch_size
else:
pad_to_global_batch_size = False
remainder_size = current_batch_size % local_device_count
if remainder_size != 0:
if remainder_size != 0 or pad_to_global_batch_size:
if global_batch_size is not None:
pad_size = global_batch_size - current_batch_size
else:
Expand All @@ -50,8 +57,8 @@ def _prepare(x):
x = x._numpy() # pylint: disable=protected-access

# Pad if remainder_size != 0 (should only be possible during evaluation).
if remainder_size != 0:
x = pad(x, pad_size, 'jax', padding_value=padding_value)
if remainder_size != 0 or pad_to_global_batch_size:
x = pad(x, pad_size, padding_value=padding_value)

# Reshape (global_batch_size, ...) to
# (local_device_count, per_device_batch_size, ...).
Expand All @@ -61,21 +68,13 @@ def _prepare(x):
return jax.tree_map(_prepare, batch)


def pad(tensor: spec.Tensor,
def pad(tensor: np.ndarray,
pad_size: int,
framework: str,
padding_value: int = 0) -> spec.Tensor:
if len(tensor) > 1:
padding_value: int = 0) -> np.ndarray:
if tensor.ndim > 1:
pad_size = (pad_size, *tensor.shape[1:])
if framework == 'pytorch':
padding = torch.full(
pad_size, padding_value, dtype=tensor.dtype, device=tensor.device)
padded_tensor = torch.cat((tensor, padding), dim=0)
elif framework == 'jax':
padding = np.full(pad_size, padding_value, dtype=tensor.dtype)
padded_tensor = np.concatenate((tensor, padding), axis=0)
else:
raise ValueError(f'Framework has to be pytorch or jax, but is {framework}.')
padding = np.full(pad_size, padding_value, dtype=tensor.dtype)
padded_tensor = np.concatenate((tensor, padding), axis=0)
return padded_tensor


Expand Down
6 changes: 6 additions & 0 deletions algorithmic_efficiency/logger_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -275,6 +275,12 @@ def get_meta_data(workload: spec.Workload) -> dict:
return meta_data


def save_meta_data(workload: spec.Workload, rng_seed: int, meta_file_name: str):
meta_data = get_meta_data(workload)
meta_data.update({'rng_seed': rng_seed})
write_json(meta_file_name, meta_data)


class MetricLogger(object):
"""Used to log all measurements during training.

Expand Down
6 changes: 4 additions & 2 deletions algorithmic_efficiency/param_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,10 @@ def pytorch_param_types(
elif 'attn' in name or 'attention' in name:
if 'bias' in name:
param_types[name] = spec.ParameterType.ATTENTION_BIAS
elif 'in_proj' in name:
param_types[name] = spec.ParameterType.ATTENTION_QKV
elif 'kv_proj' in name:
param_types[name] = spec.ParameterType.ATTENTION_KV
elif 'k_proj' in name or 'key' in name:
param_types[name] = spec.ParameterType.ATTENTION_K
elif 'q_proj' in name or 'query' in name:
Expand All @@ -51,8 +55,6 @@ def pytorch_param_types(
param_types[name] = spec.ParameterType.ATTENTION_OUT
elif 'scale' in name:
param_types[name] = spec.ParameterType.WEIGHT
elif 'in_proj_weight' in name:
param_types[name] = spec.ParameterType.ATTENTION_QKV
else:
raise ValueError(f'Unrecognized attention parameter: {name}.')
elif 'bias' in name:
Expand Down
15 changes: 11 additions & 4 deletions algorithmic_efficiency/profiler.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,13 @@
from typing import Dict, Generator, List, Optional, Tuple

import numpy as np
import torch


def _get_monotonic_time() -> float:
if torch.cuda.is_available() and torch.cuda.is_initialized():
torch.cuda.synchronize()
return time.monotonic()


class Profiler:
Expand All @@ -20,7 +27,7 @@ def __init__(self, local_rank: Optional[int] = None) -> None:

self.current_actions: Dict[str, float] = {}
self.recorded_durations = defaultdict(list)
self.start_time = time.monotonic()
self.start_time = _get_monotonic_time()

def set_local_rank(self, local_rank: int) -> None:
self._local_rank = local_rank
Expand All @@ -35,12 +42,12 @@ def start(self, action_name: str) -> None:
if action_name in self.current_actions:
raise ValueError(
f'Attempted to start {action_name} which has already started.')
self.current_actions[action_name] = time.monotonic()
self.current_actions[action_name] = _get_monotonic_time()

def stop(self, action_name: str) -> None:
if self.local_rank != 0:
pass
end_time = time.monotonic()
end_time = _get_monotonic_time()
if action_name not in self.current_actions:
raise ValueError(f'Attempting to stop recording an action '
f'({action_name}) which was never started.')
Expand All @@ -59,7 +66,7 @@ def profile(self, action_name: str) -> Generator:
def _make_report(
self
) -> Tuple[List[Tuple[str, float, float, int, float, float]], int, float]:
total_duration = time.monotonic() - self.start_time
total_duration = _get_monotonic_time() - self.start_time
report = [(str(a),
float(np.mean(d)),
float(np.std(d)),
Expand Down
7 changes: 4 additions & 3 deletions algorithmic_efficiency/spec.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,10 @@ class ParameterType(enum.Enum):
ATTENTION_V = 10
ATTENTION_OUT = 11
ATTENTION_QKV = 12 # This is used for implementations that fuse QKV together.
# We need to split this out because otherwise fused QKV models will have a
# different number of biases.
ATTENTION_BIAS = 13
ATTENTION_KV = 13 # This is used for implementations that fuse KV together.
# We sometimes need to split this out because otherwise fused models will have
# a different number of biases.
ATTENTION_BIAS = 14


# Of course, Tensor knows its shape and dtype.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,7 @@ def _eval_batch(self,
summed_loss = self.loss_fn(
label_batch=batch['targets'], logits_batch=logits,
mask_batch=weights)['summed']
return summed_loss
return summed_loss.to(dtype=torch.float64)


class Criteo1TbDlrmSmallTestWorkload(Criteo1TbDlrmSmallWorkload):
Expand Down
4 changes: 2 additions & 2 deletions algorithmic_efficiency/workloads/criteo1tb/workload.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,11 +63,11 @@ def num_eval_train_examples(self) -> int:

@property
def num_validation_examples(self) -> int:
return 89_000_000
return 83_274_637

@property
def num_test_examples(self) -> int:
return 89_274_637
return 95_000_000

@property
def train_mean(self):
Expand Down
Loading