Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[util] save intermediate tests & fallback to fuzz.crash_safe=false #58

Merged
merged 5 commits into from
Oct 3, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/known-issues.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Incompatibility of TensorFlow-GPU over fork-based crash safty

Currently we enabled `fuzz.crash_safe=true` by default where we run the compilation & execution in a forked process as a sandbox to catch crash and timeout. However, CUDA runtime is not compatible with fork. In tensorflow, the symptom is crash in forked subprocess:
`fuzz.crash_safe=true` allows running compilation & execution in a forked process as a sandbox to catch crash and timeout. However, CUDA runtime is not compatible with fork. In tensorflow, the symptom is crash in forked subprocess:

```txt
F tensorflow/stream_executor/cuda/cuda_driver.cc:219] Failed setting context: CUDA_ERROR_NOT_INITIALIZED: initialization error
Expand Down
12 changes: 12 additions & 0 deletions nnsmith/cli/fuzz.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,11 @@ def __init__(
self.timeout_s, int
), "`fuzz.time` must be an integer (with `s` (default), `m`/`min`, or `h`/`hr`)."

self.save_test = cfg["fuzz"]["save_test"]
if isinstance(self.save_test, str): # path of root dir.
FUZZ_LOG.info(f"Saving all intermediate testcases to {self.save_test}")
mkdir(self.save_test)

def make_testcase(self, seed) -> TestCase:
mgen_cfg = self.cfg["mgen"]
gen = random_model_gen(
Expand Down Expand Up @@ -210,6 +215,13 @@ def run(self):

if not self.validate_and_report(testcase):
FUZZ_LOG.warning(f"Failed model seed: {seed}")

if self.save_test:
testcase_dir = os.path.join(
self.save_test, f"{time.time() - start_time:.3f}"
)
mkdir(testcase_dir)
testcase.dump(testcase_dir)
self.status.n_testcases += 1


Expand Down
9 changes: 5 additions & 4 deletions nnsmith/config/main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -50,17 +50,18 @@ fuzz:
time: 14400
root: "???"
seed: null
crash_safe: true
test_timeout: null # second.
crash_safe: false
test_timeout: null
save_test: null

filter:
type: []
patch: []

cmp:
equal_nan: true # skip regarding it as a bug if with fp exception values.
equal_nan: true # skip regarding it as a bug if with fp exception values.

raw_input: null # path to raw input data (Dict[str, np.ndarray])
raw_input: null # path to raw input data (Dict[str, np.ndarray])

oracle: "auto"
# "auto": use `oracle.pkl` in local path;
Expand Down
15 changes: 11 additions & 4 deletions nnsmith/materialize/torch/forward.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,17 @@

@operator_impl(Constant)
def forward_fn(op: Constant):
data = torch.randn(op.abs_tensor.shape).to(op.abs_tensor.dtype.torch())
return lambda: torch.nn.parameter.Parameter(
data, requires_grad=data.is_floating_point()
)
class ConstFn(torch.nn.Module):
def __init__(self, data) -> None:
super().__init__()
self.data = torch.nn.parameter.Parameter(
data, requires_grad=data.is_floating_point()
)

def forward(self):
return self.data

return ConstFn(torch.randn(op.abs_tensor.shape).to(op.abs_tensor.dtype.torch()))


@operator_impl(ReLU)
Expand Down