New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize fast pipeline tests with PipelineTestMixin #1526
Conversation
The documentation is not available anymore as the PR was closed or merged. |
tests/test_pipelines_common.py
Outdated
max_diff = np.abs(output - output_loaded).max() | ||
self.assertLessEqual(max_diff, 1e-5) | ||
|
||
def test_tuple_output(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! That's very good for now ! In the future we could expand this test to also automatically make sure that not only the first tuple is the same but all the others now.
For now this would e.g. apply to the stable diffusion pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the approach in principle, I think it works to simplify and make tests coherent!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Design looks nice to me! Just think we could add some more common tests. Note the more tests we add here the more "safety" we will get for free. Some additional tests we could add:
- We can for now assume that every pipeline has a
num_inference_steps
parameter IMO. I'd also add a test that runs every pipeline with 2,3 differentnum_inference_steps
and makes sure that a) output sizes are always the same, b) speed is less when less steps, ... - from_pretrained from the hub and locally -> IMO every pipeline should also define a parameter
dummy_components_on_hub
in addition toget_common_pipeline_components()
this dummy components could then be used to check that loading from hub saving loading same locally gives same results - use
dummy_components_on_hub
and add aEXPECTED_SLICE
to create a common test that makes sure that dummy weights on the Hub give expected results. This should help us to get rid of a lot of boiler plate fast tests that check for numerical equivalence. - The
def components
function can/should also be tested here. E.g. just check thatcomponents
is the same asget_commen_pipeline_components
and also the same as the init signature without the optional arguments. - enabling disabling the progress bar can be checked
- Load/save with safetensors
=> Contrary to src/diffusers
I'm really happy to go full abstraction mode here, so the more "common" tests we add here the faster we'll be able to add new pipelines going forward.
Also, some signature tests would be nice e.g. for now we could maybe "force" every pipeline to have a |
Added the requested tests, except for:
|
output_loaded = pipe_loaded(**inputs)[0] | ||
|
||
max_diff = np.abs(output - output_loaded).max() | ||
self.assertLess(max_diff, 3e-3, "The output of the fp16 pipeline changed after saving and loading.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might have an opportunity to make fp16 more stable somewhere, the error of about 0.002
even when running the same pipeline twice (without saving) seems suspicious to me
# TODO: address the non-deterministic text encoder (fails for save-load tests) | ||
# torch.manual_seed(0) | ||
# text_encoder_config = RobertaSeriesConfig( | ||
# hidden_size=32, | ||
# project_dim=32, | ||
# intermediate_size=37, | ||
# layer_norm_eps=1e-05, | ||
# num_attention_heads=4, | ||
# num_hidden_layers=5, | ||
# vocab_size=5002, | ||
# ) | ||
# text_encoder = RobertaSeriesModelWithTransformation(text_encoder_config) | ||
|
||
torch.manual_seed(0) | ||
config = RobertaSeriesConfig( | ||
text_encoder_config = CLIPTextConfig( | ||
bos_token_id=0, | ||
eos_token_id=2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patil-suraj the AltDiffusion pipeline produces non-matching outputs if I run it with the same inputs twice. Replacing RobertaSeriesModelWithTransformation
with CLIPTextModel
helped, so it's probably that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm weird, could you maybe open a seperate issue for this? I can look into it :-) We should test with RobertaSeriesConfig
here IMO :-)
def test_save_load_local(self): | ||
if torch_device == "mps" and self.pipeline_class in ( | ||
DanceDiffusionPipeline, | ||
CycleDiffusionPipeline, | ||
StableDiffusionImg2ImgPipeline, | ||
): | ||
# FIXME: inconsistent outputs on MPS | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @pcuenca I might need your expertise with these, whenever you have time 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline let's maybe open an issue here. But it's very nice that we spotted these bugs :-)
|
||
torch.backends.cuda.matmul.allow_tf32 = False | ||
|
||
|
||
class KarrasVePipelineFastTests(PipelineTesterMixin, unittest.TestCase): | ||
class KarrasVePipelineFastTests(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guess this whole class needs an API update no? Could we maybe also open an issue for this?
|
||
@property | ||
def pipeline_class(self) -> Union[Callable, DiffusionPipeline]: | ||
raise NotImplementedError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice! This forces new models to be nicely tested :-)
max_diff = np.abs(output_with_offload - output_without_offload).max() | ||
self.assertLess(max_diff, 1e-5, "XFormers attention should not affect the inference results") | ||
|
||
def test_progress_bar(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
@@ -9,4 +27,347 @@ class PipelineTesterMixin: | |||
equivalence of dict and tuple outputs, etc. | |||
""" | |||
|
|||
pass | |||
# set these parameters to False in the child class if the pipeline does not support the corresponding functionality | |||
test_attention_slicing = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very cool to reuse transformers mechanism here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Let's merge it!
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
) * [WIP] Standardize fast pipeline tests with PipelineTestMixin * refactor the sd tests a bit * add more common tests * add xformers * add progressbar test * cleanup * upd fp16 * CycleDiffusionPipelineFastTests * DanceDiffusionPipelineFastTests * AltDiffusionPipelineFastTests * StableDiffusion2PipelineFastTests * StableDiffusion2InpaintPipelineFastTests * StableDiffusionImageVariationPipelineFastTests * StableDiffusionImg2ImgPipelineFastTests * StableDiffusionInpaintPipelineFastTests * remove unused mixins * quality * add missing inits * try to fix mps tests * fix mps tests * add mps warmups * skip for some pipelines * style * Update tests/test_pipelines_common.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
) * [WIP] Standardize fast pipeline tests with PipelineTestMixin * refactor the sd tests a bit * add more common tests * add xformers * add progressbar test * cleanup * upd fp16 * CycleDiffusionPipelineFastTests * DanceDiffusionPipelineFastTests * AltDiffusionPipelineFastTests * StableDiffusion2PipelineFastTests * StableDiffusion2InpaintPipelineFastTests * StableDiffusionImageVariationPipelineFastTests * StableDiffusionImg2ImgPipelineFastTests * StableDiffusionInpaintPipelineFastTests * remove unused mixins * quality * add missing inits * try to fix mps tests * fix mps tests * add mps warmups * skip for some pipelines * style * Update tests/test_pipelines_common.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
) * [WIP] Standardize fast pipeline tests with PipelineTestMixin * refactor the sd tests a bit * add more common tests * add xformers * add progressbar test * cleanup * upd fp16 * CycleDiffusionPipelineFastTests * DanceDiffusionPipelineFastTests * AltDiffusionPipelineFastTests * StableDiffusion2PipelineFastTests * StableDiffusion2InpaintPipelineFastTests * StableDiffusionImageVariationPipelineFastTests * StableDiffusionImg2ImgPipelineFastTests * StableDiffusionInpaintPipelineFastTests * remove unused mixins * quality * add missing inits * try to fix mps tests * fix mps tests * add mps warmups * skip for some pipelines * style * Update tests/test_pipelines_common.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
PipelineTesterMixin