New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fx/tuning] tune performance on rotor with meta info. #1599
[fx/tuning] tune performance on rotor with meta info. #1599
Conversation
@@ -50,7 +54,7 @@ def _is_sink() -> bool: | |||
bool | |||
""" | |||
|
|||
return not sum([v for _, v in deps.items()]) | |||
return not sum([v for _, v in deps.items()]) and not any(map(is_inplace, n.users)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a simple example here to show the different between new linearize and older version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[input] 15 15 15
[conv1] 78 78 78
[bn1, relu] 78 78 78
[maxpool] 20 78 59
[layer1_0_conv1, layer1_0_bn1, layer1_0_relu, layer1_0_conv2, layer1_0_bn2, add, layer1_0_relu_1] 20 78 59
[layer1_1_conv1, layer1_1_bn1, layer1_1_relu, layer1_1_conv2, layer1_1_bn2, add_1, layer1_1_relu_1] 20 78 39
[layer2_0_conv1, layer2_0_bn1, layer2_0_relu, layer2_0_conv2, layer2_0_bn2, layer2_0_downsample_0, layer2_0_downsample_1, add_2, layer2_0_relu_1] 10 49 30
[layer2_1_conv1, layer2_1_bn1, layer2_1_relu, layer2_1_conv2, layer2_1_bn2, add_3, layer2_1_relu_1] 10 39 20
[layer3_0_conv1, layer3_0_bn1, layer3_0_relu, layer3_0_conv2, layer3_0_bn2, layer3_0_downsample_0, layer3_0_downsample_1, add_4, layer3_0_relu_1] 5 25 15
[layer3_1_conv1, layer3_1_bn1, layer3_1_relu, layer3_1_conv2, layer3_1_bn2, add_5, layer3_1_relu_1] 5 20 10
[layer4_0_conv1, layer4_0_bn1, layer4_0_relu, layer4_0_conv2, layer4_0_bn2, layer4_0_downsample_0, layer4_0_downsample_1, add_6, layer4_0_relu_1] 3 13 12
[layer4_1_conv1, layer4_1_bn1, layer4_1_relu, layer4_1_conv2, layer4_1_bn2, add_7, layer4_1_relu_1] 0 8 3
[avgpool] 0 0 1
[flatten] 1 1 1
[fc] 1 1 0
[input] 15 15 15
[conv1] 78 78 78
[bn1] 0 0 0
[relu] 78 78 78
[maxpool] 20 78 78
[layer1_0_conv1, layer1_0_bn1, layer1_0_relu, layer1_0_conv2, layer1_0_bn2, add] 0 58 0
[layer1_0_relu_1] 20 20 78
[layer1_1_conv1, layer1_1_bn1, layer1_1_relu, layer1_1_conv2, layer1_1_bn2, add_1] 0 58 0
[layer1_1_relu_1] 20 20 49
[layer2_0_conv1, layer2_0_bn1, layer2_0_relu, layer2_0_conv2, layer2_0_bn2, layer2_0_downsample_0, layer2_0_downsample_1, add_2] 0 39 0
[layer2_0_relu_1] 10 10 39
[layer2_1_conv1, layer2_1_bn1, layer2_1_relu, layer2_1_conv2, layer2_1_bn2, add_3] 0 29 0
[layer2_1_relu_1] 10 10 25
[layer3_0_conv1, layer3_0_bn1, layer3_0_relu, layer3_0_conv2, layer3_0_bn2, layer3_0_downsample_0, layer3_0_downsample_1, add_4] 0 20 0
[layer3_0_relu_1] 5 5 20
[layer3_1_conv1, layer3_1_bn1, layer3_1_relu, layer3_1_conv2, layer3_1_bn2, add_5] 0 15 0
[layer3_1_relu_1] 5 5 13
[layer4_0_conv1, layer4_0_bn1, layer4_0_relu, layer4_0_conv2, layer4_0_bn2, layer4_0_downsample_0, layer4_0_downsample_1, add_6] 0 10 0
[layer4_0_relu_1] 3 3 12
[layer4_1_conv1, layer4_1_bn1, layer4_1_relu, layer4_1_conv2, layer4_1_bn2, add_7] 0 8 0
[layer4_1_relu_1] 0 0 3
[avgpool] 0 0 1
[flatten] 1 1 1
[fc] 1 1 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great Work!
@@ -381,7 +326,7 @@ def solver_rotor(gm: ColoGraphModule, | |||
mem_limit: int, | |||
mem_slots: int = 500, | |||
cnode: List[str] = None, | |||
eps: float = 0.02) -> ColoGraphModule: | |||
eps: float = 0.0) -> ColoGraphModule: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should eps be a very small but non-zero value? e.g. 1e-6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so 0.0
means no memory decay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the default setting is 0.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the memory decay is calculated by
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the eps will be something around 0.05 or less, 1e-6
is too small as the memory will be discretized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So actually decay is unnecessary if i can estimate the memory accurately.
This can be removed in future if I have tested performance of all models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think eps is ok, you can just provide the equation for memory decay in line 338 to explain how eps affect memory decay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this option could be provided for user as we might not be able to catch up with all the models in reality, so there might be some cases our meta info provides bad estimations. With this option the user might be able to tune the solver if necessary.
What's new?
Phase.LOSS
in op level estimation. Use torch.autograd.backward() instead.Tests
Resnet
Densenet