Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should I regenerate the savedmodel when I use a new project #10

Closed
Colibrow opened this issue Jun 7, 2021 · 22 comments
Closed

Should I regenerate the savedmodel when I use a new project #10

Colibrow opened this issue Jun 7, 2021 · 22 comments

Comments

@Colibrow
Copy link

Colibrow commented Jun 7, 2021

Hi @yundiqian, I have migrated the demo project to the chrome/v8 project and got 5% percent reduction of size in binary and I wanna know if I need to regenerate the saved model or use the exactly one generated by Fuchsia etc?

@Colibrow
Copy link
Author

Colibrow commented Jun 7, 2021

After tested, the origin saved model could not be reused, so close this!

@Colibrow Colibrow closed this as completed Jun 7, 2021
@mtrofin
Copy link
Collaborator

mtrofin commented Jun 7, 2021

When you say "the original saved model could not be reused", do you mean you could not build the compiler with it embedded, or its performance wasn't as good as the one of the model you trained on chrome/v8?

@Colibrow
Copy link
Author

Colibrow commented Jun 8, 2021

When you say "the original saved model could not be reused", do you mean you could not build the compiler with it embedded, or its performance wasn't as good as the one of the model you trained on chrome/v8?

That compilers okay, but the binary size is bigger the one which doesn't use the model so I am training the new model for my personal project.

@yundiqian
Copy link
Collaborator

I see, that is possible, Fuchsia code may be quite different from the v8 code so the model trained on fuchsia does not work well on v8

@Colibrow
Copy link
Author

Colibrow commented Jun 9, 2021

I see, that is possible, Fuchsia code may be quite different from the v8 code so the model trained on fuchsia does not work well on v8

@yundiqian @mtrofin
Sadly found that the model specify for the project didn't work and the so size was bigger than the one not trained. Is there any way to find which compile command influences the result? May I need to close the -flto flag?

@mtrofin
Copy link
Collaborator

mtrofin commented Jun 9, 2021

We don't support lto currently.

The model included with llvm is a reasonable reference, but we didn't use an overly comprehensive codebase when we trained it; that's why Fuchsia, for example, builds their own, which holds up well over time (as their codebase and as the compiler evolve).

@Colibrow
Copy link
Author

Colibrow commented Jun 9, 2021

We don't support lto currently.

The model included with llvm is a reasonable reference, but we didn't use an overly comprehensive codebase when we trained it; that's why Fuchsia, for example, builds their own, which holds up well over time (as their codebase and as the compiler evolve).

So if I need to regenerate the model with llvm which disable the lto?

@mtrofin
Copy link
Collaborator

mtrofin commented Jun 9, 2021

Yes, when training your own model, disable lto, and (of course) make sure you're passing -Oz to clang.

@Colibrow
Copy link
Author

Colibrow commented Jun 9, 2021

I'll try it.Thanks!

@yundiqian
Copy link
Collaborator

Hi @yundiqian, I have migrated the demo project to the chrome/v8 project and got 5% percent reduction of size in binary and I wanna know if I need to regenerate the saved model or use the exactly one generated by Fuchsia etc?

I'm a little confused, to be clear, which model caused 5% percent reduction of size on which binary?

@Colibrow
Copy link
Author

Colibrow commented Jun 9, 2021

Hi @yundiqian, I have migrated the demo project to the chrome/v8 project and got 5% percent reduction of size in binary and I wanna know if I need to regenerate the saved model or use the exactly one generated by Fuchsia etc?

I'm a little confused, to be clear, which model caused 5% percent reduction of size on which binary?

emm.. I have tried three projects using the ml-compiler-opt and got 7% size-reduction on Fuchsia demo, 5% (trained 100*2000 out of consideration of time)on chrome/v8 build and -2% on my personal project (-.-)
I am retraining the third model cuz I trained it with -flto last time.
plus.I am migrating the model in an Android CMake project which the strip-binary-size is 1.7MB or so

@yundiqian
Copy link
Collaborator

Hi @yundiqian, I have migrated the demo project to the chrome/v8 project and got 5% percent reduction of size in binary and I wanna know if I need to regenerate the saved model or use the exactly one generated by Fuchsia etc?

I'm a little confused, to be clear, which model caused 5% percent reduction of size on which binary?

emm.. I have tried three projects using the ml-compiler-opt and got 7% size-reduction on Fuchsia demo, 5% (trained 100*2000 out of time consideration)on chrome/v8 build and -2% on my personal project (-.-)
I am retraining the third model cuz I trained it with -flto last time.
plus.I am migrating the model in an Android CMake project which the strip-binary-size is 1.7MB or so

got it, so it's 3 projects instead of 2 projects :) Is the "Android CMake project which the strip-binary-size is 1.7MB or so" the 4th project different from your personal projects?

In addition to retraining without -flto, you can also try our model included in llvm --- this is a model that we found generalizable across SPEC, so probably generalizable to your project as well.

@Colibrow
Copy link
Author

Colibrow commented Jun 9, 2021

Okay

In addition to retraining without -flto, you can also try our model included in llvm --- this is a model that we found generalizable across SPEC, so probably generalizable to your project as well.

I will try it ASAP. So many thanks~

@Colibrow
Copy link
Author

Colibrow commented Jun 9, 2021

In addition to retraining without -flto, you can also try our model included in llvm --- this is a model that we found generalizable across SPEC, so probably generalizable to your project as well.

Unfortunately, the size after trained is bigger than the origin one which applies -flto -faddrsig/-flto -Wl,-z,norelro,-z,lazy,--icf=all.. about 3% or so.

@mtrofin
Copy link
Collaborator

mtrofin commented Jun 9, 2021

To make sure I understand: you trained a model on your project (without lto, but -Oz); and then built with that model. (also without lto, and with -Oz)

How does that size compare to all other options being the same, except building with the default heuristic?

@Colibrow
Copy link
Author

Colibrow commented Jun 9, 2021

Here is the approaches:

  1. normally build the project with nothing changed : binary is 1705 kb
  2. now delete the -flto with other flags not changed: binary is 1755 kb
  3. build the llvm with the latest model in llvm-project with flag(LLVM_ENABLE_LTO false ) and (TENSORFLOW_AOT_PATH)
    then delete -flto, add -mllvm -enable-ml-inliner=release then got the binary is 1823kb

@Colibrow
Copy link
Author

Colibrow commented Jun 9, 2021

I haven't build the specify model now because of it needs a lot of time to train and if it's done, I'll post result here~

@mtrofin
Copy link
Collaborator

mtrofin commented Jun 9, 2021

I see now - thanks!

(fwiw - LLVM_ENABLE_LTO can be enabled for clang - just no -flto for your project)

@Colibrow
Copy link
Author

FYI, I've tested my personal project twice with SPEC model or not and found that the specific model is better than the SPEC but still worse than the origin one.
Here is some data(kilobytes):
origin: 1712312
close flto: 1763256
close flto and use specify mode(enable-ml-inliner): 1757576
use flto and use specify model: 1716344

@yundiqian
Copy link
Collaborator

hmm...interesting, we need to look into what happens during training to debug.

Can you share your log file during training with tensorboard.dev following the instructions here: https://tensorboard.dev/#get-started? (basically running two command lines)

When running "tensorboard dev upload --logdir logs..." , set the logdir flag to be the root_dir flag you use when running train_locally.py

@Colibrow
Copy link
Author

hmm...interesting, we need to look into what happens during training to debug.

Can you share your log file during training with tensorboard.dev following the instructions here: https://tensorboard.dev/#get-started? (basically running two command lines)

When running "tensorboard dev upload --logdir logs..." , set the logdir flag to be the root_dir flag you use when running train_locally.py

Okay, I'll try it.

@Colibrow
Copy link
Author

I've also tried the cronet project and the reduction is also not obvious... I doubt that if my training process is wrong?
Here is the detail when I was applying the model https://gist.github.com/Colibrow/9d2b31bc7eff127cfe74c807fce86451
And I found using flto may reduce more than applying the trained model single...And I will post the log file later~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants