Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLGO] Upstream the corpus extraction tooling #72319

Merged

Conversation

boomanaiden154
Copy link
Contributor

@boomanaiden154 boomanaiden154 commented Nov 14, 2023

This patch upstreams some of the MLGO utilities, particularly the corpus extraction tooling, into LLVM proper. The motivation for this patch is available in the RFC.

https://discourse.llvm.org/t/rfc-upstreaming-elements-of-the-mlgo-tooling/74939

Copy link

github-actions bot commented Nov 14, 2023

✅ With the latest revision this PR passed the Python code formatter.

@boomanaiden154 boomanaiden154 marked this pull request as ready for review January 15, 2024 06:35
@boomanaiden154
Copy link
Contributor Author

After this lands, my plan is to work on getting CI up and running, both to run testing and also to publish the package.

Copy link
Member

@mtrofin mtrofin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming the files were drop-in copied from google/ml-compiler-opt, so no need to comment there; also that we would subsequently delete them from there and depend on this package.

llvm/utils/mlgo/README.md Outdated Show resolved Hide resolved
llvm/utils/mlgo/mlgo/__init__.py Outdated Show resolved Hide resolved
@boomanaiden154
Copy link
Contributor Author

I'm assuming the files were drop-in copied from google/ml-compiler-opt, so no need to comment there; also that we would subsequently delete them from there and depend on this package.

They were mostly copied. I modified combine_training_corpus_lib.py to remove the tensorflow dependency as that feels reasonably prescriptive (at least for the use case on why it was there).

My plan was to do exactly as you mention and delete them from the googlem/ml-compiler-opt repository and make that project depend upon this library.

@petrhosek
Copy link
Member

Would it be also possible to remove the dependency on Abseil? None of the existing scripts in LLVM use it and I don't think we should be introducing this dependency. It looks like Abseil is only used for flag parsing, logging and testing; those should be straightforward to replace with standard libraries like argparse, logging or unittest.

@boomanaiden154
Copy link
Contributor Author

Would it be also possible to remove the dependency on Abseil? None of the existing scripts in LLVM use it and I don't think we should be introducing this dependency. It looks like Abseil is only used for flag parsing, logging and testing; those should be straightforward to replace with standard libraries like argparse, logging or unittest.

Yes. My plan was to remove the dependency on abseil as well. My plan was to get this landed with all the infrastructure setup and the code basically just directly copied and then remove the abseil dependency in a follow-up patch so that the different pieces get reviewed appropriately.

@mtrofin
Copy link
Member

mtrofin commented Jan 17, 2024

Would it be also possible to remove the dependency on Abseil? None of the existing scripts in LLVM use it and I don't think we should be introducing this dependency. It looks like Abseil is only used for flag parsing, logging and testing; those should be straightforward to replace with standard libraries like argparse, logging or unittest.

Yes. My plan was to remove the dependency on abseil as well. My plan was to get this landed with all the infrastructure setup and the code basically just directly copied and then remove the abseil dependency in a follow-up patch so that the different pieces get reviewed appropriately.

Ah, if you can drop the abseil dependency, the dependency problem for tests goes away. May be worth doing the abseil dropping bit in this patch, too?

@boomanaiden154
Copy link
Contributor Author

I believe this is ready to go now. All the tests have been converted to lit-style tests and everything has been wired up into a check-mlgo-utils CMake target.

I talked with Mircea about the absl dependency. It's not in the tests anymore and the plan is to refactor in a subsequent commit the rest of the utilities that use absl to the generic python versions. For now the tests have been gated to actually having absl installed.

Copy link
Member

@mtrofin mtrofin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding license info!

@boomanaiden154 boomanaiden154 merged commit a387bce into llvm:main Jan 20, 2024
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants