Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open source of domain to topic model. #79

Open
ydennisy opened this issue Jul 5, 2022 · 4 comments
Open

Open source of domain to topic model. #79

ydennisy opened this issue Jul 5, 2022 · 4 comments

Comments

@ydennisy
Copy link

ydennisy commented Jul 5, 2022

Hi!

I have been checking out chrome://topics-internals/ thank you for this tool it is very useful!

I was wondering if you could go one step further and open up the domain to topic model itself?

I can see the model path and version here:

Model version: 2206021246
Model file path: /Users/*******/Library/Application Support/Google/Chrome/Profile 2/optimization_guide_prediction_model_downloads/64f1ed1f-3fb6-4a8f-aaa0-fe2c58dc1723/model.tflite

Would you consider open sourcing the model, or at least providing a guide, or data set on which it was trained?

Any information on this would be much appreciated!

@ydennisy
Copy link
Author

ydennisy commented Jul 6, 2022

tagging @jkarlin

@jkarlin
Copy link
Collaborator

jkarlin commented Jul 6, 2022

Hi Dennis. The model can be loaded via TensorFlow Lite. I know a number of people have been asking how they can run it manually and so I gave it a spin myself. Here is what I did:

I basically did what's listed here.

git clone https://github.com/tensorflow/tflite-support.git
cd tflite-support
sudo apt-get install bazel-5.1.1 
bazel run -c opt tensorflow_lite_support/examples/task/text/desktop:bert_nl_classifier_demo -- --model_path=<path to your tflite file as shown in chrome://topics-internals> --text="example com"

Replace any '-', '_', '.', and '+' chars in your input text with whitespace ' '.

Note that the first run is slow as bazel has to download a bunch of third-party libraries and build them all.

The output is a list of topic ids and their score. You can map the ids to strings with the taxonomy file.

In regards to the training data, I don't believe there are any current plans to release that data.

@jkarlin
Copy link
Collaborator

jkarlin commented Jul 15, 2022

A colleague developed a python-based colab notebook for folks to try running the model manually with as well.

Edit: changed the link

@ydennisy
Copy link
Author

ydennisy commented Aug 5, 2022

Hey @jkarlin that is great! Thanks very much for this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants