Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LLM token classification example #4541

Merged
merged 11 commits into from Dec 18, 2023
Merged

Conversation

roym899
Copy link
Collaborator

@roym899 roym899 commented Dec 14, 2023

What

Adds an example that tokenizes a text, visualizes the embeddings for each token (as a 3D UMAP embedding), logs the text tokens linking to the corresponding embedding, and classifies each token. Classification is into named entities (person, location, organization, and misc). The found, unique named entities are also logged.

Also removed some newlines in manifest.yml to make it more consistent.

llm_embedding_ner

Checklist

  • I have read and agree to Contributor Guide and the Code of Conduct
  • I've included a screenshot or gif (if applicable)
  • I have tested the web demo (if applicable):
    • Full build: app.rerun.io
    • Partial build: app.rerun.io - Useful for quick testing when changes do not affect examples in any way
  • The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG

@roym899 roym899 added examples Issues relating to the Rerun examples include in changelog labels Dec 14, 2023
@teh-cmc teh-cmc self-requested a review December 15, 2023 08:12
Copy link
Member

@teh-cmc teh-cmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works great, looks great.

Please don't spawn debugging shells without my consent though 😛

examples/python/llm_embedding_ner/main.py Outdated Show resolved Hide resolved
examples/python/llm_embedding_ner/main.py Outdated Show resolved Hide resolved
examples/python/llm_embedding_ner/main.py Outdated Show resolved Hide resolved
examples/python/llm_embedding_ner/main.py Outdated Show resolved Hide resolved
examples/python/llm_embedding_ner/main.py Outdated Show resolved Hide resolved
@teh-cmc
Copy link
Member

teh-cmc commented Dec 15, 2023

Also: at least on my machine, there's a bunch of wait time before the first logging calls arrive and again while computing the embeddings:

23-12-15_09.24.23.patched.mp4

It'd be nice if the script mentioned what it was doing in its standard output during those.

(Man i really wish we could log a spinner thing...)

@roym899
Copy link
Collaborator Author

roym899 commented Dec 15, 2023

Added a print.

Regarding runtime, the embeddings are currently computed twice. Once for logging and once as part of the whole pipeline. Not sure if it's worth to change this. In the pipeline there is a bit of extra stuff going on beyond just another function call passing the embeddings. So it'd add some complexity to the example. I added a note for now.

@teh-cmc
Copy link
Member

teh-cmc commented Dec 18, 2023

Any reason we're not merging this @roym899 ?

@roym899
Copy link
Collaborator Author

roym899 commented Dec 18, 2023

Making some small adjustments after talking to @nikolausWest

@roym899 roym899 merged commit 471af6d into main Dec 18, 2023
40 checks passed
@roym899 roym899 deleted the leo/token_embedding_example branch December 18, 2023 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples Issues relating to the Rerun examples include in changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants