Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to do vector arithmetic? #160

Open
sleepandpancakes opened this issue Oct 13, 2023 · 4 comments
Open

how to do vector arithmetic? #160

sleepandpancakes opened this issue Oct 13, 2023 · 4 comments
Labels
usage General usage

Comments

@sleepandpancakes
Copy link

how do i use the API to do manual vector arithmetic on vectorized words/phrases?
for example, adding an arbitrary vector to vector corresponding to a word and returning the result?
or linear interpolation between two vectorized words and converting to corresponding word?

@rmitsch rmitsch added the usage General usage label Oct 13, 2023
@rmitsch
Copy link

rmitsch commented Oct 13, 2023

You can obtain the vectors like this (see example in the readme):

import spacy

nlp = spacy.load("en_core_web_sm")
s2v = nlp.add_pipe("sense2vec")
s2v.from_disk("/path/to/s2v_reddit_2015_md")

doc = nlp("A sentence about natural language processing.")
vector = doc[3:6]._.s2v_vec

You can then use e. g. numpy to do whatever vector arithmetic on the embeddings you obtained.

@sleepandpancakes
Copy link
Author

thank you. is there a way to take an arbitrary vector and find the closest corresponding word in the vocab? i'm still having a bit of trouble understanding how i would do this

@rmitsch
Copy link

rmitsch commented Oct 16, 2023

What you're looking for is a nearest neighbor search. sense2vec doesn't expose this in the public API, but there are a lot of tools for this - sorted by complexity/overhead/capabilities from low to high:

@sleepandpancakes
Copy link
Author

thank you again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage General usage
Projects
None yet
Development

No branches or pull requests

2 participants