Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word Embedding question 2 #28

Closed
mrJezy opened this issue Sep 30, 2020 · 1 comment
Closed

Word Embedding question 2 #28

mrJezy opened this issue Sep 30, 2020 · 1 comment
Assignees
Milestone

Comments

@mrJezy
Copy link

mrJezy commented Sep 30, 2020

Hi,

In the tutorial Part 3: Build an Embeddings index from a data source, at the part, where the word vectors are built I checked the txt file, that was generated. I realized that the vector representation of letters are there and not the words?! Is this on purpose? Correct me if I'm wrong, but I think the list of words should be there with the 300 dimension vectors.

Kind regards,
mrJezy

@davidmezzetti
Copy link
Member

Thank you for reporting this issue. It was not intentional, it's an error with the notebook. There was a missing function call to tokenize the input data before writing to the word vector input file.

@davidmezzetti davidmezzetti self-assigned this Oct 1, 2020
asysc2020 pushed a commit to asysc2020/txtai that referenced this issue Dec 20, 2020
@davidmezzetti davidmezzetti added this to the v1.3.0 milestone May 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants