Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pickle article id's along with their tf-idf vectors #1

Open
martinthenext opened this issue Oct 24, 2014 · 1 comment
Open

Pickle article id's along with their tf-idf vectors #1

martinthenext opened this issue Oct 24, 2014 · 1 comment

Comments

@martinthenext
Copy link
Owner

Shocking: correspondence between vectors in the pickled matrix and corresponding articles for a particular source in the database is by index. This means that when we decided to look for the most relevant article in Fox News, we search for the closest vector in tfidf_<foxnews_source_id>.pkl. Suppose we get that vector 234 is the closest. Then we look up Article.objects.filter(source=Fox News).all()[234]. It would be better to have a list of article database id's pickled along with every matrix, so that we could find an id of a closest vector in O(1) and retrieve if from the database in O(1).

@go1dshtein
Copy link
Collaborator

Please use url as unique id of article, not pk in sql.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants