-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to run simbase? #36
Comments
please check the dir |
I used java -server -jar command directly in terminal without start file and then connected via redis-cli. Is there any way of importing data from .txt files? Also if I add more then 5 values in vectors (vadd) and then "vget" it will promt "(error) Unknown server error!". Is there a limit set to 5? |
You should setup the schemas first just like any RDBMS such as mysql. Had you follow the steps described in https://github.com/guokr/simbase#a-general-application-case ? Could you elaborate your setup scripts and your vget command in detail? About the importing, currently we do not have such tool, but it is very easy to write a scripts. |
Is basis a relation and article an attribute? What I want to do is to import bag of words (4623 rows of bag of words, each with 10,000 vectors) and than compare for similarity. I do not see any examples of scripts in documentation. Any hints would be good. |
By concepts, I am not sure about your cases since I did not see the detail. |
I create bag of words representation out of vocabulary with python. I export them to text file. The file looks like(for 10 most common words, 1 vector for each most common word):
Then I import Vectors (without 1,2,3,4,5, they are just sentenceids) to postgres. In postgres I use query to compare two bag of word vectors for the similiraty:
There are 4623 sentences, so 4623 rows. Total number of comparisons is 10,683,753. For 10 most common words time of execution is about 20 minutes. My goal is to compare bag of words which consists of 10,000 most common words, so 10,000 vectors. To consider how much time it takes for 10 vectors, in case of 10,000 vectors it will take about 24 hours. As Simbase works on vectors, I thought it can do these calculations faster.
For second case I wanted cosine distance, but postgres does not have any functions. Maybe Simbase can do something with it? |
For 10 common words example, try some scripts like below steps in redis-cli: Setup:
Fill data:
Query
|
Thanks. But is there any way to import vectors with script? Writing manually is impossible. |
For the Glove data, it is very neat to add a head for each row:
=>
save it as a shell scripts, then execute the shell. For your text file, there are two ways:
|
Thanks, importing should suit my case. And what about comparing every sentence with each other?
compares to other sentences or to itself? Also: `127.0.0.1:7654> rrec sentence 1 sentence
` |
the result is the id list of nearest sentences ordered by distance. I am not sure whether you had read our documents or not, please read them before some basic questions. Thank you. |
I installed simbase (not sure if correctly, but there were not errrors). I have ssh connection to the server where I installed simbase, so I can work directly on server machine. But I have no access to root, only can sudo. When I do bin/start or sudo bin/start, it writes command not found. Any ideas?
The text was updated successfully, but these errors were encountered: