Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I saw sphinx_internal_id, wanting to use as UUID as ID but can't find anything online. #134

Closed
unisqu opened this issue Dec 5, 2018 · 17 comments

Comments

@unisqu
Copy link

unisqu commented Dec 5, 2018

sphinx_internal_id <- there's no documentation i can find for this.

How do I use UUID as Primary ID / Key for the search to happen? Please help. Thx

@tomatolog
Copy link
Contributor

tomatolog commented Dec 5, 2018

you might add any UUID / Primary ID / Key from any DB source as regular attribute sql_attr_bigint \ rt_attr_bigint and use that in your queries

source src : base
{
  type = mysql
  sql_query = SELECT id, text, UUID as sphinx_internal_id FROM test_table
  sql_attr_bigint = sphinx_internal_id
}

@isqad
Copy link
Contributor

isqad commented Dec 5, 2018

UUIDs is usually has type of String. I know that sphinx had limited size of store in 4Gb for strings. What about manticore?

@unisqu
Copy link
Author

unisqu commented Dec 5, 2018

my question is then

INSERT INTO rt VALUES ( [UUID], 'first record', 'test one', 123 );

is this possible? inserting UUID as primary key

@tomatolog
Copy link
Contributor

tomatolog commented Dec 5, 2018

UUIDs is usually has type of String

unlimited storage of strings \ MVA \ JSON is in progress.
However you might convert your UUID to number via any fold functions - ever CRC32 could work

@tomatolog
Copy link
Contributor

INSERT INTO rt VALUES ( [UUID], 'first record', 'test one', 123 );
is this possible? inserting UUID as primary key

you might insert your UUID in case it fits restrictions

IDS MUST BE UNIQUE UNSIGNED NON-ZERO INTEGER NUMBERS 64-BIT WIDE

otherwise you have convert your UUID to such number via any fold functions available

@unisqu
Copy link
Author

unisqu commented Dec 5, 2018

IDS MUST BE UNIQUE UNSIGNED NON-ZERO INTEGER NUMBERS 64-BIT WIDE

so i guess there's no workaround for 128 bit UUID in this case? I still don't really understand the application in this...

source src : base
{
type = mysql
sql_query = SELECT id, text, UUID as sphinx_internal_id FROM test_table
sql_attr_bigint = sphinx_internal_id
}

@tomatolog
Copy link
Contributor

so i guess there's no workaround for 128 bit UUID in this case? I still don't really understand the application in this...

you have to fold your 128 bit UUID into 64 bit in your application in case you push data into RT index, like this

int64 id = int64( ( int64(UUID>>64) ) ^ UUID);
INSERT INTO rt ( id, title, content, attr1, ref_id ) VALUES ( id, 'first record', 'test one', 123, UUID );

or use any fold functions available in your application

@unisqu
Copy link
Author

unisqu commented Dec 5, 2018

wont this allow for collision in the ID value?

   int64 id = int64( ( int64(UUID>>64) ) ^ UUID);

this is definitely not going to make the id unique... right?

@tomatolog
Copy link
Contributor

tomatolog commented Dec 5, 2018

this is definitely not going to make the id unique... right?

yes, however INSERT got failed in case document with same ID exists and you could rehash UUID with another function or might use hash function that makes less collisions, for example FNV64 should fits your need and makes no collisions.

or you might hash more data, like this

string sTitle = 'first record';
int64 id = FNV64 ( UUID, FNV64_SEED );
id = FNV64 ( sTitle, id );
INSERT INTO rt ( id, title, content, attr1, ref_id ) VALUES ( id, sTitle, 'test one', 123, UUID );

to make sure collision does not happens, however I sure id = FNV64 ( UUID ) could be enough.

@unisqu
Copy link
Author

unisqu commented Dec 5, 2018

i'm using uuid as my primary id and i understand the solution you provided.

  1. so when will int128 or uuid as id be out or in dev roadmap?

@manticoresearch
Copy link
Contributor

Hi

We don't have plans to add int128 IDs since 64bits should be enough for now. If you look at alternatives, e.g. ElasticSearch - their internal GUID is also 64 bits.

MySQL provides UUID_SHORT() which is already 64-bit int and can be used w/o any hassle with folding etc and can be sufficient in most cases (see https://dev.mysql.com/doc/refman/8.0/en/miscellaneous-functions.html#function_uuid-short for more details when the uniqueness is guaranteed).

@hiqbn
Copy link

hiqbn commented May 25, 2019

@tomatolog
unlimited storage of strings \ MVA \ JSON is in progress.
still not supported yet for uuid etc?

@githubmanticore
Copy link
Contributor

➤ Ilya Kuznetsov commented:

There is no more 4gb string limit in 3.0.

@hiqbn
Copy link

hiqbn commented May 25, 2019

@githubmanticore what about uuid for id or using some unique text as id?

@tomatolog
Copy link
Contributor

we already implemented daemon wide \ cluster wide auto id for percolate index and might add it into RT index too.

However have no plains for 128 wide bits numbers or any expressions at INSERT \ REPLACE statements.

Also at Manticore3 version id is regular bigint attribute that you might set and use as you wish.

@stale
Copy link

stale bot commented Dec 17, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Feel free to re-open the issue in case it becomes actual.

@stale stale bot added the waiting Waiting for the original poster (in most cases) or something else label Dec 17, 2019
@manticoresearch
Copy link
Contributor

Auto-id for RT has been implemented. It's available on Github and will be packaged and released in few days. Closing as non-actual, feel free to re-open in case you have any question.

@manticoresearch manticoresearch removed the waiting Waiting for the original poster (in most cases) or something else label Dec 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants