Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multivalue attributes #36

Closed
manuelportela opened this issue Feb 11, 2019 · 11 comments
Closed

Multivalue attributes #36

manuelportela opened this issue Feb 11, 2019 · 11 comments

Comments

@manuelportela
Copy link

What is the best way to handle documents with multi value attributes?
For example a document with a m:n relation to another entity.

@ts-thomas
Copy link
Contributor

Can you provide me a short example of a document?

@manuelportela
Copy link
Author

manuelportela commented Feb 12, 2019

Sure,

const doc = [{
    id: 1,
    description: 'Doc 1 description',
    tags: [{
        id: 1,
        name: 'tag 1'
    },
    {
        id: 2,
        name: 'tag 2'
    }]
}]

I guess I could try to index by index.add(doc) and then map over doc and add a serialized string of the corresponding "tags", e.g. 'tags: 'tag 1 tag 2' or tags: 'id:1 id:2' but this could clash with a mor sophisticated tokenizer where a search for 'id:2' may include also tags like 'id:22'.

I may end up to array.filter the fuzzy results as second run. I'd rather use index.search with where.

kindest regards

@ts-thomas
Copy link
Contributor

ts-thomas commented Feb 12, 2019

Nice example, it opens new requirements about indexing complex documents.

You could solve it actually by:

const index = new FlexSearch({
    doc: {
        id: "id",
        field: {
            description: { // or use a preset
                encode: "simple",
                tokenize: "forward"
            },
            tags: {
                encode: false,
                tokenize: function(val){
                    return val.split(/[\[{,}\]]+/);
                }
            }
        }
    }
});

I used this doc for the test:

const doc = [{
    id: 1,
    description: 'Doc 1 description',
    tags: [{
        id: 2,
        name: "tag 1"
    },
    {
        id: 222,
        name: "tag 2"
    }]
},{
    id: 2,
    description: 'Doc 1 description',
    tags: [{
        id: 22,
        name: "tag 1"
    },
    {
        id: 2222,
        name: "tag 2"
    }]
}];

Also mapping over the doc and serialize the field "tags" (this should be a nice improvement to solve this via the document descriptor):

doc.map(function(val){ 
    val.tags = JSON.stringify(val.tags); 
    return val;
});

Add to index:

index.add(doc);

Search:

const results = index.search('"id":2', {field: "tags"}); // --> doc[0];
const results = index.search('"id":22', {field: "tags"}); // --> doc[1];

Alternatively:

const results = index.search(JSON.stringify({id:22}), {field: "tags"});

Also using where could be a possible (but almost slower) solution:

const index = new FlexSearch({
    doc: {
        id: "id",
        field: "description"
    }
});

const results = index.search("description", {
    field: "description",
    where: function(doc){
        for(let i = 0; i < doc.tags.length; i++){
            if(doc.tags[i].id === 22) return true;
        }
    }
});

This example does not need to map the tags in documents but performs slower.

@manuelportela
Copy link
Author

Wow,

thank you very much. That looks very promising; I still need to check. Actually I could not read from the documentation that every field can be configured with its own tokenizer, which is great.

@ScreamZ
Copy link

ScreamZ commented Jun 25, 2019

@ts-thomas Not sure if this is the same issue, I'm having some issue understanding.

My use case is

const docs = [
    {
      id: 1,
      materials: ["wood", "steel"],
    },
    {
      id: 2,
      materials: ["plastic"],
    },
    {
      id: 3,
      materials: ["wood"],
    },
  ];

How can i match using text search everything that start or looks like woo ?

I tried :

materials: {
          encode: false,
          tokenize(val) {
            return val.split(/\s/); // Remove space
          },
        },

But looks like not working...

Thanks in advance :)

@ts-thomas
Copy link
Contributor

@ScreamZ Hello, just switch the tokenizer and encoder to:

materials: {
    encode: "advanced",
    tokenize: "forward"
},

The encoder covers "looks like woo", the tokenizer covers "everything start with woo".

Another example:

materials: {
    encode: "extra",
    tokenize: "full"
},

This encoder covers "sounds like woo", the tokenizer covers "everything start with, end with and also including woo".

@freder
Copy link

freder commented Oct 6, 2019

@ts-thomas I just tried your suggestion with @ScreamZ's example, and it does not work for me:

const index = flexsearch.create({
	doc: {
		id: 'id',
		field: {
			materials: {
				encode: 'advanced',
				tokenize: 'forward'
			},
		},
	},
});

index.add([
	{
		id: 1,
		materials: ['wood', 'steel'],
	},
	{
		id: 2,
		materials: ['plastic'],
	},
	{
		id: 3,
		materials: ['wood'],
	},
]);

const results = index.search('woo');
console.log(results); // → []

I have essentially the same problem: I want to search documents that have array fields.

@edsu
Copy link

edsu commented Feb 19, 2020

I'm assuming that this issue is still open because there isn't a workable solution? I tried the proposed solutions above and none of them seemed to work.

@chenningg
Copy link

Is there any update on this? Seems like recent issues haven't been receiving replies... I'm trying to do the same use case with tags:

index.add([
  {
    id: 1,
    tags: ['wood', 'steel'],
  },
  {
    id: 2,
    tags: ['wool', 'cloth'],
  },
  {
    id: 3,
    tags: ['nylon'],
  },
]);

But nothing seems to work (or even be indexed properly). Additionally, users can choose which fields to index, making custom tokenization a bit of a pain to do dynamically.

@carlinmack
Copy link

carlinmack commented Nov 6, 2020

I think there's been a bit of confusion and I hope to clarify a few things as I see that "support of nested arrays in documents" is yet to be released.

The original question was about searching for key: value pairs in a structure like so:

const docs = [{
    id: 1,
    description: 'Doc 1 description',
    tags: [{
        id: 1,
        name: 'tag 1'
    },
    {
        id: 2,
        name: 'tag 2'
    }]
}]

as in, you want to search the block of JSON with queries such as "id":2.

Subsequent commenters ask to store and search for information in arrays, like so:

const docs = [{
    id: 1,
    description: 'Doc 1 description',
    tags: [ 'tag 1', 'tag 2']
}]

as in, you want to search for docs which have a certain tag.

There doesn't seem to be support or a solution for this. I thought the 'tokenize function would allow parsing of [ 'tag 1', 'tag 2'] as "tag 1", "tag 2" but the tokenize function tokenises the search query, not the parsing of the tags for the index.

Unfortunately I don't think you can store the doc as

const docs = [{
    id: 1,
    description: 'Doc 1 description',
    tags: {
         name: "tag 1",
         name: "tag 2"
    }
}]

due to restrictions with JSON on key names.

I think one solution, but with caveats, would be to create a secondary index for tags like so:

var tags = [{
    id: 1,
    docID: 1,
    tag: "tag 1"
},{
    id: 2,
    docID: 1,
    tag: "tag 2"
}];

However, I'm not sure how easy it is to join two result sets, although the searching would be efficient. I'm going to investigate this last approach and will report back.

@carlinmack
Copy link

carlinmack commented Nov 6, 2020

As we have index.find(ID); to get a document by ID, it is actually fairly trivial to merge the two arrays. Unfortunately, ranking the results is a bit nontrivial. I would assume an ordering of

  1. matches both description and tag
  2. matches description
  3. matches tag

Which means that you would want to iterate through the description matches, checking if they have tag matches - O(n×m) - and then find the documents for the remaining tags — O(m). This is fairly trivial for the result sets I am using, but sad that there is not a native way to do this.

Edit: It doesn't really matter what the ordering is above, you'll still be iterating through both lists each time, which wouldn't be necessary if you could store an array in the same object

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants