Skip to content
This repository has been archived by the owner on Apr 9, 2023. It is now read-only.

Update search infrastructure #35

Merged
merged 1 commit into from Dec 12, 2016
Merged

Update search infrastructure #35

merged 1 commit into from Dec 12, 2016

Conversation

vangheem
Copy link
Member

@vangheem vangheem commented Dec 6, 2016

Other things in here...

  • Added documentation on addons, applications, configuration, design
  • remove local site components: saves a lookup and then we aren't storing a persistent registry. If you want to customize per site, use layers and register adapters against layers.
  • added @catalog endpoint with POST and DELETE handlers to create/delete index
  • update plone.server application interface: app_settings, includeme, zcml--see details in docs
  • update index directive to be able to provide custom indexers apart from interface definition as well as a decorator to use a custom accessor to retrieve index data

@vangheem vangheem changed the title change catalog utility to an adapter WIP: change catalog utility to an adapter Dec 6, 2016
@vangheem
Copy link
Member Author

vangheem commented Dec 6, 2016

@bloodbare one of the problems is we have a bit of elasticsearch specific ideas mixed into core here.

For example:

    catalog(creators='text')
    catalog(subject='text')
    catalog(contributors='text')
    index(contributors='non_analyzed')
    index(creators='non_analyzed')
    index(subject='non_analyzed')

This seems confusing to me. I think we should only have an "index" directive which has sane defaults. Then further customizations can be provided with the pserver.elasticsearch package to define ways to customize the mappings.

Also, pserver.elasticsearch will need a query converter/generator to transform term query and complex queries into something meaningful for elasticsearch.

@bloodbare
Copy link
Member

As I'm seeing it, the catalog and the index have a global semantic meaning. Catalog is what in ZCatalog we have the metadata, meaning that we want to catalog that attribute. Index is defining the kind of index that we want to use.

Both are generic and maybe the problem is choosing non_analyzed and text, maybe we can define some generic way to define this directives but I think that they need to be separated and clear (cataloging vs indexing)

@vangheem
Copy link
Member Author

vangheem commented Dec 7, 2016

That seems to me more like a different between stored and not stored in the catalog right? So for elasticsearch we'd use stored: false?

Or do you mean that the distinction is if that value will be a full text type of query instead of using exact matching?

@bloodbare
Copy link
Member

My idea is:

We have a catalog metadata on the field , we need to catalog. In order to catalog we need to know which kind of type the field is it ( and its hard to get that from the fieldtype as there may be different catalog fields based on the semantic meaning of the field). The index metadata is to define if we want to store it as text of we want to analyze with something.

In ES there is the keyword / text for defining which kind of data :
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html

So the main problem may be with the index directive, there would be great to allow to define which index names we want to map with which analyzers we want to apply. Looking for which generalization we can do is hard, but I'm sure that we can end with something like :

  • catalog ( metadata cataloging with the kind of field )
  • index ( field index analyzer )

Creating fake @Property schema attributes that are defined on the object to add new fields that may be analyzed with different analyzers ( or in non-ES environment with different index )

@bloodbare
Copy link
Member

Btw I don't know if I like the idea of configuring them as adapters for layers, I like the idea of utilities defined on config.json that are started with the app, less state and less steps to enable one catalog or another.

@vangheem
Copy link
Member Author

vangheem commented Dec 7, 2016

Maybe we need to have a meeting about this as I'm still a bit confused at the distinctions and how this can be done correctly.

By utilities that are started with the app, do you mean async utilities that run on the loop outside the context of the request? I guess that's fine. I'm trying to decide what is better overall for design of plone.server and that includes how we decide addons are going to be integrated. The utility route is okay but then you can't have only specific plone sites that are using the catalog and some that are not in the same install.

@vangheem
Copy link
Member Author

vangheem commented Dec 7, 2016

For anyone else paying attention. Ramon and I had a chat.

Here are the prevailing issues that need to be thought about in designing this:

  • need a way to store data in catalog but not have it indexed
  • need a way to provide indexing hints in a generalized way with the directive
  • closely align with elasticsearch terminology where it makes sense since they have thought about these things a lot
  • do not use adapters with request layers from addons because it adds overhead with the more layers we have on the request.
  • async utilities have the advantage of being outside the request, faster
  • utility search api for this isn't great--see if we can improve

@vangheem vangheem force-pushed the catalog-adapter branch 2 times, most recently from 895593f to dc90d2e Compare December 10, 2016 10:59
vangheem added a commit to pyrenees/pserver.elasticsearch that referenced this pull request Dec 10, 2016
- Added documentation on addons, applications, configuration, design
- remove local site components: saves a lookup and then we aren't storing a persistent registry. If you want to customize per site, use layers and register adapters against layers.
- added @catalog endpoint with POST and DELETE handlers to create/delete index
- update `plone.server` application interface: app_settings, includeme, zcml--see details in docs
- update index directive to be able to provide custom indexers apart from interface definition as well as a decorator to use a custom accessor to retrieve index data
@vangheem
Copy link
Member Author

@bloodbare this is now ready for review. I'm sorry for such a large PR. I've updated the ticket.

@vangheem vangheem changed the title WIP: change catalog utility to an adapter Update search infrastructure Dec 10, 2016
@bloodbare bloodbare merged commit c000b9f into master Dec 12, 2016
@gforcada gforcada deleted the catalog-adapter branch December 12, 2016 21:11
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants