Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SolrException: Undefined field text_edge #115

Closed
GreenArchon opened this issue Dec 31, 2016 · 7 comments
Closed

SolrException: Undefined field text_edge #115

GreenArchon opened this issue Dec 31, 2016 · 7 comments

Comments

@GreenArchon
Copy link

Hello,

I just installed Nextant (v1.0.3) & ran a first successful index on a Solr 6.3.0 instance. However, I don't seem to get any results on search (except for the default slow Nextcloud search on filenames), and when looking at the Solr logs for a search of "foobar" in Nextcloud, I get the following:

2016-12-31 20:40:18.683 ERROR (qtp606548741-58460) [   x:nextant] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: undefined field text_edge
        at org.apache.solr.schema.IndexSchema.getDynamicFieldType(IndexSchema.java:1308)
        at org.apache.solr.schema.IndexSchema$SolrQueryAnalyzer.getWrappedAnalyzer(IndexSchema.java:452)
        at org.apache.lucene.analysis.DelegatingAnalyzerWrapper$DelegatingReuseStrategy.getReusableComponents(DelegatingAnalyzerWrapper.java:84)
        at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:191)
        at org.apache.lucene.util.QueryBuilder.createFieldQuery(QueryBuilder.java:206)
        at org.apache.solr.parser.SolrQueryParserBase.newFieldQuery(SolrQueryParserBase.java:371)
        at org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:741)
        at org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:384)
        at org.apache.solr.parser.SolrQueryParserBase.handleQuotedTerm(SolrQueryParserBase.java:543)
        at org.apache.solr.parser.QueryParser.Term(QueryParser.java:413)
        at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:180)
        at org.apache.solr.parser.QueryParser.Query(QueryParser.java:101)
        at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:184)
        at org.apache.solr.parser.QueryParser.Query(QueryParser.java:101)
        at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:184)
        at org.apache.solr.parser.QueryParser.Query(QueryParser.java:101)
        at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:90)
        at org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:152)
        at org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:50)
        at org.apache.solr.search.QParser.getQuery(QParser.java:140)
        at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:161)
        at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:269)
        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:153)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:2213)
        at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
        at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460)
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:303)
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254)
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
        at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
        at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
        at org.eclipse.jetty.server.Server.handle(Server.java:518)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
        at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
        at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)
        at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
        at java.lang.Thread.run(Thread.java:745)

2016-12-31 20:40:18.683 INFO  (qtp606548741-58460) [   x:nextant] o.a.s.c.S.Request [nextant]  webapp=/solr path=/select params={json.nl=flat&hl=true&fl=id,nextant_deleted,nextant_path,nextant_source,nextant_owner,nextant_mtime,nextant_attr_content_type,score&start=0&hl.fragsize=70&fq=nextant_owner:"user1"+OR+nextant_share:"user1"+OR++nextant_sharegroup:[a bunch of groups...]&rows=25&hl.snippets=4&q=((text_edge:"foobar"^150)+OR+(text:"foobar"^1)+OR+(text_edge:"foobar"^5))%0a+OR+(nextant_path:"foobar"^1000+%0a)&omitHeader=true&hl.maxAnalyzedChars=100000&hl.fl=text_edge&wt=json} status=400 QTime=0

Querying Solr manually, I can see that the files have a nextant_attr_text_edge (and can successfully query for it).

However, simply modifying the query Nextant sends to append nextant_attr_ then leads to an undefined field text.

Thanks, and happy new year!

@ArtificialOwl
Copy link
Member

can you try a ./occ nextant:check --fix ?

if it is not full green, try multiple time.

@GreenArchon
Copy link
Author

That helped, with:

[...]
 * Checking field-type 'text_general' : fail
   -> Fixing field-type 'text_general' ok
 * Checking field-type 'text_general_edge' : fail
   -> Fixing field-type 'text_general_edge' ok
 * Checking field-type 'text_general_word' : fail
   -> Fixing field-type 'text_general_word' ok
 * Checking field '_version_' : fail
   -> Fixing field '_version_' ok
 * Checking field 'id' : ok
 * Checking field 'text' : fail
   -> Fixing field 'text' ok
 * Checking field 'text_edge' : fail
   -> Fixing field 'text_edge' ok
 * Checking field 'text_word' : fail
   -> Fixing field 'text_word' ok
 * Checking field 'nextant_path' : fail
   -> Fixing field 'nextant_path' ok
 * Checking field 'nextant_owner' : fail
   -> Fixing field 'nextant_owner' ok
 * Checking field 'nextant_mtime' : fail
   -> Fixing field 'nextant_mtime' ok
 * Checking field 'nextant_share' : fail
   -> Fixing field 'nextant_share' ok
 * Checking field 'nextant_sharegroup' : fail
   -> Fixing field 'nextant_sharegroup' ok
 * Checking field 'nextant_deleted' : fail
   -> Fixing field 'nextant_deleted' ok
 * Checking field 'nextant_source' : fail
   -> Fixing field 'nextant_source' ok
 * Checking field 'nextant_tags' : fail
   -> Fixing field 'nextant_tags' ok
 * Checking field 'nextant_extracted' : fail
   -> Fixing field 'nextant_extracted' ok
 * Checking field 'nextant_ocr' : fail
   -> Fixing field 'nextant_ocr' ok
 * Checking field 'nextant_unmounted' : fail
   -> Fixing field 'nextant_unmounted' ok
 * Checking dynamic-field 'ignored_*' : ok
 * Checking dynamic-field 'nextant_attr_*' : fail
   -> Fixing dynamic-field 'nextant_attr_*' ok
 * Checking copy-field 'text_edge/text' : fail
   -> Fixing copy-field 'text_edge/text' ok
 * Checking copy-field 'text_edge/text_word' : fail
   -> Fixing copy-field 'text_edge/text_word' ok
[...]

All is green now.

However, it seems I'm not out of the woods yet, since all queries still return 0 hits, and I get something like this in the logs:

2016-12-31 22:50:02.792 INFO  (qtp606548741-58453) [   x:nextant] o.a.s.c.S.Request [nextant]  webapp=/solr path=/select params={json.nl=flat&hl=true&fl=id,nextant_deleted,nextant_path,nextant_source,nextant_owner,nextant_mtime,nextant_attr_content_type,score&start=0&hl.fragsize=70&fq=nextant_owner:"user1"+OR+nextant_share:"user1"+OR++nextant_sharegroup:"group1"+OR++nextant_sharegroup:"__all"&rows=25&hl.snippets=4&q=((text_edge:"foobar"^150)+OR+(text:"foobar"^1)+OR+(text_edge:"foobar"^5))%0a+OR+(nextant_path:"foobar"^1000+%0a)&omitHeader=true&hl.maxAnalyzedChars=100000&hl.fl=text_edge&wt=json} hits=0 status=0 QTime=0

Playing with it a bit, I noticed that if I modify manually the query a bit, changing +OR+ to %2BOR%2B (ie HTML escaping), I get results...

@ArtificialOwl
Copy link
Member

did you reindex after the nextant:check ?

@GreenArchon
Copy link
Author

Doing it now, I seem to start getting results. I'll confirm it when it's done (the reindex seems to take ~10x more time than the first index for the same files, is that to be expected?) in a few days.

@ArtificialOwl
Copy link
Member

Well, the schema of your Solr was not ok, so it might be normal that on the first index your files were not totally extracted.
Now, how many files do you have on your cloud, and what kind of equipment if running it ?

@GreenArchon
Copy link
Author

It's currently indexing ~155k files and has done only ~20k in 10 hours, versus the whole process in 13 the first time (both the files and Solr are local and the bottleneck is mostly php maxing out a core, not much to do here to improve it).

Anyway, I don't mind it taking a few days for the first index, I'll just leave it running in its screen session.

@GreenArchon
Copy link
Author

After a few retries I finally got a working index, so all is good now, thanks.

Looking back at it, what I did was try to index, have issues, drop the core and recreate it, and of course nextant didn't know about it and thought the schema was still ok... Sorry for the trouble.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants