Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elasticsearch Indexing Fails with error "converting NULL to string is unsupported" #15357

Closed
shieldsjared opened this issue Aug 27, 2020 · 13 comments

Comments

@shieldsjared
Copy link
Contributor

Summary

Unable to create elasticsearch index. Error encountered:

Unable to get the posts batch for indexing. — sql: Scan error on column index 18, name "TeamId": converting NULL to string is unsupported

(Checked Jira. Did not see anything relevant. Found others reporting same error message when attempting to use Bleve search #14929, but since that is experimental and potentially a different implementation, reporting this issue separately).

Steps to reproduce

Utilizing MM 5.26.0 (E20) and ElasticSearch 7.9.0. Unable to determine exact conditions that cause this to occur, but it happens each time I attempt to re-index (and fails).

  1. We recently upgraded to MM 5.26 and had been using database-driven searching
  2. Attempted to enable Bleve indexing, and ran into issue described in Bleve index fails with "converting NULL to string is unsupported" #14929
  3. Installed/Configured ElasticSearch 7.9
  4. Initiated bulk indexing, and after 866 minutes, ran into error described above.

Expected behavior

Indexing should complete successfully.

Observed behavior (that appears unintentional)

When executing the bulk indexing, this occurs after the indexing has been running for quite a long time.

image

Possible fixes

Checked for the code supporting this to see if I could troubleshoot further, but implementation appears to be in private enterprise repository.

@amyblais
Copy link
Member

@shieldsjared

  1. Was the indexing recreated as described in the important upgrade notes for v5.26?
  2. There are some known issues with Bleve as described here. Would you be able to disable Bleve to see if it helps with the issue?
  3. Are there any additional server logs or config settings (with confidential info removed) that you would be able to help share?

@shieldsjared
Copy link
Contributor Author

shieldsjared commented Aug 28, 2020

@amyblais

  1. This is a brand new configuration of elastic search... We had not used it prior to 5.26... So that step sounds like it doesn't apply (there's no prior index to purge first).
  2. Bleve is/was not enabled. We tested Bleve first independently, and when that gave us the error, we fully disabled that, and moved on to trying out Elasticsearch.
  3. Below are some log entries that seem to stand out.
{"level":"warn","ts":1598597578.0817766,"caller":"elasticsearch/indexing_job.go:322","msg":"Failed to get posts batch for indexing. Retrying.","error":"SqlPostStore.GetPostContext: Unable to get the posts batch for indexing., sql: Scan error on column index 18, name \"TeamId\": converting NULL to string is unsupported"}
{"level":"warn","ts":1598597593.093277,"caller":"elasticsearch/indexing_job.go:322","msg":"Failed to get posts batch for indexing. Retrying.","error":"SqlPostStore.GetPostContext: Unable to get the posts batch for indexing., sql: Scan error on column index 18, name \"TeamId\": converting NULL to string is unsupported"}
{"level":"warn","ts":1598597608.0984693,"caller":"elasticsearch/indexing_job.go:322","msg":"Failed to get posts batch for indexing. Retrying.","error":"SqlPostStore.GetPostContext: Unable to get the posts batch for indexing., sql: Scan error on column index 18, name \"TeamId\": converting NULL to string is unsupported"}
{"level":"warn","ts":1598597623.118078,"caller":"elasticsearch/indexing_job.go:322","msg":"Failed to get posts batch for indexing. Retrying.","error":"SqlPostStore.GetPostContext: Unable to get the posts batch for indexing., sql: Scan error on column index 18, name \"TeamId\": converting NULL to string is unsupported"}
{"level":"warn","ts":1598597638.1300888,"caller":"elasticsearch/indexing_job.go:322","msg":"Failed to get posts batch for indexing. Retrying.","error":"SqlPostStore.GetPostContext: Unable to get the posts batch for indexing., sql: Scan error on column index 18, name \"TeamId\": converting NULL to string is unsupported"}
{"level":"warn","ts":1598597653.136984,"caller":"elasticsearch/indexing_job.go:322","msg":"Failed to get posts batch for indexing. Retrying.","error":"SqlPostStore.GetPostContext: Unable to get the posts batch for indexing., sql: Scan error on column index 18, name \"TeamId\": converting NULL to string is unsupported"}
{"level":"warn","ts":1598597668.1446245,"caller":"elasticsearch/indexing_job.go:322","msg":"Failed to get posts batch for indexing. Retrying.","error":"SqlPostStore.GetPostContext: Unable to get the posts batch for indexing., sql: Scan error on column index 18, name \"TeamId\": converting NULL to string is unsupported"}
{"level":"warn","ts":1598597683.1566584,"caller":"elasticsearch/indexing_job.go:322","msg":"Failed to get posts batch for indexing. Retrying.","error":"SqlPostStore.GetPostContext: Unable to get the posts batch for indexing., sql: Scan error on column index 18, name \"TeamId\": converting NULL to string is unsupported"}
{"level":"warn","ts":1598597698.161478,"caller":"elasticsearch/indexing_job.go:322","msg":"Failed to get posts batch for indexing. Retrying.","error":"SqlPostStore.GetPostContext: Unable to get the posts batch for indexing., sql: Scan error on column index 18, name \"TeamId\": converting NULL to string is unsupported"}
{"level":"warn","ts":1598597713.1781416,"caller":"elasticsearch/indexing_job.go:322","msg":"Failed to get posts batch for indexing. Retrying.","error":"SqlPostStore.GetPostContext: Unable to get the posts batch for indexing., sql: Scan error on column index 18, name \"TeamId\": converting NULL to string is unsupported"}
{"level":"error","ts":1598597728.1904867,"caller":"elasticsearch/indexing_job.go:267","msg":"Worker: Failed to index batch for job","workername":"EnterpriseElasticsearchIndexer","job_id":"3o5wemr3ttye9era9q8z4nyy8h","error":"SqlPostStore.GetPostContext: Unable to get the posts batch for indexing., sql: Scan error on column index 18, name \"TeamId\": converting NULL to string is unsupported"}
{"level":"info","ts":1598598051.135663,"caller":"elasticsearch/aggregation_job.go:106","msg":"Worker: Aggregation job claimed by worker","workername":"EnterpriseElasticsearchAggregator","job_id":"ko7mnr8q3jn93pn87hyc4kqa4y"}
{"level":"info","ts":1598598051.3511677,"caller":"elasticsearch/aggregation_job.go:169","msg":"Worker: Aggregation job reindexing","start_date":"2019-08-28","end_date":"2019-08-30"}
{"level":"info","ts":1598598066.224554,"caller":"elasticsearch/indexing_job.go:162","msg":"Worker: Indexing job claimed by worker","workername":"EnterpriseElasticsearchIndexer","job_id":"1s7w54p4a3g6uktreoefeqaiiw"}
{"level":"info","ts":1598598092.4688087,"caller":"elasticsearch/indexing_job.go:289","msg":"Worker: Indexing job finished successfully","workername":"EnterpriseElasticsearchIndexer","job_id":"1s7w54p4a3g6uktreoefeqaiiw"}
{"level":"info","ts":1598598096.8524206,"caller":"elasticsearch/aggregation_job.go:213","msg":"Worker: Aggregation job finished successfully","workername":"EnterpriseElasticsearchAggregator","job_id":"ko7mnr8q3jn93pn87hyc4kqa4y"}

And our configuration:

"ElasticsearchSettings": {
        "ConnectionUrl": "REMOVED",
        "Username": "",
        "Password": "",
        "EnableIndexing": true,
        "EnableSearching": false,
        "EnableAutocomplete": false,
        "Sniff": true,
        "PostIndexReplicas": 1,
        "PostIndexShards": 1,
        "ChannelIndexReplicas": 1,
        "ChannelIndexShards": 1,
        "UserIndexReplicas": 1,
        "UserIndexShards": 1,
        "AggregatePostsAfterDays": 365,
        "PostsAggregatorJobStartTime": "03:00",
        "IndexPrefix": "",
        "LiveIndexingBatchSize": 1,
        "BulkIndexingTimeWindowSeconds": 3600,
        "RequestTimeoutSeconds": 30,
        "SkipTLSVerification": false,
        "Trace": ""
    },

This morning, I checked again and this time it appears to have succeeded.... although if you look at the screenshot... the results seem odd that after 862 minutes, it fails, but then succeeds on the next run of only 26 seconds. I enabled search and theres a period of messages that are not returning in the search results... so although it says it completed, I don't think it really did.
image

Appreciate your help, @amyblais!

@jasonblais
Copy link
Contributor

Hey @shieldsjared! :)

Covering for Amy while she's away. @ethervoid or @svelle would you be open to help review the above and help with troubleshooting?

@ethervoid
Copy link
Contributor

ethervoid commented Sep 2, 2020

The error comes from here and looks like there are posts that have channelId pointing to a non-existing channel or it's null.

Can you verify that doing a query like this:

SELECT count(*) FROM Posts p LEFT JOIN Channels c ON p.ChannelId = c.Id WHERE p.ChannelId IS NULL OR c.Id is NULL OR p.ChannelId = '' ;

This way we can verify those cases and check if it's something that shouldn't have happened or a bug in our code

@shieldsjared
Copy link
Contributor Author

shieldsjared commented Sep 2, 2020

Hey @jasonblais!

@ethervoid - You're quick! Indeed, I got back 1071. Appear to be pointing to a non-existent channel. I'm actually OK with just purging these posts... safe to just delete these posts from the Posts table, or are there any other places that will also require cleanup? I think I know how this happened, and assuming my hunch is correct - it's not as a result of a bug in MM.

@ethervoid
Copy link
Contributor

@shieldsjared :) probably the best way to deal with it is calling this endpoint so you get rid of all the intermediate data. Try that and if you get an error we can think about another solution

@shieldsjared
Copy link
Contributor Author

shieldsjared commented Sep 2, 2020

@ethervoid - Getting 403 (I'm a sysadmin... tested on other known posts to confirm my call to the API is correct). :/

@ethervoid
Copy link
Contributor

@shieldsjared can you share with me the error message you're seeing in the logs for that 403 so we can see if that is something we can solve?

@shieldsjared
Copy link
Contributor Author

shieldsjared commented Sep 2, 2020

Just getting this...

{"level":"error","ts":1599063946.9764547,"caller":"mlog/log.go:190","msg":"You do not have the appropriate permissions.","path":"/api/v4/posts/16u14w3p13rjzfnsx3k377q6uc","request_id":"9uizh973sfby9js5eig4hqtwye","ip_addr":"X.X.X.X","user_id":"faor17yjn3gbfqiu4q4mqtj99c","method":"DELETE","err_where":"Permissions","http_code":403,"err_details":"userId=faor17yjn3gbfqiu4q4mqtj99c, permission=delete_post"}

Perhaps because this posts are referencing a channel id of a channel that does not exist?

Greatly appreciate your help @ethervoid!

I checked and I see no other references to the channel id anywhere else throughout. All I could find were some lingering reactions entries (presumably related to these posts)... I know its not recommended, but i'm not opposed to just wiping these...

@ethervoid
Copy link
Contributor

Yes, sadly that is because is trying to check for permissions on that channel. The only way I can think of is to delete them by hand.

First of all, I encourage you to make a backup of the database or even better duplicate your installation and test this in a controlled environment to see if this is the cause of your problems and don't have side-effects.

The tables with posts relationships are Posts, Reactions, and Fileinfo. So you should make a query that deletes the posts and their relationships.

@shieldsjared
Copy link
Contributor Author

Definitely agree on backup and testing! Thanks for confirming the relationships. Appreciate you @ethervoid!

@shieldsjared
Copy link
Contributor Author

@ethervoid - Thanks for the help! That solved our issue :) Closing this out. I'm guessing that the folks on #14929 are likely having the same issue w/ Bleve... Thanks again!

@ethervoid
Copy link
Contributor

Really glad to know that this fixed the issue :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants