Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solr #2

Open
realguoshuai opened this issue Nov 19, 2018 · 10 comments
Open

Solr #2

realguoshuai opened this issue Nov 19, 2018 · 10 comments

Comments

@realguoshuai
Copy link
Owner

No description provided.

@realguoshuai
Copy link
Owner Author

今天NC市线上快速搜车数据发现每天只有20-50万条,数据明显减少, 最后发现是创建索引程序的kafka消费组重名导致的

@realguoshuai
Copy link
Owner Author

Solr创建索引少数据,最终解决是用kafka自带的ShutdownableThread.scala工具类

@realguoshuai
Copy link
Owner Author

新增Solr从hive创建索引,进行字典转换 使用Fiber实现

@realguoshuai
Copy link
Owner Author

2张hive数据表中数据需要通过4张hive字典表转换存到solr,思路是读字典表分别存到hashmap,在创建索引之前做转换

@realguoshuai
Copy link
Owner Author

公司Solr查询接口重写 代码上修改替换掉前缀通配符 参考Solr前缀匹配优化.txt

@realguoshuai
Copy link
Owner Author

realguoshuai commented Feb 16, 2019

Solr近期需要实时接入 NC市3000w过车数据/天 现在测试集群进行百亿规模测试
使用spring boot 实现solr查询的rest接口

@realguoshuai
Copy link
Owner Author

realguoshuai commented Jun 19, 2019

线上某省会城市日 增数据3000w 早高峰数据写入滞后 中午才正常 是重建索引导致的写入延迟导致的,随着存量数据的增加,之后会越来越久 计划后期使用es代替solr

@realguoshuai
Copy link
Owner Author

realguoshuai commented Jun 28, 2019

Solr索引太大(25亿条,日增3000w) 实时创建索引遇到瓶颈,早晚高峰会出现延迟 ; 想了下,决定采用分表方式 分为历史表和实时表(28shard,保存3个月数据) rest中自己加一个判断控制

@realguoshuai
Copy link
Owner Author

上条issue解决方式:测试发现批量提交几千条跟几万条重建索引的时间是相同的,使用blockqueen控制批量提交的数量>1000或距离上次时间>5s 提交一次

@realguoshuai
Copy link
Owner Author

realguoshuai commented Aug 20, 2019

NC市晚上查询特别慢30-50s,早上很快2秒,从存量查当天数据
内存被其他查询塞满了,而且短时间没有释放, 走的磁盘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant