New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISSUE-39 Prefer Elasticsearch TransportClient over Node #42
Conversation
I will push another bunch of commits to stabilize this branch. |
@Yongyao @quintinali in addition the most recent commit upgrades Elasticsearch from 1.7.X to 2.3.X. |
@quintinali Can you help test this pull request tomorrow? Thanks |
Any comments here folks? |
@lewismc I can not run mudrod successfully with program argument "-f -logDir E:\mudrodCoreTestData" java.lang.NullPointerException Through debugging I found that the value of es, spark is null. If I add three lines in the main function as following, this bug could be solved. But I am not sure whether you want to add these initial functions there?
|
@lewismc We used "private transient Node node" in ESdriver before, many functions in ESdriver use this parameter. Since node are not initialized when host is not null in this latest version, functions invoking node will fail, such as refreshIndex(). |
@quintinali thank you v much for feed back. I will fix now. |
OK |
Current coverage is 0.00% (diff: 0.00%)@@ master #42 diff @@
====================================
Files 49 49
Lines 2689 2704 +15
Methods 0 0
Messages 0 0
Branches 317 318 +1
====================================
Hits 0 0
- Misses 2689 2704 +15
Partials 0 0
|
@Yongyao @quintinali please can both of you try this PR out again? I've made a number of updates which further clean up the codebase. |
@Yongyao @quintinali I just pushed another commit. Please test it out based on the elasticsearch server v 2.3.5 running on localhost:9200 and based upon the instructions below. When I invoke the MudrodEngine with the following parameters
The files I have locally are of type .json not of file suffix Please tag me here once you have debug'd the code and have some issues. |
@lewismc That's correct. The logDir I shared with you is the intermediate results, which means you don't need raw logs anymore, and you can run -f -logDir /Users/lmcgibbn/Desktop/logDir/ directly. I just tested the code with -f -logDir /Users/lmcgibbn/Desktop/logDir/. There is still some ES upgrading issue. java.lang.NullPointerException As you can see, most of the problem is in ESDriver. Please let me know, if you need help. |
I think the code below is what is causing the issue. A lot of methods are using "Node", which is null, because it always goes into the first if condition. // Prefer TransportClient |
Thanks I'll scope later today. On Thursday, August 25, 2016, Yongyao Jiang notifications@github.com
Lewis |
The code is correct. We should always prefer to use a client rather than a This patch had turned out to me a blocker for my ontology development. The What error is thrown when you execute a full log ingest? Are you running On Thursday, August 25, 2016, Lewis John Mcgibbney <
Lewis |
@lewismc I have already fixed the problem in ESdriver. Please see my commit I have also tested the -f command. The results is correct. I will continue to test the -l command (preprosessing) this afternoon, which very likely has some error since the ES API has changes. I will report the result here later today. Also, some minor issues here.
|
Ok please keep testing. I need you and Yun to keep testing so that we can On Thursday, August 25, 2016, Yongyao Jiang notifications@github.com
Lewis |
@lewismc I am testing the -l command. It works for very small logs, but it takes much longer to ingest the same amount of logs. Just give you an idea of what it is like, the log file that takes about 1hr now takes more than 1hour (still running). There was a mistake in importLog.java, but I have fixed it. I will keep testing, and let you know tomorrow morning at latest. BTW, a crawler detection problem.
Do you think it is a good practice to do so? I have found a lot of new agents in the log of 2016. Maybe change "equal()" to "contain("crawler") or contain("bot")" ? |
Well ingestion should not be slower and we need to write ingestion tests to Regarding crawler detection, the current crawler http agent name det croon Thanks On Thursday, August 25, 2016, Yongyao Jiang notifications@github.com
Lewis |
Hi @Yongyao @quintinali I just pushed some more commits to the ISSUE-39 branch. I am able to perform full ingest of local log files as well as other linkage information. |
Yes, there is no problem with -f command right now, but this is not real full ingest. In order not to block your further work, I made it skip the preprocessing step last week. The 246K hits are not logs. I think you can continue the ontology work based on this code. In the meanwhile, I will test the real preprocessing code and try to improve it. Sent from my iPhone
|
I am not understanding this. Can you please explain what else needs to be done for a full ingest to take place? Maybe we can discuss tomorrow @Yongyao as I am clearly missing some information here. Can we work to implement the full ingest code after this has been merged into master? |
I agree. It's fine to merge it to the master. I'm happy to talk tomorrow, and 11am-12pm, 1:30-4pm work for me. Sent from my iPhone
|
Cool. I will try you at the earlier time. Thanks @Yongyao and @quintinali for reviews of this code. I appreciate it. |
@Yongyao @quintinali this PR addresses a number of issues