New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using Elasticsearch-hadoop library to import and process logs in parallel #62
Comments
Branch name: cc_es_hadoop
2.I changed term aggregation parameter since size: 0 is no longer valid for the terms. Could @Yongyao , @lewismc check if there is a better implementation with higher performance. |
I'll scope this out tomorrow. |
@lewismc I have upgraded Elasticsearch from 2.x to 5.x and import logs in parallel using Elasticsearch-Hadoop library. I set up a cluster and want to test the whole process in the cluster, but I don't know which package in the target directory can be used with the spark-submit command. I tried to execute the following command in the mudrod-core target directory
and get this error: I think the reason is that the dependency packages are not included in the built package. Is that correct? Do we have to modify pom.xml to generate another package including all dependencies? |
Hi @quintinali
Yes correct. This is trivial work and is described at https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/troubleshooting/missing_dependencies_in_jar_files.html |
@quintinali can you please create a pull request for your work on the following branch https://github.com/mudrod/mudrod/tree/cc_es_hadoop |
@quintinali please see my mail on the Mudrod mailing list for guidance on creating pull requests. |
Main modifications in this branch:
|
Issue resolved in #77 |
We are using Elasticsearch version 2.3, which only supports up to spark 1.6.1. If we would like to use the Spark 2.0 compatibility, we will need to switch to using the 5.0.0-alpha5 version. However, it is an alpha release version and should be used for testing purposes only (i.e. not to be used in production).
Do you have any suggestion?
refer: https://discuss.elastic.co/t/write-es-error-with-spark-2-0-release/56967/3
The text was updated successfully, but these errors were encountered: