Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

F#2352 enable prep spark engine #2464

Merged
merged 14 commits into from Aug 20, 2019

Conversation

@joohokim1
Copy link
Contributor

commented Aug 19, 2019

Description

Now we can use Apache Spark for data preparation.

Related Issue : #2352

How Has This Been Tested?

  1. Download discovery spark engine: https://github.com/metatron-app/discovery-spark-engine
  2. mvn -DskipTests clean install
  3. Put configuration like below:
  • polaris:
    • dataprep:
      • etl:
        • spark:
          • jar: <discovery-prep-spark-engine-1.2.0.jar path>
          • port: <port>

(example)

  • polaris:
    • dataprep:
      • etl:
        • spark:
          • jar: $HOME/git-repos/discovery-spark-engine/discovery-prep-spark-engine/target/discovery-prep-spark-engine-1.2.0.jar
          • port: 5300
  1. ./run-prep-spark-engine.sh
  • If not given, $METATRON/conf/application-config.yaml is used.
  • ex> ./run-prep-spark-engine.sh $METATRON_HOME/discovery-server/src/main/resources/application.yaml
  1. Run discovery server as usual.
  2. Import & wrangle s5k_1.csv.txt
  3. Click "Snapshot"
  4. Open "Advanced options"
  5. Click "Spark" option.
  6. Click "Done"
  7. Check the result snapshot.

NOTICE:

  1. Although, the jar location is on sprint framework's property file, it's parsed by bash shell script.
    This is for unified configuration of metatron programs.
    Therefore, ${user.home} is not right in this case. $HOME will work.
  2. Discovery spark engine do not need any type of legacy spark cluster. Just we need is HCatalog. HCatalog is configured by polaris.storage.stagedb.metastore.uri

Need additional checks?

Yes, please.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • I have read the CONTRIBUTING document.
  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have added tests to cover my changes.
@joohokim1 joohokim1 added the testbed4 label Aug 19, 2019
@joohokim1

This comment has been minimized.

Copy link
Contributor Author

commented Aug 19, 2019

run build
deploy to 4

@joohokim1

This comment has been minimized.

Copy link
Contributor Author

commented Aug 19, 2019

Working with .txt has problem now. It's very simple problem, but I have to resolve in https://github.com/metatron-app/discovery-spark-engine. (takes a bit of time)
There fore, when you test, rename .txt into .csv please.
I cannot understand why GitHub prohibits uploading .csv.

@joohokim1 joohokim1 removed the testbed4 label Aug 19, 2019
@joohokim1 joohokim1 merged commit a8cb97f into master Aug 20, 2019
1 check passed
1 check passed
CodeFactor 7 issues fixed. 1 issue found.
Details
minhyun2 added a commit that referenced this pull request Sep 27, 2019
ufoscw added a commit that referenced this pull request Sep 30, 2019
* #2463 pause and resume batch ingestion

* #2464 add batch status
ufoscw added a commit that referenced this pull request Oct 7, 2019
* #2463 pause and resume batch ingestion

* #2464 add batch status
ufoscw added a commit that referenced this pull request Oct 7, 2019
* #2463 pause and resume batch ingestion

* #2464 add batch status
@alchan-lee alchan-lee deleted the f#2352-enable_prep_spark_engine branch Oct 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.