- Show a demo that build, deploy and run your spark application with Oracle OCI Dataflow spark cluster, with application deployed to OCI bucket
Setup Python Virtual Environment
mkdir .venv
python3 -m venv .venv
source .venv/bin/activate
python -m pip install pip setuptools --upgrade
python -m pip install wheel
python -m pip install spark-etl
python -m pip install oci-core
check out demos
git clone https://github.com/stonezhong/spark_etl.git
cd spark_etl/examples/oci_dataflow1
etl -a build -p demo01
- It build the application
demo01
- The config file is
config.json
unless specified by -c option - Since
apps_dir=apps
in config, it will locate applicationdemo01
at direcotryapps/demo01
- Since
builds_dir=.builds
in configuration, build result will be in.builds/demo01
Setup OCI Config
You can change the config in profile .profiles/main.json
if needed.
- Your API key is stored in file
~/.oci/oci_api_key.pem
- Your API key fingerprint is
2b:3d:75:f3:00:10:60:32:94:9b:82:56:82:e2:c1:a4
- You are using dataflow in region
us-ashburn-1
- Your tenancy ID is
ocid1.tenancy.oc1..aaaaaaaax7td4zfyexbwdz3tvcgsolgtw5okcvmnzpjryfzfgpvoamk74t3a
- Your user ID is
ocid1.user.oc1..aaaaaaaa7w622vhkumwop4dasnbx2pfoluzlzojmjwuhim733hhd2vtaiqxq
etl -a deploy -p demo01 -f main
- This command deploy the application
demo01
- It uses profile
main
- Since
profiles_dir=.profiles
inconfig.json
, it will load profilemain
from file.profiles/main.json
- It will deploy to directory
oci://dataflow-apps@idrnu3akjpv5/spark-etl-lab/apps/demo01/1.0.0.0
, sincedeploy_base
in profilemain
, and application version is1.0.0.0
from it's manifest file.
etl -a run -p demo01 -f main --run-args input.json
- It run the application
demo01
, using profilemain
- It passes the content of
input.json
as parameter to the data application - based on the cmds in
input.json
, it will save parquet tooci://spark-etl-lab@idrnu3akjpv5/data/trade.parquet
. - The application returns a dict
{"result": "ok"}