-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: add instructions for generic spark (#103)
- Loading branch information
1 parent
efa7cc5
commit 6a2ed90
Showing
2 changed files
with
83 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# Spark integration | ||
|
||
You can integrate sidekick with Apache Spark by adding the [sidekick_service_init.sh](./sidekick_service_init.sh) as an [init script]() in your Spark clusters. This init-script should be configured to run on all Spark nodes. | ||
|
||
Briefly, the init script does the following: | ||
- Install sidekick on the Spark node | ||
- Configure S3 endpoint (for specific buckets) to point to sidekick | ||
- Setup a sytstemctl service to run sidekick as a daemon | ||
|
||
## Configuration | ||
|
||
To get started, download the sample [init script]() and make the following changes. | ||
|
||
1. | ||
Add bucket endpoints and regions which will be accessed via sidekick by adding to this section in the init_script. | ||
|
||
```bash | ||
cat >/databricks/driver/conf/sidekick-spark-conf.conf <<EOL | ||
[driver] { | ||
"spark.hadoop.fs.s3a.bucket.<MY_BUCKET1>.endpoint" = "http://localhost:7075" | ||
"spark.hadoop.fs.s3a.bucket.<MY_BUCKET1>.endpoint.region" = <AWS_REGION_OF_BUCKET1> | ||
"spark.hadoop.fs.s3a.bucket.<MY_BUCKET2>.endpoint" = "http://localhost:7075" | ||
"spark.hadoop.fs.s3a.bucket.<MY_BUCKET2>.endpoint.region" = <AWS_REGION_OF_BUCKET2> | ||
} | ||
EOL | ||
``` | ||
|
||
2. | ||
Define the environment variables by adding these lines to the [sidekick service init script](./sidekick_service_init.sh): | ||
|
||
```bash | ||
export SIDEKICK_APP_CLOUDPLATFORM=<AWS|GCP> | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
#!/bin/bash | ||
set -ex | ||
|
||
# Check if sidekick bin is present, if not download it | ||
SIDEKICK_BIN=/usr/bin/sidekick | ||
if [ -f "$SIDEKICK_BIN" ]; then | ||
echo "$SIDEKICK_BIN already installed." | ||
else | ||
wget https://github.com/project-n-oss/sidekick/releases/latest/download/sidekick-linux-amd64.tar.gz | ||
tar -xzvf sidekick-linux-amd64.tar.gz -C /usr/bin | ||
fi | ||
chmod +x $SIDEKICK_BIN | ||
$SIDEKICK_BIN --help > /dev/null | ||
|
||
cat > /opt/spark/conf/style-path-spark-conf.conf <<EOL | ||
[driver] { | ||
"spark.hadoop.fs.s3a.path.style.access" = "true" | ||
"spark.hadoop.fs.s3a.bucket.<MY_BUCKET1>.endpoint" = "http://localhost:7075" | ||
"spark.hadoop.fs.s3a.bucket.<MY_BUCKET1>.endpoint.region" = <AWS_REGION_OF_BUCKET1> | ||
"spark.hadoop.fs.s3a.bucket.<MY_BUCKET2>.endpoint" = "http://localhost:7075" | ||
"spark.hadoop.fs.s3a.bucket.<MY_BUCKET2>.endpoint.region" = <AWS_REGION_OF_BUCKET2> | ||
} | ||
EOL | ||
|
||
# Add any spark or env config here: | ||
# -------------------------------------------------- | ||
|
||
# -------------------------------------------------- | ||
|
||
export SIDEKICK_APP_CLOUDPLATFORM="<AWS|GCP>" | ||
|
||
# Create service file for the sidekick process | ||
SERVICE_FILE="/etc/systemd/system/sidekick.service" | ||
touch $SERVICE_FILE | ||
|
||
cat > $SERVICE_FILE << EOF | ||
[Unit] | ||
Description=Sidekick service file | ||
[Service] | ||
Environment=SIDEKICK_APP_CLOUDPLATFORM=$SIDEKICK_APP_CLOUDPLATFORM | ||
ExecStart=$SIDEKICK_BIN serve -p 7075 | ||
Restart=always | ||
[Install] | ||
WantedBy=multi-user.target | ||
EOF | ||
|
||
systemctl daemon-reload | ||
systemctl enable sidekick | ||
systemctl start sidekick | ||
systemctl status sidekick |