Pachyderm Spatial Demo with R

Step 1 - push your image to a container registry

If you you have your containers ready in a container registry, go to step 2

These instructions are for Google Container Registry, which we just happened to use for the demo. It will vary slightly if you use other infrastructure like AWS, Oracle Cloud or IBM Bluemix.

Push your container to the spatial_demo directory in your container registry (you will need sufficient permissions to do so).

In our example it would look like this.

docker build -t gcr.io/ixplaza/spatial_demo .
docker push gcr.io/ixplaza/spatial_demo

Step 2 - Get a cluster on Pachyderm Hub

Go to hub.pachyderm.com and create a cluster. If you want to run the demo for free, “Create a 4-hr Workspace”. You can also setup your own cluster locally (for instance with microk8s or setup your own cluster the cloud.
The Pachyderm Hub interface will give you the instructions under the connect option

Validate your connection with pachctl version. You should get a response with a version number for pachctl and pachd. If this works you can now do pachctl shell a utility that gives you autocomplete. For readability we will assume you are in the shell (otherwise just prepend the command with pachctl in all calls below).
We will work in the spatial_demo directory and the following commands will assume that your are there.

Step 3 - Put files on the cluster

We are going to create the data repository to put the shapefiles in. We will call it shapes.

create repo shapes

Validate that the repo is there running pachtl list repo.

We have found it easiest to work in a team with data in a bucket (GCP for this demo). Note that when we try to copy from our demo-data bucket Pachyderm will state the name of the service-account that needs permissions

put file shapes@master -r -f gs://pachyderm_demo/shapes/

You will need to give this service account Storage Object Viewer permissions, and then repeat the command above.

Step 4 - Launch a pipeline

The key component of the pipeline is the container image that runs the code we need. For this demo we made the container public - you will need to setup some additional permissions in a production environment.
Lets put the first pipeline to work

create pipeline -f pipelines/pipeline_separate.json

We can see what is happening now on the Pachyderm Dashboard or on the command line with:

logs --pipeline=separate_shape
inspect job <job_id>` # get `<job_id>` with `list job`.

Once the separate_shape pipeline has finished, we will use the playas.rds file for each of the segments.

list file separate_shape@master:/shapes_segmentos

We then do a cross between input repositories so that we can use more than one. Pachyderm will combine each datum in one repo with all those in another so that we will have all beaches available for each segment datum. We will create a new repo called beaches and bring the playas.rds file there.

create repo beaches
get file separate_shape@master:/shapes_segmentos/playas/playas.rds -o playas.rds
put file beaches@master -f playas.rds

On your file system you can now remove the file we downloaded from the cluster

rm playas.rds

We can now start the pipeline to calculate distances

create pipeline -f pipeline_distances.json

We can check how the execution of the pipeline went by taking a look on the dashboards or inspect the job. We get the job id with list job and then inspect job <job_id>.

And now we run the last pipeline that joins all the archives into one.

create pipeline -f pipeline_join.json

When this is done, we can download the final file to our local machine. This is the output file of the pipeline that joins the segments.

get file join_segments@master:/segmentos_playas.rds -o final_result.rds

NOTE: We suggest your remove any access you may have given to the pachyderm hub services account to any buckets you have used.

What if something goes wrong

What if something goes wrong? Can we see is if any of the datums failed? Yes, we can ask for a list of datums and check their status. If you do this on a command line you can grep (search) through the list to narrow it down.

list datum <pipeline id>

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
R		R
img		img
pipelines		pipelines
.gitignore		.gitignore
Dockerfile		Dockerfile
README.Rmd		README.Rmd
README.md		README.md
spatial_demo.Rproj		spatial_demo.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R

R

img

img

pipelines

pipelines

.gitignore

.gitignore

Dockerfile

Dockerfile

README.Rmd

README.Rmd

README.md

README.md

spatial_demo.Rproj

spatial_demo.Rproj

Repository files navigation

Pachyderm Spatial Demo with R

Step 1 - push your image to a container registry

Step 2 - Get a cluster on Pachyderm Hub

Step 3 - Put files on the cluster

Step 4 - Launch a pipeline

What if something goes wrong

About

Releases

Packages

Languages

ixpantia/spatial_demo

Folders and files

Latest commit

History

Repository files navigation

Pachyderm Spatial Demo with R

Step 1 - push your image to a container registry

Step 2 - Get a cluster on Pachyderm Hub

Step 3 - Put files on the cluster

Step 4 - Launch a pipeline

What if something goes wrong

About

Resources

Stars

Watchers

Forks

Languages