Skip to content

Commit

Permalink
[DOC]Add a spout and service description (#3871)
Browse files Browse the repository at this point in the history
* Add a spout and service description
* Added spouts description to the pipeline spec
  • Loading branch information
svekars committed Jun 28, 2019
1 parent 0bc474d commit 2e0cd2a
Show file tree
Hide file tree
Showing 2 changed files with 53 additions and 4 deletions.
21 changes: 21 additions & 0 deletions doc/fundamentals/spouts.md
Original file line number Diff line number Diff line change
Expand Up @@ -282,3 +282,24 @@ setting it to `true` would be like having the `--overwrite` flag specified on ev
With the spec written, we would then use `pachctl create pipeline -f my-spout.json` to install the spout.
It would begin processing messages
and placing them in the `my-spout` repo.
## Combine a Spout with a Service
You can create a pipeline that can act as both a spout and a service.
If you want your spout pipeline to combine these two functionalities, add
the following to the spout pipeline specification:
```bash
"spout": {
"overwrite": false,
"service": {
"internal_port": 8200,
"external_port": 31467,
"annotations": {
"foo": "bar"
}
}
}
```
This specification creates an endpoint that can read and serve
data from Pachyderm and write data back into a Pachyderm repository.
36 changes: 32 additions & 4 deletions doc/reference/pipeline_spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,17 @@ create pipeline](../pachctl/pachctl_create_pipeline.html) doc.
"internal_port": int,
"external_port": int
},
"spout": {
"overwrite": bool
\\ Optionally, you can combine a spout with a service:
"service": {
"internal_port": int,
"external_port": int,
"annotations": {
"foo": "bar"
}
}
},
"max_queue_size": int,
"chunk_spec": {
"number": int,
Expand Down Expand Up @@ -129,6 +140,8 @@ create pipeline](../pachctl/pachctl_create_pipeline.html) doc.
etc...
]



------------------------------------
"cron" input
------------------------------------
Expand Down Expand Up @@ -631,13 +644,28 @@ in the input repos.
`service` specifies that the pipeline should be treated as a long running
service rather than a data transformation. This means that `transform.cmd` is
not expected to exit, if it does it will be restarted. Furthermore, the service
will be exposed outside the container using a kubernetes service.
is exposed outside the container using a Kubernetes service.
`"internal_port"` should be a port that the user code binds to inside the
container, `"external_port"` is the port on which it is exposed, via the
NodePorts functionality of kubernetes services. After a service has been
created you should be able to access it at
container, `"external_port"` is the port on which it is exposed through the
`NodePorts` functionality of Kubernetes services. After a service has been
created, you should be able to access it at
`http://<kubernetes-host>:<external_port>`.

### Spout (optional)

`spout` is a type of pipeline that processes streaming data.
Unlike a union or cross pipeline, a spout pipeline does not have
a PFS input. Instead, it opens a Linux *named pipe* into the source of the
streaming data. Your pipeline
can be either a spout or a service and not both. Therefore, if you added
the `service` as a top-level object in your pipeline, you cannot add `spout`.
However, you can expose a service from inside of a spout pipeline by
specifying it as a field in the `spout` spec. Then, Kubernetes creates
a service endpoint that you can expose externally. You can get the information
about the service by running `kubectl get services`.

For more information, see [Spouts](../fundamentals/spouts.html).

### Max Queue Size (optional)
`max_queue_size` specifies that maximum number of datums that a worker should
hold in its processing queue at a given time (after processing its entire
Expand Down

0 comments on commit 2e0cd2a

Please sign in to comment.