[DOC]Add a spout and service description (#3871)

* Add a spout and service description * Added spouts description to the pipeline spec
pachyderm · Jun 28, 2019 · 2e0cd2a · 2e0cd2a
1 parent 0bc474d
commit 2e0cd2a
Show file tree

Hide file tree

Showing 2 changed files with 53 additions and 4 deletions.
diff --git a/doc/fundamentals/spouts.md b/doc/fundamentals/spouts.md
@@ -282,3 +282,24 @@ setting it to `true` would be like having the `--overwrite` flag specified on ev
 With the spec written, we would then use `pachctl create pipeline -f my-spout.json` to install the spout.
 It would begin processing messages
 and placing them in the `my-spout` repo.
+
+## Combine a Spout with a Service
+
+You can create a pipeline that can act as both a spout and a service.
+If you want your spout pipeline to combine these two functionalities, add
+the following to the spout pipeline specification:
+
+```bash
+"spout": {
+    "overwrite": false,
+    "service": {
+        "internal_port": 8200,
+        "external_port": 31467,
+        "annotations": {
+            "foo": "bar"
+        }
+    }
+}
+ ```
+This specification creates an endpoint that can read and serve
+data from Pachyderm and write data back into a Pachyderm repository.
diff --git a/doc/reference/pipeline_spec.md b/doc/reference/pipeline_spec.md
@@ -75,6 +75,17 @@ create pipeline](../pachctl/pachctl_create_pipeline.html) doc.
     "internal_port": int,
     "external_port": int
   },
+  "spout": {
+  "overwrite": bool
+  \\ Optionally, you can combine a spout with a service:
+  "service": {
+        "internal_port": int,
+        "external_port": int,
+        "annotations": {
+            "foo": "bar"
+        }
+    }
+  },
   "max_queue_size": int,
   "chunk_spec": {
     "number": int,
@@ -129,6 +140,8 @@ create pipeline](../pachctl/pachctl_create_pipeline.html) doc.
   etc...
 ]
 
+
+
 ------------------------------------
 "cron" input
 ------------------------------------
@@ -631,13 +644,28 @@ in the input repos.
 `service` specifies that the pipeline should be treated as a long running
 service rather than a data transformation. This means that `transform.cmd` is
 not expected to exit, if it does it will be restarted. Furthermore, the service
-will be exposed outside the container using a kubernetes service.
+is exposed outside the container using a Kubernetes service.
 `"internal_port"` should be a port that the user code binds to inside the
-container, `"external_port"` is the port on which it is exposed, via the
-NodePorts functionality of kubernetes services. After a service has been
-created you should be able to access it at
+container, `"external_port"` is the port on which it is exposed through the
+`NodePorts` functionality of Kubernetes services. After a service has been
+created, you should be able to access it at
 `http://<kubernetes-host>:<external_port>`.
 
+### Spout (optional)
+
+`spout` is a type of pipeline that processes streaming data.
+Unlike a union or cross pipeline, a spout pipeline does not have
+a PFS input. Instead, it opens a Linux *named pipe* into the source of the
+streaming data. Your pipeline
+can be either a spout or a service and not both. Therefore, if you added
+the `service` as a top-level object in your pipeline, you cannot add `spout`.
+However, you can expose a service from inside of a spout pipeline by
+specifying it as a field in the `spout` spec. Then, Kubernetes creates
+a service endpoint that you can expose externally. You can get the information
+about the service by running `kubectl get services`.
+
+For more information, see [Spouts](../fundamentals/spouts.html).
+
 ### Max Queue Size (optional)
 `max_queue_size` specifies that maximum number of datums that a worker should
 hold in its processing queue at a given time (after processing its entire