You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### Create a new subscription or use existing subscription
9
+
10
+
Follow [the instruction](https://cloud.google.com/pubsub/lite/docs/quickstart#create_a_lite_subscription) to create a new
11
+
subscription or use existing subscription. If using existing subscription, the connector will read message from the
12
+
oldest unacknowledged.
13
+
14
+
### Create a Google Cloud Dataproc cluster (Optional)
15
+
16
+
If you do not have an Apache Spark environment you can create a Cloud Dataproc cluster with pre-configured auth. The following examples assume you are using Cloud Dataproc, but you can use `spark-submit` on any cluster.
17
+
18
+
```
19
+
MY_CLUSTER=...
20
+
gcloud dataproc clusters create "$MY_CLUSTER"
21
+
```
22
+
23
+
## Downloading and Using the Connector
24
+
25
+
<!--- TODO(jiangmichael): Add jar link for spark-pubsublite-latest.jar -->
26
+
The latest version connector of the connector (Scala 2.11) is publicly available in
Note that the connector supports both MicroBatch Processing and [Continuous Processing](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#continuous-processing).
52
+
53
+
### Properties
54
+
55
+
The connector supports a number of options to configure the read:
56
+
57
+
| Option | Type | Required | Meaning |
58
+
| ------ | ---- | -------- | ------- |
59
+
| pubsublite.subscription | String | Y | Full subscription path that the connector will read from. |
60
+
| pubsublite.flowcontrol.byteoutstandingperpartition | Long | N | Max number of bytes per partition that will be cached in workers before Spark processes the messages. Default to 50000000 bytes. |
61
+
| pubsublite.flowcontrol.messageoutstandingperpartition | Long | N | Max number of messages per partition that will be cached in workers before Spark processes the messages. Default to Long.MAX_VALUE. |
62
+
| gcp.credentials.key | String | N | Service account JSON in base64. Default to [Application Default Credentials](https://cloud.google.com/docs/authentication/production#automatically). |
63
+
64
+
### Data Schema
65
+
66
+
The connector has fixed data schema as follows:
67
+
68
+
| Data Field | Spark Data Type | Notes |
69
+
| ---------- | --------------- | ----- |
70
+
| subscription | StringType | Full subscription path |
0 commit comments