# k8s Streamset Deployment on HPE Container Platform
![image](images/maprstreamsetlogo.png)

StreamSets provides Continuous Ingest technology for the next generation of big data applications. Its enterprise-grade infrastructure accelerates time-to-analysis by bringing unprecedented transparency and event processing to data in motion.
For more information, visit http://www.streamsets.com/



# k8s Streamset Deployment on HPE Container Platform

The following Tutorial describes how to leverage streamset on the HPE Container Platform for DataIntegration between external datasources and the integrated HPE Data Fabric.   

The Data Fabric provides multiple Dataservices 

* File Data Service
  store or read files of different Data Formats using the MapR-FS Connector

* Event Store Service 
  Integrate Data from Streaming Sources as well as CDC using MapR-Streams Connector 
  
* Document Database Services
  Create / Read / Update / Delete Records into the integrated Document Database using MapR-DB JSON for Operational Analytics

* Wide Column Store Database
  Create / Read / Update / Delete Records into the integrated Widecolum Store using MapR-DB Binary (Hbase). 

![image](images/streamsetspipeline.png)


Get a list of supported MapR (HPE Data  

## Prepare your Environment

### List the available Container Platform Clusters in your Environment

In [1]:
kubectx

tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
tput: No value for $TERM and no -T specified
development


### Set the Cluster that you want to use for Application Deployment

In [None]:
kubectx development

### Set the tenant where you want to deploy the K8S Application

In [5]:
kubens researchgroup1

Context "development" modified.
Active namespace is "researchgroup1".


### Setup a Service Account for the K8S Application
It is very important that create a secret which includes the maprticket to authenticate with the internal Data Fabric.

Note: The following procedure describes how to get the admin ticket for the internal Data Fabric.  This procedure should only be used for Demos and should not be used for any Production Installation

You can get the Data Fabric admin ticket from the HPE Controller Platform Control Node ## echo -n $(cat /opt/bluedata/mapr/conf/mapruserticket ) | base64 -w 0

Please copy the bas64 encoded string into the CONTAINER_TICKET VARIABLE below

In [14]:
cat << 'EOF' | kubectl create -f -
# MapR Apps ServiceAccount
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: streamset-app-sa

# Ticket to authenticate with HPE Container Platform internal Data Fabric
---
apiVersion: v1
kind: Secret
metadata:
  name: maprticket
type: Opaque
data:
#Note for Testing only you can use the following Command on the Controller Node ## echo -n $(cat /opt/bluedata/mapr/conf/mapruserticket ) | base64 -w 0

  CONTAINER_TICKET: aGNwLm1hcHIuY2x1c3RlciBLTzBnRTRRKzQ3WnFhdy9URDQ2WGlqSzhPMHdGOW4waEY1NEh3WEVQOEgyZ3RBZHhYMHluVXFSYnMrSGVDOTBpRzk0MW9hYkw3dG5jRDh2ekJyQnY1YmJRb1Bwbk9PdnNUMC95bWw5TGhGOER6SUNkL05Kakl3LzRKRzM2MlN0Q2I0NVdLL0FTK1NIdHdFenVRU011SkRDRTY1L1pDTVNDMWN6UWFBSXNFRWVqaVJvR2pxK2hkQVFQN2duSmRJVGM3TEY4Q1BCWW8xdHU1NHltVDJqbGEySDA4YldENDVKVUdHNWh1Q0U1NGhxc1dPRzB2UUliNEJpRUtiakNBd3B0c3g4V1dud1h3bHIxcWNWdE5uUmpmMy9icE1PamhFeWgzWEhs
---
EOF

serviceaccount/streamset-app-sa created
secret/maprticket created


### Setup a Persistent Volume claim for the streamset data collector application 

In [24]:
cat << 'EOF' | kubectl create -f -

# Create PVC for internal Data Fabric
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-streamset-app
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 5G
    
--- 
EOF

persistentvolumeclaim/pvc-streamset-app created


## Create the  deployment yaml for the Streamset Data Collector Container. 
Please modify the Environment Variables to match your HPE Container Platform Data Fabric Settings

In [37]:
cat << 'EOF' | kubectl create -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: streamset-app
spec:
  selector:
    matchLabels:
      app: streamset-app
      tier: frontend
  template:
    metadata:
      labels:
        app: streamset-app
        tier: frontend
    spec:
     
      containers:
        - name: streamset-app
          image: mkieboom/mapr-pacc-streamsets-docker:streamsets381_mapr610_mep600
          ports:
            - containerPort: 18630
              name: streamset-port
          securityContext:
  
          command: ["/bin/sh","-c"]
          args: ["sleep 2 && /opt/mapr/installer/docker/mapr-setup.sh container && sudo -E -u $SDC_USER $SDC_DIST/bin/streamsets dc"]
              
          volumeMounts:

          - name: maprticket-volume
            mountPath: "/tmp/ticket/"
            
          - name: streamset-data
            mountPath: "/data"
            
          - mountPath: /mapr/hcp.mapr.cluster
            name: maprvolume
            
          env:  
           - name: MAPR_CLUSTER
             value: "hcp.mapr.cluster"	##### <-----Enter you MapR Cluster Name
           - name: MAPR_CLDB_HOSTS
             value: "10.1.0.195"		##### <-----Enter you MapR Cluster CLDB Nodes.  ## execute the following command on controller : bdmapr maprcli node listcldbs 
           - name: MAPR_CONTAINER_USER
             value: "mapr"				##### <-----Enter you MapR Cluster User 
           - name: MAPR_CONTAINER_GROUP
             value: "mapr"				##### <-----Enter you MapR Cluster Users Group 
           - name: MAPR_CONTAINER_UID
             value: "5000"				##### <-----Enter you MapR User UID 
           - name: MAPR_CONTAINER_GID
             value: "5000"				##### <-----Enter you MapR User GID 
           - name: SDC_JAVA_OPTS
             value: "-Dmaprlogin.password.enabled=true"
           - name: MAPR_TICKETFILE_LOCATION
             value: "/tmp/ticket/CONTAINER_TICKET"

      volumes:
          
      - name: maprticket-volume
        secret:
          secretName: maprticket
          
      - name: streamset-data
        hostPath:
          path: /tmp
          type: DirectoryOrCreate
          
      - name: maprvolume
        persistentVolumeClaim:
          claimName: pvc-streamset-app
          
---

apiVersion: v1
kind: Service
metadata:
  name: streamset-app 
spec:
  selector: 
    app: streamset-app
  ports:
  - name: http-streamset
    protocol: TCP
    port: 18630
    targetPort: 18630
  type: NodePort
          
          
EOF

deployment.apps/streamset-app created
service/streamset-app created


### Check if the Streamset Data Collector POD is running

In [48]:
kubectl get pods -l app=streamset-app

NAME                             READY   STATUS    RESTARTS   AGE
streamset-app-6b65bc99c6-k2sh4   1/1     Running   0          3m47s


### Check the Service port the Streamset Data Collector pod is mapped to ( Annotation Link for the HPE Gateway)

In [50]:
kubectl describe svc streamset-app -n researchgroup1

Name:                     streamset-app
Namespace:                researchgroup1
Labels:                   hpecp.hpe.com/hpecp-internal-gateway=true
Annotations:              hpecp-internal-gateway/18630: ec2-15-236-36-5.eu-west-3.compute.amazonaws.com:10022
Selector:                 app=streamset-app
Type:                     NodePort
IP:                       10.96.62.3
Port:                     http-streamset  18630/TCP
TargetPort:               18630/TCP
NodePort:                 http-streamset  32165/TCP
Endpoints:                10.192.2.112:18630
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type    Reason  Age    From         Message
  ----    ------  ----   ----         -------
  Normal  HpeCp   4m58s  hpecp-agent  Created HPECP K8S service


# Access the Streamsets UI from Container Platform 
![image](images/ContainerPlatform_Service.png)
# Login Screen
### Default User = admin
### Default Password = admin
![image](images/streamset-adminui.png)