# Batch processing with Spark RDDs

In this notebook we will dive into how you can run batch processing with Spark RDDs.

Dependencies:

* Seldon core installed as per the docs with an ingress
* Spark and PySpark installed


### Setup

#### Install Seldon Core
Use the notebook to [set-up Seldon Core with Ambassador or Istio Ingress](https://docs.seldon.io/projects/seldon-core/en/latest/examples/seldon_core_setup.html).

In [2]:
%%bash
kubectl apply -f - << END
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: sklearn
spec:
  name: iris
  predictors:
  - graph:
      children: []
      implementation: SKLEARN_SERVER
      modelUri: gs://seldon-models/sklearn/iris
      name: classifier
    name: default
    replicas: 1
END

seldondeployment.machinelearning.seldon.io/sklearn created


In [None]:
import findspark
findspark.init()

import pyspark
import random

sc = pyspark.SparkContext(appName="seldon")
num_samples = 100000000

def inside(p):     
  x, y = random.random(), random.random()
  return x*x + y*y < 1

count = sc.parallelize(range(0, num_samples)).filter(inside).count()

pi = 4 * count / num_samples
print(pi)

sc.stop()