## Spark Job with Spark Operator
Using spark operator for running spark job over k8s.

In [1]:
import mlrun
import os

# set up new spark function with spark operator
# command will use our spark code which needs to be located on our file system
# the name param can have only non capital letters (k8s convention)
sj = mlrun.new_function(kind='spark', command='/User/sparkreadCSV.py', name='sparkreadcsv') 

# set spark driver config (gpu_type & gpus=<number_of_gpus>  supported too)
sj.with_driver_limits(cpu="1300m")
sj.with_driver_requests(cpu=1, mem="512m") 

# set spark executor config (gpu_type & gpus=<number_of_gpus> are supported too)
sj.with_executor_limits(cpu="1400m")
sj.with_executor_requests(cpu=1, mem="512m")

# adds fuse, daemon & iguazio's jars support
sj.with_igz_spark() 

# args are also supported
sj.spec.args = ['-spark.eventLog.enabled','true']

# add python module
sj.spec.build.commands = ['pip install matplotlib']

# Number of executors
sj.spec.replicas = 2 

# Rebuilds the image with MLRun - needed in order to support artifactlogging etc
sj.deploy() 

# Run task while setting the artifact path on which our run artifact (in any) will be saved
sj.run(artifact_path='/User')

> 2020-12-21 14:07:14,144 [info] starting remote build, image: .mlrun/func-default-sparkreadcsv-latest
[36mINFO[0m[0020] Retrieving image manifest datanode-registry.iguazio-platform.app.hsbctesting3.iguazio-cd0.com:80/iguazio/spark-app:3.0_katyak_debug_b1089_20201214154653 
[36mINFO[0m[0020] Retrieving image manifest datanode-registry.iguazio-platform.app.hsbctesting3.iguazio-cd0.com:80/iguazio/spark-app:3.0_katyak_debug_b1089_20201214154653 
[36mINFO[0m[0020] Built cross stage deps: map[]                
[36mINFO[0m[0020] Retrieving image manifest datanode-registry.iguazio-platform.app.hsbctesting3.iguazio-cd0.com:80/iguazio/spark-app:3.0_katyak_debug_b1089_20201214154653 
[36mINFO[0m[0020] Retrieving image manifest datanode-registry.iguazio-platform.app.hsbctesting3.iguazio-cd0.com:80/iguazio/spark-app:3.0_katyak_debug_b1089_20201214154653 
[36mINFO[0m[0020] Executing 0 build triggers                   
[36mINFO[0m[0020] Unpacking rootfs as cmd RUN pip install matplotli

project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
default,...7c4b8ef6,0,Dec 21 14:10:33,completed,sparkreadcsv,v3io_user=adminkind=sparkowner=adminmlrun/job=sparkreadcsv-d00ee42ehost=sparkreadcsv-d00ee42e-driver,,,,df_sample


to track results use .show() or .logs() or in CLI: 
!mlrun get run ddb40aabe2ca48a78b9b09437c4b8ef6 --project default , !mlrun logs ddb40aabe2ca48a78b9b09437c4b8ef6 --project default
> 2020-12-21 14:11:11,513 [info] run executed, status=completed


<mlrun.model.RunObject at 0x7ff9a28e1790>