# How to execute large jobs?: Using Batch Job

Most of the simple, basic openEO usage examples show synchronous downloading of results: you submit a process graph with a (HTTP POST) request and receive the result as direct response of that same request. This only works properly if the processing doesn’t take too long (order of seconds, or a couple of minutes at most).

For the heavier work (larger regions of interest, larger time series, more intensive processing, …) you have to use batch jobs.

This notebook shows how to programmatically create and interact with batch job using the openEO Python client library.

In [2]:
import openeo

# connect to the backend and authenticate
connection = openeo.connect(url = "openeo-staging.creo.vito.be")
connection.authenticate_oidc()

Authenticated using refresh token.


<Connection to 'https://openeo-staging.creo.vito.be/openeo/1.1/' with OidcBearerAuth>

In [3]:
# load your data collection
cube = connection.load_collection(
                            "SENTINEL2_L2A",
                            bands = ["B04", "B03", "B02"],
                            temporal_extent = ("2022-05-01", "2022-05-30"),
                            spatial_extent = {'west': 3.202609,'south': 51.189474,'east': 3.254708,'north': 51.204641,'crs': 'EPSG:4326'},
                            max_cloud_cover=80

)
cube.max_time()

In [4]:
# Store raster data as GeoTIFF files
cube = cube.save_result(format="GTiff")

While not necessary, it is also recommended to give your batch job a descriptive title so it’s easier to identify in your job listing.

In [5]:
job = cube.create_job(title = "Testing batch job in openeo")

The job object returned by create_job() is a BatchJob object. It is basically a client-side reference to a batch job that exists on the back-end and allows to interact with that batch job

Starting a batch job is pretty straightforward with the `start_job()` or if you want to monitor it you can also use `start_and_wait()` and itprint some progress messages on the way for you.

In [6]:
result = job.start_and_wait()

0:00:00 Job 'j-aae1404e0e27402f9ffe6a3a1bc28c77': send 'start'
0:00:11 Job 'j-aae1404e0e27402f9ffe6a3a1bc28c77': created (progress N/A)
0:00:16 Job 'j-aae1404e0e27402f9ffe6a3a1bc28c77': created (progress N/A)
0:00:23 Job 'j-aae1404e0e27402f9ffe6a3a1bc28c77': created (progress N/A)
0:00:31 Job 'j-aae1404e0e27402f9ffe6a3a1bc28c77': created (progress N/A)
0:00:41 Job 'j-aae1404e0e27402f9ffe6a3a1bc28c77': created (progress N/A)
0:00:53 Job 'j-aae1404e0e27402f9ffe6a3a1bc28c77': running (progress N/A)
0:01:08 Job 'j-aae1404e0e27402f9ffe6a3a1bc28c77': running (progress N/A)
0:01:27 Job 'j-aae1404e0e27402f9ffe6a3a1bc28c77': running (progress N/A)
0:01:51 Job 'j-aae1404e0e27402f9ffe6a3a1bc28c77': running (progress N/A)
0:02:21 Job 'j-aae1404e0e27402f9ffe6a3a1bc28c77': running (progress N/A)
0:02:59 Job 'j-aae1404e0e27402f9ffe6a3a1bc28c77': finished (progress N/A)


A batch job on a back-end is fully identified by its job_id.

In [7]:
job.job_id

'j-aae1404e0e27402f9ffe6a3a1bc28c77'

Depending on your situation or use case: make sure to properly take note of the batch job id. It allows you to “reconnect” to your job on the back-end, even if it was created at another time, by another script/notebook or even with another openEO client. Then you can later use use `Connection.job("your job id")` to create a BatchJob object for an existing batch job.

A batch job typically takes some time to finish, and you can check its status with the `status()` method.

In [8]:
job.status()

'finished'

Batch job logs can be fetched with `job.logs()`

In [9]:
job.logs()

Once a batch job is finished you can get a handle to the results (which can be a single file or multiple files) and metadata with `get_results()`.
The result metadata describes the spatio-temporal properties of the result and is in fact a valid STAC item.

In [10]:
results = result.get_results()
results.get_metadata()

{'assets': {'openEO_2022-05-03Z.tif': {'eo:bands': [{'center_wavelength': 0.6645,
     'name': 'B04'},
    {'center_wavelength': 0.56, 'name': 'B03'},
    {'center_wavelength': 0.4966, 'name': 'B02'}],
   'file:nodata': [0],
   'href': 'https://openeo-staging.creo.vito.be/openeo/1.1/jobs/j-aae1404e0e27402f9ffe6a3a1bc28c77/results/assets/M2UyNGUyNTEtMmU5YS00MzhmLTkwYTktZDQ1MDBlNTc2NTc0/aff72f67c43208d79b576d8451f6bde7/openEO_2022-05-03Z.tif?expires=1684238747',
   'roles': ['data'],
   'title': 'openEO_2022-05-03Z.tif',
   'type': 'image/tiff; application=geotiff'},
  'openEO_2022-05-08Z.tif': {'eo:bands': [{'center_wavelength': 0.6645,
     'name': 'B04'},
    {'center_wavelength': 0.56, 'name': 'B03'},
    {'center_wavelength': 0.4966, 'name': 'B02'}],
   'file:nodata': [0],
   'href': 'https://openeo-staging.creo.vito.be/openeo/1.1/jobs/j-aae1404e0e27402f9ffe6a3a1bc28c77/results/assets/M2UyNGUyNTEtMmU5YS00MzhmLTkwYTktZDQ1MDBlNTc2NTc0/4c78b026d08dce60b943479e75e0e055/openEO_2022-05-08

In the general case, when you have one or more result files (also called “assets”), the easiest option to download them is using `download_files()` (plural) where you just specify a download folder (otherwise the current working directory will be used by default).

In [15]:
results.download_files("data/out")