## Retrieving data from FDB

In [1]:
import earthkit.data

FDB (Fields DataBase) is a domain-specific object store developed at ECMWF for storing, indexing and retrieving GRIB data. For more information on FBD please consult the following pages:

- [FDB](https://fields-database.readthedocs.io/en/latest/)
- [pyfdb](https://pyfdb.readthedocs.io/en/latest/)

This example requires FDB access and the <b>FDB_HOME</b> environment variable has to be set correctly. 

The following request was  written to retrieve data from the operational FDB at ECMWF.  Please note that the **date** must be adjusted since FDB at ECMWF only stores the most recent dates.

In [2]:
request = {
    'class': 'od',
    'expver': '0001',
    'stream': 'oper',
    'date': '20230524',
    'time': [0, 12],
    'domain': 'g',
    'type': 'an',
    'levtype': 'sfc',
    'step': 0,
    'param': [151, 167]
}

### Reading as a stream

We can retrieve data from FDB as a stream.

#### Stream: iteration with one field at a time in memory

When we use the default arguments the resulting object can only be used for iteration and only one field is kept in memory at a time. Fields created in the iteration get deleted when going out of scope.

In [3]:
ds = earthkit.data.from_source("fdb", request)
for f in ds:
    print(f)

GribField(msl,None,20230524,0,0,0)
GribField(2t,None,20230524,0,0,0)
GribField(msl,None,20230524,1200,0,0)
GribField(2t,None,20230524,1200,0,0)


Once the iteration is completed, there is nothing left in *ds*.

In [4]:
for f in ds:
    print(f)

#### Stream: using batch_size

We can read multiple fields into memory from the stream at a time by using **batch_size** in *from_source()*:

In [5]:
ds = earthkit.data.from_source("fdb", request, batch_size=2)
for f in ds:
    # f is a FieldList containing 2 fields. It gets deleted when going out of scope
    print(len(f))
    print(f.metadata("param"))

2
['msl', '2t']
2
['msl', '2t']


#### Stream: storing all the fields in memory

When we use **batch_size=0** all the fields are loaded into memory and the resulting object iswill behave like a FieldList

In [6]:
ds = earthkit.data.from_source("fdb", request, batch_size=0)

Nothing is read at this moment:

In [7]:
print(f"stored fields count={len(ds._reader._fields)}")

stored fields count=0


If we call any function on the fieldlist it reads the messages into memory

In [8]:
len(ds)

4

In [9]:
print(f"stored fields count={len(ds._reader._fields)}")

stored fields count=4


In [10]:
ds.ls()

Unnamed: 0,centre,shortName,typeOfLevel,level,dataDate,dataTime,stepRange,dataType,number,gridType
0,ecmf,msl,surface,0,20230524,0,0,an,0,reduced_gg
1,ecmf,2t,surface,0,20230524,0,0,an,0,reduced_gg
2,ecmf,msl,surface,0,20230524,1200,0,an,0,reduced_gg
3,ecmf,2t,surface,0,20230524,1200,0,an,0,reduced_gg


In [11]:
ds.sel(param="2t").ls()

remapping={}


Unnamed: 0,centre,shortName,typeOfLevel,level,dataDate,dataTime,stepRange,dataType,number,gridType
0,ecmf,2t,surface,0,20230524,0,0,an,0,reduced_gg
1,ecmf,2t,surface,0,20230524,1200,0,an,0,reduced_gg


In [12]:
ds.to_xarray()

### Reading into a file

We can retrieve data from FDB into a file, which is located in the cache: 

In [13]:
ds = earthkit.data.from_source("fdb", request, stream=False)

In [14]:
ds.ls()

Unnamed: 0,centre,shortName,typeOfLevel,level,dataDate,dataTime,stepRange,dataType,number,gridType
0,ecmf,msl,surface,0,20230524,0,0,an,0,reduced_gg
1,ecmf,2t,surface,0,20230524,0,0,an,0,reduced_gg
2,ecmf,msl,surface,0,20230524,1200,0,an,0,reduced_gg
3,ecmf,2t,surface,0,20230524,1200,0,an,0,reduced_gg


The data is now cached. Subsequent retrievals will used the cached file directly.