# Search Functionality

## How to use both regular and wildcard expressions in `search()`

It is often useful to search using a wildcard or a regular expression for your
search.


## Use Case #1 - Inconsistent First Member IDs

Some models don't have the member number **r1i1p1f1**. For example,
**CNRM-CM6-1**'s first member is **r1i1p1f2**.

A typical query that looks like:

```python

col.search(
    experiment_id=["historical"],
    table_id="Amon",
    variable_id="tas",
    member_id="r1i1p1f1",
)

```

Will return models that that strictly meet this criteria, excluding all models,
such as **CNRM-CM-01**, that don't have this first member.

If you want to include models that begin with varying member ids, you can use a
wildcard (`*`) in your search.

```python

col.search(
    experiment_id=["historical"],
    table_id="Amon",
    variable_id="tas",
    member_id="r1i1p1f*",
)

```

This search will return all of the target members.


## Use Case #2 Non-CF Standard Names

In some datasets the long names are **not** CF Standard Names, but names
specified in some other documentation. For this reason the user may not know
exactly what name to search for without listing all names.

```
uniques = col.unique(columns=['long_name'])
nameList = sorted(uniques['long_name']['values'])
print(*nameList, sep='\n') #note *list to unpack each item for print function
```

The above code block uses the wildcard expression (`*`) to find all unique names
in the collection, then alphabetically sorts and prints them.


### The longer example

Import and load in a typical enhanced collection description file


In [1]:
import pprint

import intake
import pandas as pd
from IPython.display import HTML

In [2]:
cat_url = "https://ncar-cesm-lens.s3-us-west-2.amazonaws.com/catalogs/aws-cesm1-le-enhanced.json"
col = intake.open_esm_datastore(cat_url)
col

Unnamed: 0,unique
component,5
dim,2
frequency,5
experiment,6
start,10
end,11
variable,75
long_name,75
path,365


Take a look at the first few lines of the enhanced catalog


In [3]:
print(col.esmcol_data["description"])
print("Catalog file:", col.esmcol_data["catalog_file"])
print(col)

HTML(col.df.head(10).to_html(index=False))

This is an inventory of the Community Earth System Model (CESM) Large Ensemble (LENS) dataset in Zarr format publicly available on Amazon S3 (https://doi.org/10.26024/wt24-5j82)
Catalog file: https://ncar-cesm-lens.s3-us-west-2.amazonaws.com/catalogs/aws-cesm1-le-enhanced.csv
<aws-cesm1-le catalog with 27 dataset(s) from 365 asset(s)>


component,dim,frequency,experiment,start,end,variable,long_name,path
atm,2D,monthly,HIST,1850-01,1919-12,FLNS,Net longwave flux at surface,s3://ncar-cesm-lens/atm/monthly/cesmLE-HIST-FLNS.zarr
atm,2D,monthly,20C,1920-01,2005-12,FLNS,Net longwave flux at surface,s3://ncar-cesm-lens/atm/monthly/cesmLE-20C-FLNS.zarr
atm,2D,daily,20C,1920-01-01,2005-12-31,FLNS,Net longwave flux at surface,s3://ncar-cesm-lens/atm/daily/cesmLE-20C-FLNS.zarr
atm,2D,monthly,RCP85,2006-01,2100-12,FLNS,Net longwave flux at surface,s3://ncar-cesm-lens/atm/monthly/cesmLE-RCP85-FLNS.zarr
atm,2D,daily,RCP85,2006-01-01,2100-12-31,FLNS,Net longwave flux at surface,s3://ncar-cesm-lens/atm/daily/cesmLE-RCP85-FLNS.zarr
atm,2D,monthly,CTRL,0400-01,2200-12,FLNS,Net longwave flux at surface,s3://ncar-cesm-lens/atm/monthly/cesmLE-CTRL-FLNS.zarr
atm,2D,monthly,CTRL_AMIP,0001-01,2600-12,FLNS,Net longwave flux at surface,s3://ncar-cesm-lens/atm/monthly/cesmLE-CTRL_AMIP-FLNS.zarr
atm,2D,monthly,CTRL_SLAB_OCN,0001-01,1000-12,FLNS,Net longwave flux at surface,s3://ncar-cesm-lens/atm/monthly/cesmLE-CTRL_SLAB_OCN-FLNS.zarr
atm,2D,monthly,HIST,1850-01,1919-12,FLNSC,Clearsky net longwave flux at surface,s3://ncar-cesm-lens/atm/monthly/cesmLE-HIST-FLNSC.zarr
atm,2D,monthly,20C,1920-01,2005-12,FLNSC,Clearsky net longwave flux at surface,s3://ncar-cesm-lens/atm/monthly/cesmLE-20C-FLNSC.zarr


Display all of the `long_name` variable options


In [4]:
uniques = col.unique(columns=["long_name"])
nameList = sorted(uniques["long_name"]["values"])
print(nameList, sep="\n")

['Clearsky net longwave flux at surface', 'Clearsky net solar flux at surface', 'Convective precipitation rate (liq + ice)', 'Convective snow rate (water equivalent)', 'Dissolved Inorganic Carbon', 'Dissolved Organic Carbon', 'Dissolved Oxygen', 'Fraction of sfc area covered by sea-ice', 'Free-Surface Residual Heat Flux', 'Free-Surface Residual Salt Flux', 'Freshwater Flux', 'Geopotential Height (above sea level)', 'Geopotential Z at 500 mbar pressure surface', 'Heat Flux across top face', 'Heat Flux in grid-x direction', 'Heat Flux in grid-y direction', 'Horizontal total wind speed average at the surface', 'Internal Ocean Heat Flux Due to Ice Formation', 'Large-scale (stable) precipitation rate (liq + ice)', 'Large-scale (stable) snow rate (water equivalent)', 'Lowest model level zonal wind', 'Maximum (convective and large-scale) precipitation rate (liq+ice)', 'Maximum reference height temperature over output period', 'Meridional wind', 'Minimum reference height temperature over outpu

If you want to unpack each item for the `print` function, you need to use the
wildcard `*`


In [5]:
print(*nameList, sep="\n")

Clearsky net longwave flux at surface
Clearsky net solar flux at surface
Convective precipitation rate (liq + ice)
Convective snow rate (water equivalent)
Dissolved Inorganic Carbon
Dissolved Organic Carbon
Dissolved Oxygen
Fraction of sfc area covered by sea-ice
Free-Surface Residual Heat Flux
Free-Surface Residual Salt Flux
Freshwater Flux
Geopotential Height (above sea level)
Geopotential Z at 500 mbar pressure surface
Heat Flux across top face
Heat Flux in grid-x direction
Heat Flux in grid-y direction
Horizontal total wind speed average at the surface
Internal Ocean Heat Flux Due to Ice Formation
Large-scale (stable) precipitation rate (liq + ice)
Large-scale (stable) snow rate (water equivalent)
Lowest model level zonal wind
Maximum (convective and large-scale) precipitation rate (liq+ice)
Maximum reference height temperature over output period
Meridional wind
Minimum reference height temperature over output period
Mixed-Layer Depth
Net longwave flux at surface
Net solar flux at 

**Note**: For the wildcard search to work, you will need at least intake-esm
v2020.08.15.
