# Pattern as Path - CSV

In this notebook we'll use a data source that encodes multiple variables in the filename. We'll demonstrate how we can quickly load and visualize these data.

In [None]:
import intake

First we'll open the catalog and take a look at the data sources defined within:

In [None]:
cat = intake.Catalog('catalog.yml')
list(cat)

For this example we will be using `southern_rockies`. We can learn more about the data source by reading the `description`.

In [None]:
southern_rockies = cat.southern_rockies()
print(southern_rockies.description)

We can also inspect the  `pattern` property of the data source:

In [None]:
southern_rockies.pattern

We will use the `read` method to load the data.

In [None]:
df = southern_rockies.read()
df.sample(5)

The values for `emissions` and `model` are parsed from the filenames and added to the data. By inspecting more closely we can see that these new columns have categorical datatypes:

In [None]:
df.dtypes

This is useful because it makes the data highly visualize-able. 

## Visualization
Intake provides a plotting API which uses `hvplot` to declare plots in the catalog itself. This API can be used to set default values for a particular data source and to declare specific plots. 

In [None]:
import hvplot.intake

intake.output_notebook()

In this case specify some defaults to make it easy to produce plots quickly. You'll find these lines in the catalog file:
```
metadata:
  plot:
    x: time
    y: precip
```

In [None]:
southern_rockies.plot(groupby='emissions', by='model')

In addition to declaring defaults, the catalog author can specify complete plots:
```
metadata:
  plots:
    model_emissions_grid:
      col: model
      row: emissions
      width: 300
      height: 200
```

In [None]:
southern_rockies.plot.model_emissions_grid()

This is a great way to get a quick sense of the data, and the interactivity makes it straightforward to zoom into the area of interest and derive real meaning. We can use the `pandas.dataframe` directly to do computations and make more visualizations.

In [None]:
import hvplot.pandas

In [None]:
unit = southern_rockies.metadata['fields']['precip']['unit']

In [None]:
thresh = 3
years = 20
label = f'Days per {years} years with precip ({unit}) greater than {thresh}'

(df[df['precip'] > thresh]
    .groupby('emissions').resample(f'{years}y', on='time').sum()
    .rename(columns={'precip': 'count'}) 
    .hvplot.bar(by='emissions', x='time') 
    .relabel(label))

## Using a list of paths 

For this example for a change of pace we'll show using a list of files and with inline opening. 


In [None]:
paths = ['./data/SRLCC_b1_Precip_ECHAM5-MPI.csv', 'data/SRLCC_b1_Precip_MIROC3.2(medres).csv']

southern_rockies_list = intake.open_csv(urlpath=paths,
                path_as_pattern='SRLCC_{emissions}_Precip_{model}.csv',
                csv_kwargs=dict(
                    skiprows=3,
                    names=['time', 'precip'],
                    parse_dates=['time']))

In [None]:
df = southern_rockies_list.read()
df.sample(5)

In this case since we are using inline loading rather than the catalog, we need declare ever feature of out plot inline.

In [None]:
southern_rockies_list.hvplot(x='time', y='precip', col='model', row='emissions', width=300, height=200)