## Ingesting Data with Intake

Intake provides an easy way to find your data, locally, in a cloud service, or on an Intake server. Allowing users to load analysis-ready data without worrying about the specifics of the loading package, format and storage backend.

In [None]:
import intake
intake.output_notebook()

## Catalogs

The common starting point for finding and inspecting data-sets is with a catalog, which is a collection of entries, each of which corresponds to a specific data-set. The entries have names, descriptions and metadata, to allow for searching and filtering of the entries, to find the specific data which solves a particular problem.

In [None]:
local_cat = intake.open_catalog('catalog.yml')
list(local_cat)

In addition to loading catalogs that are stored locally, you can use intake to explore remote catalogs. For instance we can take a look at the intake catalog of downloads from anaconda.org.

In [None]:
url = 'https://raw.githubusercontent.com/ContinuumIO/anaconda-package-data/master/catalog/anaconda_package_data.yaml'
remote_cat = intake.open_catalog(url)
list(remote_cat)

If you have access to an [intake-server](distributing_data.ipynb), then you will also be able to access catalogs at the server url. Replace MY_SERVER_URL in the cell below with the url at which you have deployed your server, but replace "https" with "intake".

<div class="alert alert-warning" role="alert">
    
For instance if you deployed a server at `https://server.aip.anaconda.com` then you can will use `intake://server.aip.anaconda.com`. 

</div>

In [None]:
# replace this value with your own
MY_SERVER_URL = 'intake://server.aip.anaconda.com'  

server_cat = intake.open_catalog(MY_SERVER_URL, http_args={'ssl': True})
list(server_cat)

If you have installed a data package into your conda env (`conda install -c intake us_crime` for instance), this will be available at `intake.cat`. Otherwise the following call will return an empty list.

In [None]:
list(intake.cat)

<div class="alert alert-info" role="alert">
    <b>NOTE:</b> To learn how to create your own data package, see <a href=distributing_data.ipynb>distributing_data.ipynb</a>.
</div>

## Data Sources

No matter how you load your catalog, you should be able to use it to access the data sources within. You can think of the data sources as recipes for how to get the data. For information about how to create data sources see [distributing_data.ipynb](distributing_data.ipynb). These data sources have pointers, information on how to load it, and additional metadata. For instance, often data sources will have a description:

In [None]:
source = local_cat.southern_rockies
print(source.description)

Although we have read some metadata, at this point if the data are remote, they haven't been downloaded yet. With the next step we will read those data into a `pandas.DataFrame`.

In [None]:
df = source.read()
df.head()

This is just a regular pandas dataframe which we can use for all regular pandas operations.

In [None]:
mean_df = df.groupby(['emissions', 'model'])['precip'].mean()
mean_df

In [None]:
import hvplot.pandas

mean_df.hvplot.barh()

If our data are large, we might not want to read them all straight away. In that case, we can use the `to_dask` method to read the metadata and set up tasks to be performed on our larger dataset. 

In [None]:
ddf = source.to_dask()
ddf

## Plotting the data

Catalogs can contain definitions of plots which are available all the time for the data. You can see a list of these plots:

In [None]:
source.plots

In [None]:
source.hvplot.violin_example()

You can also create custom plots using `.hvplot`

In [None]:
source.hvplot.line(x='time', y='precip', groupby=['emissions', 'model'])

## Graphical interface

You can also explore the catalog using the GUI. Since the GUI depends on [Panel](https://panel.pyviz.org), you can use it in the notebook, or deploy an instance of the gui for `catalog.yml` by clicking the "Deploy" button and choosing the command: `gui`.

In [None]:
local_cat.gui

To learn more about Intake, see [distributing_data.ipynb](distributing_data.ipynb) or visit the [Intake docs](https://intake.readthedocs.io).