# General template for creating a derived dataset *(aka. adding an edge to the DatasetGraph)*

This example creates the dataset from in the [`Add-csv-template.ipynb`](https://cookiecutter-easydata.readthedocs.io/en/latest/Add-csv-template/) example, but does it completely generally without using the `workflow` helper function and builds on the `New-Dataset-Template.ipynb` example. Any derived dataset can be added in this way as an *edge* in the `DatasetGraph`.

## Basic imports

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
# Basic utility functions
import logging
import os
import pathlib
from src.log import logger
from src import paths
from src.utils import list_dir
from functools import partial

# data functions
from src.data import DataSource, Dataset, DatasetGraph
from src import workflow

In [None]:
# Optionally set to debug log level
logger.setLevel(logging.DEBUG)

## Source dataset



In [None]:
source_ds_name = 'covid-19-epidemiology-raw'

In [None]:
source_ds = Dataset.load(source_ds_name)

In [None]:
source_ds.EXTRA

## Create and add your transfomer function
Here we'll use a pre-built transformer function `csv_to_pandas`, but normally you would place your new transformer function in `{your_project_module}/data/transformer_functions.py` as in the [`Add-Derived-Dataset.ipynb`](https://cookiecutter-easydata.readthedocs.io/en/latest/Add-derived-dataset/) example. 

Transformer functions take a dict of Datasets of the form `{ds_name: ds}` as input and outputs a new dict of Datasets of the same form.



In [None]:
from src.data.transformer_functions import csv_to_pandas
from src.data import create_transformer_pipeline

In [None]:
## Fill this in for your dataset
ds_name = 'covid-19-epidemiology'
transformers = [partial(csv_to_pandas,
                        output_map={ds_name:'epidemiology.csv'})]

## Create the new edge in the transformer graph

In [None]:
dag = DatasetGraph(catalog_path=paths['catalog_path'])

In [None]:
dag.add_edge(input_dataset=source_ds_name,
             output_dataset=ds_name,
             transformer_pipeline=create_transformer_pipeline(transformers),
             force=True)

In [None]:
%%time
ds = Dataset.from_catalog(ds_name)

In [None]:
%%time
ds = Dataset.load(ds_name)

In [None]:
print(ds.DESCR)

In [None]:
print(ds.LICENSE)

In [None]:
ds.data.shape

In [None]:
ds.data.head()

## Check-in the new dataset
Finally, check in the new catalog files. 