# How to add dataset to Layer

[![Open in Layer](https://development.layer.co/assets/badge.svg)](https://app.layer.ai/layer/iris/) [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/layerai/examples/blob/main/tutorials/add-datasets-to-layer/how_to_add_dataset_to_layer.ipynb) [![Layer Examples Github](https://badgen.net/badge/icon/github?icon=github&label)](https://github.com/layerai/examples/tree/main/tutorials/add-datasets-to-layer)


In this notebok we'll look at how to build and add datasets to [Layer](https://layer.ai).

## Install Layer

Ensure that you have the latest version of Layer installed. 

In [None]:
!pip install layer --upgrade -qqq

## Authenticate your Layer account 

Once Layer is installed, you need to log in to your Layer account. The created data will be stored under this account. Therefore, this step is a must.

In [None]:
import layer
layer.login()

## Create a project
The next step is to create a project. The dataset will be saved under this project. In Layer, projects are created using the `layer.init` command while passing the name of the project. 

In [None]:
# ++ init Layer
layer.init("iris")

## Saving the data to Layer
We can interact with Layer using decorators. Layer has built-in decorators for different purposes. In this case, we are interested in the [@dataset](http://docs.app.layer.ai/docs/sdk-library/dataset-decorator) decorator that we can use to create new datasets. 

Let's demonstrate how to use the [@dataset](http://docs.app.layer.ai/docs/sdk-library/dataset-decorator) decorator by saving the Iris dataset from `sklearn`. We can do this by creating a function that returns the dataset as a Pandas DataFrame.

### Decorate the function 

Saving the dataset to Layer is done by decorating the function with the [@dataset](http://docs.app.layer.ai/docs/sdk-library/dataset-decorator) decorator. The decorator only expects the name you'd like to give your dataset. 

In [32]:
import layer
from layer.decorators import dataset
@dataset('iris_data')
def save_iris():
  from sklearn import datasets
  import pandas as pd
  iris = datasets.load_iris()
  df = pd.DataFrame(data=iris.data, columns = iris.feature_names)
  return df

When you execute this function, the data will be stored in Layer under the project you just intitialized. 

You can execute this function in two ways.

### Run the function localy

Running the function locally uses your local infrastructure. However, the resulting DataFrame will still be saved to Layer. Layer will also print a link that you can use to view the data immediately. 

### Run the function on Layer infrastructure 

You can also choose to execute the function on Layer's infrastructure. This is useful especially when dealing with large data that require a lot of computation power. 

Running functions on Layer infra is done by passing them to the `layer.run` command. The command expects a list of functions. 

In [None]:
layer.run([save_iris])

## Saving files to Layer
If your dataset depends on a file like a CSV file, you can bundle it with your decorated function with [resources](https://docs.app.layer.ai/docs/sdk-library/resources-decorator) decorator. Layer automatically uploads your local file.  The decorator expects the path to the data file.

In [41]:
data_file = 'iris.csv'
import layer
from layer.decorators import resources
@dataset('iris_data')
@resources(data_file)
def save_iris():
  import pandas as pd
  iris = pd.read_csv(data_file)
  return iris

In [None]:
layer.run([save_iris])