# EXPERIMENT TRACKING FOR ML PROJECTS

In this tutorial, we will discuss what experiment tracking means within the Machine Learning context, why it is important to keep track of your project journey and how you could make use of [Layer](https://layer.ai/) to do it.<br><br>
ML project development is experimental by its nature. It means that we need to run many recurrent experiments to end up with the best performing model. These experiments might include various tasks such as parameter searching to optimise a model or transforming a dataset in an incremental manner. You could think of these experiments as your Python functions in your notebook designed for conducting a specific task.<br><br>

***[Experiment Tracking](https://layer.ai/):***<br> 
*\"During development of an ML project, being able to save entire project history and thus revert to any previous project snapshot when needed.\"*<br><br>
Let's list possible actions data scientists should take during an ML project development and experiment tracking:<br><br>
- **Versioning:** As GitHub is doing for code, datasets and models should be versioned automatically each time you build them so that whenever needed entities could be reverted to any previous version. It is sort of applying the CI/CD software principle for data projects as well.<br> 
- **Monitoring:** Display some information about datasets and models such as model performance metrics, model parameters or data profiling. It will provide continuous observability on entities and comparing different versions. 
Testing: Sanity check on datasets and models by running simple unit tests to have more reliable pipeline. It could be perceived as an Alert System for data projects.<br>
- **Documentation:** Writing descriptions about datasets and models as well as a detailed project report. It will tell other people what a project is all about and break work in-silos.<br>
- **Using more computing power:** Processing big data and training complicated algorithms generally require more resources such as CPU/GPU  utilisation.<br> 
- **Collecting everything in a single place:** Keep everything - datasets, models, project documentations - in a single place including list of all experiments (runs) conducted by all team members. This will enable collaborative working within the team.

Now, let me walk you through a real but simple ML project development cycle and show how you could do all the things we listed above with the help of [Layer](https://layer.ai/).

---


# CASE STUDY: COMREHENSIVE GETTING STARTED DEMO WITH [LAYER](https://layer.ai/)

This is a step-by-step tutorial aims walking you through a simple yet comprehensive demo to get started with [Layer](https://layer.ai/).

- **Step I:** Install [Layer](https://layer.ai/) and Authentication
- **Step II:** Initialise a project and import some [Layer](https://layer.ai/) modules
- **Step III:** Learning more about Dataset Versioning on [Layer](https://layer.ai/)
- **Step IV:** Learning more about Model Versioning on [Layer](https://layer.ai/)
- **Step V:** Run multiple-functions experiments on [Layer](https://layer.ai/)
- **Step VI:** Upload and Create Dynamic Project Documentation
- **Step VII:** Run [Layer](https://layer.ai/) in local model
<br><br>

You will learn more about these decorators and functions in this exact order:
- layer.init() function
- @dataset decorator
- @assert_not_null decorator
- @resources decorator
- layer.log() function
- layer.run() function
- @model decorator
- @fabric decorator
- @assert_true decorator
- get_dataset() function
- get_model() function
<br><br>

You are also about to learn more about these important concepts and features of [Layer](https://layer.ai/):
- How Dataset Versioning works on [Layer](https://layer.ai/)
- How Model Versioning works on [Layer](https://layer.ai/)
- How to compare different model versions
- How to share and import a Dataset or Model
- How to run and manage multiple experiments on [Layer](https://layer.ai/)




# **Step I:** Install Layer and Authentication
---

With just 3 lines of code, you will be done with installation and authentication in your notebook.<br><br>
Once you run the code cell below, it will prompt you to a link like below.

![img](https://github.com/layerai/examples/blob/main/comprehensive-getting-started/images/Screenshot%202022-07-07%20at%2018.20.58.png?raw=true)

---

If you are a first time user on [Layer](https://layer.ai/), then you will see these 5 authentication pages in the exact order below when you open the link in the screenshot above.


<p float="left">
  <img src="https://github.com/layerai/examples/blob/main/comprehensive-getting-started/images/Screenshot%202022-07-07%20at%2018.21.19.png?raw=true" width="300" />
  <img src="https://github.com/layerai/examples/blob/main/comprehensive-getting-started/images/Screenshot%202022-07-07%20at%2018.21.40.png?raw=true" width="300" />
  <img src="  https://github.com/layerai/examples/blob/main/comprehensive-getting-started/images/Screenshot%202022-07-07%20at%2018.21.52.png?raw=true" width="300" /> 
  <img src="  https://github.com/layerai/examples/blob/main/comprehensive-getting-started/images/Screenshot%202022-07-07%20at%2018.22.17.png?raw=true" width="500" height="300" /> 
  <img src="  https://github.com/layerai/examples/blob/main/comprehensive-getting-started/images/Screenshot%202022-07-07%20at%2018.22.57.png?raw=true" width="500" height="300" />  
</p>

---



In [None]:
# Clone project github repo into your Colab environment
!rm -rf examples
!git clone https://github.com/layerai/examples/

In [None]:
# Step I: Install Layer and Authentication

# Run the shell command to install Layer Python package
!pip install layer -U
# Import Layer package
import layer
# Call Layer's login function to authenticate
layer.login()


# **Step II:** Initialise a project and import some Layer modules
---

Once you paste the authentication code into the Python prompt on your notebook, you will successfully log into [Layer](https://layer.ai/)! Click on the link https://app.layer.ai and you will see the landing page of [Layer](https://layer.ai/).

  <img src="  https://github.com/layerai/examples/blob/main/comprehensive-getting-started/images/Screenshot%202022-07-07%20at%2018.56.52.png?raw=true" width="700" height="400" />

Click on **My Projects** tab on the top bar. Once you create several projects using [Layer](https://layer.ai/), they will all list here. Click on the **New project** button and you will see a pop-up page to enter some details of your project.

  <img src="  https://github.com/layerai/examples/blob/main/comprehensive-getting-started/images/Screenshot%202022-07-07%20at%2019.08.51.png?raw=true" width="700" height="400" />

  Once you create your project, you should see the empty project page of your project named **my-first-project** as seen in the image below.

  <img src="  https://github.com/layerai/examples/blob/main/comprehensive-getting-started/images/Screenshot%202022-07-14%20at%2010.59.27.png?raw=true::" width="700" height="400" />

---



### Layer Decorators or Functions Used In Step-II
---

**layer.init():**<br>
To start logging into the ***my-first-project*** project, [Layer](https://layer.ai/)'s init function should be called with the name of project as shown in the first line below. It basically tells [Layer](https://layer.ai/) that everything will be saved in that specific project. If any project exists with the given name, then it will match and sync otherwise, a new project with the given name will be created under your organisation or personal account. This function also has a ***pip_packages*** parameter which takes a list of Python packages. Some of the libraries are already pre-installed on Layer's remote machines. Check them out here: https://docs.app.layer.ai/docs/reference/fabrics#preinstalled-libraries You are expected to list all other libraries or different versions of the existing libraries you use in your project using the ***pip_packages*** argument. [Layer](https://layer.ai/) will install those Python packages first to make your project environment ready on the [Layer](https://layer.ai/) Infra. <br><br>

For more information about this function:<br>
https://docs.app.layer.ai/docs/sdk-library/layer-init

---

In [None]:
# Step II: Project initialization and import some Layer packages

# Matches and Sync with the project we created before and installs the required Python packages on the Layer Remote
layer.init("my-first-project",pip_packages=['matplotlib','sklearn'])
# Import functions from Layer
from layer.decorators import dataset, model, resources, fabric
from layer.decorators.assertions import assert_not_null, assert_true
from layer import Dataset,Model

# **Step III:** Learning more about Dataset Versioning on Layer
---
Every time you run your data build function, [Layer](https://layer.ai/) will automatically generate a new version of your dataset depending on changes in your function.<br><br>
This is how [Layer](https://layer.ai/) does dataset versioning:<br><br>
**- If schema of a dataset is changed, then bump up the major version number**<br>
*(e.g v1.3 → v2.1)* <br>
**- Otherwise, bump up the minor version number <br>**
*(e.g v1.3 → v1.4)*



### Layer Decorators and Functions Used in Step-III

---




**@dataset:**<br> 
It is a decorator which defines the Pandas data frame ***training_df*** returned by the function as a [Layer](https://layer.ai/) dataset named ***my_first_dataset*** under the project ***my_first_project***. [Layer](https://layer.ai/) starts tracking this dataset automatically every time you run this function. In other words, [Layer](https://layer.ai/) will do versioning on your dataset and log any other data you want [Layer](https://layer.ai/) to store along with this dataset. It is important to note that it is required that you use this dataset decorator at the top a Python function returning a Pandas data frame.<br><br>
For more information about the dataset decorator:<br>
https://docs.app.layer.ai/docs/sdk-library/dataset-decorator <br><br>

**@assert_not_null:**<br>
It is one of pre-defined assertion decorators on [Layer](https://layer.ai/) which tests whether specified columns listed in its parameter have any null values or not. You could think of this as defining unit tests for your ML metadata. It alerts you right away if tests have passed or failed. Assertions ensure that nothing unexpected could happen in your dataset without your supervision.<br><br> 
For more information about the assertions: <br>
https://docs.app.layer.ai/docs/sdk-library/assertions<br><br>

**@resources:**<br> 
It's a decorator that is used to upload local files to [Layer](https://layer.ai/) remote machines. In this example, it is used to upload files under the ***data*** directory on Google Colab file system to the [Layer](https://layer.ai/) Cloud. Folder structure will be exactly the same on the cloud side as your local so that you don't have to change any hard-coded file paths exist in your code in order to run your function remotely.<br><br> 
For more information about the resources decorator:<br>
https://docs.app.layer.ai/docs/sdk-library/resources-decorator<br><br>


**layer.log():**<br>
In the function body, there is also one more [Layer](https://layer.ai/) supported function which is used to log arbitrary data along with the dataset. This function supports many different data types such as primitives, dataframes or images. In this example, we log a box plot of the column named ***target***. In general, you could log anything attached with the current build version of your dataset or model by using this function.
<br><br> 
For more information about the resources decorator:<br>
https://docs.app.layer.ai/docs/sdk-library/layer-log<br><br>

**layer.run():**<br>
Runs specified list of functions in its parameter on [Layer](https://layer.ai/) infrastructure remotely. In the case below, since we have a function named ***create_dataset***, we just put the function name into a list and give it to this function.

Check out the link to learn more:<br>
https://docs.app.layer.ai/docs/sdk-library/layer-run
 

---

In [None]:
# --- Create your first dataset ---
@dataset("my_first_dataset")
@assert_not_null(["target","feat1"])
@resources(path="./examples/comprehensive-getting-started/data")
def create_dataset():
  from matplotlib import pyplot as plt
  from sklearn.datasets import make_regression
  import pandas as pd
  
  X_train, Y_train = make_regression(n_samples=100, n_features=3, n_targets=1, noise=0.5)

  features = pd.DataFrame(X_train, columns = ['feat1', 'feat2', 'feat3'])
  target = pd.DataFrame(Y_train, columns = ['target'])
  training_df= pd.concat([features, target], axis=1)

  external_df = pd.read_csv("examples/comprehensive-getting-started/data/external.csv",names=['feat1','feat2','feat3','target'],header=None)
  final_training_df = training_df.append(external_df)

  training_df[['target']].plot(kind='box', title='Target Quartile Analysis')
  layer.log({"Box plots": plt})

  plt.close()
  
  return training_df 

Now, we are ready to take one more step to run the function above on [Layer](https://layer.ai/) Cloud. <br>

When you run the cell below for the first time, it will create your dataset ***my_first_dataset v1.1*** and also print a link to that dataset's page on [Layer](https://layer.ai/) Web UI. Once you click on the generated link, you will see a dataset page looks like the one below:

  <img src=" https://github.com/layerai/examples/blob/main/comprehensive-getting-started/images/Screenshot%202022-07-14%20at%2012.26.03.png?raw=true" width="700" height="500" />

Once you run the cell below for the second time without changing anything in the ***create_dataset*** function's code, [Layer](https://layer.ai/) will generate a new version of the same dataset and bump up only the minor version number *(v1.1 --> v1.2)* since there will be no change in schema of the dataset. When you click on newly generated link, you will be directed to the ***my_first_dataset v1.2***'s page.

  <img src=" https://github.com/layerai/examples/blob/main/comprehensive-getting-started/images/Screenshot%202022-07-14%20at%2012.28.22.png?raw=true" width="700" height="500" />

In [None]:
# --- Run your function: create_dataset remotely on Layer Infra ---

# Run this cell twice to create my_first_dataset v1.1 & v1.2 
# and check them out on Layer Web UI by clicking on the links generated in the output of the cell 
layer.run([create_dataset])

Now, let's make some changes in the function body that affects the returning dataset's schema such as adding one more column. Assume that you have come up with a new feature that you think will help improve your ML model's prediction performance. You would add this new feature into your dataset in the next iteration (or experiment) and train your model using the new version of your dataset. In the case below, we will simulate it by increasing number of features ***n_features*** in the ***make_regression*** function from 3 to 4.<br>

We copied the code of the function ***create_dataset*** and pasted it into the next cell and did that change accordingly.

In [None]:
# Change your function create_dataset: Adding a new feature into the dataset and increasing number of features to 4. 
# See changes and compare the function with the one above.
@dataset("my_first_dataset")
@assert_not_null(["target","feat1"])
@resources(path="./examples/comprehensive-getting-started/data")
def create_dataset():
  from matplotlib import pyplot as plt
  from sklearn.datasets import make_regression
  import pandas as pd
  
  X_train, Y_train = make_regression(n_samples=100, n_features=4, n_targets=1, noise=0.5)

  features = pd.DataFrame(X_train, columns = ['feat1', 'feat2', 'feat3','feat4'])
  target = pd.DataFrame(Y_train, columns = ['target'])
  training_df= pd.concat([features, target], axis=1)

  external_df = pd.read_csv("examples/comprehensive-getting-started/data/external.csv",names=['feat1','feat2','feat3','feat4','target'],header=None)
  final_training_df = training_df.append(external_df)

  training_df[['target']].plot(kind='box', title='Target Quartile Analysis')
  layer.log({"Box plots": plt})

  plt.close()
  
  return training_df 

Once again you run the function in the next cell after the changes, [Layer](https://layer.ai/) will create and register a new version of the same dataset but this time it will automatically bump up the major version (v1.2 --> v2.1) since there is a change in the schema of the ***my_first_dataset***. You will see a page similar to the one below when you click on the generated link.

  <img src=" https://github.com/layerai/examples/blob/main/comprehensive-getting-started/images/Screenshot%202022-07-14%20at%2012.47.22.png?raw=true" width="700" height="500" />

Recall that we also logged a boxplot of the ***target*** column along with this dataset. You will see those logged data in the **Logged data** tab. Here is a screenshot taken from that tab.

  <img src=" https://github.com/layerai/examples/blob/main/comprehensive-getting-started/images/Screenshot%202022-07-14%20at%2012.48.01.png?raw=true" width="700" height="500" />




In [None]:
# --- Run your function remotely ---

# Run this cell once to create my_first_dataset v2.1
# and check it out on Layer Web UI by clicking on the link generated in the output of the cell 
layer.run([create_dataset])

# **Step IV:** Learning more about Model Versioning on [Layer](https://layer.ai/)
---

Now, we are ready to fit our very first model on [Layer](https://layer.ai/) using the datasets generated in the previous step. [Layer](https://layer.ai/) will do automatic versioning on your models as well once you start building ML models by running your model train function several times.<br><br>
This is how model versioning works on [Layer](https://layer.ai/):<br><br>
**- If source code of a model is changed, then bump up the major version number**<br>
*(e.g v2.2 → v3.1)*<br>
**- Otherwise, bump up the minor version number**<br>
*(e.g v2.2 → v2.3)*


### [Layer](https://layer.ai/) Decorators and Functions Used In Step-IV

---
**@model:**<br>
It is a decorator which defines the sklearn's Gradient Boosting Regressor model 'gbr' returned by the function as a [Layer](https://layer.ai/) model named my_first_model under the project my_first_project. Once you wrap your model's train function with this type of decorator, [Layer](https://layer.ai/) will start versioning your model automatically every time you run your function. You will also see a parameter named ***dependencies*** which expects a list of [Layer](https://layer.ai/) Datasets and Models that the function depends on. [Layer](https://layer.ai/) will optimise the build process accordingly. <br><br> 
For more information about the model decorator:<br>
https://docs.app.layer.ai/docs/sdk-library/model-decorator<br><br>
**@fabric:**<br>
It is a special [Layer](https://layer.ai/) decorator that is used to set the type of remote machines and allocating different amount of CPU/GPU resources to run your function. In the case below, we will use the ***f-medium*** type machines to run our function ***create_model*** remotely. [Layer](https://layer.ai/) has many more pre-defined fabrics.<br><br>

Check out other fabrics on [Layer](https://layer.ai/):<br>
https://docs.app.layer.ai/docs/sdk-library/fabric-decorator<br><br>


**@assert_true:**<br>
This is a special type of assertion decorators which takes a custom function with a boolean return type as its parameter. Then, it will run that function and check whether the result returned by this function is true or not. As you see in the code snippet above, we have a test function named ***model_test_function*** which checks if train loss value is consistently decreasing in each iteration. It is expected that the train loss should decrease as new trees are added into ensemble. If you need to do such a sanity check on your model, then all you need to do is to give this function as a parameter into the assert_true decorator.<br><br>

You could see more about the assert_true decorator here:<br>
https://docs.app.layer.ai/docs/sdk-library/assert-true<br><br>

**get_dataset():**<br>
This function retrieves a [Layer](https://layer.ai/) dataset object from the Datasets you defined on [Layer](https://layer.ai/) previously. It returns a [Layer](https://layer.ai/) Dataset object and you should also call ***to_pandas()*** function to convert it to a Pandas data frame. In the case below, we fetches ***my_first_dataset v1.2*** to use as a training data to build the model.<br><br>

Learn more about this function:<br>
https://docs.app.layer.ai/docs/sdk-library/get-dataset<br><br>


---

In [None]:
# --- Create your first model ---

# A user-defined function to test the model: Check if train loss is decreasing in each iteration
def model_test_function(predictor):
  score_arr = predictor.train_score_
  for i in range( len(score_arr) - 1 ):
    if score_arr[i] < score_arr[i+1]:
      return False
  return True

# Model build function
@model("my_first_model" , dependencies=[Dataset('my_first_dataset')])
@fabric("f-medium")
@assert_true(model_test_function)
def create_model():
  from sklearn.ensemble import GradientBoostingRegressor
  from matplotlib import pyplot as plt
  from sklearn.model_selection import train_test_split
  from sklearn.metrics import mean_squared_error
  from sklearn.preprocessing import StandardScaler
  
  # Read dataset from Layer and split into train and test pandas dataframes
  data = layer.get_dataset("my_first_dataset:1.2").to_pandas()
  X_train, X_test, y_train, y_test = train_test_split(data.iloc[:,:-1], data.target, random_state=42, test_size=0.1)

  # Standardize the dataset
  sc = StandardScaler()
  X_train_std = sc.fit_transform(X_train)
  X_test_std = sc.transform(X_test)

  # Hyperparameters for GradientBoostingRegressor
  gbr_params = {'n_estimators': 300,
                'max_depth': 3,
                'learning_rate': 0.1,
                'min_samples_split' : 5,
                'loss': 'ls'}

  # Log model parameters to Layer
  layer.log(gbr_params)

  # Create an instance of gradient boosting regressor
  gbr = GradientBoostingRegressor(**gbr_params)
  # Fit the model
  gbr.fit(X_train_std, y_train)

  # Log loss scores incrementally using the step parameter
  for i, y_pred in enumerate(gbr.staged_predict(X_test_std)):
    loss = gbr.loss_(y_test, y_pred)
    layer.log({"Loss":loss},step = i)

  # Log Coefficient of determination R^2
  r2 = gbr.score(X_test_std, y_test)
  # Create the mean squared error
  mse = mean_squared_error(y_test, gbr.predict(X_test_std))
  # Get Feature importance data using feature_importances_ attribute
  sorted_idx = gbr.feature_importances_.argsort()
  plt.barh(data.iloc[:,:-1].columns[sorted_idx], gbr.feature_importances_[sorted_idx])
  plt.xlabel("Gradient Boosting Regressor Feature Importance")

  # Log perfomance metrics and feature importance plot to Layer
  layer.log({"R2 Score": r2,
             "Mean Squared Error": mse,
             "Feature Importance Plot": plt
             })
  
  plt.close()

  return gbr

Now, we are ready to execute our function by using the ***layer.run()***.<br>
Once you run the cell below, it will create a very first version of your model and you could visit its page by clicking on the generated link in the cell's output. You will see a model page for your ***my_first_model v1.1*** like the one below:

  <img src=" https://github.com/layerai/examples/blob/main/comprehensive-getting-started/images/Screenshot%202022-07-14%20at%2013.40.41.png?raw=true" width="700" height="500" />

You could also see some other data we logged along with the model by using ***layer.log*** function such as coefficients and a feature importance bar plot.<br>

In the next experiment, run the cell below again without changing anything in the model's train function code. In this experiment, since there is no change in the model's function, [Layer](https://layer.ai/) train and register a model automatically and bump up only the minor version number *(v1.1 --> v1.2).*  This time, you will see the ***my_first_model v1.2*** page once you click on the generated link.

  <img src=" https://github.com/layerai/examples/blob/main/comprehensive-getting-started/images/Screenshot%202022-07-14%20at%2013.41.17.png?raw=true" width="700" height="500" />

In [None]:
# --- Run your function remotely ---

# Run this cell twice to create my_first_model v1.1 & v1.2 
# and check them out on Layer Web UI by clicking on the links generated in the output of the cell
layer.run([create_model])

Let's copy the ***create_model*** function's code and paste it in the next cell as it is. Now, make some changes within the source code. For the sake of an easy practice, we will just change the ***learning_rate*** parameter from *0.01* to *0.1*. Since this is a change within the model function's source code, [Layer](https://layer.ai/) will detect it automatically and bump up the major version number *(v1.2 --> v2.1)* once you run the function again using ***layer.run()***. You will be directed to the ***my_first_model v2.1***'s page when you click on generated link.  

<img src=" https://github.com/layerai/examples/blob/main/comprehensive-getting-started/images/Screenshot%202022-07-14%20at%2013.58.15.png?raw=true" width="700" height="500" />

---



In [None]:
# --- Change the learning_rate parameter ---

# A user-defined function to test the model: Check if train loss is decresed in each iteration step
def model_test_function(predictor):
  score_arr = predictor.train_score_
  for i in range( len(score_arr) - 1 ):
    if score_arr[i] < score_arr[i+1]:
      return False
  return True

# Model build function
@model("my_first_model" , dependencies=[Dataset('my_first_dataset')])
@fabric("f-medium")
@assert_true(model_test_function)
def create_model():
  from sklearn.ensemble import GradientBoostingRegressor
  from matplotlib import pyplot as plt
  from sklearn.model_selection import train_test_split
  from sklearn.metrics import mean_squared_error
  from sklearn.preprocessing import StandardScaler
  
  # Read dataset from Layer and split into train and test pandas dataframes
  data = layer.get_dataset("my_first_dataset:1.2").to_pandas()
  X_train, X_test, y_train, y_test = train_test_split(data.iloc[:,:-1], data.target, random_state=42, test_size=0.1)

  # Standardize the dataset
  sc = StandardScaler()
  X_train_std = sc.fit_transform(X_train)
  X_test_std = sc.transform(X_test)

  # Hyperparameters for GradientBoostingRegressor
  gbr_params = {'n_estimators': 300,
                'max_depth': 3,
                # CHANGED HERE: 0.1 --> 0.01 
                'learning_rate': 0.01,
                'min_samples_split' : 5,
                'loss': 'ls'}

  # Log model parameters to Layer
  layer.log(gbr_params)

  # Create an instance of gradient boosting regressor
  gbr = GradientBoostingRegressor(**gbr_params)
  # Fit the model
  gbr.fit(X_train_std, y_train)

  # Log loss scores incrementally using the step parameter
  for i, y_pred in enumerate(gbr.staged_predict(X_test_std)):
    loss = gbr.loss_(y_test, y_pred)
    layer.log({"Loss":loss},step = i)

  # Log Coefficient of determination R^2
  r2 = gbr.score(X_test_std, y_test)
  # Create the mean squared error
  mse = mean_squared_error(y_test, gbr.predict(X_test_std))
  # Get Feature importance data using feature_importances_ attribute
  sorted_idx = gbr.feature_importances_.argsort()
  plt.barh(data.iloc[:,:-1].columns[sorted_idx], gbr.feature_importances_[sorted_idx])
  plt.xlabel("Gradient Boosting Regressor Feature Importance")

  # Log perfomance metrics and feature importance plot to Layer
  layer.log({"R2 Score": r2,
             "Mean Squared Error": mse,
             "Feature Importance Plot": plt
             })
  
  plt.close()

  return gbr

In [None]:
# --- Run your function remotely ---

# Run this cell once to create 'my_first_model v2.1' 
# and check it out on Layer Web UI by clicking on the link generated in the output of the cell
layer.run([create_model])



Before wrapping up the model part, let me show you 2 more important actions you could do in this page.<br>

**- Compare model versions**

When you select multiple model versions on the left panel, you will see all the logged data belongs to these versions side-by-side. You could also copy links to these plots to share with someone easily or put it into project readme file as we will described in a following section. In the screenshot below, we compare the model ***my_first_model v2.1*** with its previous ***v1.2*** version.

<img src=" https://github.com/layerai/examples/blob/main/comprehensive-getting-started/images/Screenshot%202022-07-14%20at%2014.00.24.png?raw=true" width="700" height="500" />

**- Share and import a model**

If you would like to share or import a specific version of your ML model into another project or your notebook, you could click on the **'</> Use from Layer'** button and copy the code snippet. You could do the same for datasets as well.

<img src=" https://github.com/layerai/examples/blob/main/comprehensive-getting-started/images/Screenshot%202022-07-14%20at%2014.00.55.png?raw=true" width="700" height="500" />

# **Step V:** Run multiple-functions experiments on [Layer](https://layer.ai/)
---

Let's create one more [Layer](https://layer.ai/) dataset which holds predictions we get on a seperate test data using a previously trained and registered ***my_first_model v2.1*** model.<br>

### [Layer](https://layer.ai/) Decorators and Functions Used In Step-V

---
**get_model():**<br>
This function retrieves a [Layer](https://layer.ai/) model object from the Models you registered on [Layer](https://layer.ai/) previously. It returns a [Layer](https://layer.ai/) Model object and you should also call ***get_train()*** function to convert it to a regular sklearn model type. In the case below, we fetches ***my_first_model v2.1*** to make predictions on the test data.<br><br>

Learn more about this function:<br>
https://docs.app.layer.ai/docs/sdk-library/get-model<br><br>

In [None]:
# Create another dataset named 'predictions' by fetching the registered my_first_model v2.1 on Layer
@dataset("predictions", dependencies=[Model('my_first_model')])
def generate_predictions():
  from sklearn.datasets import make_regression
  import pandas as pd
  from sklearn.metrics import r2_score
  
  X_test, _ = make_regression(n_samples=10, n_features=3, n_targets=0, noise=0.5)

  X_test = pd.DataFrame(X_test, columns = ['feat1', 'feat2', 'feat3'])
  
  my_model = layer.get_model("my_first_model:2.1").get_train()
  Y_pred_arr = my_model.predict(X_test)
  Y_pred = pd.DataFrame(Y_pred_arr, columns = ['prediction'])

  predictions_df = pd.concat([X_test, Y_pred], axis=1)

  return predictions_df

In [None]:
# --- Run your function remotely ---

# Run this cell once to create dataset 'predictions v1.1' 
# and check it out on Layer Web UI by clicking on the link generated in the output of the cell 
layer.run([generate_predictions])

---
We have defined 3 different functions so far:
- create_dataset()
- create_model()
- generate_predictions()

So far, we have used these functions singular in the ***layer.run()*** function and run each of them one by one. However, we can also have multiple functions within the ***layer.run()*** and run them in an order. The order these functions will be running is determined by [Layer](https://layer.ai/) automatically based on dependencies between these functions so that you don't have to explicitly state it. [Layer](https://layer.ai/) will run functions in parallel whenever possible to optimise running time. Now, let's run these 3 functions all together in a single run by executing the single line of code below.

[Layer](https://layer.ai/) will run these functions in this exact order: ***create_dataset*** → ***create_model*** → ***generate_predictions*** given the ***dependencies*** parameters. It is because the ***create_model*** function uses a dataset generated by the ***create_dataset*** function and the ***generate_predictions*** function uses a model created by the ***create_model*** function. Once it is done, you will end up with newer versions of your datasets ***my_first_dataset*** and ***predictions*** as well as a new version of your model ***my_first_model***.<br><br> 
To see all the experiments we have run so far, you should click on the ***'Runs'*** tab on the project page and see the list of runs and entities generated by each respective run. As you will see at top of the page below, the latest run creates new versions of 2 datasets: ***my_first_dataset v2.2*** & ***predictions v1.2*** and 1 model: ***my_first_model v2.2***. However, the second run from the top creates only the very first version of the ***predictions*** dataset. You could also click on those entities to go their respective pages to get more details.

<img src=" https://github.com/layerai/examples/blob/main/comprehensive-getting-started/images/Screenshot%202022-07-14%20at%2015.00.32.png?raw=true" width="700" height="500" />

In [None]:
# --- Run multiple functions remotely ---

# Run this cell once and then check out the 'Runs' tab to see list of runs we've had so far.
layer.run([create_dataset,create_model,generate_predictions])

# **Step VI:** Upload and Create Dynamic Project Documentation
---

Another useful feature of [Layer](https://layer.ai/) is to let users to have dynamic project readme documentations. It is basically a regular markdown file. However, you could also add links to datasets, models or any logged data into these files. [Layer](https://layer.ai/) will show these links as clickable entity cards where people can go to their respective pages once they click on them. That's what makes these documentations 'dynamic' and interactive.<br><br>
All you need is to create a project readme markdown file by using your favourite text editor and put it into the working directory on your Google Colab notebook. Once you upload that file and run the ***layer.init*** function with the same project name again. [Layer](https://layer.ai/) will render and show it in the project main page similar to the screenshot below.

We already created a README.md file for you. It is located under the directory */content/examples/comprehensive-getting-started/* All you have to do is to make sure it is moved in the project root directory which is */content* for Google Colab users.

Check out the page to learn more about Project cards on Layer:<br>
https://docs.app.layer.ai/docs/projects/card <br><br>


<img src=" https://github.com/layerai/examples/blob/main/comprehensive-getting-started/images/Screenshot%202022-07-14%20at%2015.05.48.png?raw=true" width="450" height="600" />





In [None]:
# --- Run this cell once after you upload your README.md file into your project working directory ---

layer.init("my-first-project")

# **Step VII:** Run [Layer](https://layer.ai/) in local mode
---

In order to use [Layer](https://layer.ai/) to store your ML metadata, you don't have use the ***layer.run()*** to run your functions remotely on the [Layer](https://layer.ai/) Cloud. You could also use your own resources to run your functions locally and still ask [Layer](https://layer.ai/) to store results returned from your functions. <br><br>

In the cell below, we just call functions locally in their correct order and [Layer](https://layer.ai/) will still store generated results returned from those functions. Click on the respective links for each entity to see them on the [Layer](https://layer.ai/) Web UI.

In [None]:
# --- Run your functions locally using your own resources and machines

# my_first_dataset
create_dataset()

# my_first_model
create_model()

# predictions
generate_predictions()

# CONCLUSION

Congratulations, you are done! You have completed the comprehensive Layer Demo by building an end-to-end ML pipeline.<br><br>

To learn more about using Layer, you can:

- Join our [Slack Community](https://layer-community.slack.com/)
- Visit [Layer Examples Repo](https://github.com/layerai/examples) for more examples
- Browse Trending Layer Projects on our [mainpage](https://layer.ai/)
- Check out [Layer Documentation](https://docs.app.layer.ai/docs/) to learn more