Skip to content

Model Evaluation is the process through which we quantify the quality of a system’s predictions. To do this, we measure the newly trained model performance on a new and independent dataset. This model will compare labeled data with it’s own predictions.

Notifications You must be signed in to change notification settings

kumar-shivam-ranjan/Model-Evaluation-and-diagnosis-display

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Model Evaluation and Diagnosis Display

Model Evaluation is the process through which we quantify the quality of a system’s predictions. To do this, we measure the newly trained model performance on a new and independent dataset. This model will compare labeled data with it’s own predictions. Model evaluation performance metrics teach us:

  • How well our model is performing
  • Is our model accurate enough to put into production
  • Will a larger training set improve my model’s performance?
  • Is my model under-fitting or over-fitting?

Problem Statement

The Goal of the project is to come up with a solution which has context based visualisation capability and helps data scientist in comparing relevant metrics of the different machine learning model easily and evaluate its performance on unseen data.

This project is focused on evaluating several machine learning models, given a model and a testing dataset.It also provides capability to compare model evaluations on the basis of different metrics. However comparison of model evaluation is only possible for following 2 scenarios

  • Evaluation reports for one model against different validation datasets(having same schema)
  • Evaluation report for multiple models (2 or more) generated against same validation dataset

Table Schemas

Model

Columns Data Type
Model ID integer
Name String
Metadata Object
Model path string
Date created Date

Dataset

Columns Data Type
Dataset ID integer
Name String
Metadata Object
Dataset path string
Date created Date

Evaluation

Columns Data Type
Evaluation ID integer
Name String
Metadata Object
Model ID integer
Dataset ID integer

Workflow

Screenshot (8)

REST API Endpoints

Screenshot (4) Screenshot (5) Screenshot (6)

How to Use?

There are two methods to run the application.

Note:

  • To Run the application using method 1 , make sure proxy is set to "http://localhost:5000/" in client/package.json file
  • To Run the application using method 2 , make sure proxy is set to "http://api:5000/" in client/package.json file

Method 1:

  • Clone the repository (git clone https://github.com/ksrrock/Model-Evaluation-and-Diagnosis-Display.git )
  • Change directory into the cloned repo(cd .\Model-Evaluation-and-Diagnosis-Display\)
  • Change directory to the client folder(cd client)
  • Install the neccessary dependencies(yarn install)
  • Run the Flask server at port 5000(yarn start-api)
  • Open another terminal/cmd window and run the client side at port 3000(yarn start)

Your application is up and running and to see that , head over to http://localhost:3000

Method 2

Application is dockerized. So , Follow the given steps to run the application using docker.

Firstly, download Docker desktop and follow its instructions to install it. This allows us to start using Docker containers.

Create a local copy of this repository and run

docker-compose build

This spins up Compose and builds a local development environment according to our specifications in docker-compose.yml.

After the containers have been built (this may take a few minutes), run

docker-compose up

This one command boots up a local server for Flask (on port 5000) and React (on port 3000). Head over to

http://localhost:3000/ 

to view an incredibly overwhelming React webpage triggering REST API call to our Flask server.

The API endpoints can be tweaked easily in api/app.py. The front-end logic for consuming our API is contained in client/src/. The code contained within these files simply exists to demonstrate how our front-end might consume our back-end API.

Finally, to gracefully stop running our local servers, you can run

docker-compose down

in a separate terminal window or press control + C.

You can view the REST endpoints through swagger UI by following steps:

  • Clone the repo by git clone https://github.com/ksrrock/Model-Evaluation-and-Diagnosis-Display.git
  • cd .\Model-Evaluation-and-Diagnosis-Display\
  • cd client
  • yarn install
  • yarn start-api
  • Then go to http://127.0.0.1:5000/swagger

Alternatively , you can just visit https://ksrrock.github.io/swagger-ui/ to visualise the REST endpoints.

Steps to build Evaluations:

Step 1:

Bring your own test dataset and Serialised Model

Train your model in Jupyter Notebook/Kaggle/GoogleColab and obtain the serialised model and dataset as follows:

import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
df1=pd.DataFrame(X_test)
df2=pd.DataFrame(y_test)
result=pd.concat([df1,df2],axis=1)

#Saving the Testing Data used for evaluation
result.to_csv('test_data_DecisionTree.csv',index=False)
#Training the model
from sklearn.linear_model import LogisticRegression
LR=LogisticRegression(solver='lbfgs', max_iter=10000)
LR.fit(X_train,y_train)
filename = 'finalized_model.sav'

#Obtaining the Serialised Model
pickle.dump(LR, open(filename, 'wb'))

Step 2:

Clone the repo and run the command yarn install to install the neccessary dependencies

Step 3:

Run the code on your local machine (you can either run the application via docker or you can also run client side and server side individually).
If you want to run the application via docker , make sure the proxy set in client/package.json is http://api:5000/.
If you want to run client side and server side individually , change the proxy to http://localhost:5000 and follow the instructions ahead.

Server Side

The api folder is included in this repository itself. To spin up the server at http://localhost:5000 , open a terminal, navigate to the client folder inside the folder where you have cloned this repository, and run:

yarn start-api

Client Side

In the project directory, go inside the client folder and to spin up the client at http://localhost:3000 , run:

yarn start

As soon as both the server and client are up and running, you will be able to surf through the site and call API endpoints. The React-frontend has a proxy setup to port 5000. That way, the urls that are not recognised on port 3000, get redirected to port 5000, thus invoking the endpoints, if they have been defined in the backend.

Step 4:

While the server side is running, populate your models and datasets using POSTMAN triggering the following API endpoints:
Screenshot (13)

Screenshot (14)

FrontEnd

BackEnd

Others

Quick Start

Some General inormation

  1. There are two types of tables in the project

    • Interactive
      • Can toggle padding
      • Change number of rows per page
      • Navigate between table pages
      • Checkbox to select on or multiple rows
      • Delete button to delete slected row(s)
      • Sort according to column
    • Non-Interactive: Plain simple table to display data
    • Semi-Interactive: Same as interactive, except following are removed:
      • Delete option
      • Checkboxes to select
  2. Only plots that are shown in single evaluation have a slider. They have not been added in comparisons, considering their lack of utility in such a case.

  3. Datasets and models are currently registered using either Postman or Swagger UI. Evauluations can be registered using the UI.

  4. Comaprison is only possible in two cases:

    • Multiple models trained on the same dataset
    • Same model tested on multiple datasets
  5. Only models provided by the Scikit-Learn Library are supported

  6. Model files are unpickled and used. So only one extension, .sav, is supported

Homepage

The Homepage consists of a table where all the evaluations are listed. The table is fully interactive. By clicking on the EvaluationID, the user can see the visualizations related to it. The Compare button can be used to compare two or more evaluations. Refer to the General information section above to see how these work. Clicking on the Evaluation ID triggers the model evaluation, if the metrics are not there already. Each row has the following information:

  • Evaluation ID
  • Evaluation name
  • Model Type
  • Model
  • Dataset
  • Date Created

Evaluation Post Form

This page contains the form that helps the user to register an evaluation. The user enters the following information here:

  • Evaluation name
  • Model Type(selection)
  • Dataset(selection)
  • Model(selection)
  • Description(optional)

On submitting, the evaluation gets stored in the table, without the metadata.

Single Model Evaluation

The evaluation metrics for a single model can be visualized by clicking on the button encircling the Evaluation ID of the evaluation in the table at Homepage. It essentially sends a get request for the evaluation, and based on the received payload, It will render the visualisations as follows:

Note: All tables rendered in this scenario are semi-interactive, except the table for feature importance.

Evaluation Metrics

Following evaluation metrics will be visible to the user the user in tabular, bar chart and line chart format:

Classification Regression
Accuracy MAE
Precision MSE
Recall RMSE
F1-Score RMSLE
Log-Loss R-squared
---- Adjusted R-squared

Curves and Charts

The following curves and charts will be shown to the user when they select this option:

Classification Regression
ROC Residual vs Observed
Precision-Recall Observed vs Predicted
Confusion Matrix Residual vs Predicted
Gain and Lift Charts ---

Along with the plots, there are also several ways to interact with them, by:

  1. Zooming in and out using scrolling
  2. Select
  3. Lasso select
  4. Slider of cutoff value
  5. Button to reset to initial state
  6. Drag and move
  7. Save as PNG

Dataset Information

The following data is shown about the test dataset used by the model for the prediction:

  1. Dataset Statistics are shown in Tabular format as well as in Line Chart format. Following are the statistics displayed in it, for each column of the dataset:
    • Mean
    • Standard deviation
    • Minimum value
    • Maximum value
    • First Quartile
    • Second Quartile
    • Third Quartile
    • IQR
    • Number of missing values
  2. Feature Importances are shown in Bar chart as well as tabular format
  3. Class Imbalence is shown in Pie-chart format

Model Information

This section gives a tabular view of the parameters and attributes that are associated with the trained model, in a tabular format.

Details

Each of the above tabs will have a Details tab, that gives information about the evaluation in general, some information about the dataset and the model used in the evaluation.

Add Custom Metrics

Here, the user can persist additional metrics calculated externally, in the database. Clicking on the '+' button at the bottom right creates a dialog box with a form that handles the addition of the key value pairs of additional custom metrics. They are shown along with the principal metrics, side by side, in a table format.

Multiple Model Single Dataset Comparison

For both regression and classification, there are five types of components rendered:

  • Metrics
  • Dataset Information
  • Model information
  • Curves
  • Details(part of each tab panel)

Metrics

An semi-interactive table, along with both bar graph and line charts are rendered in this tab. Metrics are the same as put up in the above section on Single Evaluation. The table can be sorted by metrices to compare the models.

Dataset information

Since the evaluations being considered in this case must have the same dataset, the same component that was used for single evaluation use case has been used.

Model Information

Multiple tables listing out parameters and attributes of each model are rendered.

Curves

The plots mentioned in the above section are rendered, with the traces of other models in the same graph, with the exception of Gain and Lift charts in case of Binary Classification.

Details

Every tab panel has it. Its the same as single evaluation, except now, it has tabs for all evaluations that were selected by the user.

Single Model Multiple Datasets

For both regression and classification, there are five types of components rendered:

  • Metrics
  • Dataset Information
  • Model information
  • Curves
  • Details(part of each tab panel)

Metrics

An semi-interactive table, along with both bar graph and line charts are rendered in this tab. Metrics are the same as put up in the above section on Single Evaluation. The table can be sorted by metrices to compare the datasets.

Dataset information

The The dataset statistics for all datasets are shown tab wise. User can switch between statistics and compare the datasets based on those statistics. The use can also switch between tabular view and Line Chart view. Along with this, it also contains information about the feature importances of the datasets in chart and tabular format, and the class imbalence.

Model Information

Since the model being used is same, there is a single table listing out all the parameters and attributes of the trained model.

Curves

The plots mentioned in the 'Single Model Evaluation' section are rendered, with the traces of other datasets in the same graph, or in subplots, with the exception of Gain and Lift charts in case of Binary Classification.

Details

Every tab panel has it. Its the same as single evaluation, except now, it has tabs for all evaluations that were selected by the user.

Demo

Homepage Features

Homepage.Features.mp4

Adding an Evaluation

Add.an.evaluation.mp4

Single Model Visulaization - Linear Regression on Boston Housing Dataset

Regression.single.evaluation.mp4

Multiple Models single Dataset - on Daibetes Dataset

multiple.models.single.dataset.comaprison.classification.mp4

Single Model Multiple Datasets - Linear Regression on the Boston Housing Dataset

Single.model.multiple.datasets.regression.mp4

About

Model Evaluation is the process through which we quantify the quality of a system’s predictions. To do this, we measure the newly trained model performance on a new and independent dataset. This model will compare labeled data with it’s own predictions.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published