Skip to content

How to configure a SI Assessment Model

martimanzano edited this page Oct 28, 2019 · 1 revision

Introduction

The following figure provides an overview of steps to build a BN assessment model for a SI: Structure

There is a preliminary step consisting in choosing which factors will impact the SI and hence will be part of the assessment model. After that, these factors should start to be collected beforehand, as enough historical data and knowledge related to them is needed for the model's creation. One month before continuing to the first step is recommended.Each step is summarized as follows:

  1. Model's structure. In this first step, the assessment model's structure is defined according to the cause-effect relations between the variables of the model (factors and the SI). The associated discrete categories for each node of the model and their associated binning intervals for the case of the factor nodes (which are the input nodes in the BN model) should also be defined in this step. Lastly, in this step the probabilities are quantified for the parent factor nodes, using frequency quantification of the historical data gathered for the factors associated with these nodes, using the corresponding defined categories and binning intervals.
  2. Quantifying impact of factors on the SI. In the second step, probabilities are quantified according to the impact the factors have towards the SI. This is achieved in a semi-automatic fashion, using some developed tools to automatically gather and specify part of the SI's associated Node Probability Table (NPT) and the Weighted Sum Algorithm (WSA) to inflict the complete NPT.
  3. Generation of the SI estimation model . Taking as input the collected information from the two previous steps, the BN assessment model is generated and usable.
  4. Validation of the SI estimation model. This last step comprises the assessment model's validation, to assess its accuracy and recalibrate it if required. For that, two kinds of validations are considered: Model Walkthrough aims to validate the model using hypothetical scenarios and the users' perception, while Outcome Adequacy uses past SI's assessments (for instance manually done) to compare them with the model's outcome.

Model's Structure

This step has the objective of gathering the structure of the SI estimation model, the categories for each variable, and the corresponding numeric ranges for each category.

  1. Definition of ordinal categories or labels for the factors of the SI : the resulting model will deal with categories for every variable; therefore, it is necessary to elicit meaningful categories with an enough level of granularity corresponding to the desired discretization function for the factors. For instance:
  • Low, Medium, High
  • Very Low, Low, Medium, High, Very High
  1. To translate numbers to categories or labels. As factors are measured in the [0, 1] range, it is necessary to define the "translation" formula from numerical values to the categories/labels elicited in step a. This function can be as simple as a range for each category. For example, " Code Quality" factor could have the following categories:
  • Low [0, 0.3), Medium [0.3 and 0.7) and High [0.7, 1].
  • Very Low [0, 0.3), Low [0.3, 0.5), Medium [0.5 and 0.7), High [0.7, 0.9) and Very High [0.9, 1].
  1. Assign probabilities for each category of the parent factors. These probabilities can be computed using the historical data collected until the moment this step is performed. The historical data considered have to belong to the product/project where the new SI is being created. This process can be achieved with the SI assessment java project.

NOTE: In order to prepare the Outcome Adequacy validation step later on, it is recommended to quantify probabilities using approximately the 70% of the data, reserving the remaining 30% for the validation. This 30% has to belong to the data having the manually assessed states of the SI.

  1. Definition of categorical states of the SI. Considering that SI are usually qualitative, no ranges are needed but ordinal or nominal labels. For example, the _"Product Quality"_SI could have 3 categories: Low, Medium and High or Bad, Neutral and Good.

Quantifying impact of factors on the SI

The objective of this step is to get the required input to train the estimation model. To do so, several scenarios are configured and rated in order to define probabilities for the indicator.

The execution of this step that involves the following activities:

  1. Define the combination of scenarios and their corresponding probabilities for the SI. The scenarios can be automatically gathered using the SI assessment tools.
  2. Define weights for each input factor. These weights have to be assigned according to the relative impact of each factor on the SI. The weights of the factors must add up to 1. These weights can be elicited informally or with the AHP technique (AHP Online Calculator). If this is the case, compare the importance of the factors associated to the new SI by pairs in a scale 1 to 9 and the weights will be computed automatically.

For instance, in the following figure, associated to the SI On-Time Delivery with the categories Bad, Neutral and Good and factors Issues' Estimation Accuracy, Issues' Development Status, Issues' Due Date Accuracy and Blocking, all of them with categories Low, Medium and High, the DEs would have to fulfil the factors' categories marked as x and the probabilities, marked as y.

scenarios

Generation of the SI estimation model

The objective of this step is to train the SI estimation model using the information gathered in the previous stages. To do so, perform the following activities:

  1. Build the Bayesian Network with Netica GUI. Add one node per factor and one for the SI. Factors nodes' names need to be the same as the corresponding "factor" term in elasticsearch documents. To enter that name and the categories, right click on the node and edit it from "Properties". To enter the CPT, right click on it and then click on "Table". On nodes on which the WSA will be executed, enter only the partial CPT.
  2. To run the the WSA (Weighted Sum Algorithm) to automatically inflict the unfilled rows of the CPT. Run the Java project qrapids-wsa that can be found in the q-rapids' Github. See the README file for instructions.
  3. Check the resulting file using Netica GUI. Full CPT should be present for nodes on which WSA has been run.

The next figure shows an example of a generated model for the Product Quality Strategic Indicator.

netica

Validation of the model

The objective of this step is to validate the model built in the previous step.

There are two types of validation processes considered:

  • Model walkthrough to assess the experts' perception on the adequacy of the resulting output probabilities from the model.
  • Outcome adequacy to assess if the resulting estimation model provides similar values to those values obtained from real past data. It is important to remark that in order to perform this validation, it is required to have past data available of combinations of factors' values and states of the SI, not used for the models' creation. (e.g., 30 % of the total data).

Both validation parts can be performed using the graphical interface of Netica software, in order to ease the gathering of scenarios.

Model walkthrough Validation

To perform the model walkthrough validation:

  1. Prepare ~10 hypothetical scenarios, each one composed by a combination of factor's states, and the most probable resulting state for the SI.
  2. Input the scenarios one by one and compare the model's output label with highest probability to the expert's perception. If recalibration of a specific row of the CPT is needed, the row has to be refined before proceeding to the next scenario. If the recalibration involves tuning the parents' probabilities, then the already assessed scenarios of this validation need to be validated again. For example, a scenario consisting in states for three factors: two in Medium and one in High. This scenario would be entered into the model to validate its output with the experts. The output probabilities for new SI could be Bad = 65, Neutral = 35 and Good = 0, but the experts could perceive this situation as Good for the SI. So the CPT of the model would need a refinement to reflect the belief of the experts increasing the probability of the High category for the SI. The validation process finishes when the experts agree on the outputs of the model for the validating the selected scenarios.

Outcome Adequacy Validation

To validate the SI estimation model with real data from a past period. The following steps need to be performed:

  1. Compile the values of the factors for the 30% of the data excluded when quantifying probabilities and their obtained corresponding categories into 8-10 scenarios to validate the model, using the ranges of the factors elicited in the first step. For each combination of factors, compile the resulting past assessment for the SI.
  2. Input the scenario to test one by one and assess the model's output compared to the past-assessed SI. If recalibration of a specific row of the CPT is needed, refine and proceed to the next scenario. If the recalibration involves tuning the parents' probabilities. If the recalibration involves tuning the parents' probabilities, then the already assessed scenarios of this validation need to be validated again.