<div>
<center><img src="img/logistigate_logo2.png" alt="Drawing" width="700"></center>
</div>

## Overview of `logistigate`
Generally speaking, the `logistigate` methods infer aberration likelihoods at entities within a two-echelon supply chain, only using testing data from sample points taken from entities of the lower echelon. It is assumed that products originate within the system at one entity of the upper echelon, and are procured by one entity of the lower echelon. The likelihood of a lower-echelon entity obtaining product from each of the upper-echelon entities is stored in what is deemed the "transition matrix" for that system. Testing of products at the lower echelon yields aberrational (recorded as "1") or acceptable ("0") results. We then distinguish possible information-availability settings into two categories, *Tracked* and *Untracked*:
 - In the *Tracked* case, both the upper-echelon and lower-echelon entities traversed by the tested product are known upon testing.
 - In the *Untracked* case, only the lower-echelon entity is entirely known, in addition to the system's transition matrix.

It is further assumed that products are aberrational at their origin in the upper echelon with some entity-specific fixed probability, and that products acceptable at the upper echelon become aberrational at the destination in the lower echelon with some other entity-specific fixed probabiltiy. It is these fixed probabilities that the `logistigate` methods attempt to infer.

More specifically, the `logistigate` methods were developed with the intent of inferring sources of substandard or falsified products within a pharmaceutical supply chain. Entities of the upper echelon are referred to as *importers*, and entities of the lower echelon are referred to as *outlets*. This terminology is used interchangeably throughout the `logistigate` package.

## Installation
Before using `logistigate`, you need to ensure it is installed in your Python library. To do this, open a Command console, and run the following line, using `pip`:

`pip install git+https://github.com/eugenewickett/logistigate.git#egg=logistigate`

([See here](https://pip.pypa.io/en/stable/installing/) for help installing `pip` if you do not already have it.)

## Example
In this example, we will illustrate `logistigate`'s capabilities through a toy problem. In this problem, we have a supply system that consists of 3 importers (upper-echelon entities) and 12 outlets (lower-echelon entities). Outlets procure products from importers according to a (possibly known or unknown) *transition matrix*, and distribute these products to consumers. The transition matrix for this example is shown here, in percentages:

| . | Importer 1 | Importer 2 | Importer 3 |
| --- | --- | --- | --- |
|Outlet 1| 69 | 22 | 9 |
|Outlet 2| 13 | 32 | 55 |
|Outlet 3| 39 | 51 | 10 |
|Outlet 4| 1 | 92 | 7 |
|Outlet 5| 28 | 22 | 50 |
|Outlet 6| 84 | 4 | 12 |
|Outlet 7| 11 | 60 | 29 |
|Outlet 8| 43 | 25 | 32 |
|Outlet 9| 11 | 65 | 24 |
|Outlet 10| 60 | 05 | 35 |
|Outlet 11| 5 | 15 | 80 |
|Outlet 12| 40 | 15 | 45 |

For instance, a product procured from Outlet 6 has an $84\%$ chance of originating from Importer 1, a $4\%$ chance of originating from Importer 2, and a $12\%$ chance of originating from Importer 3.

In pharmaceutical post-market surveillance, regulators only test products at the consumer-facing end of the supply chain, i.e., from the outlets. We refer to the detection of a poor-quality product as an *aberration*. In this situation where testing only occurs at the outlet level, it is not apparent upon the detection of an aberration whether the ultimate source of the aberration was due to the importer or the outlet. **The crux of `logistigate` is to attempt to infer the aberration levels at entities at both levels of the supply chain, only using testing information from the lower echelon and some degree of supply chain information.**

For our example, let us assume that importers and outlets generate aberrational products as listed in the following tables (again, in percentages):

| Importer 1 | Importer 2 | Importer 3 |
| --- | --- | --- |
| 40 | 30 | 20 |



| Outlet 1 | Outlet 2 | Outlet 3 | Outlet 4 | Outlet 5 | Outlet 6 | Outlet 7 | Outlet 8 | Outlet 9 | Outlet 10 | Outlet 11 | Outlet 12 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| 71 | 3 | 10 | 38 | 28 | 62 | 50 | 9 | 20 | 13 | 80 | 6 |



For example, a product procured from Outlet 3 that originated from Importer 2 has a $0.3 + (1-0.3)*0.1=0.37$ probability of being aberrational. The goal of `logistigate` is to use binary (pass/fail) testing results from the outlets to estimate these different aberration rates at both the importer and outlet levels.

In the event that the importer is known upon the procurement of product to test, we refer to this setting as the *Tracked* framework. The Tracked setting implies that the importer-outlet path is known for all tested products. In the event that the importer for a tested product is not precisely known, and only the transition matrix is known, we refer to this setting as the *Untracked* framework. Note that knowledge of the transition matrix is not necessary within the Tracked setting, but is required for the Untracked setting.



### Tracked framework
Let us first consider how we might use `logistigate` in the Tracked setting. As described above, Tracked data should consist of three different features: outlet, importer, and test result. The following displays the first ten rows of `example1TestData.csv` included in the `logistigate` package:

| Outlet Name | Importer Name | Test Result |
| --- | --- | --- |
| Outlet_03 | Importer_1 |	1 |
| Outlet_09 | Importer_2 |	0 |
| Outlet_12 | Importer_3 |	1 |
| Outlet_01 | Importer_3 |	1 |
| Outlet_05 | Importer_3 |	0 |
| Outlet_09 | Importer_2 |	1 |
| Outlet_01 | Importer_1 |	1 |
| Outlet_03 | Importer_1 |	0 |
| Outlet_09 | Importer_1 |	0 |
| Outlet_08 | Importer_1 |  1 |


In total, `example1TestData.csv` contains 4,000 test rows, generated for this example in the following manner:
 - Randomly (and uniformly) choose one of the 12 outlets; this is our "current" outlet.
 - Randomly choose one of the 3 importers, weighted by the transition matrix at the row of the current outlet; this is our "current" importer.
 - Randomly generate a test result as a function of the current outlet aberration rate, the current importer abberation rate, and the specifications (sensitivity and specificity) of the diagnostic tool.

Given these testing data, `logistigate` will provide estimates of the aberration rates at each of the outlets and importers. 

First, import necessary modules from `logistigate`, as well as the `pkg_resources` module which will allow us to use the CSV file that comes with `logistigate`.

In [1]:
from logistigate import utilities as util
from logistigate import methods, lg
import pkg_resources


Next, we provide the CSV file path to the `TestResultsFileToTable()` function, which takes a CSV file path as input and returns a data dictionary that can be used with other `logistigate` methods.

In [4]:
csv1Path = pkg_resources.resource_filename('logistigate','data/example1TestData.csv')
dataTblDict = util.TestResultsFileToTable(csv1Path)

Let us look at the keys of this `dataTblDict` dictionary:

In [5]:
print(dataTblDict.keys())

dict_keys(['type', 'transMat', 'dataTbl', 'outletNames', 'importerNames', 'N', 'Y'])


`'type'` is either one of Tracked (if 3 input data columns) or Untracked (if only 2 input data columns: outlet name and testing result). If providing Untracked data, then `'transMat'` is also required as an input to `TestResultsFileToTable()` (this will be an empty list for Tracked data). `'dataTbl'` is a Python-friendly list of the testing data CSV. `'outletNames'` and `'importerNames'` are sorted lists of the entered entity names. `'N'` and `'Y'` are matrix (Tracked) or vector (Untracked) summaries of the testing results of `'dataTbl'`, and constitute the principal objects of analysis within the `logistigate` methods. `'N'` contains the total number of tests conducted at each outlet-importer path (Tracked) or outlet (Untracked), while `'Y'` contains the number of positive tests at each outlet-importer path or outlet.

Once the testing data is in a format (`dataTblDict`) suitable for `logistigate`, we need to provide a few more pieces of information needed for the estimation procedures.

In [6]:
dataTblDict.update({'diagSens':0.90, 'diagSpec':0.99, 'numPostSamples':500, 'prior':methods.prior_normal()})

`'diagSens'` and `'diagSpec'` signify the sensitivity and specificity, respectively, of the diagnostic tool used to collect the testing data. For our example, these data were generated using a "device" with $90\%$ sensitivity and $99\%$ specificity.

`'numPostSamples'` refers to the desired number of posterior likelihood distribution samples. These samples are constructed using the testing data provided (as well as a prior "belief" regarding the underlying aberration rates). Each particular sample provides a possible underlying set of aberration rates that might have generated the observed testing data. Taken as a set, these samples provide a picture of likely aberration rate scenarios for each outlet and importer, under the data provided. The number of samples to generate is up to the user, but more samples will require longer computing time.

`'prior'` signifies the prior distribution to use in forming the posterior likelihood. It should capture the user's beliefs, prior to collecting data, surrounding the aberration rates of different entities. `logistigate` includes Normal and Laplace distributions - see `methods` for more details regarding setting prior distributions.