Name: Sachin Mathur <br>
Team Members: Jeff Winchell, Charlie Flanagan

# Fruit Inspection Challenge

## CSCI E82A

>**Make sure** you include your name along with the name of your team and team members in the notebook you submit. 

## Introduction

In a previous homework assignment you computed the utility of various approaches to fruit inspection using two unreliable sensors along with human inspection. This challenge exercise differs in from the homework assignment in the following ways:
1. Most importantly, this is not a guided lab, but rather you are free to apply the methods of your choice
2. The parameters for the CPDs must be estimated from data samples.
3. There are a larger number of CPDs.
4. You will perform a query on your graphical model.  

###  Background

Bob's Orchards is a premium seller of apples and pears. Bob's customers pay a substantial premium for superior fruit. To satisfy these customers, Bob's must ensure that the fruit delivered is correctly packed and perfectly ripe. 

Like many legacy industries requiring specialized human skills, Bob's is facing a talent problem. Many of the human inspectors who expertly check each piece of fruit shipped for ripeness are retiring. Management's attempts to recruit and train younger people to apprentice as fruit inspectors have been mixed. In fact, not only is it difficult to recruit people willing to train as inspectors but, it is believed that the newly trained inspectors are prone to errors. Therefore, it has become imperative to find some type of automated system which can reduce the workload on the diminishing number of human inspectors. To address this problem, Bob's has deployed technology from Robots R Us.

The first robotic system to be deployed at Bob's uses a multi-sensor array to determine if the fruit being shipped is at the correct ripeness. There are two sensors, a vision system that examines the fruit for spots or damage, indicating the fruit is over ripe, and a smell sensor that determines if the fruit is not ripe enough. If either sensor indicates the fruit is bad it is sent to a human inspector. In addition customers may reject even perfect fruit for no apparent reason, whereas others seem perfectly happy with less than perfect fruit.   



## Scenario 

In order to better understand the fruit inspection process and customer acceptance of the fruit, Bob's management has authorized the shipment of 1,000 randomly selected orders. All available inspection methods will be applied to each order. Further, a team of the most experienced inspectors will provide an absolute baseline on order quality. The orders will be shipped to customers regardless of the outcome of the inspections. 

Shipping orders regardless of inspection outcome is a significant departure from long-held beliefs and traditions at Bobs. However, the data collected provide a powerful source of information for improving Bob's overall customer satisfaction, which is highly valued by Bob's management.   

Your goal, as the consulting team, is to determine which inspection methods and any other possible process improvement Bob's should apply to maximize customer satisfaction as measured by utility. You will use the data collected from the 1,000 orders to 

### Data description 

For the 1,000 orders in the test sample a number of attributes have been collected. These data are in the `fruit_data.csv` file. The columns in the data set are:
1.  **weather:** indicates the weather conditions the day before the fruit is harvested; 0 = wet, 1 = dry. Prior information indicates that the statistics of weather are constant over the harvest period. 
2. **week:** indicates the week the fruit is harvested; 0 = week 1, 1 = week2. There is a two week harvest season for Bob's orchard where the fruit comes from. 
3. **good_bad** is the quality assigned to the fruit shipment by an independent inspection team of highly experienced inspectors. At least three inspectors has agreed on the fruit quality and these indicators are believed to have absolute accuracy. 
4. **smell_sensor:** are the indicators emitted by the smell fruit inspection sensor; 0 = bad, 1 = good.
5. **visual_sensor:** are the indicators emitted by the visual fruit inspection sensor; 0 = bad, 1 = good.
6. **inspector:** are the indicators determined by the single entry-level fruit inspectors; 0 = bad, 1 = good.
7. **accepted:** indicates if the customer accepted the order as received, or complained and requested an adjustment; 0 = not accepted, 1 = accepted. 

### Bayesian Graph Representation

A directed acyclic graph (DAG) representing the fruit quality process is shown in the diagram below.  

<img src="FruitQualityGraph.JPG" alt="Drawing" style="width:800px; height:450px"/>
<center> **DAG fruit quality process**    
Decision nodes are not shown for simplicity</center>

The representation shown in the diagram illustrates the CDPs in a DAG. There are a number of utility nodes shown. Notice, that the multiple decision nodes are not shown.   

There are two utility functions in this problem. The **utility of a human inspection** is -10.0. And the **utility of the satisfied and unsatisfied customers** is:

|  | Satisfied | Not Satisfied |
|----|----|----|
|Utility | 20 | -40 |

Notice that the DAG shows causality between the CDPs. **Consider how this causality is important in the representation of this problem**. 

## Approach
Use the raw data to compute CPDs. This can be calculated by using the raw data given in the table
Data given:
1. Week
2. Weather
3. Smell
4. Visual
5. Customer Acceptance
6. Inspector Accuracy (a single inspector)
7. Good_bad --- The quality derived by the 3 inspectors

U2: Utility when there is no inspection given good/bad fruit
U1: Cost associated with fruit that was not detected as either not-Week or not-Weather or both


In [1]:
import numpy as np
import numpy.random as nr
import pandas as pd

## Read Data

In [2]:
data = pd.read_csv("fruit_data.csv")
data.head()

Unnamed: 0,weather,week,good_bad,smell_sensor,visual_sensor,inspector,accepted
0,1,1,1,1,1,1,1
1,1,0,1,1,0,1,1
2,1,0,1,1,0,1,1
3,0,1,0,1,0,0,0
4,1,0,1,1,1,1,1


In [3]:
# calculation of fruit quality
data['fruit_quality'] = np.where(((data['weather'] == 1) & (data['week'] == 1 ) & (data['good_bad'] == 1 )), 1, 0)

# calculation of sensor_inspect node
data['sensor_inspect'] = np.where(((data['smell_sensor'] == 1) & (data['visual_sensor'] == 1 ) & (data['fruit_quality'] == 1 )), 1, 0)

# calculation of inspector accuracy
data['inspector_accuracy'] = np.where(((data['inspector'] == data['good_bad'] )), 1, 0)

# calculation of manual inspection
data['manual_inspection'] = np.where(((data['fruit_quality'] == 0 )), 1, 0)

# calculation of manual inspection acceptance
data['manual_inspection_accept'] = np.where(((data['manual_inspection'] == 1 )), 1, 0)

# calculation of no inspection acceptance
data['no_inspection_accept'] = np.where(((data['fruit_quality'] == 1) & (data['accepted'] == 1 )), 1, 0)

# calculation of sensor impact acceptance
data['sensor_impact_accept'] = np.where(((data['sensor_inspect'] == 1) & (data['accepted'] == 1 )), 1, 0)

# calculation of sensor manual inspection
data['sensor_manual_inspect'] = np.where(((data['sensor_inspect'] == 0) & (data['inspector_accuracy'] == 0 )), 1, 0)

# calculation of acceptance after manual inspection
data['sensor_manual_inspect_accept'] = np.where(((data['sensor_manual_inspect'] == 1) & (data['accepted'] == 1 )), 1, 0)


In [4]:
#Fruit Quality
print(np.where(((data['weather'] == 0) & (data['week'] == 0 ) & (data['good_bad'] == 0 )), 1, 0).sum())
print(np.where(((data['weather'] == 0) & (data['week'] == 1 ) & (data['good_bad'] == 0 )), 1, 0).sum())
print(np.where(((data['weather'] == 1) & (data['week'] == 0 ) & (data['good_bad'] == 0 )), 1, 0).sum())
print(np.where(((data['weather'] == 1) & (data['week'] == 1 ) & (data['good_bad'] == 0 )), 1, 0).sum())
print(np.where(((data['weather'] == 0) & (data['week'] == 0 ) & (data['good_bad'] == 1 )), 1, 0).sum())
print(np.where(((data['weather'] == 0) & (data['week'] == 1 ) & (data['good_bad'] == 1 )), 1, 0).sum())
print(np.where(((data['weather'] == 1) & (data['week'] == 0 ) & (data['good_bad'] == 1 )), 1, 0).sum())
print(np.where(((data['weather'] == 1) & (data['week'] == 1 ) & (data['good_bad'] == 1 )), 1, 0).sum())

#acccept_good_bad_matrix
print("acccept_good_bad_matrix")
print(np.where(((data['accepted'] == 0) & (data['good_bad'] == 0 )), 1, 0).sum())
print(np.where(((data['accepted'] == 0) & (data['good_bad'] == 1 )), 1, 0).sum())
print(np.where(((data['accepted'] == 1) & (data['good_bad'] == 0 )), 1, 0).sum())
print(np.where(((data['accepted'] == 1) & (data['good_bad'] == 1 )), 1, 0).sum())

40
48
47
27
123
126
311
278
acccept_good_bad_matrix
142
37
20
801


In [14]:
U2 = [20, -40]
CustomerAccep_given_GoodQuality = 801/(801+37)
CustomerAccep_given_BadQuality = 20/(20+142)
CustomerNonAccep_given_GoodQuality = 37/(801+37)
CustomerNonAccep_given_BadQuality = 142/(20+142)
bad_quality = 0.162
good_quality = 0.838
quality = np.array([bad_quality, good_quality])

Customer_quality = np.array([[CustomerAccep_given_GoodQuality, CustomerAccep_given_BadQuality],
                            [CustomerNonAccep_given_GoodQuality, CustomerNonAccep_given_BadQuality]])
col_names = ['Accept', 'Not-Accept']
row_names = ['Good', 'Bad']
print(pd.DataFrame(Customer_quality, columns = col_names, index = row_names))
#print(np.transpose(quality))

customerAccep_GoodQuality = 20 * (CustomerAccep_given_GoodQuality * good_quality + \
                                  CustomerAccep_given_BadQuality * bad_quality)

customerAccep_BadQuality = -40 * (CustomerNonAccep_given_GoodQuality * good_quality + \
                                  CustomerNonAccep_given_BadQuality * bad_quality)

print(customerAccep_GoodQuality)
print(customerAccep_BadQuality)
u2_utility_cost = customerAccep_GoodQuality + customerAccep_BadQuality
print("U2_utility_Cost = ",u2_utility_cost)


        Accept  Not-Accept
Good  0.955847    0.123457
Bad   0.044153    0.876543
16.42
-7.160000000000001
U2_utility_Cost 9.260000000000002


In [20]:
#Calculation of utility-1
#Using only the marginals where week and weather do not agree --- only that fruit is sent for inspection
Bad_notagree_or_both0 = (40+48+47)/695
print(Bad_notagree_or_both0)
Good_notagree_or_both0 = (123+126+311)/695
print(Good_notagree_or_both0)
Bad_agree = 27/(27+278)
Good_agree = 278/(27+278)

#Cost for manual inspection = -10
#print(np.where((data['inspector'] == 0),1,0).sum())
#print(np.where((data['inspector'] == 1),1,0).sum())
inspector_bad = np.where((data['inspector'] == 0),1,0).sum()/1000
inspector_good = np.where((data['inspector'] == 1),1,0).sum()/1000

inspector = [inspector_bad, inspector_good]
weather_week_agree = np.array([[Bad_notagree_or_both0, Bad_agree],
                               [Good_notagree_or_both0, Good_agree]])
print(weather_week_agree)

inspector_weather_week = np.transpose(np.multiply(np.transpose(weather_week_agree),inspector))
print(inspector_weather_week)
inspector_weather_week_marginal = np.sum(inspector_weather_week, axis = 0)

#Compute U1 cost ---- only the fruit that is in disagreement is sent for inspection
u1_cost = -10 * (0.1942446 + 0.08852459)
print("Inspection cost - U1 Cost = ",u1_cost)

0.19424460431654678
0.8057553956834532
[[0.1942446  0.08852459]
 [0.8057554  0.91147541]]
[[0.03438129 0.01566885]
 [0.66313669 0.75014426]]
Inspection cost - U1 Cost =  -2.8276919


In [15]:
#Compute U3
print(Customer_quality)
print(inspector_weather_week_marginal)

customerAccep_GoodQuality_givenInspectorFinding = 20 * (CustomerAccep_given_GoodQuality * 0.89276297 + \
                                  CustomerAccep_given_BadQuality * 0.10723703)

customerAccep_BadQuality_givenInspectorFinding = -40 * (CustomerNonAccep_given_GoodQuality * 0.89276297 + \
                                  CustomerNonAccep_given_BadQuality * 0.10723703)
print(customerAccep_GoodQuality_givenInspectorFinding)
print(customerAccep_BadQuality_givenInspectorFinding)
print("U3_Cost = ", customerAccep_GoodQuality_givenInspectorFinding + customerAccep_BadQuality_givenInspectorFinding)

[[0.95584726 0.12345679]
 [0.04415274 0.87654321]]
[0.10723703 0.89276297]
17.331683481531574
-5.336633036936858
U3_Cost 11.995050444594716


In [17]:
#Sensor Inspect
print(np.where(((data['smell_sensor'] == 0) & (data['visual_sensor'] == 0 ) & (data['good_bad'] == 0 )), 1, 0).sum())
print(np.where(((data['smell_sensor'] == 0) & (data['visual_sensor'] == 1 ) & (data['good_bad'] == 0 )), 1, 0).sum())
print(np.where(((data['smell_sensor'] == 1) & (data['visual_sensor'] == 0 ) & (data['good_bad'] == 0 )), 1, 0).sum())
print(np.where(((data['smell_sensor'] == 1) & (data['visual_sensor'] == 1 ) & (data['good_bad'] == 0 )), 1, 0).sum())
print(np.where(((data['smell_sensor'] == 0) & (data['visual_sensor'] == 0 ) & (data['good_bad'] == 1 )), 1, 0).sum())
print(np.where(((data['smell_sensor'] == 0) & (data['visual_sensor'] == 1 ) & (data['good_bad'] == 1 )), 1, 0).sum())
print(np.where(((data['smell_sensor'] == 1) & (data['visual_sensor'] == 0 ) & (data['good_bad'] == 1 )), 1, 0).sum())
print(np.where(((data['smell_sensor'] == 1) & (data['visual_sensor'] == 1 ) & (data['good_bad'] == 1 )), 1, 0).sum())

81
10
67
4
15
214
77
532


In [25]:
SV_Bad_notagree_or_both0 = (81+10+67)/464
print(Bad_notagree_or_both0)
SV_Good_notagree_or_both0 = (15+214+77)/464
print(Good_notagree_or_both0)
SV_Bad_agree = 4/(4+532)
SV_Good_agree = 532/(4+532)

smell_visual_agree = np.array([[SV_Bad_notagree_or_both0, SV_Bad_agree],
                               [SV_Good_notagree_or_both0, SV_Good_agree]])
print(smell_visual_agree)
inspector_smell_visual = np.transpose(np.multiply(np.transpose(smell_visual_agree),inspector))
print(inspector_smell_visual)
inspector_smell_visual_marginal = np.sum(inspector_smell_visual, axis = 0)
print(inspector_smell_visual_marginal)
#Normalize the marginals
inspector_smell_visual_marginal = inspector_smell_visual_marginal/np.sum(inspector_smell_visual_marginal)
print(inspector_smell_visual_marginal)

#Compute U4

customerAccep_GoodQuality_givenInspector_SVFinding = 20 * (CustomerAccep_given_GoodQuality * 0.57569395 + \
                                  CustomerAccep_given_BadQuality * 0.42430605)

customerAccep_BadQuality_givenInspector_SVFinding = -40 * (CustomerNonAccep_given_GoodQuality * 0.57569395 + \
                                  CustomerNonAccep_given_BadQuality * 0.42430605)
print(customerAccep_GoodQuality_givenInspector_SVFinding)
print(customerAccep_BadQuality_givenInspector_SVFinding)
print("U4_Cost = ", customerAccep_GoodQuality_givenInspector_SVFinding + customerAccep_BadQuality_givenInspector_SVFinding)



0.19424460431654678
0.8057553956834532
[[0.34051724 0.00746269]
 [0.65948276 0.99253731]]
[[0.06027155 0.0013209 ]
 [0.54275431 0.81685821]]
[0.60302586 0.8181791 ]
[0.42430605 0.57569395]
12.053178900070716
-15.893642199858569
U4_Cost -3.840463299787853


In [None]:
#U5

