# Fundamental Concepts in Data Insight: 
## <font color=indigo>Demo: Automating Insight</font>

### Fundamentals for a General Audience
---

QA Ltd. owns the copyright and other intellectual property rights of this material and asserts its moral rights as the author. All rights reserved.

Below some helpful libraries are loaded for this project,

In [1]:
import pandas as pd

In [109]:
from statistics import mean, median, mode
from random import sample
from operator import itemgetter; value = itemgetter(1)

---

## The Simulation

Suppose we're trying to predict the risk of victimization. 

One method here is to keep a table of risk factors that can be applied to any individual person. The *weights* of these factors can be determined from historical datasets and even expert judgment. 

In [3]:
risk_factors = pd.Series({
    "name":    0.0,
    "arrests": 0.5,
    "age":     0.2
})



In [4]:
risk_factors

name       0.0
arrests    0.5
age        0.2
dtype: float64

We multiply each of these factors by what we observe a person to have,

In [5]:
alice = pd.Series({"name": "Alice", "arrests": 10, "age": 18})

In [6]:
alice

name       Alice
arrests       10
age           18
dtype: object

In [11]:
(
    risk_factors["arrests"] * alice["arrests"] + 
    risk_factors["age"]     * alice["age"] 
)/2


4.3

We can generalise this to table of people,

In [27]:
pd.DataFrame?

In [37]:
people = pd.DataFrame([
    {"name": "Alice", "arrests": 10, "age": 18},
    {"name": "Bob",   "arrests": 10, "age": 21},
    {"name": "Eve",   "arrests": 10, "age": 35},
    {"name": "Lucie", "arrests": 10, "age": 35},
    {"name": "Alex",  "arrests": 10, "age": 35},
]).set_index('name')

factors = pd.Series({
    "arrests": 0.5,
    "age":     0.2
})



In [38]:
people

Unnamed: 0_level_0,arrests,age
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Alice,10,18
Bob,10,21
Eve,10,35
Lucie,10,35
Alex,10,35


In [41]:
results = (people * factors).mean(1)

In [42]:
results

name
Alice    4.3
Bob      4.6
Eve      6.0
Lucie    6.0
Alex     6.0
dtype: float64

## Descriptive Analytics

These are the type of metrics we would include in a report.

Highest risk person,

In [45]:
results.idxmax(), results.max()

('Eve', 6.0)

Lowest risk person,

In [46]:
results.idxmin(), results.min()

('Alice', 4.3)

Median risk,

In [48]:
results.median()

6.0

People with the median,

In [51]:
results[ results == results.median() ]

name
Eve      6.0
Lucie    6.0
Alex     6.0
dtype: float64

A sample of people,

In [50]:
results.sample(3)

name
Lucie    6.0
Bob      4.6
Alex     6.0
dtype: float64

## How do you automate insight?

When building an automation system we will often want to make a **decision** based on these type of measures,

In [62]:
risk_threshold = 5

for name, risk in results.items():
    if risk > risk_threshold:
        print(f"ALERT: {name} above threshold!")

ALERT: Eve above threshold!
ALERT: Lucie above threshold!
ALERT: Alex above threshold!


## Exercise

* Revise the table of people, add in additional observables
* Update the factors to include a risk weight for each factor
* Revise these weights until the risk totals make sense