# The Caribbean Cruise Incident

You have received an anonymous tip that an purchase group employee in the purchasing group for industrial oils has been on a caribbean cruise for the third time in the last two years. Since spending beyond means is a red flag, you begin an investigation. 

You have requested a copy of the data from the SAP system. You focus on the **purchase orders**. Analyze the data to understand the irregularity. 

**Can you establish sufficient evidence?**

## Setup

Some initialization to make life easier. **Make sure to run the following cell before proceeding.**

In [1]:
#Allow multiple outputs for each cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'
#Show simple plots in the notebook
import matplotlib.pyplot as plt
%matplotlib inline

We use the following libraries:
* [Pandas](https://pandas.pydata.org) is the most important workhorse in data analytics.
* [Altair](https://altair-viz.github.io) is a visualization library.

In [2]:
import pandas as pd
import altair as alt
#Format number without any decimals
pd.set_option('display.float_format', lambda x: '%.0f' % x)
#Show altair plots in the notebook
alt.renderers.enable('notebook')

You have received three tables from the SAP system that contain all the information of the `purchase order` step:
2. The header information of the purchase orders is stored in the table `EKKO`.
1. The line items information of the purchase orders is stored in the table `EKPO`.
3. The sets of applicable conditions associated with the purchase orders are stores in the table `KONV`.

In [3]:
ekko_table = pd.read_csv('https://raw.githubusercontent.com/mschermann/forensic_accounting/master/EKKO.csv')
ekpo_table = pd.read_csv('https://raw.githubusercontent.com/mschermann/forensic_accounting/master/EKPO.csv')
konv_table = pd.read_csv('https://raw.githubusercontent.com/mschermann/forensic_accounting/master/KONV.csv')

The tables from with huge numbers of columns.

## Understanding the Data

You can find the definition of all the columns in the SAP system using the transaction code `SE16`.

In [4]:
ekko_table.columns
ekpo_table.columns
konv_table.columns

Index(['MANDT', 'EBELN', 'BUKRS', 'BSTYP', 'BSART', 'BSAKZ', 'LOEKZ', 'STATU',
       'AEDAT', 'ERNAM',
       ...
       'OTB_RES_VALUE', 'OTB_SPEC_VALUE', 'SPR_RSN_PROFILE', 'BUDG_TYPE',
       'OTB_STATUS', 'OTB_REASON', 'CHECK_TYPE', 'CON_OTB_REQ',
       'CON_PREBOOK_LEV', 'CON_DISTR_LEV'],
      dtype='object', length=137)

Index(['MANDT', 'EBELN', 'EBELP', 'LOEKZ', 'STATU', 'AEDAT', 'TXZ01', 'MATNR',
       'EMATN', 'BUKRS',
       ...
       'FSH_SS', 'FSH_GRID_COND_REC', 'FSH_PSM_PFM_SPLIT', 'CNFM_QTY',
       'REF_ITEM', 'SOURCE_ID', 'SOURCE_KEY', 'PUT_BACK', 'POL_ID',
       'CONS_ORDER'],
      dtype='object', length=300)

Index(['MANDT', 'KNUMV', 'KPOSN', 'STUNR', 'ZAEHK', 'KAPPL', 'KSCHL', 'KDATU',
       'KRECH', 'KAWRT', 'KBETR', 'WAERS', 'KKURS', 'KPEIN', 'KMEIN', 'KUMZA',
       'KUMNE', 'KNTYP', 'KSTAT', 'KNPRS', 'KRUEK', 'KRELI', 'KHERK', 'KGRPE',
       'KOUPD', 'KOLNR', 'KNUMH', 'KOPOS', 'KVSL1', 'SAKN1', 'MWSK1', 'KVSL2',
       'SAKN2', 'MWSK2', 'LIFNR', 'KUNNR', 'KDIFF', 'KWERT', 'KSTEU', 'KINAK',
       'KOAID', 'ZAEKO', 'KMXAW', 'KMXWR', 'KFAKTOR', 'KDUPL', 'KFAKTOR1',
       'KZBZG', 'KSTBS', 'KONMS', 'KONWS', 'KAWRT_K', 'KWAEH', 'KWERT_K',
       'KFKIV', 'KVARC', 'KMPRS', 'PRSQU', 'VARCOND', 'STUFE', 'WEGXX',
       'KTREL', 'MDFLG', 'TXJLV', 'KBFLAG', 'KOLNR3', 'CPF_GUID', 'KAQTY'],
      dtype='object')

### The EKKO table

For our purposes, we use the following columns from `EKKO`:
* `EBELN` - Contains the purchase order number.
* `ERNAM` - Contains the purchase group employee who authorized the purchase order.
* `LIFNR` - Contains the vendor unique identifier that received the purchase order.
* `KNUMV` - Contains the link to the set of conditions associated with the purchase order.

### The EKPO table

For our purposes, we use the following columns from `EKPO`:
* `EBELN` - Contains the purchase order number.
* `EBELP` - Contains the line item identifier.
* `TXZ01` - Contains a textual description of the material.
* `MATNR` - Contains the material unique identifier.
* `MENGE` - Contains the amount of material ordered.
* `NETPR` - Contains the effective net price of material ordered.
* `NETWR` - Contains the effective net value of material ordered (i.e., `MENGE * NETPR`).

### The KONV table

For our purposes, we use the following columns from `KONV`:
* `KNUMV` - Contains the unique identifier for the condition set.
* `KPOSN` - Contains the line item identifier.
* `KSCHL` - Contains the type of a condition.
* `KAWRT` - Contains the baseline of a conditon.
* `KBETR` - Contains the effective price.

The following condition types are relevant for this case study:
* `PBXX` - Gross price
* `RB00` - Absolute discounts
* `NAVM` - Tax deductions
* `SKTO` - Cash discounts related to payment goals
* `WOTB` - Effective price

The following variable contains all relevant condition types.

In [5]:
cond_types = ['NAVM','PBXX','RB00','SKTO','WOTB']

## Clean the data

**Your task:** Reduce the `EKKO` table to the columns of interest. Store the result in a variable called `ekko`.

**Your task:** Show the first rows of `ekko`.

**Your task:** Reduce the `EKPO` table to the columns of interest. Store the result in a variable called `ekpo`.

**Your task:** Show the first rows of `ekpo`.

**Your task:** Reduce the `KONV` table to the columns of interest. Store the result in a variable called `konv`.

**Your task:** Show the first five rows of the `konv`.

## Focus on Brent Crude Oil

**Your task:** Filter the line items that contain orders of Brent Crude Oil (`MATNR`: `BRENTCRUDE`). Store the result in a variable called `ekpo_bco`.

**Your task:** Show how many purchase orders contain orders of Brent Crude Oil?

## The value of the Brent Crude Oil purchase orders

**Your task:** What is the overall value of all purchases order of Brent Crude Oil?

**Your task:** Show how the effective net prices (`NETPR`) of all purchase orders of Brent Crude Oil changes over time. *(Assume that the order of purchase orders represents time)*.

In [None]:
v_netpr = alt.Chart().mark_line().encode(
    x=alt.X('index', axis=alt.Axis(title='Purchase Orders')),
    y=alt.Y('NETPR', scale=alt.Scale(domain=(4100, 4700)), axis=alt.Axis(title='Effective Net Price of Brent Crude Oil'))
)
alt.layer(v_netpr, data = ekpo_bco.reset_index(), width=600, height=300)

**Reflect:** What is your interpretation of the effective net price?

## The volume of the Brent Crude Oil purchase orders

**Your task:** What is the average order volume for Brent Crude Oil?

**Your task:** Show how amount of all purchase orders (`MENGE`) of Brent Crude Oil. (We assume that the order of purchase orders represents time).

In [None]:
v_menge = alt.Chart().mark_line().encode(
    x=alt.X('index', axis=alt.Axis(title='Purchase Orders')),
    y=alt.Y('MENGE', scale=alt.Scale(domain=(19500, 20500)), axis=alt.Axis(title='Order Volume of Brent Crude Oil'))
)
alt.layer(v_menge, data = ekpo_bco.reset_index(), width=600, height=300)

**Reflect:** What is your interpretation of the order volume?

## Integrated analysis of `EKKO` and `EKPO`

**Your task:** Left join the `ekpo_bco` and the `ekko` tables. Store the result in a variable called `ekko_ekpo_bco`.

**Your task:** Show a sample of the `ekko_ekpo_bco` table.

**Your task:** How many vendors (`LIFNR`) deliver Brent Crude Oil?

**Your task:** How many purchase group employees (`ERNAM`) are responsible for ordering Brent Crude Oil?

**Your task:** Are there any differences in the average order volume between the purchase group employees (`ERNAM`)?

In [None]:
base = alt.Chart().mark_line().encode(
    x=alt.X('index', axis=alt.Axis(title='Purchase Orders')),
    y=alt.Y('MENGE', scale=alt.Scale(domain=(19760, 20250)), axis=alt.Axis(title='Order Volume of Brent Crude Oil')),
    color=alt.Color('ERNAM', legend=alt.Legend(title="Purchase Group Employee"))
)
alt.layer(base, data = ekko_ekpo_bco.reset_index(), width=600, height=300)

## Analysis of the Conditions

**Your task:** Filter the conditions for the purchase order of Brent Crude Oil? Store the result in a variable called `konv_bco`.

**Your task:** Show the first five rows of `konv_bco`.

**Your task:** What is the interesting condition type?

**Your task:** Add dummies for the condition type to `konv_bco`. Store the result in a variable called `konv_bco_d`?

**Your task:** Show the first five rows of `konv_bco_d`.

**Your task:** Filter the sets of conditions that contain the condition type of interest. Store the result in a variable called by the name of the condition type.

**Your task:** Left join the condition type of interest to the other tables. Store the result in a variable called `eek_bco`.

## What is the fraud?

**Your task:** What does the fraudster exploit?

**Your task:** Who is the person of interest?

**Your task:** What is the financial damage?

**Reflect:** Can you explain the fraud?