# The Evaporating Inventory Incident

You have been contracted for your first investigation. The [COO](https://en.wikipedia.org/wiki/Chief_operating_officer) of a company has heard rumors about irregularities in the inventory. Nothing specific but enough to get her curious. 

You have requested a copy of the data from the company's SAP system. You focus on the **material documents**. 

Your objective is to analyze the data to identify and understand the irregularities. **Can you establish sufficient evidence for fraud?**

## Setup

Some initialization to make life easier. **Make sure to run the following cell before proceeding.**

In [335]:
#Allow multiple outputs for each cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'
#Show simple plots in the notebook
import matplotlib.pyplot as plt
%matplotlib inline

We use the following libraries:
* [Pandas](https://pandas.pydata.org) is the most important workhorse in data analytics.
* [Altair](https://altair-viz.github.io) is a visualization library.

In [336]:
import pandas as pd
import altair as alt
from altair import *
import numpy as np
#Format number without any decimals
pd.set_option('display.float_format', lambda x: '%.4f' % x)
#Show altair plots in the notebook
alt.renderers.enable('notebook')

You have received two tables from the SAP system, which contain the information of the **material movements** in the company:
2. The header information of the material documents are stored in the table `MKPF`.
1. The line item information of the material documents is stored in the table `MSEG`.

In [337]:
mkpf_table = pd.read_csv('https://raw.githubusercontent.com/mschermann/forensic_accounting/master/MKPK_EI.csv')
mseg_table = pd.read_csv('https://raw.githubusercontent.com/mschermann/forensic_accounting/master/MSEG_EI.csv',low_memory=False)

## Understanding the tables

### The MKPF table

For our purposes, we use the following columns from `MKPF`:
* `MBLNR` - Contains the material document number.
* `USNAM` - Contains the inventory employee who posted the material document.

### The MSEG table

For our purposes, we use the following columns from `MSEG`:
* `MBLNR` - Contains the material document number.
* `BWART` - Contains the movement type of the line item. This [link](https://wiki.scn.sap.com/wiki/display/ERPLO/Movement+types) contains information about the movement types.
* `MATNR` - Contains the material id of the material moved.
* `LGORT` - Contains the storage location.
* `DMBTR` - Contains the value of the material movement.
* `MENGE` - Contains the volume of the movement in units of the material. 

## Understanding the company

After a meeting you understand that the company has four locations:
* `DL00` - the factory in Dallas, TX
* `MI00` - the factory in Miami, FL
* `SD00` - the factory in San Diego, CA
* `SC00` - the factory in Santa Clara, CA

Furthermore, the company has the following **groups** of raw materials in its storage locations (`RM00`) across the locations in the company code `US00`. You keep a record of the groups in the following variables.

**Marvelous Materials:**
* `ADAMANTIUM`
* `CARBONITE`
* `KRYPTONITE`

In [338]:
marvel = ['ADAMANTIUM', 'CARBONITE', 'KRYPTONITE']

**Boring Metals:**
* `GOLD` 
* `IRON`
* `PALLADIUM`
* `SILVER`

In [339]:
boring = ['GOLD', 'IRON', 'PALLADIUM', 'SILVER']

**Gems:**
* `ANGOLAN AMETHYST`
* `AUSTRALIAN AMETHYST`
* `BOTSWANA BLACK PE`
* `BRITAIN BLACK PEAR`
* `CANADIAN CRYSTAL`
* `CONGOCRYSTAL`
* `DIAMOND`
* `ROMANIAN RUBY`
* `RUBY`
* `RUSSIAN RUBY`
* `SA SAPPHIRE`
* `SWISS SAPPHIRE`

In [340]:
gems = ['ANGOLAN AMETHYST','AUSTRALIAN AMETHYST', 'BOTSWANA BLACK PE',\
        'BRITAIN BLACK PEAR', 'CANADIAN CRYSTAL', 'CONGOCRYSTAL', \
        'DIAMOND', 'ROMANIAN RUBY', 'RUBY', \
        'RUSSIAN RUBY', 'SA SAPPHIRE','SWISS SAPPHIRE']

**Essentials:**
* `CHRONIUM`
* `CONCRETE`
* `ICAN HEADMASK`
* `MAGICDUST`
* `ORANGE`
* `PAPER`
* `ROSE ESSENTIAL OIL`
* `TEST`
* `WALLET`

In [341]:
essentials = ['CHRONIUM', 'CONCRETE',  'ICAN HEADMASK', \
              'MAGICDUST', 'ORANGE',  'PAPER', \
              'ROSE ESSENTIAL OIL',  'TEST', 'WALLET']

**Consumables:**
* `AAA LUBE` 
* `BLUEPAINT`
* `BOLT1000`
* `HEXNT`
* `LIQUID`
* `ZTESTHEXNT`

In [342]:
consumables = ['AAA LUBE', 'BLUEPAINT', 'BOLT1000', \
               'HEXNT','LIQUID','ZTESTHEXNT']

## Your investigation

### Reduce the tables to the relevant columns

**Your objective:** Focus on the columns that are important to your investigation.

### Develop an overview of the material movements

**Your objective:** Identify potential irregularities in the material movement data.

Think along the following lines:

* What are the differences between the locations?
* What are the differences between the movement types?
* What are the relationships between the movement types?

**Please note**: You cannot have numbers as column headers.

## Is there fraud?

**Reflect:** Can you explain the fraud?