## iZettle Data Science Assignment - CPV ##

Thanks for taking time to complete the iZettle data science assignment.
<br>
This is to give you an idea about the challenges that we may face at iZettle. Also, it will help us understand how you would tackle them.
<br>
Feel free to scope the assignment as appropriate. Make sure to motivate what you did and why.

The goal is to estimate the Card Payment Volume (CPV) of each organization
during their first, second and third month after creation (i.e. the sum of the
amount of every transaction in each month). The prediction should be based
on the first 14 days of data after the organization was created.

You’ll find two CSV files attached with the following data:

data/organization_ids.csv:
- organization_id Unique identifier for each organization
- organization_created Date on which organization was onboarded

data/payment_ids.csv:
- organization_id Unique identifier for each organization
- card_payment_id Unique identifier for a payment that was completed
- timestamp Time of event
- currency_id Currency
- amount Amount of the transaction
- device_type The device connected to the card reader
- card_payment_entry_mode Payment technology
- payment_type Valid payment or invalid

### Imports ###

In [1]:
import pandas as pd
import numpy as np

### Data Loading and Preparation ###

We'll load the 2 csv's, and merge them on the organization id.

In [29]:
payment_df = pd.read_csv('payment_ids.csv')

In [30]:
organization_df = pd.read_csv('organization_ids.csv')

In [34]:
master_df = pd.merge(payment_df, organization_df, on='organization_id')

<br>
<br>
The column with the payment amount contains values in multiple currencies. In order to make meaningful predictions, we need to make the amounts uniform. We're going to convert all the amounts to Euros. First, we create variables for the different exchange rates. Then we create a list for EUR amounts which will be used for conversion. Then we convert the currencies, and add the column to the dataframe.

In [35]:
GBP = 1.14
NOK = 0.1
SEK = 0.1
DKK = 0.13

In [36]:
amount_eur = []

In [37]:
for index, row in master_df.iterrows():
    if master_df['currency_id'][index] == 'GBP':
        amount_eur.append(master_df['amount'][index] * GBP)
    elif master_df['currency_id'][index] == 'NOK':
        amount_eur.append(master_df['amount'][index] * NOK)
    elif master_df['currency_id'][index] == 'SEK':
        amount_eur.append(master_df['amount'][index] * SEK)
    elif master_df['currency_id'][index] == 'DKK':
        amount_eur.append(master_df['amount'][index] * DKK)
    else:
        amount_eur.append(master_df['amount'][index])

In [38]:
master_df['amount_eur'] = amount_eur

<br>
<br>
Let's take a quick look at the categorical columns, and see how they're distributed.

In [39]:
master_df.columns

Index(['organization_id', 'timestamp', 'amount', 'currency_id',
       'card_payment_entry_mode', 'device_type', 'payment_type',
       'organization_created', 'amount_eur'],
      dtype='object')

In [40]:
master_df['device_type'].value_counts()

IPAD       525853
IPHONE     233479
ANDROID    176066
Name: device_type, dtype: int64

In [41]:
master_df['payment_type'].value_counts()

Valid      873597
Invalid     61801
Name: payment_type, dtype: int64

In [42]:
master_df['card_payment_entry_mode'].value_counts()

EMV                      794062
CONTACTLESS_EMV          138415
MAGSTRIPE                  2700
MANUAL_ENTRY                102
CARD_HOLDER_ENTRY            69
CONTACTLESS_MAGSTRIPE        50
Name: card_payment_entry_mode, dtype: int64

In [43]:
master_df['currency_id'].value_counts()

GBP    318020
SEK    246520
EUR    181686
NOK    126994
DKK     62178
Name: currency_id, dtype: int64