## Generation of churn dummy data using Faker
This workbook generates dummy churn data using Faker for use in data visualization demos. 

The dataset variables consists of:  
- General demographic data
- Customer start date
- If the customer churned or not
- Churn data
- Churn probability
- Competitor that "stole" the customer
- Customer value in the shape of a normal distribution (can be used as sales, portfolio value etc.)

Locale can be changed to generate location specific data. Default is Danish.

In [1]:
from faker import Faker
import pandas as pd
import numpy as np
import datetime
from datetime import date
import random
import locale

### Parameters ###

In [16]:
# Locale for randomly generated data
faker = Faker('da_DK')

# Country for locations
location_country = 'DK'

# Number of records
records = 5000

# Minimum customer age
min_age = 18

# Company start date
start_date = datetime.date(2010, 1, 1)

# Probability of churn
churn_rate = 25

In [17]:
# Custom competitor list
competitors = ['Insuralux','Arrowhead','Beacon','WeProtectYou','Unity','Capital Protectors','Smart Protect','Insurance For You','Smart Life Insurance','MutuTrust','InsCap']

### Generate data

In [18]:
# Generating dummy data
customer ={}

for n in range(records):
    customer[n]={}
    customer[n]['id']= faker.random_number(digits=5)
    customer[n]['name']= faker.name()
    customer[n]['address']= faker.address()
    customer[n]['city']= faker.city()
    customer[n]['post_code']= faker.postcode()
    # Danish locations on land
    customer[n]['location']= faker.local_latlng(location_country)
    customer[n]['email']= faker.email()
    customer[n]['phone']= faker.phone_number()
    # Only adult customers
    customer[n]['birth_date'] = faker.date_of_birth(minimum_age=min_age)
    customer[n]['start_date'] = faker.date_between(start_date)
    # % of churned customers
    customer[n]['churn']= faker.boolean(chance_of_getting_true=churn_rate)
    customer[n]['churn_date'] = faker.date_between(start_date)
    customer[n]['churn_probability'] = faker.pyfloat(min_value=0,max_value=1)
    customer[n]['competitor'] = random.choice(competitors)
    # Customer value assuming a normal distribution
    customer[n]['customer_value'] = np.random.normal(loc=25000,scale=10000)

In [19]:
# Convert dictionary to dataframes
churn_df = pd.DataFrame.from_dict(customer,orient='index')

### A little cleanup

In [20]:
churn_df.loc[churn_df['churn'] == False,'churn_date'] = np.nan

In [21]:
def calculate_age(born):
    today = date.today()
    return today.year - born.year - ((today.month, today.day) < (born.month, born.day))

In [22]:
churn_df['age'] = churn_df['birth_date'].apply(calculate_age)

In [23]:
churn_df.head()

Unnamed: 0,id,name,address,city,post_code,location,email,phone,birth_date,start_date,churn,churn_date,churn_probability,competitor,customer_value,age
0,7724,Dr. Per Mathiasen,Rismosevej 4\n5982 Gentofte,Højslev,8513,"(55.67938, 12.53463, Frederiksberg, DK, Europe...",borisjohansen@example.net,+45 49 78 57 40,1951-11-23,2013-12-01,False,,0.16067,Beacon,6742.754982,71
1,11706,Inga Clausen-Holm,Enghave Allé 18\n7321 Sydals,Hobro,7254,"(55.67938, 12.53463, Frederiksberg, DK, Europe...",eriksennicolai@example.org,+45 26979353,1993-08-16,2022-08-25,False,,0.39,Smart Life Insurance,23402.995223,29
2,30001,Hr Ivan Bruun,Rødkløvervej 69\n2897 Mørkøv,Skovlunde,6859,"(55.67938, 12.53463, Frederiksberg, DK, Europe...",dagnymikkelsen@example.com,17423619,1977-12-13,2010-09-20,False,,0.877329,MutuTrust,19156.903423,45
3,7024,Kent Winther,Lille Kannike Allé 9\n9850 Vig,Ballerup,5345,"(55.67938, 12.53463, Frederiksberg, DK, Europe...",vestergaardmatthias@example.com,+45 55153274,1914-04-27,2010-04-07,False,,0.701186,Insuralux,14279.754803,108
4,32317,Fru Ane Carlsen,Antoinettegade 5\n8796 Ølsted,Balle,4439,"(55.67938, 12.53463, Frederiksberg, DK, Europe...",kjohansen@example.com,3164 7839,2000-06-30,2016-07-01,False,,0.134293,Arrowhead,22715.562717,22


### Export to csv

In [24]:
# Export to csv
churn_df.to_csv('/Users/lars/OneDrive/Datasets/fakechurn/churn.csv',header=True)