### EDA (Exploratory Data Analysis) - Complete the following tasks to explore the data:

- Show DataFrame info.
- Describe DataFrame.
- Show a plot of the total number of responses.
- Show a plot of the response rate by the sales channel.
- Show a plot of the response rate by the total claim amount.
- Show a plot of the response rate by income.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline

In [None]:
df = pd.read_csv('merged_clean_marketing_customer_analysis.csv')

Show DataFrame info.

In [None]:
df.info()

In [None]:
df = df.drop(['Unnamed: 0'], axis=1) # not sure why the unnamed column is there, it seems to be inserted into every dataframe automatically
df.sample(10)

Describe DataFrame.

In [None]:
df.describe()

In [None]:
df.describe(include='all')

Show a plot of the total number of responses.

In [None]:
df['response'].hist(bins=10)

Show a plot of the response rate by the sales channel.

In [None]:
df['sales_channel'].unique()

In [None]:
df['response'].value_counts()

In [None]:
# I am now renaming the former NaNs ('no data') to 'No' since both can be attributed to a negative performance of a sales channel.
df['response'] = df['response'].replace(['no data'], ['No'])
df['response'].unique()

In [None]:
plt.figure(figsize=(6, 6))
sns.countplot(x='sales_channel', hue='response', data=df)
plt.show()

Show a plot of the response rate by the total claim amount.

In [None]:
sns.boxplot(x='total_claim_amount', data=df)
plt.show()

In [None]:
df['total_claim_amount_range'] = pd.cut(df['total_claim_amount'], 
bins=[0,250,500,750,1000,3000], labels=['0-250', '250-500', '500-750', '750-1000', '>1000'])

In [None]:
plt.figure(figsize=(8,6))
sns.countplot('total_claim_amount_range', hue='response', data=df)
plt.xlabel('Total Claim Amount')
plt.ylabel('Response Count')

Show a plot of the response rate by income.

In [None]:
df['income_group'] = pd.cut(df['income'], bins=[0, 25000, 50000, 75000, 100000], labels=['Low', 'Middle', 'High', 'Very High'])

In [None]:
plt.figure(figsize=(8,6))
sns.countplot('income_group', hue='response', data=df)
plt.xlabel('Income')
plt.ylabel('Response Count')

Also interesting:

In [None]:
filtered_df = df[df['response'] == 'Yes']

In [None]:
sns.distplot(filtered_df['income'], bins=20)
plt.show()

In [None]:
filtered_df = df[df['response'] == 'No']

In [None]:
sns.distplot(filtered_df['income'], bins=10)
plt.show()