In [1]:
import altair as alt
import pandas as pd

# Data Overview

Source: [Kaggle - Marketing Campaign Dataset](https://www.kaggle.com/datasets/rodsaldanha/arketing-campaign)

This is an artificial dataset containing data about 5 different marketing campaigns.

## Content

- **AcceptedCmp1**: 1 if customer accepted the offer in the 1st campaign, 0 otherwise
- **AcceptedCmp2**: 1 if customer accepted the offer in the 2nd campaign, 0 otherwise
- **AcceptedCmp3**: 1 if customer accepted the offer in the 3rd campaign, 0 otherwise
- **AcceptedCmp4**: 1 if customer accepted the offer in the 4th campaign, 0 otherwise
- **AcceptedCmp5**: 1 if customer accepted the offer in the 5th campaign, 0 otherwise
- **Response (target)**: 1 if customer accepted the offer in the last campaign, 0 otherwise
- **Complain**: 1 if customer complained in the last 2 years
- **DtCustomer**: date of customer’s enrolment with the company
- **Education**: customer’s level of education
- **Marital**: customer’s marital status
- **Kidhome**: number of small children in customer’s household
- **Teenhome**: number of teenagers in customer’s household
- **Income**: customer’s yearly household income
- **MntFishProducts**: amount spent on fish products in the last 2 years
- **MntMeatProducts**: amount spent on meat products in the last 2 years
- **MntFruits**: amount spent on fruits products in the last 2 years
- **MntSweetProducts**: amount spent on sweet products in the last 2 years
- **MntWines**: amount spent on wine products in the last 2 years
- **MntGoldProds**: amount spent on gold products in the last 2 years
- **NumDealsPurchases**: number of purchases made with a discount
- **NumCatalogPurchases**: number of purchases made using a catalogue
- **NumStorePurchases**: number of purchases made directly in stores
- **NumWebPurchases**: number of purchases made through the company’s website
- **NumWebVisitsMonth**: number of visits to the company’s website in the last month
- **Recency**: number of days since the last purchase


In [2]:
df =  pd.read_csv("data/marketing_campaign.csv", sep=";")


In [3]:
df.columns

Index(['ID', 'Year_Birth', 'Education', 'Marital_Status', 'Income', 'Kidhome',
       'Teenhome', 'Dt_Customer', 'Recency', 'MntWines', 'MntFruits',
       'MntMeatProducts', 'MntFishProducts', 'MntSweetProducts',
       'MntGoldProds', 'NumDealsPurchases', 'NumWebPurchases',
       'NumCatalogPurchases', 'NumStorePurchases', 'NumWebVisitsMonth',
       'AcceptedCmp3', 'AcceptedCmp4', 'AcceptedCmp5', 'AcceptedCmp1',
       'AcceptedCmp2', 'Complain', 'Z_CostContact', 'Z_Revenue', 'Response'],
      dtype='object')

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2240 entries, 0 to 2239
Data columns (total 29 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   ID                   2240 non-null   int64  
 1   Year_Birth           2240 non-null   int64  
 2   Education            2240 non-null   object 
 3   Marital_Status       2240 non-null   object 
 4   Income               2216 non-null   float64
 5   Kidhome              2240 non-null   int64  
 6   Teenhome             2240 non-null   int64  
 7   Dt_Customer          2240 non-null   object 
 8   Recency              2240 non-null   int64  
 9   MntWines             2240 non-null   int64  
 10  MntFruits            2240 non-null   int64  
 11  MntMeatProducts      2240 non-null   int64  
 12  MntFishProducts      2240 non-null   int64  
 13  MntSweetProducts     2240 non-null   int64  
 14  MntGoldProds         2240 non-null   int64  
 15  NumDealsPurchases    2240 non-null   i

In [5]:
df.head()

Unnamed: 0,ID,Year_Birth,Education,Marital_Status,Income,Kidhome,Teenhome,Dt_Customer,Recency,MntWines,...,NumWebVisitsMonth,AcceptedCmp3,AcceptedCmp4,AcceptedCmp5,AcceptedCmp1,AcceptedCmp2,Complain,Z_CostContact,Z_Revenue,Response
0,5524,1957,Graduation,Single,58138.0,0,0,2012-09-04,58,635,...,7,0,0,0,0,0,0,3,11,1
1,2174,1954,Graduation,Single,46344.0,1,1,2014-03-08,38,11,...,5,0,0,0,0,0,0,3,11,0
2,4141,1965,Graduation,Together,71613.0,0,0,2013-08-21,26,426,...,4,0,0,0,0,0,0,3,11,0
3,6182,1984,Graduation,Together,26646.0,1,0,2014-02-10,26,11,...,6,0,0,0,0,0,0,3,11,0
4,5324,1981,PhD,Married,58293.0,1,0,2014-01-19,94,173,...,5,0,0,0,0,0,0,3,11,0


# Discuss Data and Goals
## Week 1: Finding Your Data

**Task**:
Locate a dataset that you are interested in working with. The data should be sufficiently complex that you can ask lots of questions about it and engage in creative design techniques, but not so complex that you need specialized hardware or algorithmic approaches to analyze. While you are welcome to use any data you’d like, I recommend that your datasets are tabular (e.g., CSV, TSV, SQL, etc.), contain 5,000 or fewer datapoints (on the order of one hundred or so tends to be sufficiently interesting without causing lag in Altair), and is data that you’re comfortable discussing as part of the course (e.g., avoid data that is overly private or classified). 

Discuss your dataset, including the data’s source, key attributes/dimensions of the data, and your goals for working with that data (i.e., what are the key questions you want to answer). Identify existing relevant visualizations for working with that data (either using the same data, showing the same concepts, or just that might provide some inspiration) and critique those visualizations based on the practices from this module. What works well? What might need improvement or to change to answer your target questions? 



**Dataset**: As mentioned above, this dataset is from Kaggle and it contains data about customers that were engaged in marketing company. The data might contain valuable insights related to marketing company. To be more specific,
we have **29 columns** and **2240 records**.

**Key dimensions**: From visualization design's point of view, the key dimensions in our data are demographic features like *Age, Education, Marital Status, Income* as well as bussiness-specific features *(Recency, NumWebVisits, Complain)*.
 And certainly the most important variable that is interested for bussiness - Response (accepted offer in marketing compaing)

**Goals**: 
 
    1) Find out how does Education level affect Income 

    2) Discover how education level might affect sensitivity to marketing campaign

    3) How does consumption preferences (Amount Spent on Meat, Wine, Fish, Fruits etc) reflect sensitivity to marketing?

**Relevant visualizations**:

    1) Barchart of education level as summary, separate histograms of Income for each Edlevel
    
    2) Barchart or scatterplot (to emphasize that groups are quite disbalanced)

    3) Violin plots for positive/negative response groups, Correlation plot for different purchases

# Visualization prototypes
### Week 2: Sketching Your Data

Your Module 1 discussion post identified some high-level goals for working with a dataset of interest to you. In this post, you will expand on those goals to characterize your target problem and develop some low-fidelity prototypes for working with that data. First, identify two to three tasks you would wish to complete with your data.

Then, sketch a set of preliminary low-fidelity prototypes for addressing these tasks with the given data. You may either sketch freeform or use the Five Design Sheets approach to generate these prototypes (hand-sketched on paper is fine). Upload a copy of your sketches as part of your post. 



#### Define visual tasks
  **Task 1**:
   Find out relationship between Education level and Income of a customer
   
   ``Why`` is task pursued? - to confirm the initial hypotheses for our customers that higher education level corresponds to higher income in future.

   ``How`` is task conducted? - by looking at the average income of people with different Education level and by comparing distributions of Income for levels

   ``What`` does task seek to learn about data? - whether our customers distributed according to the rule "Higher Education - higher income" or we have different situations due to specificity of our bussiness

   ``Where`` does the task operate? - the task operates in (Education level, Income) space

   ``When`` does the task performed? - the taks is performed whenever management want to confirm their initial hypotheses

   ``Who`` is executing the task? - Data Analyst and Managers that make strategic decisions

   5D tuple representation:
   **(confirmatory, compare,  distributions, attr(Education)| attr(Income), all)**


   ****

  **Task 2**:
   Find out how Education level affects sensitivity to marketing campaign
   
   ``Why`` is task pursued? - to discover which part of audience is affected most and take actions depending on result

   ``How`` is task conducted? - by observing what people form positive repsonse group and by comparing average response

   ``What`` does task seek to learn about data? - distribution of Education level for positive and negative response

   ``Where`` does the task operate? - the task operates in (Education level, Response) space

   ``When`` does the task performed? - the taks is performed when marketing specialist analyze what is the target group

   ``Who`` is executing the task? - Managers and Marketing department

   5D tuple representation:
   **(explore, compare|search,  distributions, attr(Education)| attr(Response), all)**

   ****
   

  **Task 3**:
   Discover preferences in spending money among groups that has positive/negative response
   
   ``Why`` is task pursued? - to find out how to improve marketing campaign by adjusting it to preferences

   ``How`` is task conducted? - by comparing distributions of money spent on different kind of products (Wine, Meat, Fish, Sweets etc)

   ``What`` does task seek to learn about data? - distribution of Spent Money among positive/negative groups

   ``Where`` does the task operate? - the task operates in (Education level, Response) space

   ``When`` does the task performed? - the taks is performed when marketing specialist analyze what is the target group

   ``Who`` is executing the task? - Managers and Marketing department

   5D tuple representation:
   **(explore,
      compare|search,
      averages, 
      attrs('MntWines', 'MntFruits',
       'MntMeatProducts', 'MntFishProducts', 'MntSweetProducts',
       'MntGoldProds', "Response"), all)**

   ****
   

#### Hand-written sketches

<img src="sketches/task1.jpeg" alt="Image" width="600" height="400">
<img src="sketches/task2.jpeg" alt="Image" width="600" height="400">


<img src="sketches/task3.jpeg" alt="Image" width="600" height="400">
