In [None]:
import pandas as pd
import sqlite3
import numpy as np

# Load the Excel file into a pandas DataFrame
data = pd.read_excel('Markov.xlsx', sheet_name='Sheet1', usecols='B:K', skiprows=29, nrows=80)

# Connect to SQLite3 database and create a table
conn = sqlite3.connect('Markov.db')
data.to_sql('markov_data', conn, if_exists='replace', index=False)

# Define parameters
initial_recency = 1
original_frequency = 1
wacc = 0.03
mean_profit = 50
std_dev_profit = 10
mailing_cost = 5

# Function to simulate customer behavior for one period
def simulate_period(recency, frequency, is_active):
    if not is_active:
        return recency, frequency, is_active, 0, 0
    
    # Retrieve probability from SQLite3 based on recency and frequency
    query = f"SELECT I FROM markov_data WHERE C={recency} AND D={frequency}"
    prob = pd.read_sql_query(query, conn).values[0][0]
    
    # Determine if the customer orders
    orders = 1 if np.random.rand() < prob else 0
    
    # Book mailing cost and profit
    mailing_cost_booked = mailing_cost if is_active else 0
    profit = np.random.normal(mean_profit, std_dev_profit) if orders else 0
    total_profit = profit - mailing_cost_booked
    
    # Update recency and frequency
    recency = 1 if orders else recency + 1
    frequency = min(5, frequency + orders)
    is_active = recency < 24
    
    return recency, frequency, is_active, total_profit, orders

# Run the simulation for 80 periods
num_periods = 80
recency = initial_recency
frequency = original_frequency
is_active = True
profits = []

for _ in range(num_periods):
    recency, frequency, is_active, total_profit, _ = simulate_period(recency, frequency, is_active)
    profits.append(total_profit)

# Calculate the net present value of profits
npv = np.npv(wacc, profits)

print(f"Net Present Value of Customer: ${npv:.2f}")

Explanation:
1. Import necessary libraries: pandas, sqlite3, and numpy
2. Load the Excel file into a pandas DataFrame, specifying the sheet name, columns, and rows to read
3. Connect to an SQLite3 database and create a table from the DataFrame
4. Define simulation parameters
5. Create a function to simulate customer behavior for one period:
   - Retrieve the probability from SQLite3 based on recency and frequency
   - Determine if the customer orders using a random number and the probability
   - Book mailing cost and profit based on the order status
   - Update recency and frequency based on the order status
   - Check if the customer is still active (recency < 24)
6. Run the simulation for 80 periods, updating recency, frequency, and activity status in each period
7. Calculate the net present value of the profits using numpy's npv function
8. Print the net present value of the customer

The Python code loads the necessary data from the Excel file, converts it into an SQLite3 database, and performs the Monte Carlo simulation using the same logic as the Excel steps. It demonstrates how to integrate SQLite3 with Python to efficiently handle larger datasets and perform the required calculations.

---

--- 

Title: Simulating Groupon Deal Outcomes in Python with SQLite3 and Matplotlib
Content:

In [None]:
import pandas as pd
import sqlite3
import numpy as np
import matplotlib.pyplot as plt

# Load the Excel file into a pandas DataFrame
data = pd.read_excel('Groupon.xlsx', sheet_name='Sheet1', usecols='B:G', skiprows=2, nrows=17)

# Connect to SQLite3 database and create tables
conn = sqlite3.connect('Groupon.db')
data.to_sql('parameters', conn, if_exists='replace', index=False)

# Define simulation parameters and results list
num_iterations = 10000
results = []

# Run the Monte Carlo simulation
for _ in range(num_iterations):
    # Retrieve random values from SQLite3
    query = "SELECT * FROM parameters WHERE ROWID = (ABS(RANDOM()) % (SELECT COUNT(*) FROM parameters) + 1)"
    params = pd.read_sql_query(query, conn)
    
    # Calculate net gain using the run_simulation() function from previous code
    net_gain = run_simulation()
    results.append(net_gain)

# Calculate average profit and probability of success
avg_profit = np.mean(results)
prob_success = np.mean([r > 0 for r in results])

print(f"Average Profit per 100 Deal Takers: ${avg_profit:.2f}")
print(f"Probability of Profitable Deal: {prob_success:.2%}")

# Create a histogram of the results
plt.figure(figsize=(10, 6))
plt.hist(results, bins=20, edgecolor='black', alpha=0.7)
plt.axvline(x=0, color='red', linestyle='--', label='Breakeven')
plt.xlabel('Net Gain')
plt.ylabel('Frequency')
plt.title('Distribution of Simulated Groupon Deal Outcomes')
plt.legend()
plt.tight_layout()
plt.show()

**Explanation**:
- Import necessary libraries, load Excel data into a pandas DataFrame, and create an SQLite3 database table
- Define simulation parameters and create an empty list to store results
- Run the Monte Carlo simulation for the specified number of iterations:
- Retrieve random parameter values from SQLite3
- Calculate net gain using the previously defined run_simulation() function
- Append the net gain to the results list
- Calculate and print the average profit per 100 deal takers and the probability of a profitable deal
- Create a histogram of the simulation results using Matplotlib:
- Set the figure size and create a histogram with 20 bins
- Add a vertical line representing the breakeven point
- Label the axes and add a title
- Display the legend and adjust the layout for better readability

**Show the plot**:
The Python code follows the same logic as the Excel steps but utilizes SQLite3 for parameter storage and Matplotlib for histogram creation. By running the simulation 10,000 times and storing the results in a list, we can easily calculate summary statistics and visualize the distribution of outcomes.

---

### Title: Uncovering Customer Insights with RFM Analysis  
Subtitle: Predicting Response Rates and Profitability
Content:
- **What is RFM Analysis?**
  - Customers rated 1-5 on Recency, Frequency, and Monetary value
  - Higher scores indicate greater likelihood to purchase
  - Profitable RFM combinations targeted for future mailings
- **Limitations of Traditional RFM Analysis**
  - *Loss of valuable information due to 1-5 coding*
    - E.g., two "5" customers may have spent vastly different amounts
  - *Exact RFM Analysis: A more complex but profitable approach*
    - Utilizes precise R, F, and M values to guide customer selection


### Title: Calculating R, F, and M for Effective Customer Segmentation
Subtitle: A Step-by-Step Guide to RFM Analysis in Excel
Content:
- **Step 1: Compute key metrics for each customer**
  - Most recent transaction date
  - Number of transactions per year
  - Average amount purchased per year
- **Step 2: Determine customer rankings on R, F, and M**
  - Use RANK function to assign ranks based on metrics
  - Convert ranks to 1-5 ratings using VLOOKUP and a lookup table
- **Example: Analyzing J.Crew's customer data**
  - 5,000 customers, 100,000 sales transactions
  - RFMexample.xlsx used to illustrate the process


#### Explanation:
1. Load the Excel data into a pandas DataFrame and create an SQLite3 database table.
2. Use SQL queries to calculate the most recent transaction date, first transaction date, years with company, monetary value, and frequency for each customer.
3. Merge the resulting DataFrames and calculate average monetary value and frequency per year.
4. Rank customers on Recency, Frequency, and Monetary value using the rank() function.
5. Convert ranks to 1-5 ratings using a custom function.
6. Print the resulting RFM ratings for the first few customers.

This Python script demonstrates how to perform an RFM analysis using SQLite3 for efficient data manipulation and pandas for data analysis. The process follows the same general steps as the Excel approach but leverages the power of SQL queries and Python functions to streamline the calculations.

In [None]:
import pandas as pd
import sqlite3

# Load the Excel file into a pandas DataFrame
data = pd.read_excel('RFMexample.xlsx', sheet_name='Sheet1', usecols='F:J', skiprows=5, nrows=100000)

# Connect to SQLite3 database and create a table
conn = sqlite3.connect('RFMexample.db')
data.to_sql('transactions', conn, if_exists='replace', index=False)

# Calculate most recent transaction date for each customer
query = '''
SELECT Customer, MAX(Date) AS Most_recent 
FROM transactions
GROUP BY Customer
'''
most_recent = pd.read_sql_query(query, conn)

# Calculate first transaction date for each customer
query = '''
SELECT Customer, MIN(Date) AS Start_Date
FROM transactions
GROUP BY Customer
'''
start_date = pd.read_sql_query(query, conn)

# Calculate years with company for each customer
most_recent['Present'] = pd.to_datetime('2014-01-01')
merged = most_recent.merge(start_date, on='Customer')
merged['Years_with_Us'] = (merged['Present'] - merged['Start_Date']).dt.days / 365

# Calculate monetary value and frequency for each customer
query = '''
SELECT Customer, SUM(Amount) AS Monetary_Value, COUNT(*) AS Total_Transactions
FROM transactions
GROUP BY Customer
'''
monetary_freq = pd.read_sql_query(query, conn)

# Merge DataFrames and calculate average monetary value and frequency per year
rfm_data = merged.merge(monetary_freq, on='Customer')
rfm_data['Monetary_Value'] = rfm_data['Monetary_Value'] / rfm_data['Years_with_Us']
rfm_data['Frequency'] = rfm_data['Total_Transactions'] / rfm_data['Years_with_Us']

# Rank customers on R, F, and M
rfm_data['Rank_R'] = rfm_data['Most_recent'].rank(ascending=False)
rfm_data['Rank_F'] = rfm_data['Frequency'].rank(ascending=False)  
rfm_data['Rank_M'] = rfm_data['Monetary_Value'].rank(ascending=False)

# Convert ranks to 1-5 ratings
def rank_to_rating(rank):
    if rank <= 1000:
        return 1
    elif rank <= 2000:
        return 2
    elif rank <= 3000:
        return 3
    elif rank <= 4000:  
        return 4
    else:
        return 5

rfm_data['R'] = rfm_data['Rank_R'].apply(rank_to_rating)
rfm_data['F'] = rfm_data['Rank_F'].apply(rank_to_rating)
rfm_data['M'] = rfm_data['Rank_M'].apply(rank_to_rating)

print(rfm_data[['Customer', 'R', 'F', 'M']].head())

Title: Identifying Profitable RFM Combinations for Targeted Mailings
Subtitle: Maximizing ROI through Break-Even Analysis 
Content:
- **Break-Even Analysis in RFM Targeting**
  - Mail to segments where Response_Rate > Mailcost/Profit
  - *Example: J.Crew's break-even criteria*
    - Profit per order: $20, Mailing cost: $0.50
    - Break-even response rate: 0.50/20 = 2.5%
    - Mail to RFM combinations with >5% response rate (2x break-even)
- **Determining Profitable RFM Combinations**
  - List all 125 possible RFM combinations (1 1 1 to 5 5 5)
  - Calculate response rates for each combination
  - Highlight combinations meeting profitability criteria


Title: Ensuring Statistical Significance in RFM Analysis
Subtitle: The Importance of Adequate Sample Sizes
Content:
- **The Challenge of Small Databases**
  - Limited observations in each RFM combination
  - Difficult to accurately estimate response rates
- **Solution: Reduce the Number of Categories**
  - Create terciles (3 categories) for R, F, and M
  - Ensures sufficient observations in each cell
  - Improves reliability of response rate estimates
- **Key Takeaways**
  - Sample size matters for statistical significance
  - Adapt analysis to database size and characteristics
  - Balance granularity and reliability in segmentation


Title: Identifying Profitable RFM Combinations in Excel
Content:
1. In the range AC14:AE138, list the 125 possible RFM combinations from 1 1 1 through 5 5 5.
2. In AG14, enter the formula =COUNTIFS(R_,AC14,F,AD14,M,AE14) and copy to AG15:AG138 to count the number of customers in each RFM category.
3. In AH14, enter the formula =COUNTIFS(R_,AC14,F,AD14,M,AE14,actualrresponse,1) and copy to AH15:AH138 to calculate the number of customers in each RFM combination that responded to the last mailing.
4. In AF14, enter the formula =IFERROR(AH14/AG14,0) and copy to AF15:AF138 to compute the response rate for each RFM combination.
5. Use Excel's Conditional Formatting to highlight RFM combinations with a response rate of at least 5%:
   a. Select the range AC14:AF138.
   b. Go to Home > Conditional Formatting > New Rule.
   c. Choose "Use a Formula to determine which cells to format."
   d. Enter the formula =$AF14>=0.05.
   e. Click "Format" and choose a fill color (e.g., orange) from the Fill tab.
6. Analyze the highlighted cells to identify profitable RFM combinations for targeted mailings.


Explanation:
1. Load the Excel data into a pandas DataFrame and create an SQLite3 database table.
2. Generate a list of all possible RFM combinations using a list comprehension.
3. For each RFM combination, query the database to calculate the total number of customers and responders.
4. Compute the response rate for each combination, handling cases where the total number of customers is zero.
5. Create a new DataFrame with the RFM combinations and their corresponding response rates.
6. Filter the DataFrame to identify profitable combinations (response rate >= 5%).
7. Print the profitable RFM combinations.

This Python script demonstrates how to perform the RFM profitability analysis using SQLite3 for data storage and pandas for data manipulation. By leveraging SQL queries and pandas filtering, we can efficiently identify the RFM combinations that meet our profitability criteria.

In [None]:
import pandas as pd
import sqlite3

# Load the Excel file into a pandas DataFrame
data = pd.read_excel('RFMexample.xlsx', sheet_name='Sheet1', usecols='O:AA', skiprows=5, nrows=5001)

# Connect to SQLite3 database and create tables
conn = sqlite3.connect('RFMexample.db')
data.to_sql('rfm_data', conn, if_exists='replace', index=False)

# List all possible RFM combinations
rfm_combinations = [(r, f, m) for r in range(1, 6) for f in range(1, 6) for m in range(1, 6)]

# Calculate response rates for each RFM combination
response_rates = []

for combo in rfm_combinations:
    r, f, m = combo
    query = f'''
        SELECT 
            COUNT(*) AS total_customers,
            SUM(N) AS total_responders
        FROM rfm_data
        WHERE R_ = {r} AND F = {f} AND M = {m}
    '''
    result = pd.read_sql_query(query, conn)
    total_customers = result['total_customers'].values[0]
    total_responders = result['total_responders'].values[0]
    response_rate = total_responders / total_customers if total_customers > 0 else 0
    response_rates.append(response_rate)

# Create a DataFrame with RFM combinations and response rates
rfm_response_rates = pd.DataFrame({'R': [c[0] for c in rfm_combinations],
                                   'F': [c[1] for c in rfm_combinations],
                                   'M': [c[2] for c in rfm_combinations],
                                   'Response_Rate': response_rates})

# Highlight profitable RFM combinations (response rate >= 5%)
profitable_combinations = rfm_response_rates[rfm_response_rates['Response_Rate'] >= 0.05]

print("Profitable RFM Combinations:")
print(profitable_combinations)

---

Title: Optimizing Direct Mail Campaigns
Subtitle: Leveraging Evolutionary Solver for Data-Driven Decisions

Content:
- How can we maximize revenue from targeted mailing campaigns?
  - Create a scoring rule based on customer purchase frequency and amount spent
  - Mail to the top 10% of customers based on their scores
- Evolutionary Solver finds optimal weights for the scoring rule
  - Objective: Maximize revenue from selected customers
  - Constraints: Weights between 0.01 and 10


Title: Implementing the Evolutionary Solver in Excel

Content:
1. Enter trial weights in cells C10 and D10
2. In cell F12, enter the formula: =$C$10*C12+$D$10*D12
   - Copy the formula to cells F13:F3045 to generate scores for each customer
3. In cell G8, enter the formula: =PERCENTILE(F12:F3045,0.9)
   - This determines the 90th percentile of the scores
4. In cell G12, enter the formula: =IF(F12>$G$8,1,0)
   - Copy the formula to cells G13:G3045 to identify top 10% scores
5. In cell G10, enter the formula: =SUMPRODUCT(G12:G3045, E12:E3045)
   - This computes the total revenue from the top 10% of scores
6. Set up the Evolutionary Solver:
   - Objective: Maximize cell G10 (revenue from top 10%)
   - Variables: Cells C10 and D10 (weights)
   - Constraints: C10 and D10 between 0.01 and 10
   - Solving method: Evolutionary



This Python script demonstrates how to:
1. Load the Excel data into a pandas DataFrame
2. Create a SQLite3 database and write the DataFrame to it
3. Retrieve data from the SQLite3 database using a SQL query
4. Define a scoring function based on the optimal weights found by the Evolutionary Solver
5. Apply the scoring function to each row in the DataFrame
6. Determine the 90th percentile score and flag customers in the top 10%
7. Calculate the total revenue generated by the top 10% of customers

By using Python and SQLite3, we can automate the process of scoring customers and selecting the top prospects for our direct mail campaign. This approach scales well to larger datasets and allows for easy integration with other data sources and marketing tools.

---

In [None]:
import pandas as pd
import sqlite3

# Load the Excel file into a DataFrame
data = pd.read_excel('RFMtop10%.xlsx', sheet_name='Sheet1', usecols='C:F', skiprows=11, nrows=3034)

# Create a SQLite3 database and write the DataFrame to it
conn = sqlite3.connect('rfm_data.db')
data.to_sql('rfm_data', conn, index=False)

# Retrieve data from the SQLite3 database
query = '''
    SELECT *
    FROM rfm_data
'''
df = pd.read_sql_query(query, conn)

# Define the scoring function
def score(freq_weight, amt_weight, freq, amt):
    return freq_weight * freq + amt_weight * amt

# Apply the scoring function to each row
df['score'] = df.apply(lambda x: score(7.92, 0.13, x['Number of purchases Jan-June 2014'], x['Amt bought June 2014 $']), axis=1)

# Determine the 90th percentile score
percentile_90 = df['score'].quantile(0.9)

# Flag customers with scores in the top 10%
df['top_10_pct'] = df['score'].apply(lambda x: 1 if x > percentile_90 else 0)

# Calculate total revenue from the top 10% of customers
revenue_top_10_pct = df[df['top_10_pct'] == 1]['Amt spent July-Dec 2014 $'].sum()

print(f"Total revenue from top 10% of customers: ${revenue_top_10_pct:,.2f}")