### Feature engineering 
Capture customer behavior (spending, contract length)

Reflect customer type (demographics, dependents, partner)

Describe service usage (addons, internet type)

Add business logic insight (e.g., average spend, service bundles)

In [1]:
## Load the CSV File First
import pandas as pd

# Load your saved feature-engineered file
df = pd.read_csv("customer_clv_feature_engineered.csv")

### Not all customers have the same tenure. This feature tells us how much a customer spends on average per month, no matter how long they’ve stayed.

In [3]:
##Create AvgMonthlySpend
df['AvgMonthlySpend'] = df['TotalCharges'] / df['tenure']
df['AvgMonthlySpend'].fillna(0, inplace=True)  # Handle divide-by-zero

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['AvgMonthlySpend'].fillna(0, inplace=True)  # Handle divide-by-zero


### tenure is numeric, but grouping it into ranges (e.g. 0–12 months, 13–24 months) shows customer lifecycle stages.

In [4]:
##  3. Create TenureGroup (binned column)
df['TenureGroup'] = pd.cut(df['tenure'],
                           bins=[0, 12, 24, 48, 60, 72],
                           labels=['0-12m', '13-24m', '25-48m', '49-60m', '61-72m'])

### Machine learning models don’t understand text. They need numbers.

In [5]:
##  4. Encode Add-on Service Columns
addon_cols = ['OnlineSecurity', 'OnlineBackup', 'DeviceProtection',
              'TechSupport', 'StreamingTV', 'StreamingMovies']

for col in addon_cols:
    df[col] = df[col].map({'Yes': 1, 'No': 0})

### The more services a customer uses, the more invested they are. This usually leads to higher CLV.
## Instead of analyzing 6 separate features, we count how many optional services each customer uses.

In [6]:
##  5. Create TotalAddons
df['TotalAddons'] = df[addon_cols].sum(axis=1)

In [7]:
#6. One-Hot Encode TenureGroup
df = pd.get_dummies(df, columns=['TenureGroup'], drop_first=True)

In [8]:
### Create Binary Flags
# Create binary flags based on existing one-hot columns
df['HasFiberOptic'] = df['InternetService_Fiber optic'].astype(int)
df['IsAutoPay'] = df['PaymentMethod_Credit card (automatic)'].astype(int)
df['IsSenior'] = df['SeniorCitizen']  # Already 0/1

In [10]:
##
# One-hot encode remaining categorical columns
categorical_cols = ['MultipleLines', 'Contract', 'PaymentMethod', 'InternetService']
df = pd.get_dummies(df, columns=categorical_cols, drop_first=True)

# Save final version
df.to_csv("customer_clv_model_ready.csv", index=False)