For predicting customer churn from the Telco dataset, three widely used machine learning models that typically perform well for binary classification tasks are:

1. **Logistic Regression:**
   - **Why Choose It:** Logistic Regression is a straightforward model that works well for binary classification problems. It's interpretable, fast to train, and provides probabilities for outcomes, which can be useful for setting thresholds for actions.
   - **Suitability:** It performs well when the relationship between the independent variables and the log-odds of the dependent variable is linear, and it's a good baseline model to start with in binary classification.

2. **Random Forest Classifier:**
   - **Why Choose It:** Random Forest is a robust ensemble technique that uses multiple decision trees to make predictions, reducing the risk of overfitting. It can handle both numerical and categorical data, is robust to outliers, and can model non-linear relationships.
   - **Suitability:** It's effective in high dimensional spaces as well as large data sets and provides feature importance scores, which help in understanding the most significant predictors of churn.

3. **Gradient Boosting Classifier (e.g., XGBoost, LightGBM):**
   - **Why Choose It:** Gradient Boosting Machines (GBMs) are powerful ensemble learning techniques that build trees one at a time, where each new tree helps to correct errors made by previously trained trees. They have been known to deliver high accuracy in many binary classification problems.
   - **Suitability:** GBMs can handle missing data and, like Random Forest, provide feature importance. They are highly flexible and can be optimized to achieve better performance through hyperparameter tuning.

### Considerations for Model Choice:
- **Data Size and Quality:** Logistic Regression requires less computational resources, making it suitable for smaller or less complex datasets, while Random Forest and GBM are better for larger datasets.
- **Interpretability:** Logistic Regression offers the best interpretability. If understanding the influence of each predictor is important, this might be preferable.
- **Accuracy vs Speed:** GBMs often provide the best accuracy but at the cost of increased computational time and complexity. Random Forest strikes a balance between accuracy and training speed.

Next steps would involve preparing the data for these models, selecting features based on importance and relevance to churn, and then tuning each model to compare their performance accurately.

Neural networks are another powerful option for predicting customer churn, especially when dealing with large datasets and complex relationships among features. Here’s why you might consider using a neural network for churn prediction:

### Advantages of Using Neural Networks:
1. **Complex Pattern Recognition:**
   - Neural networks excel at identifying complex patterns and interactions between features that may not be easily captured by traditional machine learning models.
  
2. **Handling High-Dimensional Data:**
   - They can manage large volumes of data with many features, making them suitable for datasets that include various customer behaviors and attributes.
  
3. **Flexibility and Customization:**
   - The architecture of a neural network can be customized and tuned, including the number of hidden layers, the number of neurons in each layer, activation functions, etc., to optimize performance.

4. **Scalability:**
   - They scale well with additional data, often improving in accuracy as more training data becomes available.

### Considerations When Using Neural Networks for Churn Prediction:
1. **Data Requirements:**
   - Neural networks generally require larger datasets to perform well without overfitting compared to simpler models. They may also require more preprocessing, such as normalization of all input features.

2. **Complexity and Training Time:**
   - They are typically more computationally intensive and take longer to train than most traditional machine learning models, especially as you scale up the number of layers and units.

3. **Interpretability:**
   - Unlike models like logistic regression or decision trees, neural networks are often considered "black boxes" because it can be difficult to interpret exactly how the model is making its decisions.

4. **Overfitting Risk:**
   - There is a higher risk of overfitting with neural networks, particularly if the network is too complex or if not enough training data is available. Techniques such as dropout, regularization, and proper validation can help mitigate this risk.

### Typical Neural Network Setup for Churn Prediction:
- **Input Layer:** Should have as many neurons as the number of features in the dataset.
- **Hidden Layers:** Depending on the complexity, one or more hidden layers can be added. For churn prediction, starting with one or two layers is common.
- **Activation Functions:** ReLU (Rectified Linear Activation) is a common choice for hidden layers due to its efficiency, while the output layer might use a sigmoid function for binary classification to output probabilities.
- **Output Layer:** For binary classification (churn or not), the output layer should have a single neuron.
- **Loss Function:** Binary cross-entropy is typically used for binary classification tasks.
- **Optimizer:** Adam or SGD (Stochastic Gradient Descent) are commonly used to minimize the loss function.

Incorporating a neural network into your churn prediction pipeline could potentially increase your model's performance, particularly if tuned correctly and provided with enough data. However, consider the trade-offs between performance, training time, and interpretability.

# 1 daa prep just if u missed Part 1 main topic 


In [None]:
# do not execute for reference 

In [3]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
df= pd.read_csv('D03Mainfully_processed_telco_data.csv')
# Drop non-numeric columns explicitly if any, e.g., 'customerID' if it's not encoded
if 'customerID' in df.columns:
    df = df.drop(columns=['customerID'])

# Assuming 'Churn_Yes' is the target and already encoded as numeric
X = df.drop('Churn_Yes', axis=1)
y = df['Churn_Yes']

# Identify numeric columns only for scaling
numeric_cols = X.select_dtypes(include=['float64', 'int64']).columns

# Splitting data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a StandardScaler object
scaler = StandardScaler()

# Apply scaling only to numeric columns
X_train_scaled = X_train.copy()
X_test_scaled = X_test.copy()

X_train_scaled[numeric_cols] = scaler.fit_transform(X_train[numeric_cols])
X_test_scaled[numeric_cols] = scaler.transform(X_test[numeric_cols])

# 2 Logistic Regression Model

In [4]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Load data
df = pd.read_csv('D02fully_processed_telco_data.csv')

# Assuming 'Churn_Yes' is the target
X = df.drop('Churn_Yes', axis=1)
y = df['Churn_Yes']

# Identify numeric columns only for scaling
numeric_cols = X.select_dtypes(include=['float64', 'int64']).columns

# Splitting data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale the numeric columns
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train[numeric_cols])
X_test_scaled = scaler.transform(X_test[numeric_cols])

# Impute any missing values
imputer = SimpleImputer(strategy='median')
X_train_imputed = imputer.fit_transform(X_train_scaled)
X_test_imputed = imputer.transform(X_test_scaled)

# Create and train Logistic Regression model
log_reg = LogisticRegression(random_state=42)
log_reg.fit(X_train_imputed, y_train)

# Predict and evaluate the model
log_reg_predictions = log_reg.predict(X_test_imputed)
log_reg_accuracy = accuracy_score(y_test, log_reg_predictions)

print("Logistic Regression Accuracy:", log_reg_accuracy)
print(classification_report(y_test, log_reg_predictions))


Logistic Regression Accuracy: 0.8069552874378992
              precision    recall  f1-score   support

       False       0.83      0.93      0.88      1036
        True       0.70      0.48      0.57       373

    accuracy                           0.81      1409
   macro avg       0.76      0.70      0.72      1409
weighted avg       0.80      0.81      0.79      1409



# 3 Random forest 

In [5]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report

# Load data
df = pd.read_csv('D02fully_processed_telco_data.csv')

# Assuming 'Churn_Yes' is the target
X = df.drop('Churn_Yes', axis=1)
y = df['Churn_Yes']

# Identify numeric columns for scaling
numeric_cols = X.select_dtypes(include=['float64', 'int64']).columns

# Splitting data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale the numeric columns
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train[numeric_cols])
X_test_scaled = scaler.transform(X_test[numeric_cols])

# Impute any missing values
imputer = SimpleImputer(strategy='median')
X_train_imputed = imputer.fit_transform(X_train_scaled)
X_test_imputed = imputer.transform(X_test_scaled)

# Create and train the Random Forest model
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
rf_classifier.fit(X_train_imputed, y_train)

# Predict on the test data
rf_predictions = rf_classifier.predict(X_test_imputed)

# Evaluate the model
rf_accuracy = accuracy_score(y_test, rf_predictions)
print("Random Forest Accuracy:", rf_accuracy)
print(classification_report(y_test, rf_predictions))


Random Forest Accuracy: 0.7665010645848119
              precision    recall  f1-score   support

       False       0.82      0.88      0.85      1036
        True       0.57      0.46      0.51       373

    accuracy                           0.77      1409
   macro avg       0.70      0.67      0.68      1409
weighted avg       0.75      0.77      0.76      1409



In [None]:
# for n = 500 

In [6]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report

# Load data
df = pd.read_csv('D02fully_processed_telco_data.csv')

# Assuming 'Churn_Yes' is the target
X = df.drop('Churn_Yes', axis=1)
y = df['Churn_Yes']

# Identify numeric columns for scaling
numeric_cols = X.select_dtypes(include=['float64', 'int64']).columns

# Splitting data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale the numeric columns
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train[numeric_cols])
X_test_scaled = scaler.transform(X_test[numeric_cols])

# Impute any missing values
imputer = SimpleImputer(strategy='median')
X_train_imputed = imputer.fit_transform(X_train_scaled)
X_test_imputed = imputer.transform(X_test_scaled)

# Create and train the Random Forest model
rf_classifier = RandomForestClassifier(n_estimators=500, random_state=42)
rf_classifier.fit(X_train_imputed, y_train)

# Predict on the test data
rf_predictions = rf_classifier.predict(X_test_imputed)

# Evaluate the model
rf_accuracy = accuracy_score(y_test, rf_predictions)
print("Random Forest Accuracy:", rf_accuracy)
print(classification_report(y_test, rf_predictions))


Random Forest Accuracy: 0.7693399574166075
              precision    recall  f1-score   support

       False       0.82      0.87      0.85      1036
        True       0.58      0.48      0.52       373

    accuracy                           0.77      1409
   macro avg       0.70      0.68      0.69      1409
weighted avg       0.76      0.77      0.76      1409



# 4 XGBoost¶

4)Using XGBoost for churn prediction is a great choice due to its performance in handling various types of data, its ability to manage missing values, and its effectiveness in binary classification tasks. Here’s how you can set up an XGBoost model, from data preparation to training and evaluation:

XGBoost Model Setup XGBoost (eXtreme Gradient Boosting) is an implementation of gradient boosted decision trees designed for speed and performance. It is often used for its performance and flexibility in machine learning competitions.

In [4]:
!pip install xgboost



In [7]:
import pandas as pd
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.metrics import accuracy_score, classification_report

# Load data
df = pd.read_csv('D02fully_processed_telco_data.csv')

# Assuming 'Churn_Yes' is the target
X = df.drop('Churn_Yes', axis=1)
y = df['Churn_Yes']

# Identify numeric columns for scaling
numeric_cols = X.select_dtypes(include=['float64', 'int64']).columns

# Splitting data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale the numeric columns
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train[numeric_cols])
X_test_scaled = scaler.transform(X_test[numeric_cols])

# Impute any missing values
imputer = SimpleImputer(strategy='median')
X_train_imputed = imputer.fit_transform(X_train_scaled)
X_test_imputed = imputer.transform(X_test_scaled)

# Convert data to DMatrix object, which is optimized for XGBoost
dtrain = xgb.DMatrix(X_train_imputed, label=y_train)
dtest = xgb.DMatrix(X_test_imputed, label=y_test)

# Define XGBoost model parameters
params = {
    'max_depth': 3,  # the maximum depth of each tree
    'eta': 0.1,      # the training step for each iteration
    'objective': 'binary:logistic',  # binary classification
    'eval_metric': 'logloss',  # evaluation metric
    'seed': 42       # for reproducible results
}

# Train the XGBoost model
bst = xgb.train(params, dtrain, num_boost_round=100)

# Predict on the test set
y_pred_proba = bst.predict(dtest)
y_pred = (y_pred_proba > 0.5).astype(int)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print("XGBoost Accuracy:", accuracy)
print(classification_report(y_test, y_pred))


XGBoost Accuracy: 0.8026969481902059
              precision    recall  f1-score   support

       False       0.83      0.92      0.87      1036
        True       0.69      0.47      0.56       373

    accuracy                           0.80      1409
   macro avg       0.76      0.70      0.72      1409
weighted avg       0.79      0.80      0.79      1409



In [None]:
# increainsg the dept by 5 

In [8]:
import pandas as pd
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.metrics import accuracy_score, classification_report

# Load data
df = pd.read_csv('D02fully_processed_telco_data.csv')

# Assuming 'Churn_Yes' is the target
X = df.drop('Churn_Yes', axis=1)
y = df['Churn_Yes']

# Identify numeric columns for scaling
numeric_cols = X.select_dtypes(include=['float64', 'int64']).columns

# Splitting data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale the numeric columns
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train[numeric_cols])
X_test_scaled = scaler.transform(X_test[numeric_cols])

# Impute any missing values
imputer = SimpleImputer(strategy='median')
X_train_imputed = imputer.fit_transform(X_train_scaled)
X_test_imputed = imputer.transform(X_test_scaled)

# Convert data to DMatrix object, which is optimized for XGBoost
dtrain = xgb.DMatrix(X_train_imputed, label=y_train)
dtest = xgb.DMatrix(X_test_imputed, label=y_test)

# Define XGBoost model parameters
params = {
    'max_depth': 5,  # the maximum depth of each tree
    'eta': 0.1,      # the training step for each iteration
    'objective': 'binary:logistic',  # binary classification
    'eval_metric': 'logloss',  # evaluation metric
    'seed': 42       # for reproducible results
}

# Train the XGBoost model
bst = xgb.train(params, dtrain, num_boost_round=100)

# Predict on the test set
y_pred_proba = bst.predict(dtest)
y_pred = (y_pred_proba > 0.5).astype(int)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print("XGBoost Accuracy:", accuracy)
print(classification_report(y_test, y_pred))


XGBoost Accuracy: 0.8041163946061036
              precision    recall  f1-score   support

       False       0.83      0.92      0.87      1036
        True       0.69      0.48      0.56       373

    accuracy                           0.80      1409
   macro avg       0.76      0.70      0.72      1409
weighted avg       0.79      0.80      0.79      1409



Explanation:
Data Handling: XGBoost can handle missing values internally, so technically, you don't need to impute them (but you can if consistency with other models is necessary).

DMatrix: XGBoost uses a DMatrix, an internal data structure optimized for both memory efficiency and training speed.

Model Parameters:
max_depth controls the depth of the trees. Deeper trees can model more complex patterns but can lead to overfitting.
eta is the learning rate. Smaller values make the boosting process more conservative.
objective specifies the learning task and the corresponding learning objective. For binary classification, it is set to binary:logistic.
eval_metric is used to evaluate the training performance, logloss is typical for classification.
Training and Prediction: Training is done using xgb.train, and predictions are made where probabilities greater than 0.5 are considered class 1 (churn).

Evaluation: The model's accuracy and other classification metrics are calculated and printed.
Additional Considerations:
Hyperparameter Tuning: XGBoost performance can significantly improve by tuning hyperparameters like max_depth, min_child_weight, subsample, and colsample_bytree. Consider using tools like GridSearchCV or RandomizedSearchCV for this purpose.

Cross-validation: XGBoost supports k-fold cross-validation via the xgb.cv method, which can be useful for more robust model evaluation.
This setup provides a strong starting point for using XGBoost in your churn prediction task, with flexibility to adapt the model as needed based on your specific dataset characteristics and business objectives.

# if by chance more rows were there , than XG boost would be best than Logistics 

# 5 models - neural nets 

In [8]:
!pip install tensorflow

Collecting tensorflow
  Downloading tensorflow-2.17.0-cp311-cp311-win_amd64.whl.metadata (3.2 kB)
Collecting tensorflow-intel==2.17.0 (from tensorflow)
  Downloading tensorflow_intel-2.17.0-cp311-cp311-win_amd64.whl.metadata (5.0 kB)
Collecting absl-py>=1.0.0 (from tensorflow-intel==2.17.0->tensorflow)
  Downloading absl_py-2.1.0-py3-none-any.whl.metadata (2.3 kB)
Collecting astunparse>=1.6.0 (from tensorflow-intel==2.17.0->tensorflow)
  Downloading astunparse-1.6.3-py2.py3-none-any.whl.metadata (4.4 kB)
Collecting flatbuffers>=24.3.25 (from tensorflow-intel==2.17.0->tensorflow)
  Downloading flatbuffers-24.3.25-py2.py3-none-any.whl.metadata (850 bytes)
Collecting gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 (from tensorflow-intel==2.17.0->tensorflow)
  Downloading gast-0.6.0-py3-none-any.whl.metadata (1.3 kB)
Collecting google-pasta>=0.1.1 (from tensorflow-intel==2.17.0->tensorflow)
  Downloading google_pasta-0.2.0-py3-none-any.whl.metadata (814 bytes)
Collecting h5py>=3.10.0 (from tensorflow-

In [12]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

# Load data
df = pd.read_csv('D02fully_processed_telco_data.csv')

# Check for any non-numeric columns that need to be encoded or removed
print(df.dtypes)  # This will help in identifying non-numeric columns

# Assuming 'Churn_Yes' is the target
X = df.drop('Churn_Yes', axis=1)
y = df['Churn_Yes'].values

# Automatically identify numeric and categorical columns
numeric_features = X.select_dtypes(include=['int64', 'float64']).columns
categorical_features = X.select_dtypes(include=['object']).columns

# Define a pipeline to transform data
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())])

categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])

# Applying transformations
X_processed = preprocessor.fit_transform(X)

# Splitting data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_processed, y, test_size=0.2, random_state=42)

# Neural Network Model
model = Sequential([
    Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    Dropout(0.5),
    Dense(32, activation='relu'),
    Dropout(0.5),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Train the model
history = model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.2)

# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(X_test, y_test)
print("Test Accuracy:", test_acc)


customerID                                object
SeniorCitizen                              int64
tenure                                   float64
MonthlyCharges                           float64
TotalCharges                             float64
gender_Male                                 bool
Partner_Yes                                 bool
Dependents_Yes                              bool
PhoneService_Yes                            bool
MultipleLines_No phone service              bool
MultipleLines_Yes                           bool
InternetService_Fiber optic                 bool
InternetService_No                          bool
OnlineSecurity_No internet service          bool
OnlineSecurity_Yes                          bool
OnlineBackup_No internet service            bool
OnlineBackup_Yes                            bool
DeviceProtection_No internet service        bool
DeviceProtection_Yes                        bool
TechSupport_No internet service             bool
TechSupport_Yes     

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/50
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 10ms/step - accuracy: 0.6950 - loss: 0.6159 - val_accuracy: 0.7799 - val_loss: 0.4522
Epoch 2/50
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.7800 - loss: 0.4667 - val_accuracy: 0.7728 - val_loss: 0.4405
Epoch 3/50
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 5ms/step - accuracy: 0.8017 - loss: 0.4535 - val_accuracy: 0.7915 - val_loss: 0.4370
Epoch 4/50
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.8250 - loss: 0.3948 - val_accuracy: 0.7906 - val_loss: 0.4410
Epoch 5/50
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.8900 - loss: 0.3017 - val_accuracy: 0.7773 - val_loss: 0.4504
Epoch 6/50
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.9629 - loss: 0.1457 - val_accuracy: 0.7578 - val_loss: 0.4684
Epoch 7/50
[1m141/141[0m 

# explaination of the code 


Certainly! The script provided is a comprehensive setup for preparing your data and training a neural network to predict customer churn. Here’s a detailed breakdown of each part of the script:

### Data Loading and Initial Processing

```python
# Load data
df = pd.read_csv('/path/to/fully_processed_telco_data.csv')

# Check for any non-numeric columns that need to be encoded or removed
print(df.dtypes)  # This will help in identifying non-numeric columns
```

- **Data Loading:** The data is loaded from a CSV file into a pandas DataFrame.
- **Data Inspection:** The `dtypes` method is used to print the data types of each column in the DataFrame. This helps identify columns that are not numeric and need further preprocessing before they can be used in the neural network.

### Feature Selection and Preprocessing

```python
# Assuming 'Churn_Yes' is the target
X = df.drop('Churn_Yes', axis=1)
y = df['Churn_Yes'].values
```

- **Feature and Target Separation:** `X` contains the features (independent variables), and `y` contains the target variable (dependent variable, 'Churn_Yes').

```python
# Automatically identify numeric and categorical columns
numeric_features = X.select_dtypes(include=['int64', 'float64']).columns
categorical_features = X.select_dtypes(include=['object']).columns
```

- **Identify Column Types:** Automatically identifies which columns are numeric and which are categorical. This categorization is crucial because numeric and categorical data require different types of preprocessing.

### Defining the Transformation Pipelines

```python
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())])

categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])
```

- **Numeric Transformer:** A pipeline for numeric features which includes:
  - **Imputation:** Fills missing values with the median of each column.
  - **Scaling:** Standardizes features by removing the mean and scaling to unit variance.
  
- **Categorical Transformer:** A pipeline for categorical features which includes:
  - **Imputation:** Fills missing values with a constant ('missing').
  - **OneHot Encoding:** Transforms categorical variables into a form that could be provided to ML algorithms to do a better job in prediction.

```python
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])
```

- **Column Transformer:** Combines the two pipelines into a single transformer that applies the appropriate transformations to each column type.

### Applying Transformations and Splitting Data

```python
# Applying transformations
X_processed = preprocessor.fit_transform(X)
```

- **Apply Preprocessor:** The `fit_transform` method fits the transformation to the data and then transforms it. This prepares the data for modeling.

```python
# Splitting data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_processed, y, test_size=0.2, random_state=42)
```

- **Data Splitting:** Divides the data into training and testing sets to ensure the model can be trained and then independently evaluated.

### Building and Training the Neural Network

```python
# Neural Network Model
model = Sequential([
    Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    Dropout(0.5),
    Dense(32, activation='relu'),
    Dropout(0.5),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Train the model
history = model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.2)
```

- **Neural Network Configuration:** Constructs a sequential neural network with:
  - **Dense Layers:** Fully connected layers with `relu` activation for hidden layers and `sigmoid` for the output layer (suitable for binary classification).
  - **Dropout:** Used to prevent overfitting by randomly setting a fraction of the input units to 0 at each update during training.
- **Compilation:** Configures the model for training with the Adam optimizer and binary crossentropy as the loss function.
- **Training:** Fits the model on the training data while also validating on a portion of it.

### Model Evaluation

```python
# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(X_test, y_test)
print("Test Accuracy:", test_acc)
```

- **Evaluation:** Measures the model's performance on the unseen test data and prints the accuracy.

This setup is comprehensive, integrating data preprocessing with model configuration and evaluation, ensuring that the neural network is appropriately fed with preprocessed data for churn prediction.

# we just test he neural with epoch 500 

In [13]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

# Load data
df = pd.read_csv('D02fully_processed_telco_data.csv')

# Check for any non-numeric columns that need to be encoded or removed
print(df.dtypes)  # This will help in identifying non-numeric columns

# Assuming 'Churn_Yes' is the target
X = df.drop('Churn_Yes', axis=1)
y = df['Churn_Yes'].values

# Automatically identify numeric and categorical columns
numeric_features = X.select_dtypes(include=['int64', 'float64']).columns
categorical_features = X.select_dtypes(include=['object']).columns

# Define a pipeline to transform data
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())])

categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])

# Applying transformations
X_processed = preprocessor.fit_transform(X)

# Splitting data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_processed, y, test_size=0.2, random_state=42)

# Neural Network Model
model = Sequential([
    Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    Dropout(0.5),
    Dense(32, activation='relu'),
    Dropout(0.5),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Train the model
history = model.fit(X_train, y_train, epochs=500, batch_size=32, validation_split=0.2)

# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(X_test, y_test)
print("Test Accuracy:", test_acc)


customerID                                object
SeniorCitizen                              int64
tenure                                   float64
MonthlyCharges                           float64
TotalCharges                             float64
gender_Male                                 bool
Partner_Yes                                 bool
Dependents_Yes                              bool
PhoneService_Yes                            bool
MultipleLines_No phone service              bool
MultipleLines_Yes                           bool
InternetService_Fiber optic                 bool
InternetService_No                          bool
OnlineSecurity_No internet service          bool
OnlineSecurity_Yes                          bool
OnlineBackup_No internet service            bool
OnlineBackup_Yes                            bool
DeviceProtection_No internet service        bool
DeviceProtection_Yes                        bool
TechSupport_No internet service             bool
TechSupport_Yes     

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/500
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 12ms/step - accuracy: 0.7094 - loss: 0.6046 - val_accuracy: 0.7613 - val_loss: 0.4521
Epoch 2/500
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 7ms/step - accuracy: 0.7786 - loss: 0.4661 - val_accuracy: 0.7622 - val_loss: 0.4428
Epoch 3/500
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 7ms/step - accuracy: 0.7964 - loss: 0.4491 - val_accuracy: 0.7666 - val_loss: 0.4384
Epoch 4/500
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 7ms/step - accuracy: 0.8365 - loss: 0.3931 - val_accuracy: 0.7799 - val_loss: 0.4392
Epoch 5/500
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 7ms/step - accuracy: 0.8948 - loss: 0.2874 - val_accuracy: 0.7826 - val_loss: 0.4397
Epoch 6/500
[1m141/141[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 7ms/step - accuracy: 0.9753 - loss: 0.1311 - val_accuracy: 0.7737 - val_loss: 0.4599
Epoch 7/500
[1m141/1


KeyboardInterrupt

