# Processing New Data

This notebook demonstrates how to use the trained Logistic Regression model to make predictions on new data. Follow these steps to process your new data and obtain predictions:

1. **Load the Trained Model and Scaler**:
   - The model and scaler are loaded from the `model` directory. The scaler ensures that the new data is preprocessed in the same way as the training data.

2. **Load and Preprocess New Data**:
   - New data is loaded from the `new_data` directory.
   - Categorical variables are converted into dummy variables.
   - The notebook reindexes the new data to ensure it has the same columns as the training data, which is crucial for accurate predictions.

3. **Scale the Data**:
   - The new data is scaled using the loaded scaler to match the feature scaling applied during training.

4. **Make Predictions**:
   - Predictions are made using the loaded model.
   - Prediction probabilities for the positive class are also calculated.

5. **Save and View Results**:
   - The results are saved to `new_data/predictions.csv`.
   - A preview of the predictions and probabilities is displayed.

Ensure that your new data file is in the correct format and directory before running this notebook.



In [None]:
# Import necessary libraries
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
import joblib

# 1. Load the Trained Model and Scaler

# Define the paths to the model and scaler files
model_path = '../model/logistic_regression_model.pkl'
scaler_path = '../model/scaler.pkl'

# Load the trained model and scaler
model = joblib.load(model_path)
scaler = joblib.load(scaler_path)

# Verify that the loaded model and scaler are as expected
print("Loaded model:", model)
print("Loaded scaler:", scaler)
print("Model parameters:", model.get_params())

# 2. Load and Preprocess New Data

# Define the path to the new data file
new_data_path = 'new_data/new_data.csv'

# Load the new data
new_data = pd.read_csv(new_data_path)

# Display the first few rows of the new data
print("New data preview:")
print(new_data.head())

# Preprocess the new data
# Convert categorical variables to dummy variables
new_data_encoded = pd.get_dummies(new_data, drop_first=True)

# Ensure the new data has the same features as the training data
# Get the feature names from the scaler
expected_columns = scaler.feature_names_in_

# Reindex to ensure the new data has the same columns as the training data
new_data_encoded = new_data_encoded.reindex(columns=expected_columns, fill_value=0)

# 3. Scale the Data

# Scale the data using the loaded scaler
new_data_scaled = scaler.transform(new_data_encoded)

# 4. Make Predictions

# Use the trained model to make predictions
predictions = model.predict(new_data_scaled)
prediction_probabilities = model.predict_proba(new_data_scaled)[:, 1]  # Probabilities for the positive class

# 5. Save and View Results

# Create a DataFrame with the results
results = pd.DataFrame({
    'Prediction': predictions,
    'Probability': prediction_probabilities
})

# Save the results to a CSV file
results.to_csv('new_data/predictions.csv', index=False)

# Display the first few rows of the results
print("Prediction results preview:")
print(results.head())

