

#### Q1: Exploratory Data Analysis

**1. Import necessary libraries:**

```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Ensure plots are displayed in the notebook
%matplotlib inline
```

**2. Read the data from the Titanic.csv file:**

```python
# Load the dataset
df = pd.read_csv('Titanic.csv')
```

**3. View the column names of the Titanic dataset:**

```python
print(df.columns)
```

**4. Print a random selection of 15 records from the dataset:**

```python
print(df.sample(15))
```

**5. Check for NULL/NaN values in the dataset and list the columns with missing values:**

```python
null_counts = df.isnull().sum()
print(null_counts[null_counts > 0])
```

**6. Drop the column with too much missing data (assume Cabin has too much missing data):**

```python
df.drop(columns=['Cabin'], inplace=True)
```

**7. Rename the column "Sex" to "Gender":**

```python
df.rename(columns={'Sex': 'Gender'}, inplace=True)
```

**8. Find the list of numerical fields in the dataset:**

```python
numerical_fields = df.select_dtypes(include=[np.number]).columns.tolist()
print(numerical_fields)
```

#### Q2: Preprocessing Activities

**1. Find the ratio of survivors to non-survivors by gender and plot it:**

```python
# Calculate the ratio
gender_survival_ratio = df.groupby('Gender')['Survived'].mean()
print(gender_survival_ratio)

# Plot the ratio
gender_survival_ratio.plot(kind='bar')
plt.title('Survival Ratio by Gender')
plt.ylabel('Survival Ratio')
plt.xlabel('Gender')
plt.show()
```

**2. Handle missing values in numerical fields and replace them with the mean based on "Pclass":**

```python
# Filling missing values in 'Age' with the mean based on 'Pclass'
df['Age'] = df.groupby('Pclass')['Age'].transform(lambda x: x.fillna(x.mean()))
```

**3. Replace categorical data with proper encoding:**

```python
# Encode 'Gender' column
df['Gender'] = df['Gender'].map({'male': 0, 'female': 1})

# Encode 'Embarked' column using one-hot encoding
df = pd.get_dummies(df, columns=['Embarked'], drop_first=True)
```

#### Q3: Classification Model Using Logistic Regression

**1. Prepare the data for modeling:**

```python
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns

# Selecting features and target variable
features = ['Pclass', 'Gender', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked_Q', 'Embarked_S']
X = df[features]
y = df['Survived']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
```

**2. Develop the Logistic Regression model:**

```python
# Create the model
log_reg = LogisticRegression(max_iter=200)

# Train the model
log_reg.fit(X_train, y_train)

# Predict on the test set
y_pred = log_reg.predict(X_test)
```

**3. Evaluate the model using Classification Report and Confusion Matrix:**

```python
# Classification report
print(classification_report(y_test, y_pred))

# Confusion matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print(conf_matrix)

# Plot the confusion matrix
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()
```

### Summary of Actions and Assumptions:
- Assumed 'Cabin' column had too much missing data and dropped it.
- Renamed 'Sex' to 'Gender'.
- Replaced missing values in 'Age' based on the mean within each 'Pclass'.
- Encoded categorical variables 'Gender' and 'Embarked'.
- Split the dataset for training and testing.
- Built and evaluated a Logistic Regression model, visualizing performance with a classification report and confusion matrix.

### Packaging for Submission:
1. Save the processed dataset, EDA, and model results in an Excel workbook.
2. Document the steps and findings in a Word document.
3. Create a zip file containing both the Excel workbook and the Word document for submission. 

This comprehensive approach ensures all aspects of the task are covered, leading to a robust analysis and predictive model.