# Python Basics for Data Mining
Welcome to the Python refresher notebook for graduate students in Data Mining. This notebook is designed to give you practice with essential Python skills, including data handling, visualization, and basic modeling.

📌 **Instructions:**
- Work through each section.
- Complete the code cells where prompted.
- Submit your completed notebook as your Module 0 assignment.
- Alternatively, you may submit proof of completion of a comparable course on DataCamp, Codecademy, or Coursera.


In [None]:
# Let's import common libraries used in data mining
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
sns.set(style='whitegrid')

## 1. Python Syntax Basics
Let's start with Python basics: variables, lists, loops, and conditionals.

In [None]:
# TODO: Create a list of 5 numbers and print the square of each number
numbers = [1, 2, 3, 4, 5]
# Your code here:

## 2. Working with DataFrames
Let's load and explore a simple dataset.

In [None]:
# We'll use a simple Titanic dataset available online
url = 'https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv'
df = pd.read_csv(url)
df.head()

In [None]:
# TODO: Display column names and check for missing values
# Your code here:

## 3. Data Visualization
Visualize relationships in the dataset using matplotlib and seaborn.

In [None]:
# TODO: Plot a histogram of passenger ages
# Your code here:

## 4. Feature Engineering
We'll create a new column and handle missing data.

In [None]:
# TODO: Fill missing 'Age' values with the median and create a new binary column for 'IsMinor' (age < 18)
# Your code here:

## 5. Simple Logistic Regression Model
Let's build a model to predict survival based on a few features.

In [None]:
# We'll use 'Pclass', 'Sex', and 'Age' to predict 'Survived'
# Convert categorical variables
df['Sex'] = df['Sex'].map({'male': 0, 'female': 1})
df['Age'].fillna(df['Age'].median(), inplace=True)
X = df[['Pclass', 'Sex', 'Age']]
y = df['Survived']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")

## ✅ Final Checkpoint
**Submit your completed notebook as a PDF or .ipynb file.**
- Make sure all cells are run and outputs are visible.
- Your notebook should demonstrate basic proficiency in Python, pandas, visualization, and modeling.