# Data Science: Examples

This notebook provides practical code examples for key data science concepts, including data wrangling, exploratory data analysis, statistics, and machine learning using Python. Each example is designed to illustrate best practices and real-world workflows.

---

## Table of Contents
1. [Loading and Inspecting Data](#loading-and-inspecting-data)
2. [Data Cleaning and Transformation](#data-cleaning-and-transformation)
3. [Exploratory Data Analysis (EDA)](#exploratory-data-analysis-eda)
4. [Statistical Analysis](#statistical-analysis)
5. [Machine Learning: Classification](#machine-learning-classification)
6. [Machine Learning: Regression](#machine-learning-regression)
7. [Model Evaluation](#model-evaluation)

---

In [None]:
# 1. Loading and Inspecting Data
import pandas as pd

# Load a sample dataset
url = 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv'
df = pd.read_csv(url)
df.head()

# 2. Data Cleaning and Transformation
# Check for missing values
df.isnull().sum()

# Example: Rename columns
df.rename(columns={'species': 'Species'}, inplace=True)
df.head()

# 3. Exploratory Data Analysis (EDA)
import matplotlib.pyplot as plt
import seaborn as sns

# Visualize feature distributions
sns.histplot(df['sepal_length'], kde=True)
plt.title('Sepal Length Distribution')
plt.show()

# Pairplot for feature relationships
sns.pairplot(df, hue='Species')
plt.show()

# 4. Statistical Analysis
# Calculate summary statistics
df.describe()

# Correlation matrix
df.corr()

# 5. Machine Learning: Classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

X = df.drop('Species', axis=1)
y = df['Species']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

clf = RandomForestClassifier(random_state=42)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

print('Classification Accuracy:', accuracy_score(y_test, y_pred))

# 6. Machine Learning: Regression
from sklearn.linear_model import LinearRegression

# Example: Predict sepal_length from other features
X_reg = df.drop(['sepal_length', 'Species'], axis=1)
y_reg = df['sepal_length']

X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(X_reg, y_reg, test_size=0.2, random_state=42)

reg = LinearRegression()
reg.fit(X_train_reg, y_train_reg)
y_pred_reg = reg.predict(X_test_reg)

print('Regression R^2 Score:', reg.score(X_test_reg, y_test_reg))

# 7. Model Evaluation
from sklearn.metrics import classification_report, mean_squared_error

# Classification report
print(classification_report(y_test, y_pred))

# Regression error
print('Regression MSE:', mean_squared_error(y_test_reg, y_pred_reg))