# Data Science: Solutions

This notebook provides detailed solutions to the Data Science exercises. Review the solutions after attempting the exercises to reinforce your understanding and learn best practices.

---

## Table of Contents
1. [Data Loading and Inspection](#data-loading-and-inspection)
2. [Data Cleaning](#data-cleaning)
3. [Exploratory Data Analysis](#exploratory-data-analysis)
4. [Statistical Analysis](#statistical-analysis)
5. [Machine Learning: Classification](#machine-learning-classification)
6. [Machine Learning: Regression](#machine-learning-regression)
7. [Model Evaluation](#model-evaluation)

---

In [None]:
# 1. Data Loading and Inspection
import pandas as pd
url = 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv'
df = pd.read_csv(url)
df.head()
df.dtypes

# 2. Data Cleaning
# Check for missing values
df.isnull().sum()
# Fill missing values with column mean (if any)
df.fillna(df.mean(numeric_only=True), inplace=True)
# Rename column
df.rename(columns={'species': 'Species'}, inplace=True)
df.head()

# 3. Exploratory Data Analysis
import matplotlib.pyplot as plt
import seaborn as sns
sns.histplot(df['sepal_length'], kde=True)
plt.title('Sepal Length Distribution')
plt.show()
sns.pairplot(df, hue='Species')
plt.show()

# 4. Statistical Analysis
df.mean(numeric_only=True)
df.median(numeric_only=True)
df.std(numeric_only=True)
df.corr()

# 5. Machine Learning: Classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
X = df.drop('Species', axis=1)
y = df['Species']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
clf = RandomForestClassifier(random_state=42)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
accuracy_score(y_test, y_pred)

# 6. Machine Learning: Regression
from sklearn.linear_model import LinearRegression
X_reg = df.drop(['sepal_length', 'Species'], axis=1)
y_reg = df['sepal_length']
X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(X_reg, y_reg, test_size=0.2, random_state=42)
reg = LinearRegression()
reg.fit(X_train_reg, y_train_reg)
y_pred_reg = reg.predict(X_test_reg)
reg.score(X_test_reg, y_test_reg)

# 7. Model Evaluation
from sklearn.metrics import classification_report, mean_squared_error
print(classification_report(y_test, y_pred))
mean_squared_error(y_test_reg, y_pred_reg)