# Data Science Project – Main Notebook
### Tel Aviv University – Data Science Workshop

This notebook presents the full workflow of our project. The final topic will be decided and approved by the teaching staff during the upcoming week.

All analysis steps are structured according to the course guidelines:
- Business understanding
- Data exploration
- Data preparation
- Feature engineering
- Modeling
- Evaluation
- Interpretation of results

_All placeholder sections will be filled once the dataset and task are finalized._


## 1. Project Overview
### 1.1 Topic (to be finalized)
<Insert description of the chosen topic here>

### 1.2 Motivation
<Insert motivation according to domain (finance / neuroscience / other)>

### 1.3 Research Questions
- RQ1: <Insert>
- RQ2: <Insert>
- RQ3: <Insert>


## 2. Imports and Environment Setup
This section imports all required libraries. Additional libraries may be added later based on project needs.


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from src.data_loader import load_csv
from src.preprocessing import basic_cleaning, split_X_y
from src.utils import configure_pandas_display

configure_pandas_display()

## 3. Data Loading

Once the final dataset is approved, place the raw files inside `data/raw/`.

_Example placeholder below:_


In [None]:
# Placeholder: example load once dataset exists
# df_raw = load_csv("<file_name>.csv", folder="raw")
# df_raw.head()

## 4. Initial Data Exploration (EDA)

This phase follows the course guidelines:
- Getting a "feel" for the data
- Understanding distribution, correlations, missing values, outliers
- Checking if integration of additional datasets is required

_All placeholders will be filled after loading the approved dataset._


In [None]:
# Placeholder for EDA summaries
# df_raw.info()
# df_raw.describe()
# df_raw.isna().sum()

### 4.1 Visual Exploration
Visualization guidelines from the course: keep it simple, appropriate scale, labeled axes.


In [None]:
# Example placeholder (to implement once data exists)
# sns.pairplot(df_raw)
# plt.show()

## 5. Data Cleaning and Preparation

Based on the course lectures, this includes handling:
- Missing values (MCAR / MAR / MNAR)
- Outliers
- Normalization
- Integration of multiple data sources if necessary

All steps will be expanded once the dataset is chosen.


In [None]:
# Placeholder for cleaning
# df_clean = basic_cleaning(df_raw)
# df_clean.head()

## 6. Feature Engineering

According to course tutorials:
- Transformations
- Binning / discretization
- Encoding categorical variables
- PCA or dimensionality reduction (if needed)
- Domain-based features (e.g., daily return, volatility, or temporal descriptors)

_This section will be expanded once the dataset structure is known._


In [None]:
# Placeholder for engineered features
# df_features = df_clean.copy()
# df_features["new_feature"] = <definition>

## 7. Modeling

Once the task is finalized (classification / regression / ranking / time-series prediction), this section will include:
- Train-test split
- Baseline model
- Alternative models
- Hyperparameter tuning
- Cross-validation


In [None]:
# Placeholder for modeling workflow
# from sklearn.linear_model import LogisticRegression
# X, y = split_X_y(df_features, label_column="<label>")
# X_train, X_test, y_train, y_test = create_train_test(X, y)
# model = LogisticRegression()
# model.fit(X_train, y_train)

## 8. Evaluation

Based on the guidelines:
- Accuracy, F1, precision/recall (for classification)
- MSE/MAE/R² (for regression)
- ROC/AUC
- Cross-validation

_Exact metrics will follow the approved task._


In [None]:
# Placeholder for evaluation
# from src.evaluation import evaluate_basic
# metrics = evaluate_basic(model, X_test, y_test)
# metrics

## 9. Interpretation and Insights

This section will include:
- Feature importance
- Model explainability (SHAP / LIME)
- Comparison to baseline
- Practical significance

To be completed after first modeling iteration.


## 10. Conclusions and Next Steps
This final section will summarize:
- What we discovered
- Outstanding challenges
- Next steps for fine-tuning the model or analysis
