A curated collection of Python‑based data analysis projects built in Jupyter Notebooks, showcasing end‑to‑end workflows from data exploration to predictive modeling.
Each notebook demonstrates practical applications of pandas, NumPy, Matplotlib, Seaborn, and scikit‑learn, reflecting real‑world analytical thinking and technical execution.
This portfolio was designed to highlight:
- Hands‑on proficiency in data cleaning, visualization, and modeling
- Ability to translate business questions into data‑driven insights
- Professional, reproducible notebook structure suitable for recruiter and stakeholder review
All notebooks are self‑contained and stored in the repository root for easy access.
| Notebook | Description | Techniques |
|---|---|---|
| ANZ Bank Analytics Challenge – Predicting Loan Status | Predicts loan approval outcomes using feature engineering and classification models. Includes EDA, preprocessing, model training, and evaluation. | Classification, Feature Engineering, Model Evaluation |
| Exploratory Data Analysis (EDA) – Sales Dataset | Performs exploratory analysis on sales data, demonstrating cleaning, aggregation, and visualization using pandas, Matplotlib, and Seaborn. | Data Cleaning, Visualization, Aggregation |
| Exploring and Analyzing Data with Pandas | Introductory notebook showcasing common pandas operations for data exploration and transformation. | Data Wrangling, Descriptive Statistics |
| Predicting House Prices with Multiple Linear Regression | Builds and evaluates a multiple linear regression model to predict house prices. Includes feature selection, model fitting, and performance visualization. | Regression, Feature Selection, Model Evaluation |
- Languages: Python
- Libraries: pandas, NumPy, Matplotlib, Seaborn, scikit‑learn
- Environment: Jupyter Notebook (in Fabric) / VS Code
- Version Control: Git & GitHub
- Clean, modular notebook design with clear markdown commentary
- Visual storytelling through charts and dashboards
- Reproducible workflows for data preprocessing and modeling
- Blend of business insight and technical rigor
Future additions will include:
- Interactive dashboards using Plotly and Streamlit
- Advanced predictive models (Random Forest, XGBoost)
- Automated data pipelines and deployment examples