Machine Learning with Python - IBM

This repository contains code and notebooks developed during the Machine Learning with Python course offered by IBM Skills Network through Coursera.

About the Course

This course covers the fundamental concepts of Machine Learning and its practical applications, including data preprocessing, regression, classification, clustering, dimensionality reduction, fundamentals of statistics and linear algebra, model evaluation and improvement, as well as best project practices.
Since the course lectures and labs are in English, I chose to keep all code in the same language.

Repository Structure

The Jupyter notebook files are organized by modules, according to the course:

📁 Repository structure
├── 📁 Module 1 - Introduction
├── 📁 Module 2 - Linear and logistic regression
├── 📁 Module 3 - Supervised models
├── 📁 Module 4 - Unsupervised models
├── 📁 Module 5 - Model evaluation and metrics
├── 📁 Module 6 - Final project
│   ├── 📝 Practical Project - Titanic Survivors
│   ├── 📝 Final Project - Rain Prediction in Australia
├── README.md

Technologies Used

Python
Jupyter Notebook
Libraries: NumPy, Pandas, Scikit-Learn, Matplotlib, Seaborn

Final Project

The three final projects of the course are in the Module 6 folder.
They demonstrate the application of concepts learned throughout the course, including the best techniques and approaches taught by IBM instructors.

Developed a rainfall prediction classifier. The project involved:

Exploratory Data Analysis (EDA): Identified and addressed data leakage and outliers.
Feature Engineering: Created a 'Season' feature from the 'Date' column and handled categorical and numerical features.
Preprocessing Pipeline: Built a robust pipeline for data scaling, one-hot encoding, and addressing class imbalance with SMOTE.
Model Training & Optimization: Implemented and tuned a RandomForest Classifier and an XGBoost Classifier using GridSearchCV and Stratified K-Fold cross-validation.
Feature Selection: Analyzed feature importances and experimented with threshold-based and iterative feature selection to improve model performance.
Model Evaluation: Assessed model performance using classification reports, confusion matrices, and F1-scores, comparing the effectiveness of different data preprocessing and modeling techniques.
This project demonstrates skills in data cleaning, feature engineering, building machine learning pipelines, model selection, tuning, and evaluation, particularly for imbalanced datasets.

How to Use

Clone and access the repository:

git clone https://github.com/phaa/ibm-ml-with-python.git
cd ibm-ml-with-python/

Activate the virtual environment (conda or venv):
```
conda activate ibmenv
```
Run the notebooks in Jupyter lab:
```
jupyter lab
```

Each notebook has a cell to install the necessary dependencies.

Contributions

This repository is a record of the course learning, but suggestions and improvements are always welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.idea		.idea
.virtual_documents/6-final-project		.virtual_documents/6-final-project
2-linear-logistic-regression		2-linear-logistic-regression
3-supervised-models		3-supervised-models
4-unsupervised-models		4-unsupervised-models
5-evaluation-metrics		5-evaluation-metrics
6-final-project		6-final-project
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Machine Learning with Python - IBM

About the Course

Repository Structure

Technologies Used

Final Project

How to Use

Contributions

About

Uh oh!

Releases

Packages

Languages

phaa/ibm-ml-with-python

Folders and files

Latest commit

History

Repository files navigation

Machine Learning with Python - IBM

About the Course

Repository Structure

Technologies Used

Final Project

How to Use

Contributions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages