Machine Learning Homework Solution Notebooks (UCF CAP5610)

1. Data Preprocessing and Analyze by pivoting features of Titanic dataset

Titanic dataset: https://www.kaggle.com/c/titanic/data
Data preprocessing and data analysis

2. Analyze the Titanic data and find whether the individuals from the test dataset had survived or not

Feature Engineering

Select a set of important features VarianceThreshold Feature Selection using sklearn.feature_selection
SelectKBest Univariate Feature Selection
SelectFromModel for Logistic Regression
Recursive Feature Selection

Build Random Forest and Decision Tree Models

Build Random Forest and Decision Tree models with the Titanic dataset
Apply the five-fold cross validation of the Decision Tree and the Random Forest learning algorithms to the Titanic data to extract their average classification accuracies.
Learn a Decision Tree model with the Titanic data using Gini Index and plot the Decision Tree.

3. Support Vector Machines

Construct a support vector machine that computes the kernel function.
Learning SVMs on the Titanic dataset.
Five-fold cross validation classification accuracies on Titanic training set, with respect to the linear, quadratic, and RBF kernels.
Tuning the hyperparameters and applying GridSearchCV for each SVM kernels

4. Unsupervised Learning

Football team clustering

Use Euclidean distance as the distance metric. First, perform one iteration of the K-means algorithm and report the coordinates of the resulting centroids. Second, use K-Means to find two clusters.
Use Manhattan distance as the distance metric. First, perform one iteration of the K-means algorithm and report the coordinates of the resulting centroids. Second, use K-Means to find two clusters.

K-Means Clustering with Real World Dataset

Implement the K-means algorithm. K-means algorithm computes the distance of a given data point pair. Replace the distance computation function with Euclidean distance, 1- Cosine similarity, and 1 – the Generalized Jarcard similarity
Apply different distance matrices.
Compare the results.

5. Machine Learning for Recommender Systems

Compute the average MAE and RMSE of the Probabilistic Matrix Factorization (PMF), User based Collaborative Filtering, Item based Collaborative Filtering, under the 5-folds cross-validation
Compare the average (mean) performances of User-based collaborative filtering, item-based collaborative filtering, PMF with respect to RMSE and MAE. Which ML model is the best in the movie rating data?
Examine how the cosine, MSD (Mean Squared Difference), and Pearson similarities impact the performances of User based Collaborative Filtering and Item based Collaborative Filtering. Plot your results. Is the impact of the three metrics on User based Collaborative Filtering consistent with the impact of the three metrics on Item based Collaborative Filtering?
Examine how the number of neighbors impacts the performances of User based Collaborative Filtering and Item based Collaborative Filtering? Plot your results.
Identify the best number of neighbor (denoted by K) for User/Item based collaborative filtering in terms of RMSE. Is the best K of User based collaborative filtering the same with the best K of Item based collaborative filtering?

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Data		Data
HW 1- Data preprocessing and analysis of Titanic dataset.ipynb		HW 1- Data preprocessing and analysis of Titanic dataset.ipynb
HW 2 - Titanic data analysis.ipynb		HW 2 - Titanic data analysis.ipynb
HW 3 - Support vectors machine.ipynb		HW 3 - Support vectors machine.ipynb
HW 4 - Unsupervised learning.ipynb		HW 4 - Unsupervised learning.ipynb
HW 5 - Machine Learning for Recommender Systems.ipynb		HW 5 - Machine Learning for Recommender Systems.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.ipynb_checkpoints

.ipynb_checkpoints

Data

Data

HW 1- Data preprocessing and analysis of Titanic dataset.ipynb

HW 1- Data preprocessing and analysis of Titanic dataset.ipynb

HW 2 - Titanic data analysis.ipynb

HW 2 - Titanic data analysis.ipynb

HW 3 - Support vectors machine.ipynb

HW 3 - Support vectors machine.ipynb

HW 4 - Unsupervised learning.ipynb

HW 4 - Unsupervised learning.ipynb

HW 5 - Machine Learning for Recommender Systems.ipynb

HW 5 - Machine Learning for Recommender Systems.ipynb

README.md

README.md

Repository files navigation

Machine Learning Homework Solution Notebooks (UCF CAP5610)

1. Data Preprocessing and Analyze by pivoting features of Titanic dataset

2. Analyze the Titanic data and find whether the individuals from the test dataset had survived or not

3. Support Vector Machines

4. Unsupervised Learning

5. Machine Learning for Recommender Systems

About

Releases

Packages

Languages

javedali99/machine-learning-hw-solution-notebooks

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Homework Solution Notebooks (UCF CAP5610)

1. Data Preprocessing and Analyze by pivoting features of Titanic dataset

2. Analyze the Titanic data and find whether the individuals from the test dataset had survived or not

3. Support Vector Machines

4. Unsupervised Learning

5. Machine Learning for Recommender Systems

About

Topics

Resources

Stars

Watchers

Forks

Languages