README-Classification-Machine-Learning

Project Overview

GitHub serves as a large repository containing the programs of millions of users and their programs. README files are often included in GitHub repositories and are critical to a user’s understanding of how a program functions. One of their most important tasks is to inform the user on how to run their program.

In this project our group tried to replicate and improve the results of the paper 'Categorizing the Content of GitHub README Files' for categorizing README files that contain the methods on how to run their program. We will try to improve the results of this paper by expanding on their dataset and tuning their best machine learning algorithm (Linear Support Vector Machine Classifier) and a new machine learning algorithm (Gradient Boosted Tree Classifier) .

The Gradient Boosted Tree Classifier performed adequately but could not match the results of the paper even after adding our new dataset and tuning its hyperparameters. The Linear Support Vector Machine Classifier however was able to outperform the paper’s classifier on both the original dataset and our new dataset. Through this paper it is demonstrated that tuning and the addition of more data can improve the results of a machine learning algorithm’s predictions of if a README file explains how a program works.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
input		input
output		output
script		script
.gitignore		.gitignore
README Classification Report.pdf		README Classification Report.pdf
README-Classifier.ipynb		README-Classifier.ipynb
README.md		README.md
RandomRepo.ipynb		RandomRepo.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README-Classification-Machine-Learning

Project Overview

Contents

About

Releases

Packages

Languages

hunterkimmett/README-Classification-Machine-Learning

Folders and files

Latest commit

History

Repository files navigation

README-Classification-Machine-Learning

Project Overview

Contents

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages