Skip to content

A project on classification of GitHub readme sections using Machine Learning

Notifications You must be signed in to change notification settings

meng-ucalgary/ensf-612-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ENSF-612 Term Project

A project on using PySpark/databricks to replicate and extend a given research paper based on an application of Machine Learning.

About

In this project, we replicated and extended a research paper on Categorizing the Content of GitHub README Files using PySpark/databricks. The original code by the authors of the research paper is available here.

Project Video

Folder Structure

  • data_dumps - data dumps from the SQLite database
  • manual_work - files used for manual work
  • new_input_readmes - new readme files used for the ML model
  • notebooks - contains the code from research paper adapted for databricks, along with additional code
  • other - contains Research paper, instructions, and report template
  • presentations - contains presentations and the report

Contributors