Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify Repostories for ML model training #11

Open
kiranbaby14 opened this issue Jul 8, 2023 · 0 comments
Open

Identify Repostories for ML model training #11

kiranbaby14 opened this issue Jul 8, 2023 · 0 comments
Labels
documentation Improvements or additions to documentation

Comments

@kiranbaby14
Copy link
Owner

Datasets are crucial for training any machine leraning model and we need a lot of it. Currently we only have identified one reporsitory which contains GAP files which can be seen in the code https://github.com/kiranbaby14/Analysis-of-GAP-programming-practices-on-GitHub/blob/main/scripts/get_GAP_files.py#L57.

But inorder for our model to be effective we need to train on different datasets otherwise there will be bias in the model. so identify different repositories that contains other programming languages like Java, JavaScript, Python, etc.. and inlcude them in the list in the code that is given in the link in the above sentence.

TASK

  • Identify other repositories containing other languages
  • rename the script from 'get_GAP_files' to an appropriate naming

also it might be better to comment the repositories first as reply in this issue

@kiranbaby14 kiranbaby14 added the documentation Improvements or additions to documentation label Jul 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant