Dataset Scripts & Docs #1

elatoskinas · 2021-01-28T12:07:36Z

Added scripts from ml-typeinf-competition with an updated readme

elatoskinas · 2021-01-28T12:07:58Z

README.md

+## Dataset preparation
+**Pre-requisites:**
+* Python dependencies from `scripts/requirements.txt` installed (run `pip install -r scripts/requirements.txt`)
+* A repositories folder (dataset), where git projects are stored in format `[dataset path]/author/repo`


Do we have a script we can add for cloning all repos from a spec file? :)

@elatoskinas
yes, it's here: https://github.com/saltudelft/many-types-4-py-dataset/blob/master/repo_cloner/__main__.py
The first step is to run the cloner with this JSON file as input: https://github.com/saltudelft/many-types-4-py-dataset/blob/master/mypy-dependents-by-stars.json

The second step is to write a shell script that changes the state of git repositories based on the commit hash.

Pushed the update, I've documented how to run the cloner with the given json file, and added an auxiliary script in the scripts folder in order to revert the commit hashes.

mir-am · 2021-01-29T20:15:21Z

@elatoskinas
I'll revise the README file a bit and then merge it.

elatoskinas · 2021-01-30T12:45:51Z

@mir-am I've just realized this is missing our JSON representation generation (and also the process_dataset.sh does not have this added in). How do you recon we add this in?

mir-am · 2021-02-01T08:38:36Z

@mir-am I've just realized this is missing our JSON representation generation (and also the process_dataset.sh does not have this added in). How do you recon we add this in?

I'll add the JSON representation step to the README. Before that, I need to add setup.py file for LibSA4Py package.

mir-am · 2021-02-01T10:14:26Z

@elatoskinas
I'm gonna merge this PR for now. It'd be great if you could update prepare_dataset.sh script to include all the steps that are documented in README.

Add scripts from ml-typeinf-competition and update readmes

d2f68dc

elatoskinas commented Jan 28, 2021

View reviewed changes

mir-am assigned elatoskinas Jan 28, 2021

elatoskinas and others added 2 commits January 29, 2021 16:55

Add reset commits script, modify readmes

c740c28

Update the README files for prepraring the dataset

8e0a13d

mir-am self-assigned this Feb 1, 2021

Update the readme for installing package and including zenodo link

0d65a51

mir-am merged commit 78af655 into master Feb 1, 2021

elatoskinas mentioned this pull request Feb 1, 2021

Update dataset preparation script with JSON generation #2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset Scripts & Docs #1

Dataset Scripts & Docs #1

elatoskinas commented Jan 28, 2021

elatoskinas Jan 28, 2021

mir-am Jan 28, 2021

elatoskinas Jan 29, 2021

mir-am commented Jan 29, 2021

elatoskinas commented Jan 30, 2021

mir-am commented Feb 1, 2021

mir-am commented Feb 1, 2021

Dataset Scripts & Docs #1

Dataset Scripts & Docs #1

Conversation

elatoskinas commented Jan 28, 2021

elatoskinas Jan 28, 2021

Choose a reason for hiding this comment

mir-am Jan 28, 2021

Choose a reason for hiding this comment

elatoskinas Jan 29, 2021

Choose a reason for hiding this comment

mir-am commented Jan 29, 2021

elatoskinas commented Jan 30, 2021

mir-am commented Feb 1, 2021

mir-am commented Feb 1, 2021