gh-apps
Code to extract package.json from popular JavaScript|TypeScript repositories that are not on npm.
Usage
-
Create a
.env
file containing the variables:GITHUB_TOKEN
1: your github API token.PROGRAMMING_LANGUAGE
:'0'
for JavaScript,'1'
for TypeScript. Defaults to'0'
.ONLY_TOP_LEVEL
:'false'
to fully traverse the git tree forpackage.json
files. Defaults to'true'
.STAR_COUNT
2:'1000'
limit used fornpm run count-repos
. Defaults to'0'
.
1 If the script reaches GitHub's rate limit, it will pause and resume when the limit resets. You can also use
GITHUB_TOKENS
to provide an array of tokens and the script will circle through them if the limit is reached. -
Install dependencies & run the script
npm i npm start
2 You can also run
npm run count-repos
to create a csv containing the number of repositories for each star count. To set this limit, configureSTAR_COUNT
Dataset
- dataset_top_level_only.zip contains 12341 JavaScript and 1543 TypeScript package.json files from repos with
stars > 70
. - dataset_tree_traversal.zip contains 37702 JavaScript and 5188 TypeScript package.json files from repos with
stars > 70
.
Filename format is:
<stars>๐<owner>๐<repo>๐[<path>]package.json
, where reserved characters (e.g./
) are converted to!
.
- dataset_count_repos.zip contains frequency of repos with
stars โ [0, 1000]
.