Skip to content

Contains registry lists, cross-ecosystem lists, and directory structure of packages from Crates, Maven, PyPI, PHP, Go, NPM, and Ruby.

Notifications You must be signed in to change notification settings

oraoraoraaa/Package-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Package-Dataset

This repo contains registry lists, cross-ecosystem lists, and directory structure of packages from Crates, Maven, PyPI, PHP, Go, NPM, and Ruby.

The original scripts used to mine or analyze the lists are also uploaded in this repository. You can navigate the the corresponding script and run it locally for latest results.

general-map

How to Use

Download the dataset you are interested in directly from the repo.

ATTENTION

Since some of the files are very large in size, git lfs is used in this repo. Before you clone this repo, make sure git lfs has been installed on your machine. See git lfs for more information.

After cloning the repo, you will need to set up git lfs in the repo to view all the files properly. Run the following code in the repository folder:

git lfs install
git lfs pull

Then you should see the actual contents from the files properly.

Folder

Package-List

Contains the whole lists of packages in the corresponding ecosystem.

  • File Type: .csv
  • Columns: ID,Platform,Name,Homepage URL,Repository URL

Crates

  • Source: https://static.crates.io/db-dump.tar.gz
  • Description: Crates provides us with the whole database dump file. We can get the necessary information directly from it.
  • Mine Date: 2025-12-1
  • Amount: 207,981

Go

Maven

NPM

PHP

PyPI

Ruby

Common-Package

Contains the filtered packages which appear in multiple package lists. There are in total 120 possible combinations, while eventually 84 combinations hold matched results.

  • File Type: .csv
  • Matching Criteria: Same repo URL

Directory-Structure

Contains the directory structures of packages inside each common packages lists. The directory structure is mined using the REST API endpoint of git tree. Some of the packages cannot be mined because the repo is either deleted or non-accessible.

  • File Type: .txt

Script

Contains the scripts of getting the dataset. Clone what you are interested and run it locally to get latest information.

There are detailed documentation under the folder of each script.

The input and output path has been modified to match the current file layout.

About

Contains registry lists, cross-ecosystem lists, and directory structure of packages from Crates, Maven, PyPI, PHP, Go, NPM, and Ruby.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published