Skip to content
Permalink
Browse files

Add the PGA heads docs

Signed-off-by: Vadim Markovtsev <vadim@sourced.tech>
  • Loading branch information...
vmarkovtsev committed Sep 9, 2019
1 parent 77d1bb9 commit 1e7b13ee2ccc1d487fed450cbe1c65eb2af1db6d
Showing with 14 additions and 1 deletion.
  1. +6 −0 PublicGitArchive/README.md
  2. +8 −1 PublicGitArchive/list-pga-heads/README.md
@@ -8,12 +8,18 @@ This dataset consists of two parts:
* [Siva](https://github.com/src-d/go-siva) files with Git repositories.
* Index file in CSV format.

Besides, there is a number of auxiliary datasets:

* [configs.tar.xz](https://drive.google.com/open?id=1_cij4BMrPiKVBVdZyUzg1iOhB3pL6EPR) - raw git config files for each siva.
* [heads.csv.xz](https://drive.google.com/open?id=136vsGWfIwfd0IrAdfphIU6lkMmme4-Pj) - mapping from HEAD UUID to repository name.

## Tools

* [pga](pga) - explore the dataset, or download its contents easily.
* [pga-create](pga-create) - reproduce PGA dataset generation.
* [borges-indexer](borges-indexer) - exports a CSV file with metadata from repositories fetched with Borges.
* [pga2uast](pga2uast) - extracts [Babelfish UASTs](https://docs.sourced.tech/babelfish/uast/uast-specification-v2) from the HEADs of siva files.
* [list_heads](list-pga-heads) - lists files in each HEAD contained in siva.

## Listing and downloading

@@ -35,6 +35,13 @@ For each HEAD inside each siva file, the tool writes the list of file paths. If
each HEAD is a text file and each file path is on a new line. If the format is "parquet",
the table schema is two-column ("HEAD name", "file path").

## Results

These are the results of running the tool on PGA'19:

- [configs.tar.xz](https://drive.google.com/open?id=1_cij4BMrPiKVBVdZyUzg1iOhB3pL6EPR) - raw git config files for each siva.
- [heads.csv.xz](https://drive.google.com/open?id=136vsGWfIwfd0IrAdfphIU6lkMmme4-Pj) - mapping from HEAD UUID to repository name.

## License

[MIT](https://tldrlegal.com/license/mit-license).
[MIT](https://tldrlegal.com/license/mit-license).

0 comments on commit 1e7b13e

Please sign in to comment.
You can’t perform that action at this time.