Skip to content

Commit

Permalink
Add the PGA heads docs
Browse files Browse the repository at this point in the history
Signed-off-by: Vadim Markovtsev <vadim@sourced.tech>
  • Loading branch information
vmarkovtsev committed Sep 9, 2019
1 parent 77d1bb9 commit 1e7b13e
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 1 deletion.
6 changes: 6 additions & 0 deletions PublicGitArchive/README.md
Original file line number Original file line Diff line number Diff line change
Expand Up @@ -8,12 +8,18 @@ This dataset consists of two parts:
* [Siva](https://github.com/src-d/go-siva) files with Git repositories. * [Siva](https://github.com/src-d/go-siva) files with Git repositories.
* Index file in CSV format. * Index file in CSV format.


Besides, there is a number of auxiliary datasets:

* [configs.tar.xz](https://drive.google.com/open?id=1_cij4BMrPiKVBVdZyUzg1iOhB3pL6EPR) - raw git config files for each siva.
* [heads.csv.xz](https://drive.google.com/open?id=136vsGWfIwfd0IrAdfphIU6lkMmme4-Pj) - mapping from HEAD UUID to repository name.

## Tools ## Tools


* [pga](pga) - explore the dataset, or download its contents easily. * [pga](pga) - explore the dataset, or download its contents easily.
* [pga-create](pga-create) - reproduce PGA dataset generation. * [pga-create](pga-create) - reproduce PGA dataset generation.
* [borges-indexer](borges-indexer) - exports a CSV file with metadata from repositories fetched with Borges. * [borges-indexer](borges-indexer) - exports a CSV file with metadata from repositories fetched with Borges.
* [pga2uast](pga2uast) - extracts [Babelfish UASTs](https://docs.sourced.tech/babelfish/uast/uast-specification-v2) from the HEADs of siva files. * [pga2uast](pga2uast) - extracts [Babelfish UASTs](https://docs.sourced.tech/babelfish/uast/uast-specification-v2) from the HEADs of siva files.
* [list_heads](list-pga-heads) - lists files in each HEAD contained in siva.


## Listing and downloading ## Listing and downloading


Expand Down
9 changes: 8 additions & 1 deletion PublicGitArchive/list-pga-heads/README.md
Original file line number Original file line Diff line number Diff line change
Expand Up @@ -35,6 +35,13 @@ For each HEAD inside each siva file, the tool writes the list of file paths. If
each HEAD is a text file and each file path is on a new line. If the format is "parquet", each HEAD is a text file and each file path is on a new line. If the format is "parquet",
the table schema is two-column ("HEAD name", "file path"). the table schema is two-column ("HEAD name", "file path").


## Results

These are the results of running the tool on PGA'19:

- [configs.tar.xz](https://drive.google.com/open?id=1_cij4BMrPiKVBVdZyUzg1iOhB3pL6EPR) - raw git config files for each siva.
- [heads.csv.xz](https://drive.google.com/open?id=136vsGWfIwfd0IrAdfphIU6lkMmme4-Pj) - mapping from HEAD UUID to repository name.

## License ## License


[MIT](https://tldrlegal.com/license/mit-license). [MIT](https://tldrlegal.com/license/mit-license).

0 comments on commit 1e7b13e

Please sign in to comment.