Skip to content

Commit

Permalink
update README
Browse files Browse the repository at this point in the history
  • Loading branch information
miku committed Jul 21, 2016
1 parent 71817eb commit 823d948
Showing 1 changed file with 12 additions and 4 deletions.
16 changes: 12 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,20 @@ $ METHA_DIR=/tmp/harvest metha-sync -dir http://export.arxiv.org/oai2
/tmp/harvest/I29haV9kYyNodHRwOi8vZXhwb3J0LmFyeGl2Lm9yZy9vYWky
```

Harvesting can be CTRL-C'd any time. The data is harvested up to the last full
day, so there is a small latency. The HTTP client is resilient.
To show the harvesting directory, you can use the `-dir` flag:

```
$ metha-sync -dir http://export.arxiv.org/oai2
/home/miku/.metha/I29haV9kYyNodHRwOi8vZXhwb3J0LmFyeGl2Lm9yZy9vYWky
```

Harvesting can be interrupted any time. The data is currently harvested up to
the last full day, so there is a small latency.

Example: If the current date would be *Thu Apr 21 14:28:10 CEST 2016*, the harvester
would request all data since the repositories earliest date and *2016-04-20 23:59:59*.

You can stream records to stdout, too.
The HTTP client is resilient. You can stream records to stdout:

```sh
$ metha-cat http://export.arxiv.org/oai2
Expand All @@ -37,7 +44,8 @@ $ metha-cat -from 2016-01-01 http://export.arxiv.org/oai2

This will only stream records with a datestamp equal or after 2016-01-01.

To just stream all data really fast, use find and zcat on the harvesting dir.
To just stream all data really fast, use `find` and `zcat` over the harvesting
directory.

```sh
$ find $(metha-sync -dir http://export.arxiv.org/oai2) -name "*gz" | xargs unpigz -c
Expand Down

0 comments on commit 823d948

Please sign in to comment.