Skip to content

Commit

Permalink
updates notes
Browse files Browse the repository at this point in the history
  • Loading branch information
miku committed Apr 25, 2024
1 parent 5bd283e commit 4598bd6
Show file tree
Hide file tree
Showing 2 changed files with 52,951 additions and 0 deletions.
14 changes: 14 additions & 0 deletions contrib/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1011,3 +1011,17 @@ $ grep -f <(cat sites.tsv | awk -F / '{print $3}' | grep -v ^$ | sort | uniq -d)
| 778 | ca | True | opus | 1 |
| 779 | kh | True | ojs | 1 |

## Crossref link scouting

Crossref contains about 195M links (beside the URL).

```
$ zstdcat -T0 begin-2022-01-01-date-2024-04-01.ndj.zst | \
jq -rc .link[]?.URL? > links.txt
$ cat links.txt | LC_ALL=C grep '.*/index.php/[^/]*/article' | \
grep -o '.*/index.php/[^/]*' | \
awk '{print $0"/oai"}' | \
LC_ALL=C sort -S30% -u > crossref-possibly-oai-2024-04-25.txt
```

Loading

0 comments on commit 4598bd6

Please sign in to comment.