Skip to content

Commit

Permalink
Merge pull request #53 from ezpaarse-project/docs
Browse files Browse the repository at this point in the history
Docs
  • Loading branch information
felixleo22 committed Jun 20, 2024
2 parents 1feae71 + af50917 commit 9d3aca8
Show file tree
Hide file tree
Showing 99 changed files with 1,853 additions and 219 deletions.
62 changes: 56 additions & 6 deletions anonymizer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,70 @@

Anonymizes a list of fields.

**This middleware is activated by default.**

## Prerequisites

Your EC needs a print_identifier for enrichment.

**You must use anonymizer after filter, parser, deduplicator middleware.**

**It is recommended to use it after all middleware. Depending on its settings and if it is placed at the beginning, it may cancel some enrichment.**

## Headers

+ **Crypted-Fields** : name of the fields to be encrypted *(default: host,login)*
+ **Crypting-Algorithm** : Encryption algorithm *(default: sha1)*
+ **Crypting-Salt** : Encryption salt

## Configuration
## How to use

### ezPAARSE admin interface

You can add anonymizer by default to all your enrichments, To do this, go to the middleware section of administration.

![image](./docs/admin-interface.png)

### ezPAARSE process interface

+ ezPAARSE-Middlewares : **anonymizer**
You can use anonymizer for an enrichment process. You just add the middleware.

### Example :
![image](./docs/process-interface.png)

### ezp

You can use anonymizer for an enrichment process with [ezp](https://github.com/ezpaarse-project/node-ezpaarse) like this:

```bash
curl -v -X POST http://localhost:59599
-H "ezPAARSE-Middlewares: anonymizer"
-F "files[]=@access.log"
# enrich with one file
ezp process <path of your file> \
--host <host of your ezPAARSE instance> \
--settings <settings-id> \
--header "ezPAARSE-Middlewares: anonymizer" \
--header "Crypted-Fields: login, user" \
--header "Crypted-Salt: <some salt>" \
--out ./result.csv

# enrich with multiples files
ezp bulk <path of your directory> \
--host <host of your ezPAARSE instance> \
--settings <settings-id> \
--header "ezPAARSE-Middlewares: anonymizer" \
--header "Crypted-Fields: login, user" \
--header "Crypted-Salt: <some salt>"

```

### curl

You can use anonymizer for an enrichment process with curl like this:

```bash
curl -X POST -v http://localhost:59599 \
-H "ezPAARSE-Middlewares: anonymizer" \
-H "Crypted-Fields: login, user" \
-H "Crypted-Salt: <some salt>" \
-H "Log-Format-Ezproxy: <line format>" \
-F "file=@<log file path>"

```
Binary file added anonymizer/docs/admin-interface.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added anonymizer/docs/process-interface.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
58 changes: 51 additions & 7 deletions bot-ua-detector/README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,63 @@
# bot-ua-detector

Mark ECs as robots if their user-agent string match a regex in the COUNTER robot list
Mark ECs as robots if their user-agent string match a regex in the COUNTER [robot list](https://raw.githubusercontent.com/atmire/COUNTER-Robots/master/generated/COUNTER_Robots_list.txt).

## Enriched fields

| Name | Type | Description |
| --- | --- | --- |
| robot | boolean | Is robot or not. |

## Prerequisites

**You must use bot-ua-detector after filter, parser, deduplicator middleware.**

## Headers

+ **robot-refresh-timeout** : Robot refresh time *(default: 5000ms)*

## Configuration
## How to use

### ezPAARSE admin interface

You can add or remove bot-ua-detector by default to all your enrichments. To do this, go to the middleware section of administration.

![image](./docs/admin-interface.png)

### ezPAARSE process interface

You can use bot-ua-detector for an enrichment process.

+ ezPAARSE-Middlewares : **bot-ua-detector**
![image](./docs/process-interface.png)

### Example :
### ezp

You can use bot-ua-detector for an enrichment process with [ezp](https://github.com/ezpaarse-project/node-ezpaarse) like this:

```bash
# enrich with one file
ezp process <path of your file> \
--host <host of your ezPAARSE instance> \
--settings <settings-id> \
--header "ezPAARSE-Middlewares: bot-ua-detector" \
--out ./result.csv

# enrich with multiples files
ezp bulk <path of your directory> \
--host <host of your ezPAARSE instance> \
--settings <settings-id> \
--header "ezPAARSE-Middlewares: bot-ua-detector"

```

### curl

You can use bot-ua-detector for an enrichment process with curl like this:

```bash
curl -v -X POST http://localhost:59599
-H "ezPAARSE-Middlewares: bot-ua-detector"
-F "files[]=@access.log"
curl -X POST -v http://localhost:59599 \
-H "ezPAARSE-Middlewares: bot-ua-detector" \
-H "Log-Format-Ezproxy: <line format>" \
-F "file=@<log file path>"

```
Binary file added bot-ua-detector/docs/admin-interface.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added bot-ua-detector/docs/process-interface.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
78 changes: 72 additions & 6 deletions crossref/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,34 @@
# crossref

Enriches consultation events with [crossref](http://search.crossref.org/) data from their [API](http://search.crossref.org/help/api)
Fetches [crossref](http://search.crossref.org/) data from their [API](http://search.crossref.org/help/api).

**This middleware is activated by default.**

## Enriched fields

| Name | Type | Description |
| --- | --- | --- |
| publication_title | String | Name of publication. |
| title | String | Title of publication. |
| type | String | type of document (journal-article, book-chapter, conference-paper, dissertation, report, dataset etc.) |
| rtype | String | Variation of type |
| publication_date | String | Date of resource. |
| publisher_name | String | Name of publisher. |
| print_identifier | Number | ISBN or ISSN. |
| online_identifier | Number | EISBN or EISSN. |
| subject | String | subject, thematic of publication |
| doi | String | DOI of publication. |
| license | String | Licence. |

## Prerequisites

Your EC needs a DOI or alternative ID (any other identifier a publisher may have provided) for enrichment.

**You must use crossref after filter, parser, deduplicator middleware.**

## Recommendation

You can use ezunpaywall with crossreft by placing it in front. This will save you processing time.

## Headers

Expand All @@ -15,12 +43,50 @@ Enriches consultation events with [crossref](http://search.crossref.org/) data f
+ **crossref-on-fail** : Strategy to adopt if an enrichment reaches the maximum number of attempts. Can be either of ``abort``, ``ignore`` or ``retry``. Defaults to ``abort``.
+ **crossref-base-wait-time** : Time to wait before retrying after a query fails, in milliseconds. Defaults to ``1000``ms. This time ``doubles`` after each attempt.
+ **crossref-plus-api-token** : If you signed up for the ``Plus`` service, put your token in this header.
+ **crossref-user-agent** : Specify what to send in the `User-Agent` header when querying Crossref. Defaults to `ezPAARSE (https://ezpaarse.org; mailto:ezteam@couperin.org)`.
+ **crossref-user-agent** : Specify what to send in the `User-Agent` header when querying Crossref. Defaults to `ezPAARSE (https://readmetrics.org; mailto:ezteam@couperin.org)`.

## How to use

### ezPAARSE admin interface

You can add crossref by default to all your enrichments, To do this, go to the middleware section of administration.

![image](./docs/admin-interface.png)

### Example :
### ezPAARSE process interface

You can use crossref for an enrichment process. You just add the middleware.

![image](./docs/process-interface.png)

### ezp

You can use crossref for an enrichment process with [ezp](https://github.com/ezpaarse-project/node-ezpaarse) like this:

```bash
curl -v -X POST http://localhost:59599
-H "ezPAARSE-Middlewares: crossref"
-F "files[]=@access.log"
# enrich with one file
ezp process <path of your file> \
--host <host of your ezPAARSE instance> \
--settings <settings-id> \
--header "ezPAARSE-Middlewares: crossref" \
--out ./result.csv

# enrich with multiples files
ezp bulk <path of your directory> \
--host <host of your ezPAARSE instance> \
--settings <settings-id> \
--header "ezPAARSE-Middlewares: crossref"

```

### curl

You can use crossref for an enrichment process with curl like this:

```bash
curl -X POST -v http://localhost:59599 \
-H "ezPAARSE-Middlewares: crossref" \
-H "Log-Format-Ezproxy: <line format>" \
-F "file=@<log file path>"

```
Binary file added crossref/docs/admin-interface.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added crossref/docs/process-interface.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion crossref/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ module.exports = function () {
let userAgent = req.header('crossref-user-agent');

if (!userAgent) {
userAgent = 'ezPAARSE (https://ezpaarse.org; mailto:ezteam@couperin.org)';
userAgent = 'ezPAARSE (https://readmetrics.org; mailto:ezteam@couperin.org)';
}

const queryHeaders = {
Expand Down
99 changes: 84 additions & 15 deletions cut/README.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,100 @@
# cut

Separates any unique field into two or more distinct fields, based on a given separator or regular expression
Separates any unique field into two or more distinct fields, based on a given separator or regular expression.

**This middleware is activated by default.**
No config is set by default.

## Enriched fields

| Name | Type | Description |
| --- | --- | --- |
| destinationFields | String | custom fields |

## Prerequisites

Your EC needs sourceField that exist.

**You must use cut after filter, parser, deduplicator middleware.**

## Headers

+ **extract** : This header takes 3 parameters which are ``sourceField``, ``expression`` and ``destinationFields``, e.g: ``sourceField=>expression=>destinationFields``

### Examples
## How to use

+ Use with regex :
### ezPAARSE admin interface

> In this example we want to retrieve separately the last name and the first name of a user so the login is lastName.firstName.
```
You can add or remove cut by default to all your enrichments. To do this, go to the middleware section of administration.

![image](./docs/admin-interface.png)

### ezPAARSE process interface

You can use cut for an enrichment process.

![image](./docs/process-interface.png)

### ezp

You can use cut for an enrichment process with [ezp](https://github.com/ezpaarse-project/node-ezpaarse) like this:

```bash

# Use with split function

# enrich with one file
ezp process <path of your file> \
--host <host of your ezPAARSE instance> \
--settings <settings-id> \
--header "ezPAARSE-Middlewares: cut" \
--header "extract: email => split(@) => identifiant,domainName" \
--out ./result.csv

# enrich with multiples files
ezp bulk <path of your directory> \
--host <host of your ezPAARSE instance> \
--settings <settings-id> \
--header "ezPAARSE-Middlewares: cut" \
--header "extract: email => split(@) => identifiant,domainName"

# Use with regex

# enrich with one file
ezp process <path of your file> \
--host <host of your ezPAARSE instance> \
--settings <settings-id> \
--header "ezPAARSE-Middlewares: cut" \
--header "extract: login => /^([a-z]+)\.([a-z]+)$/ => lastName,firstName" \
--out ./result.csv

# enrich with multiples files
ezp bulk <path of your directory> \
--host <host of your ezPAARSE instance> \
--settings <settings-id> \
--header "ezPAARSE-Middlewares: cut" \
--header "extract: login => /^([a-z]+)\.([a-z]+)$/ => lastName,firstName"

curl -v -X POST http://localhost:59599
-H "ezPAARSE-Middlewares: cut"
-H "extract: login => /^([a-z]+)\.([a-z]+)$/ => lastName,firstName" \
-F "files[]=@access.log"
```

+ Use with split function :
### curl

> In this example we want to retrieve different username and domain name compared to an email address
You can use cut for an enrichment process with curl like this:

```bash
curl -v -X POST http://localhost:59599
-H "ezPAARSE-Middlewares: cut"

# Use with split function
curl -X POST -v http://localhost:59599 \
-H "ezPAARSE-Middlewares: cut" \
-H "extract: email => split(@) => identifiant,domainName" \
-F "files[]=@access.log"
```
-H "Log-Format-Ezproxy: <line format>" \
-F "file=@<log file path>"

# Use with regex
curl -X POST -v http://localhost:59599 \
-H "ezPAARSE-Middlewares: cut" \
-H "extract: login => /^([a-z]+)\.([a-z]+)$/ => lastName,firstName" \
-H "Log-Format-Ezproxy: <line format>" \
-F "file=@<log file path>"

```
Binary file added cut/docs/admin-interface.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added cut/docs/process-interface.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 4 additions & 1 deletion cut/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,10 @@ module.exports = function () {

if (!header) { return (ec, next) => next(); }

const params = /^(.+?)=>(.+?)=>(.+?)$/.exec(header); // field=>regex=>a,b,c host: 127.0.0.1/16=>([0-9]+)\/([0-9]+)=>host,aze host: 127.0.0.1/16=>split(\/)=>host,aze
// field=>regex=>a,b,c
// host: 127.0.0.1/16=>([0-9]+)\/([0-9]+)=>host,aze
// host: 127.0.0.1/16=>split(\/)=>host,aze
const params = /^(.+?)=>(.+?)=>(.+?)$/.exec(header);

if (!params) {
const err = new Error('Invalid extract expression');
Expand Down
Loading

0 comments on commit 9d3aca8

Please sign in to comment.