Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 5 additions & 6 deletions 01_Getting_Metafacture.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,16 @@ It was initially developed by DNB starting in 2011 and is maintained since 2019
Metafacture can be used as a stand-alone application or as a Java library in other applications.
The name Metafacture is a portmanteau of the words metadata and manufacture.

In this tutorial we are going to teach how to use Metafacture to peform simple and advanced data processing tasks.
In this tutorial we are going to teach how to use Metafacture to perform simple and advanced data processing tasks.

At the beginning we will use the web application [Metafacture Playground](https://metafacture.org/playground/). So no
installation is needed. The Playground is a web interface that helps you getting started.
It is useful to test, share and export metafacture workflows.
It is useful to test, share and export Metafacture workflows.

Starting with [Chapter 6](https://github.com/metafacture/metafacture-tutorial/blob/main/06_MetafactureCLI.md)
we can switch from using Playground to running Metafacture on our own Hardware.
But the examples are still provided in the playground.
Starting with [Chapter 6](./06_MetafactureCLI.md) we can switch from using Playground to running Metafacture on our own hardware.
But the examples are still provided in the Playground.

To run Metafacture on your local maschine you need you need a Linux/Unix Bash Shell (part of every Linux, MacOS and Windows >=10) with Metafacture Core installed. In this course we are not teaching you how to use the command line. For that see:
To run Metafacture on your local machine you need a Linux/Unix Bash Shell (part of every Linux, MacOS and Windows >=10) with Metafacture Core installed. In this course we are not teaching you how to use the command line. For that see: [Chapter 6](./06_MetafactureCLI.md)


**Next lesson**: [02 Introduction into Metafacture Flux](./02_Introduction_into_Metafacture-Flux.md)
77 changes: 39 additions & 38 deletions 02_Introduction_into_Metafacture-Flux.md

Large diffs are not rendered by default.

71 changes: 38 additions & 33 deletions 03_Introduction_into_Metafacture-Fix.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
# Lesson 3: Introduction into Metafacture Fix

In the last session we learned about Flux-Moduls.
Flux-Moduls can do a lot of things. They configure the the "high-level" transformation pipeline.
In the last session we learned about Flux moduls.
Flux moduls can do a lot of things. They configure the "high-level" transformation pipeline.

But the main transformation of incoming data at record-, elemenet- and value-level is usually done by the transformation moduls: `fix` or `morph` as one step in the pipeline.
But the main transformation of incoming data at record, elemenet and value level is usually done by the transformation moduls [Fix](https://metafacture.github.io/metafacture-documentation/docs/flux/flux-commands.html#fix) or [Morph](https://metafacture.github.io/metafacture-documentation/docs/flux/flux-commands.html#morph) as one step in the pipeline.

What do we mean when we talk about transformation, e.g.:
By transformation we mean things like:

* Manipulating element-names and element-values
* Manipulating element names and element values
* Change hierachies and structures of records
* Lookup values in concordance list.
* Lookup values in concordance list

But not changing serialization that is part of encoding and decoding.

In this tutorial we focus on Fix. If you want to learn about Morph have a look at https://slides.lobid.org/metafacture-2020/#/
In this tutorial we focus on [Fix](https://metafacture.github.io/metafacture-documentation/docs/flux/flux-commands.html#fix). If you want to learn about Morph have a look [at this presentation](https://slides.lobid.org/metafacture-2020/#/) and the [great documentation by Swiss Bib](https://sschuepbach.github.io/metamorph-hacks/).


## Metafacture Fix and Fix Functions
Expand All @@ -22,7 +22,7 @@ So let's dive into Metafacture Fix and get back to the [Playground](https://meta

Clear it if needed and paste the following Flux in the Flux-File area.

```
```default
"https://openlibrary.org/books/OL2838758M.json"
| open-http
| as-lines
Expand All @@ -45,14 +45,14 @@ The `fix` module in Metafacture is used to manipulate the input data filtering f
HINT: As long as you embedd the fix functions in the Flux Workflow, you have to use double quotes to fence the fix functions,
and single quotes in the fix functions. As we did here: `fix ("retain('title')")`

Now let us additionally keep the info that is given in the element `"publish_date"` and in the subfield `"key"` as well as the subfield `"key"` in `'type'` by adding `'publish_date', 'type.key'` to `retain`:
Now let us additionally keep the info that is given in the element `"publish_date"` and the subfield `"key"` in `'type'` by adding `'publish_date', 'type.key'` to `retain`:

```
```default
"https://openlibrary.org/books/OL2838758M.json"
| open-http
| as-lines
| decode-json
| fix ("retain('title', 'publish_date', 'type.key')")
| fix ("retain('title', 'publish_date', 'notes.value', 'type.key')")
| encode-yaml
| print
;
Expand All @@ -64,15 +64,15 @@ You should now see something like this:
---
title: "Ordinary vices"
publish_date: "1984"
type:
key: "/type/edition"
notes:
value: "Bibliography: p. 251-260.\nIncludes index."

```

When manipulating data you often need to create many fixes to process a data file in the format and structure you need. With a text editor you can write all fix functions in a singe separate fix-file.
When manipulating data you often need to create many fixes to process a data file in the format and structure you need. With a text editor you can write all fix functions in a singe separate Fix file.

The playground has an transformationFile-content area that can be used as if the fix is in a separate file.
In the playground we use the variable `transformationFile` to adress the fix file in the playground.
The playground has an transformationFile-content area that can be used as if the Fix is in a separate file.
In the playground we use the variable `transformationFile` to adress the Fix file in the playground.

Like this.

Expand All @@ -81,23 +81,23 @@ Like this.
Fix:

```PERL
retain("title", "publish_date", "type.key")
retain("title", "publish_date", "notes.value", "type.key")
```

With this separate fix-file it will be a bit easier to write many fix-functions and it does not overcrowd the flux-workflow.
Using a separate Fix file is recommended if you need to write many Fix functions. It will keep the Flux workflow clear and legible.

To add more fixes we can again edit the fix file.
To add more fixes we can again edit the Fix file.
Lets add these lines in front of the retain function:

```
move_field("type.key", "pub_type")
```

Also change the `retain` function, so that you keep the new element `"pub_type"` instead of the not existing nested `"key"` element.
Also change the `retain` function so that you keep the new element `"pub_type"` instead of the not existing nested `"key"` element.

```
move_field("type.key","pub_type")
retain("title", "publish_date", "pub_type")
retain("title", "publish_date", "notes.value", "pub_type")
```

The output should be something like this:
Expand All @@ -107,40 +107,45 @@ The output should be something like this:
title: "Ordinary vices"
publish_date: "1984"
pub_type: "/type/edition"
notes:
value: "Bibliography: p. 251-260.\nIncludes index."
```

So with `move_field` we moved and renamed an existing element.
With `move_field` we moved and renamed an existing element.
As next step add the following function before the `retain` function.

```
replace_all("pub_type","/type/","")
```

If you execute your last workflow with the Process-Button again, you should now see as ouput:
If you execute your last workflow with the "Process" button again, you should now see as ouput:

```YAML
---
title: "Ordinary vices"
publish_date: "1984"
pub_type: "edition"
notes:
value: "Bibliography: p. 251-260.\nIncludes index."
```

We cleaned up the `"pub_type"` element, so that we can better read it.
We cleaned up the value of `"pub_type"` element for better readability.

[See the example in the playground.](https://metafacture.org/playground/?flux=%22https%3A//openlibrary.org/books/OL2838758M.json%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-json%0A%7C+fix+%28transformationFile%29%0A%7C+encode-yaml%0A%7C+print%0A%3B&transformation=move_field%28%22type.key%22%2C%22pub_type%22%29%0Areplace_all%28%22pub_type%22%2C%22/type/%22%2C%22%22%29%0Aretain%28%22title%22%2C+%22publish_date%22%2C+%22pub_type%22%29)

Metafacture contains many fix function to manipulate data. Also there are many flux commands/modules that can be used.
Metafacture contains many Fix functions to manipulate data. Also there are many Flux commands/modules that can be used.

Check the documentation to get a complete list of [flux command](https://github.com/metafacture/metafacture-documentation/blob/master/flux-commands.md) and [fix functions](https://github.com/metafacture/metafacture-documentation/blob/master/Fix-function-and-Cookbook.md#functions). This post only presented a short introduction into Metafacture. In the next posts we will go deeper into its capabilities.
Check the documentation to get a complete list of [Flux commands](https://metafacture.github.io/metafacture-documentation/docs/flux/flux-commands.html) and [Fix functions](https://metafacture.github.io/metafacture-documentation/docs/fix/Fix-functions.html). This post only presented a short introduction into Metafacture. In the next posts we will go deeper into its capabilities.

Besides fix functions you can also add as many comments and linebreaks as you want to a fix.
Besides Fix functions you can also add as many comments and linebreaks as you want to a Fix.

Comments are good if you want to add descriptions to you transformation. Like the following.
Comments in Fix start with a hashtag `#`, while in Flux they start with `//`
Adding comments will save you a lot of time and effort when you look at your code in the future.

e.g.:
Comments in Fix start with a hash mark `#`, while in Flux they start with `//`.

```
Example:

```PERL
# Make type.key a top level element.
move_field("type.key","pub_type")

Expand All @@ -162,9 +167,9 @@ Have a look at the fix functions: https://metafacture.org/metafacture-documentat

<details>
<summary>Answer</summary>
[See here](https://metafacture.org/playground/?flux=%22https%3A//openlibrary.org/books/OL2838758M.json%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-json%0A%7C+fix+%28transformationFile%29%0A%7C+encode-yaml%0A%7C+print%0A%3B&transformation=move_field%28%22type.key%22%2C%22pub_type%22%29%0Areplace_all%28%22pub_type%22%2C%22/type/%22%2C%22%22%29%0Aadd_field%28%22mape_date%22%2C%222025-11-11%22%29%0Aretain%28%22title%22%2C+%22publish_date%22%2C+%22by_statement%22%2C+%22pub_type%22%29)
[See here](https://metafacture.org/playground/?flux=%22https%3A//openlibrary.org/books/OL2838758M.json%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-json%0A%7C+fix+%28transformationFile%29%0A%7C+encode-yaml%0A%7C+print%0A%3B&transformation=move_field%28%22type.key%22%2C%22pub_type%22%29%0Areplace_all%28%22pub_type%22%2C%22/type/%22%2C%22%22%29%0Aadd_field%28%22map_date%22%2C%222025-11-11%22%29%0Aretain%28%22title%22%2C+%22publish_date%22%2C+%22by_statement%22%2C+%22pub_type%22%2C+%22map_date%22%29)

or [use timestamp](https://metafacture.org/playground/?flux=%22https%3A//openlibrary.org/books/OL2838758M.json%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-json%0A%7C+fix+%28transformationFile%29%0A%7C+encode-yaml%0A%7C+print%0A%3B&transformation=move_field%28%22type.key%22%2C%22pub_type%22%29%0Areplace_all%28%22pub_type%22%2C%22/type/%22%2C%22%22%29%0Atimestamp%28%22mape_date%22%2Cformat%3A%22yyyy-MM-dd%27T%27HH%3Amm%3Ass%22%2C+timezone%3A%22Europe/Berlin%22%29%0Aretain%28%22title%22%2C+%22publish_date%22%2C+%22by_statement%22%2C+%22pub_type%22%29)
or [use timestamp](https://metafacture.org/playground/?flux=%22https%3A//openlibrary.org/books/OL2838758M.json%22%0A%7C+open-http%0A%7C+as-lines%0A%7C+decode-json%0A%7C+fix+%28transformationFile%29%0A%7C+encode-yaml%0A%7C+print%0A%3B&transformation=move_field%28%22type.key%22%2C%22pub_type%22%29%0Areplace_all%28%22pub_type%22%2C%22/type/%22%2C%22%22%29%0Atimestamp%28%22map_date%22%2Cformat%3A%22yyyy-MM-dd%27T%27HH%3Amm%3Ass%22%2C+timezone%3A%22Europe/Berlin%22%29%0Aretain%28%22title%22%2C+%22publish_date%22%2C+%22by_statement%22%2C+%22pub_type%22%2C+%22map_date%22%29)
</details>

Next lesson: [04 Fix Path](./04_FIX-Path.md)
Loading