Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FileMap options to put_filemap() Fix function. #266

Merged
merged 6 commits into from
Nov 18, 2022
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ nothing()

##### `put_filemap`

Defines an external map for lookup from a file.
Defines an external map for lookup from a file. Multi-column mapss are supported.
blackwinter marked this conversation as resolved.
Show resolved Hide resolved

```perl
put_filemap("<sourceFile>", "<mapName>", sep_char: "\t")
Expand All @@ -190,6 +190,16 @@ The separator (`sep_char`) will vary depending on the source file, e.g.:
| CSV | `,` or `;` |
| TSV | `\t` |

Additional options:
blackwinter marked this conversation as resolved.
Show resolved Hide resolved

- `key_column` defines the column to be used for keys. Uses zero index. Default value: `0`.
- `value_column` defines the column to be used for values. Uses zero index. Default value: `1`.
- `expected_columns` sets number of expected columns; lines with different number of columns are ignored. Set to `-1` to disable the check and allow arbitrary number of columns. Default value: `2`.
- `allow_empty_values` sets whether to allow empty values in the filemap or to ignore these entries. Default value: `false`.
- `compression` sets the compression of the file.
- `decompress_concatenated` flags whether to use decompress concatenated file compression.
- `encoding` sets the encoding used to open the resource.
blackwinter marked this conversation as resolved.
Show resolved Hide resolved

##### `put_map`

Defines an internal map for lookup from key/value pairs.
Expand Down
3 changes: 2 additions & 1 deletion build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,14 @@ subprojects {
ext {
versions = [
'ace': '1.3.3',
'antlr': '3.2',
'equalsverifier': '3.8.2',
'jackson': '2.13.3',
'jetty': '9.4.14.v20181114',
'jquery': '3.3.1-1',
'junit_jupiter': '5.8.2',
'junit_platform': '1.4.2',
'metafacture': '5.4.0',
'metafacture': 'metafacture-core-5.4.1-rc1',
'mockito': '2.27.0',
'requirejs': '2.3.6',
'slf4j': '1.7.21',
Expand Down
4 changes: 4 additions & 0 deletions metafix-runner/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ dependencies {
implementation "org.metafacture:metafacture-json:${versions.metafacture}"
implementation "org.metafacture:metafacture-runner:${versions.metafacture}"
implementation "org.metafacture:metafacture-xml:${versions.metafacture}"

implementation('org.antlr:antlr-runtime') {
version { strictly versions.antlr }
}
}

application {
Expand Down
9 changes: 9 additions & 0 deletions metafix/src/main/java/org/metafacture/metafix/FixMethod.java
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,15 @@ public void apply(final Metafix metafix, final Record record, final List<String>
fileMap.setSeparator(options.getOrDefault(FILEMAP_SEPARATOR_OPTION, FILEMAP_DEFAULT_SEPARATOR));
fileMap.setFile(metafix.resolvePath(fileName));

withOption(options, "allow_empty_values", fileMap::setAllowEmptyValues, this::getBoolean);
withOption(options, "compression", fileMap::setCompression);
withOption(options, "decompress_concatenated", fileMap::setDecompressConcatenated, this::getBoolean);
withOption(options, "encoding", fileMap::setEncoding);
withOption(options, "expected_columns", fileMap::setExpectedColumns, this::getInteger);
withOption(options, "ignore_pattern", fileMap::setIgnorePattern);
withOption(options, "key_column", fileMap::setKeyColumn, this::getInteger);
withOption(options, "value_column", fileMap::setValueColumn, this::getInteger);

metafix.putMap(params.size() > 1 ? params.get(1) : fileName, fileMap);
}
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.function.BiFunction;
import java.util.function.Consumer;
import java.util.stream.Stream;

Expand All @@ -33,15 +34,23 @@ public interface FixFunction {
void apply(Metafix metafix, Record record, List<String> params, Map<String, String> options);

default void withOption(final Map<String, String> options, final String key, final Consumer<String> consumer) {
withOption(options, key, consumer, Map::get);
}

default <T> void withOption(final Map<String, String> options, final String key, final Consumer<T> consumer, final BiFunction<Map<String, String>, String, T> function) {
if (options.containsKey(key)) {
consumer.accept(options.get(key));
consumer.accept(function.apply(options, key));
}
}

default boolean getBoolean(final Map<String, String> options, final String key) {
return Boolean.parseBoolean(options.get(key));
}

default int getInteger(final Map<String, String> options, final String key) {
return Integer.parseInt(options.get(key));
}

default int getInteger(final List<String> params, final int index) {
return Integer.parseInt(params.get(index));
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"name" : "RVK (Regensburger Verbundklassifikation)",
"id" : "https://d-nb.info/gnd/4449787-8"
}
{
"name" : "ZDB-Systematik",
"id" : "http://bartoc.org/en/node/18915"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"name": "rvk"
}
{
"name": "zdbs"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
rvk RVK (Regensburger Verbundklassifikation) https://d-nb.info/gnd/4449787-8
udc UDC (Universal Decimal Classification) https://d-nb.info/gnd/4114037-0
zdbs ZDB-Systematik http://bartoc.org/en/node/18915
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
put_filemap("./mapfile.tsv", "idLookup", sep_char:"\t",key_column:"1",value_column:"2",expected_columns:"3")
put_filemap("./mapfile.tsv", "nameLookup", sep_char:"\t",expected_columns:"-1")

lookup("name", "nameLookup")
copy_field("name","id")
lookup("id", "idLookup")
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
FLUX_DIR + "input.json"
|open-file
|as-records
|decode-json
|fix(FLUX_DIR + "test.fix")
|encode-json(prettyPrinting="true")
|write(FLUX_DIR + "output-metafix.json")
;