contrib: dojson drop unknown fields #51

greut · 2015-11-25T08:56:08Z

In #50, a test fails because a subfield is not part of the spec.

the faulty record

<record>
    <datafield tag="852" ind1="8" ind2=" ">
        <subfield code="i">M 314</subfield>
        <subfield code="h">339</subfield>
        <subfield code="b">Library Reading Room</subfield>
        <subfield code="9">10</subfield>
    </datafield>
</record>

the test

expected = create_record(xml)
result = to_marc21.do(marc21.do(expected))
assert expected == result

The spec says nothing about a $9 subfield. But should it remove it nonetheless?

The text was updated successfully, but these errors were encountered:

jirikuncar · 2015-11-25T09:08:42Z

@greut you probably want "very liberal MARC" see #26

egabancho · 2015-12-08T12:34:23Z

@greut I don't think we should remove unknown fields, actually this means that the translation for your data model is incomplete, therefore your tests should fail.

I think the question here is whether or not the liberal MARC21 implementation should add to the output JSON all the unknown fields. IMHO it should.

Just a side note, AFAIK the subfield $9 is used as a custom subfield, like 9xx or 69x, so in theory the definition of this subfield should go in your DoJSON package. We have a couple of cases like this now on CDS, i.e. https://github.com/CERNDocumentServer/cds-dojson/blob/master/cds_dojson/marc21/fields/default/bd7xx.py#L85

greut · 2015-12-08T16:31:41Z

Thanks @egabancho, lemme invoke the MARC lord @blixhavn. The silent removal looks like a way to be in trouble.

tiborsimko · 2015-12-17T14:36:53Z

Similarly to @egabancho I think the test should fail in this case (and not drop the field silently), because the input record is not compatible with the desired schema.

There are basically two solutions: either (1) stick to standard MARC that does not seem to allow $9 in that field(?), or (2) amend wanted schema to allow the presence of $9, going towards as "liberal" MARC schema as the given library instance needs.

@aw-bib can advise about whether $9 has any specific convention meaning across multiple fields in the MARC standard?

greut · 2015-12-17T15:23:56Z

@tiborsimko how easy could be to amend/edit the schema for a particular purposes. E.g. the 9xx fields that are used internally by invenio.

If you provide a liberal MARC, people will use this one because loosing data is bit of a no-go.

tiborsimko · 2015-12-17T15:50:22Z

The best example for 9xx so far is @egabancho's customisations for CERN-specific 9xx fields. I fully agree with you that loosing data is not acceptable, hence we should return an error, not silently ignore those "extra" fields. The error will hopefully prompt the site admins to amend the local schema rules to fit local cataloguing practices... or else switch to a liberal schema iike #26 that is very permissive on the input side. (At the price of loosing some precise JSON-Schema-generated editing forms.) Many sites have been calling for more permissive schema, see long discussion in #23, so perhaps we should get to implementing it sooner rather than later?

* NEW Adds `ignore_missing` option to `Overdo.do` method to specify if method should raise when there is no matching rule for any key. (closes inveniosoftware#51) Signed-off-by: Jiri Kuncar <jiri.kuncar@cern.ch>

* NEW Adds `ignore_missing` option to `Overdo.do` method to specify if method should raise `MissingRule` exception when there is no matching rule for any key. (closes inveniosoftware#51) Signed-off-by: Jiri Kuncar <jiri.kuncar@cern.ch>

* NEW Adds new keyword argument `ignore_missing` to `Overdo.do` method to specify if method should raise `MissingRule` exception when there is no matching rule for any key. On can change the behavior of `do` command too by using `--strict` option that sets the `ignore_missing` argument to `False`. (closes inveniosoftware#51) Reviewed-by: Tibor Simko <tibor.simko@cern.ch> Signed-off-by: Jiri Kuncar <jiri.kuncar@cern.ch>

* NEW Adds new keyword argument `ignore_missing` to `Overdo.do` method to specify if method should raise `MissingRule` exception when there is no matching rule for a key. One can change the behavior of `do` command too by using `--strict` option that sets the `ignore_missing` argument to `False`. (closes inveniosoftware#51) Reviewed-by: Tibor Simko <tibor.simko@cern.ch> Signed-off-by: Jiri Kuncar <jiri.kuncar@cern.ch>

* NEW Adds new keyword argument `ignore_missing` to `Overdo.do` method to specify if method should raise `MissingRule` exception when there is no matching rule for a key. * NEW Adds new CLI option `--strict` to the `do` command that sets the `ignore_missing` argument to `False`. (closes inveniosoftware#51) Reviewed-by: Tibor Simko <tibor.simko@cern.ch> Signed-off-by: Jiri Kuncar <jiri.kuncar@cern.ch>

jirikuncar changed the title ~~dojson drop unknown fields~~ contrib: dojson drop unknown fields Nov 25, 2015

jirikuncar added this to the 0.5.0 milestone Nov 25, 2015

jirikuncar modified the milestones: someday, 0.5.0 Nov 25, 2015

jirikuncar assigned greut Nov 25, 2015

jirikuncar mentioned this issue Dec 17, 2015

global: addition of ignore missing option #58

Merged

tiborsimko closed this as completed in #58 Dec 18, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

contrib: dojson drop unknown fields #51

contrib: dojson drop unknown fields #51

greut commented Nov 25, 2015

jirikuncar commented Nov 25, 2015

egabancho commented Dec 8, 2015

greut commented Dec 8, 2015

tiborsimko commented Dec 17, 2015

greut commented Dec 17, 2015

tiborsimko commented Dec 17, 2015

contrib: dojson drop unknown fields #51

contrib: dojson drop unknown fields #51

Comments

greut commented Nov 25, 2015

the faulty record

the test

jirikuncar commented Nov 25, 2015

egabancho commented Dec 8, 2015

greut commented Dec 8, 2015

tiborsimko commented Dec 17, 2015

greut commented Dec 17, 2015

tiborsimko commented Dec 17, 2015