## Assignment: MARC21 to Dublin Core conversion for OAI
#### Use BeautifulSoup: https://www.crummy.com/software/BeautifulSoup/bs4/doc/

__[The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)](https://www.openarchives.org/pmh/)__ or OAI for short:

> is a low-barrier mechanism for repository interoperability. Data Providers are repositories that expose structured metadata via OAI-PMH. Service Providers then make OAI-PMH service requests to harvest that metadata. OAI-PMH is a set of six verbs or services that are invoked within HTTP.

At Anet, for instance, we provide full OAI access to our complete database of books. Like so:

(MARC21)

```
https://anet.be/oai/catgeneric/server.phtml?verb=GetRecord&metadataPrefix=marc21&identifier=c:lvd:123456 
```


(__[MODS](https://en.wikipedia.org/wiki/Metadata_Object_Description_Schema)__)

```
https://anet.be/oai/catgeneric/server.phtml?verb=GetRecord&metadataPrefix=mods&identifier=c:lvd:123456
```

In these examples, the trailing `c:lvd:` number is a unique Library Object Identifier (LOI) used by our LMS __[Brocade](https://en.wikipedia.org/wiki/Brocade_Library_Services)__. You can substitute it for any LOI you find in our __[OPAC](https://anet.uantwerpen.be/desktop/uantwerpen/opacuantwerpen/E)__.

Typically, libraries will use the OAI protocol to import/export metadata in different formats. So when setting up an OAI server, one of the main tasks is coding software that converts data from one standard to another. Libraries management systems, for instance, need such conversions both to be able to feed an OAI server from their own database respository, or, vice versa, to harvest data from external repositories and convert it to the standard(s) they use.

According to the standards specifications, all implementations of OAI-PMH must support representing metadata in Dublin Core, so your assignment will be to write a metadata converter that is able to harvest MARC21 metadata (XML) and convert that to Dublin Core (XML). It should be a Python command line application that asks for a LOI number (e.g. `c:lvd:123456`), uses OAI to harvest the MARC21 metadata and then writes the Dublin Core conversion to a file (e.g. `123456.xml`).

Make a python application that will ask for one of these LOI (clvd) numbers, goes to the OAI server to harvest the metadata in XML and then you (yourself) translate this MARC21 metadata standard to Dublin Core. We can limit to "unqualified" table and we can skip the Leader-field

### Tips

- You can use the Library of Congress __[MARC to Dublin Core Crosswalk](https://www.loc.gov/marc/marc2dc.html)__. You may limit yourself to the fields mentioned in the "unqualified" table and skip the "Leader" field. You will find the meaning of the various codes (`a`, `c`, etcetera) in the MARC specification, but you can limit yourself to code `a`, unless the crosswalk explicitly mentions other codes (e.g. `260` = `Publisher`).
- The Python `lxml` library is well-suited to both parse (MARC21) and generate (Dublin Core) XML.
- If you don't already, you will need to know about XML namespaces. This __[tutorial from w3schools](https://www.w3schools.com/xml/xml_namespaces.asp)__ and the info from the __[lxml module](https://lxml.de/tutorial.html#namespaces)__ are good starting points.

In [None]:
def marc_query(loi):
    marcurl = marclink + loi
    with urllib.request.urlopen(url) as query:
        return query.read()

In [53]:
from lxml import etree
from urllib import request
from bs4 import BeautifulSoup

#a url to read from
loi = "123456"
marclink = "https://anet.be/oai/catgeneric/server.phtml?verb=GetRecord&metadataPrefix=marc21&identifier=c:lvd:"+str(search)
oai_namespace = "http://www.openarchives.org/OAI/2.0/"

#open a link to the url
with request.urlopen(link) as w:
    #read the page
    marc_xml = w.read()
    
#print the decoded page
print(marc_xml.decode('UTF-8'))  # remove hash to turn into code
# decodes it so that the lines are respected (and everything is not glued together)

<?xml version="1.0" encoding="UTF-8" ?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/
         http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
 <responseDate>2020-12-14T20:32:51.424055Z</responseDate>
<request verb="GetRecord" metadataPrefix="marc21" identifier="c:lvd:123456">https://anet.be/oai/catgeneric/server.phtml</request>
<GetRecord><record><header><identifier>c:lvd:123456</identifier><datestamp>2020-10-13T15:53:34.999998Z</datestamp></header><metadata><record xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"><leader>00709nam a2200000 c 4500</leader><controlfield tag="001">c:lvd:123456</controlfield><controlfield tag="003">BE-AnANE</controlfield><controlfield tag="005">20201013165335.1</controlfield><controlfield tag="008">861226s1932####xx                u du

In [63]:
import lxml.etree
marctree = lxml.etree.fromstring(marc_xml)
print(marctree)
print(dir(marctree))

<Element {http://www.openarchives.org/OAI/2.0/}OAI-PMH at 0x7fdd12f27a00>
['__bool__', '__class__', '__contains__', '__copy__', '__deepcopy__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', '_init', 'addnext', 'addprevious', 'append', 'attrib', 'base', 'clear', 'cssselect', 'extend', 'find', 'findall', 'findtext', 'get', 'getchildren', 'getiterator', 'getnext', 'getparent', 'getprevious', 'getroottree', 'index', 'insert', 'items', 'iter', 'iterancestors', 'iterchildren', 'iterdescendants', 'iterfind', 'itersiblings', 'itertext', 'keys', 'makeelement', 'nsmap', 'prefix', 'remove', 'replace', 'set', 'sourceline', 'tag', 'tail', 'text', 'values', 'xpath']


In [66]:
for element in marctree.iter("datafield"):
    print(element)

In [67]:
for element in marctree.iter("subfield"):
    print(element)

In [69]:
for element in marctree.iter("subfield"):
    # XML attributes are dicts
    for attribute_name, attribute_value in element.items():
        print(attribute_name, ":", attribute_value)
        # get text with .text() method of element
        print(element.text)

In [2]:
## check github prof prof.deneire's code.

Points from this assignment: 
1. Refactoring code is important to keep things manageable.
2. This is a real world example. OAI-PMH is a standard for exchanging metadata. 
3. 