# Plotting Newspaper Data
An example of how to use the eLuxemburgensia digital collection and plotting to visually display data.

This project uses Jupyter Notebooks to encapsulate all information regarding the project. The notebook requests the a date range from the user. It then uses those dates to select a list of newspapers published during that time period. The newspapers are then plotted showing their periodicity.

## Requirements
* Python 3.12
* [requests](https://pypi.org/project/requests/): HTTP library to run HTTP requests
* [pandas](https://pandas.pydata.org/): format the output into tabular layout
* [yarl](https://pypi.org/project/yarl/): format the output URL into a clickable URL link
* [untangle](https://pypi.org/project/untangle/): parse the MARC XML record

In [None]:
%pip install requests
%pip install pandas
%pip install yarl
%pip install untangle

In [None]:
from datetime import datetime
import requests
import pandas as pd
import untangle

In [None]:
# Request the start date from the user   
while (True):
    input_date = input("Enter the start date (dd/mm/yyyy):")
    try:
        start_date_value = datetime.strptime(input_date,'%d/%m/%Y')
        break
    except:
        print("Please enter a valid date in the format dd/mm/yyyy.")
        

In [None]:
# Request the end date from the user
while (True):
    input_date = input("Enter the end date (dd/mm/yyyy):")
    try:
        end_date_value = datetime.strptime(input_date,'%d/%m/%Y')
        break
    except:
        print("Please enter a valid date in the format dd/mm/yyyy.")

In [None]:
# get the BnL eluxembourgensia collection
elux_collection = requests.get("https://viewer.eluxemburgensia.lu/api/viewer2/cms/v2/digitalcollections")
elux_collection = elux_collection.json()

In [None]:
# select only those newspapers published between the start date and end date
print("Newspapers published between " + start_date_value.strftime('%d/%m/%Y') + " - " + end_date_value.strftime('%d/%m/%Y') + ":")

# to display all the rows in the table - otherwise, some rows are hidden
pd.set_option('display.max_rows', None)

filtered_newspapers = []
for newspaper in elux_collection["data"]:
    newspaper_dict = {}
    newspaper_start_date = newspaper["startdate"]
    try:
        newspaper_end_date = newspaper["enddate"]
    except:
        newspaper_end_date = "9999-12-31"
    if newspaper_start_date <= end_date_value.strftime("%Y-%m-%d") and newspaper_end_date >= start_date_value.strftime("%Y-%m-%d"):
        # Newspaper published between the selected dates so get the link to a-z
        az_link = newspaper["az_url"]
        
        # parse out the docid that starts with docid=alma and ends with the following '&'
        # find the starting point and add the 10 characters to skip the text "docid=alma"
        start_position = az_link.find("docid=alma") + 10
        
        # find the first ampersand after the starting position
        end_position = az_link.find("&",start_position)
        
        # build the corresponding doc_id
        doc_id = "oai:alma.352LUX_BIBNET_NETWORK:" + az_link[start_position:end_position]
        
        # build the url to get the marc data for the given newspaper
        marc_url = "https://oai.bibnet.lu/view/oai/352LUX_BIBNET_NETWORK/request?verb=GetRecord&metadataPrefix=marc21&identifier=" + doc_id
        marc_record_xml = requests.get(marc_url)
        
        # get the newspaper data record by parsing the XML and then navigating to the correct record level
        newspaper_data = untangle.parse(marc_record_xml.text).OAI_PMH.GetRecord.record.metadata.record

        # find the datafield with tag 310 = frequency of the newspaper
        for data_field in newspaper_data.datafield:
            if data_field['tag'] == "310":
                sub_field = data_field.subfield
                # if the sub_field is a list, then loop through list to find the entry with code = a
                # otherwise get the data directly from the subfield.
                if isinstance(sub_field, list):
                    for field in sub_field:
                        if field['code'] == "a":
                            frequency = field.cdata
                            break
                else:
                    if sub_field['code'] == "a":
                        frequency = data_field.subfield.cdata
                            
                    # add the newspaper to the dict with its frequency
                    newspaper_dict = {'Title': newspaper["title"],'Start Date': newspaper_start_date, 'End Date': newspaper_end_date, 'Frequency': frequency}
                    filtered_newspapers.append(newspaper_dict)

# temporary display to show results up until this point
df = pd.DataFrame(filtered_newspapers, columns=["Title", "Start Date", "End Date", "Frequency"])
dfStyler = df.style.set_properties(**{'text-align': 'left'})
dfStyler.set_table_styles([dict(selector='th', props=[('text-align', 'left')])])