# XML Generator for CrossRef

This notebook generates the XML file for CrossRef using the output from the DL Batch Revise Export.

In [None]:
import pandas as pd
from lxml import etree
from datetime import datetime

# CONFIGURATION

These are the variables to set manually for each run.

In [None]:
DATA_FILE_NAME = "drs2018"


# FUNCTIONS TO RUN THINGS



In [None]:
# Create a function to check the state of XML at any point:

def view_xml_result():
    # Pretty-print the XML
    xml_string = etree.tostring(doi_batch, pretty_print=True, xml_declaration=True, encoding="utf-8").decode("utf-8")

    # Print the formatted XML
    print(xml_string)

## VARIABLES

These are the variables you have to modify before you run the notebook.

In [None]:
# VARIABLES AND PARAMETERS YOU HAVE TO SET
SUBMISSION_TIMESTAMP = "20241213194100000"


PUBLICATION_DAY = "12"
PUBLICATION_MONTH = "01"
PUBLICATION_YEAR = "2023"
CONFERENCE_DATE = f"{PUBLICATION_DAY}/{PUBLICATION_MONTH}/{PUBLICATION_YEAR}"

CONFERENCE_NAME = "LearnXDesign 2023"
PROCEEDINGS_TITLE = "LearnXDesign 2023"
CONFERENCE_ACRONYM = "LearnXDesign 2023"
ISBN = "9781912294619"

CONFERENCE_VOLUME_DOI = "10.21606/drslxd.2024.001"
CONFERENCE_VOLUME_URL = "https://dl.designresearchsociety.org/conference-volumes/61/"

CSV_FILE_NAME = "lxd2023.csv"
OUT_FILE_NAME = "241213-lxd2023"

MAX_AUTHORS = 20  # Used for creating authors

## Generate the Root Element

In [None]:
namespaces = {
    'xsi': 'http://www.w3.org/2001/XMLSchema-instance'
}

doi_batch = ET.Element(
    "doi_batch",
    version="4.8.0",
    xmlns="http://www.crossref.org/schema/4.8.0",
)

# Manually set the xsi:schemaLocation attribute
doi_batch.set("xmlns:xsi", "http://www.w3.org/2001/XMLSchema-instance")
doi_batch.set("xsi:schemaLocation", "http://www.crossref.org/schema/4.8.0 http://www.crossref.org/schema/deposit/crossref4.8.0.xsd")


head = ET.SubElement(doi_batch, "head")
body = ET.SubElement(doi_batch, "body")
conference = ET.SubElement(body, "conference")
print(ET.tostring(doi_batch, encoding='utf8').decode('utf8'))

<?xml version='1.0' encoding='utf8'?>
<doi_batch version="4.8.0" xmlns="http://www.crossref.org/schema/4.8.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.crossref.org/schema/4.8.0 http://www.crossref.org/schema/deposit/crossref4.8.0.xsd"><head /><body><conference /></body></doi_batch>


## Populate the `head`

In [None]:
# Set up the head of the XML
doi_batch_id = ET.SubElement(head, "doi_batch_id").text = CONFERENCE_NAME
timestamp = ET.SubElement(head, "timestamp").text = SUBMISSION_TIMESTAMP
depositor = ET.SubElement(head, "depositor")

# The following values are hard-coded
depositor_name = ET.SubElement(depositor, "depositor_name").text = "desres:desres"
email_address = ET.SubElement(depositor, "email_address").text = ("dl@designresearchsociety.org")
registrant = ET.SubElement(head, "registrant").text = "Digital Library"

# Populating the `body`

Body has a `conference` element inside and in there we have to have:

- `event_metadata`
- `proceedings_metadata`
- every `conference_paper`

In [None]:
# EVENT_METADATA
event_metadata = ET.SubElement(conference, "event_metadata")
conference_name = ET.SubElement(event_metadata, "conference_name").text = CONFERENCE_NAME
conference_acronym = ET.SubElement(event_metadata, "conference_acronym").text = CONFERENCE_ACRONYM
conference_date = ET.SubElement(event_metadata, "conference_date").text = CONFERENCE_DATE


# PROCEEDINGS_METADATA
proceedings_metadata = ET.SubElement(conference, "proceedings_metadata")
proceedings_title = ET.SubElement(proceedings_metadata, "proceedings_title").text = PROCEEDINGS_TITLE

publisher = ET.SubElement(proceedings_metadata, "publisher")
publisher_name = ET.SubElement(publisher, "publisher_name").text = "Design Research Society"

publication_date1 = ET.SubElement(proceedings_metadata, "publication_date", media_type="online")
month = ET.SubElement(publication_date1, "month").text = PUBLICATION_MONTH
day = ET.SubElement(publication_date1, "day").text = PUBLICATION_DAY
year = ET.SubElement(publication_date1, "year").text = PUBLICATION_YEAR

isbn = ET.SubElement(proceedings_metadata, "isbn").text = ISBN

doi_data1 = ET.SubElement(proceedings_metadata, "doi_data")
doi = ET.SubElement(doi_data1, "doi").text = CONFERENCE_VOLUME_DOI
resource = ET.SubElement(doi_data1, "resource").text = CONFERENCE_VOLUME_URL

## Creating Conference Papers

We have code for this to process the CSV File and create the data for each row. One row = 1 paper.

In [None]:
# Load the right CSV file
df = pd.read_csv(CSV_FILE_NAME, sep=",")
df.head()


Unnamed: 0,title,shortname,editor_names,orcid,document_type,abstract,keywords,conference_title,doi,doi_link,...,author16_mname,author16_lname,author16_suffix,author16_email,author16_institution,author16_is_corporate,calc_url,context_key,issue,ctmtime
0,Unfixing the studio,,"Derek Jones, Naz Borekci, Violeta Clemente, Ja...",,,,,The 7th International Conference for Design Ed...,10.21606/drslxd.2024.057,https://doi.org/10.21606/drslxd.2024.057,...,,,,,,,https://dl.designresearchsociety.org/learnxdes...,36756083,learnxdesign/learnxdesign2023/visualpapers,1710765013
1,The Work of Untutored Designers & the Future o...,,"Derek Jones, Naz Borekci, Violeta Clemente, Ja...",,,,,The 7th International Conference for Design Ed...,10.21606/drslxd.2024.086,https://doi.org/10.21606/drslxd.2024.086,...,,,,,,,https://dl.designresearchsociety.org/learnxdes...,36756084,learnxdesign/learnxdesign2023/visualpapers,1710765016
2,Inviting Curiosity: A Framework for Creating M...,,"Derek Jones, Naz Borekci, Violeta Clemente, Ja...",,,,,The 7th International Conference for Design Ed...,10.21606/drslxd.2024.105,https://doi.org/10.21606/drslxd.2024.105,...,,,,,,,https://dl.designresearchsociety.org/learnxdes...,36756085,learnxdesign/learnxdesign2023/visualpapers,1710765013
3,User Data: A North Star in Teaching Methods,,"Derek Jones, Naz Borekci, Violeta Clemente, Ja...",,,,,The 7th International Conference for Design Ed...,10.21606/drslxd.2024.010,https://doi.org/10.21606/drslxd.2024.010,...,,,,,,,https://dl.designresearchsociety.org/learnxdes...,36756142,learnxdesign/learnxdesign2023/casestudy,1710765015
4,Competencies and Skills for Designers of Infor...,,"Derek Jones, Naz Borekci, Violeta Clemente, Ja...",,,,,The 7th International Conference for Design Ed...,10.21606/drslxd.2024.013,https://doi.org/10.21606/drslxd.2024.013,...,,,,,,,https://dl.designresearchsociety.org/learnxdes...,36756143,learnxdesign/learnxdesign2023/casestudy,1710765014


### Drop columns

In [None]:

# Drop columns that are not needed at all
df_dropped_first = df.drop(
    columns=[
        "city",
        "shortname",
        "editor_names",
        "orcid",
        "document_type",
        "abstract",
        "keywords",
        "comments",
        "conference_title",
        "doi_link",
        "custom_citation",
        "distribution_license",
        "conference_track",
        "conference_dates",
        "start_date",
        "end_date",
        "city",
        "country",
        "topics",
        "disciplines",
        "fulltext_url",
        "do_not_feature_this_article",
        "update_reason",
        "context_key",
        "issue",
        "ctmtime",
    ]
)


# Drop columns that are not needed and related to authors.
columns_to_drop = df_dropped_first.filter(
    regex=r"^author\d+_(email|suffix|mname|is_corporate)"
).columns

df_working = df_dropped_first.drop(columns=columns_to_drop)
print(df_working.columns)

Index(['title', 'doi', 'author1_fname', 'author1_lname', 'author1_institution',
       'author2_fname', 'author2_lname', 'author2_institution',
       'author3_fname', 'author3_lname', 'author3_institution',
       'author4_fname', 'author4_lname', 'author4_institution',
       'author5_fname', 'author5_lname', 'author5_institution',
       'author6_fname', 'author6_lname', 'author6_institution',
       'author7_fname', 'author7_lname', 'author7_institution',
       'author8_fname', 'author8_lname', 'author8_institution',
       'author9_fname', 'author9_lname', 'author9_institution',
       'author10_fname', 'author10_lname', 'author10_institution',
       'author11_fname', 'author11_lname', 'author11_institution',
       'author12_fname', 'author12_lname', 'author12_institution',
       'author13_fname', 'author13_lname', 'author13_institution',
       'author14_fname', 'author14_lname', 'author14_institution',
       'author15_fname', 'author15_lname', 'author15_institution',
      

### Process cleaned DF


In [None]:
# Two helper functions for processing authors

def create_person_element(sequence, first_name, last_name, institution):
    person = ET.Element("person_name", sequence=sequence, contributor_role="author")
    ET.SubElement(person, "given_name").text = first_name
    ET.SubElement(person, "surname").text = last_name
    affiliation = ET.SubElement(person, "affiliation").text = institution
    return person

def add_authors(contributors, row):
    for i in range(1, 7):  # Adjust if you have an idea of the typical max authors
        fname_col = f"author{i}_fname"
        lname_col = f"author{i}_lname"
        inst_col = f"author{i}_institution"

        # Check if the first name is empty; if so, break the loop
        if pd.isna(row[fname_col]):
            break

        # Add author only if all fields are non-empty
        if pd.notna(row[lname_col]) and pd.notna(row[inst_col]):
            sequence = "first" if i == 1 else "additional"
            contributors.append(
                create_person_element(
                    sequence, row[fname_col], row[lname_col], row[inst_col]
                )
            )

In [None]:
# Create element for each row in the table
for index, row in df_working.iterrows():
    print(f"Working on row {index}: {row['title']}")

    # Create conference_paper element
    # conference = ET.SubElement(body, "conference", content_type="proceedings
    conference_paper = ET.SubElement(
        conference, "conference_paper", publication_type="full_text"
    )

    # add contributors
    contributors = ET.SubElement(conference_paper, "contributors")
    add_authors(contributors, row)

    # Add titles
    titles = ET.SubElement(conference_paper, "titles")
    title = ET.SubElement(titles, "title")
    title.text = row["title"]

    # Add publication date
    publication_date = ET.SubElement(
        conference_paper, "publication_date", media_type="online"
    )
    ET.SubElement(publication_date, "month").text = PUBLICATION_MONTH
    ET.SubElement(publication_date, "day").text = PUBLICATION_DAY
    ET.SubElement(publication_date, "year").text = PUBLICATION_YEAR

    # add doi_data
    doi_data = ET.SubElement(conference_paper, "doi_data")
    ET.SubElement(doi_data, "doi").text = row["doi"]
    ET.SubElement(doi_data, "resource").text = row["calc_url"]

Working on row 0: Unfixing the studio
Working on row 1: The Work of Untutored Designers & the Future of Design Education
Working on row 2: Inviting Curiosity: A Framework for Creating Meaningful Experiences
Working on row 3: User Data: A North Star in Teaching Methods
Working on row 4: Competencies and Skills for Designers of Information Services
Working on row 5: Neurodesign: A Game-Changer in Educational Contexts
Working on row 6: Practice what you preach: co-designing a lecture on co-design
Working on row 7: Integrating Design for Reuse in Industrial Design Education: Exploring Children’s Products and Gaining Insights
Working on row 8: Experience is learning: the Piazza Grace case study.
Working on row 9: The Flipped Design Classroom: Effectiveness of Online Lectures
Working on row 10: Intergenerational reflection in a changing world
Working on row 11: Intergenerational participatory design with the local school community
Working on row 12: Design Thinking Methodology Over Industria

## Put Together XML and make it pretty

In [None]:
try:
    tree = ET.ElementTree(doi_batch)

    # Create a string from the XML tree
    xml_string = ET.tostring(doi_batch, encoding="utf-8", xml_declaration=True).decode(
        "utf-8"
    )

    # Use minidom to format the string
    pretty_xml = minidom.parseString(xml_string).toprettyxml(indent="  ")

    # Write the pretty XML to a file
    with open(f"{OUT_FILE_NAME}.xml", "w", encoding="utf-8") as xml_file:
        xml_file.write(pretty_xml)

    print(f"XML file created successfully")
except Exception as e:
    print(f"Error writing XML file: {e}")

XML file created successfully
