The aim of this notebook is to convert appropriate Webnucleo XML data into Excel.  Begin by installing and importing the necessary libraries.

In [1]:
import sys, io, requests, html

!{sys.executable} -m pip install --quiet pandas
!{sys.executable} -m pip install --quiet xmlcoll

import pandas as pd
import xmlcoll as xc

Now create an empty XML collection.

In [2]:
my_collection = xc.Collection()

Now choose an input XML file and update the XML data.  The file needs to be in the local directory or have an appropriate path specified.  On Colab, upload the file in the directory specified by the tab in the left frame.  This example uses data from [OSF](https://osf.io/wj5rd).  Change this by using a local file (e. g., *example.xml*, in the commented line).

In [3]:
my_collection.update_from_xml(io.BytesIO(requests.get('https://osf.io/wj5rd/download').content))

Next, create a data frame.

In [4]:
df = pd.DataFrame()

Now iterate through the data and add to the data frame.

In [5]:
items = my_collection.get()
for item in items:
    data_line={}
    props = items[item].get_properties()
    for prop in props:
        str = ""
        if isinstance(prop, tuple):
            for i in range(len(prop)):
                str += prop[i]
                if i < len(prop) - 1:
                    str += "_ "
        else:
            str = prop
        data_line[str] = [props[prop]]

    df_add = pd.DataFrame(data_line, index=[item])
    df_add.index.name = 'name'

    df = pd.concat([df, df_add])

Finally, output the data frame to Excel.  Change the name of the output file by changing the value of *output_excel_file*.  The output file will be written locally.  On Colab, check under the folder tab to the left for the output.

In [6]:
output_excel_file = 'my_output.xlsx'

df.to_excel(output_excel_file)