# Duncan Williamson Audio Recordings

The [*Tobar an Dualchais*](https://www.tobarandualchais.co.uk/) website is an archival website for Scottish oral tradition.

The site hosts several Duncan Williamson stories, but they are not as disocverable as they could be. So this notebook will describe a recipe for scraping the metadata at least and making it available in a more easily navigable form.

## Available Recordings

Recordings can be listed by artist. The results are paged, by default, in groups of 10. The URL for the second page of search results for a search on _Duncan Williamson"_ is given as:

In [1]:
url = "https://www.tobarandualchais.co.uk/search?l=en&page=2&page_size=10&term=%22Williamson%2C+Duncan%2C+1928-2007+%284292%29%22&type=archival_object"

The page size and page number are readily visible in the URL. The results report suggests 29 pages of results are available, so just under 300 results in all.

Rather than scrape all the next page links, we can generate them from the page size and the number of results pages.


Upping the page size seems to cause the server on the other end to struggle a bit. Setting a page size of 500 returns 250 items, so given we're going to have to make at least two calls to get all the results, let's make things a bit easier for the server and limit ourselves to batch sizes of 50 results, which means we'll need to make 6 results page calls in all.

To support this, we can parameterise the URL:

In [2]:
_url = "https://www.tobarandualchais.co.uk/search?l=en&page={page}&page_size={page_size}&term=%22Williamson%2C+Duncan%2C+1928-2007+%284292%29%22&type=archival_object"

We'll start by looking at a small page, with just five results. We can construct an appropriate URL as follows:

In [3]:
url = _url.format(page=1, page_size=5)
url

'https://www.tobarandualchais.co.uk/search?l=en&page=1&page_size=5&term=%22Williamson%2C+Duncan%2C+1928-2007+%284292%29%22&type=archival_object'

The `bs4` / `BeautifulSoup` package is a Python package that supports the parsing and processing of HTML and XML documents.

From the raw HTML text, we can create a navigable "soup" that allows us to reference different elements within the HTML structure.

In [4]:
import requests
from bs4 import BeautifulSoup

response = requests.get(url)
soup = BeautifulSoup(response.text)

Using a browser's developer tools, we can explore the HTML structure of the page in relation to the rendered view.

For example, the number of results pages is given in a `p` element with class `search-page`:

![Path to search results page count](images/Tobar_an_Dualchais_results_pages.png)

We can retrieve the text contained in the element by referencing the element:

In [5]:
num_results_pages = soup.find("p", {"class": "search-page"}).text
num_results_pages

'Page 1 of 58'

We can easily extract the number of results by splitting that string on white space characters and picking the last item and casting it to an integer.

In [6]:
num_results_pages = int(num_results_pages.split()[-1])
num_results_pages

58

Each results item in the results page includes some metadata and a link to a record results page.

Looking at the page structure, we see that the results links have the class `search-item__link`. We can use this as a crib to extract the links:

In [7]:
example_track_links = soup.find_all("a", {"class": "search-item__link"})
example_track_links

[<a class="search-item__link" href="/track/60155?l=en">View Track</a>,
 <a class="search-item__link" href="/track/60156?l=en">View Track</a>,
 <a class="search-item__link" href="/track/60158?l=en">View Track</a>,
 <a class="search-item__link" href="/track/60162?l=en">View Track</a>,
 <a class="search-item__link" href="/track/60167?l=en">View Track</a>]

In [8]:
example_track_links[0]['href']

'/track/60155?l=en'

The links are relative to the domain, `https://www.tobarandualchais.co.uk`.

In [9]:
domain = "https://www.tobarandualchais.co.uk"

The metadata that appears on the search results page is duplicated in an actual record page, so there is no need to scrape it from the results page. Instead, we'll get what we need from the results record pages.

Let's get an example record page down. First we construct a page URL:

In [10]:
example_page_url = f"{domain}{example_track_links[0]['href']}"
example_page_url

'https://www.tobarandualchais.co.uk/track/60155?l=en'

Then we grab the page and make soup from it:

In [11]:
example_record_soup = BeautifulSoup(requests.get(example_page_url).text)

The title, which appears to be the first line of summary with a maximum character limit, is in a `span` element with a `contributor__title` class:

In [12]:
# Title
example_record_soup.find("span", {"class": "contributor__title" }).text

'Balmoral Highlanders/Father John MacMillan of Barra/Jean Mauchline'

The rest of the page is not so conveniently structured, with the class elements appearing in each part of the result record. However, we can identify the appropriate block from an `h3` element with text *Summary* contained within it and the just grab the next sibling element:

In [13]:
# Summary
str(example_record_soup.find("h3", string="Summary").find_next("p"))

'<p class="contributor-bio-item__content">Diddling of three marches. They are \'Balmoral Highlanders\', \'Father John MacMillan of Barra\' and \'Jean Mauchline\'.</p>'

The date is another useful metadata field, which we can identify from a prior spanned `"Date"` label:

In [14]:
example_date_str = example_record_soup.find("span", string='Date').find_next("span").text
example_date_str

'1977'

We can try to parse this into a datetime object:

In [15]:
from dateparser import parse
import datetime

# Output date format
dt = "%Y-%m-%d"

# If only a year is specified, by default the parsed datetime
# will be set relative to the current datetime
# Or we can force a relative dummy date
try_date = parse(example_date_str.strip(),
                             settings={'RELATIVE_BASE': datetime.datetime(2000, 1, 1)})
example_record_date = try_date.strftime(dt) if try_date else ''

example_record_date

  now = self.get_local_tz().localize(now)
  date_obj = stz.localize(date_obj)


'1977-01-01'

If available, the genre is also likely to be of interest to us, so that we can search for songs, or stories, for example:

In [16]:
example_genre = example_record_soup.find("h3", string='Genre').find_next("p").text
example_genre

'Music'

The audio file(s) seem to be loaded via `turbo-frame` elements. These in turn appear to load a page containing the media player in a `source` element. So we can grab all `turbo-frame` elements from a page, iterate through them, extracting the frame path from each one, and then load the corresponding frame page. Each of these frame pages then contains an audio `source` element from which we can grab the audio file URL.

In [17]:
example_sources = []

# Grab and iterate through each turbo-frame element
for turbo_frame in example_record_soup.find_all('turbo-frame'):
    # The frame URL is given by the src attribute
    turbo_frame_url = f'{domain}{turbo_frame["src"]}'
    # Get the frame page text, make soup from it
    # and find the (first and only) source element
    # Append this element to our sources list for the record page
    example_sources.append( BeautifulSoup(requests.get(turbo_frame_url).text).find("source") )

example_sources

[<source src="https://digitalpreservation.is.ed.ac.uk/bitstream/handle/20.500.12734/10602/SOSS_007913_060155.mp4" type="audio/mp4"/>]

If we want, can can embed that audio in our own player. Let's start by downloading a local copy of the audio file.

Start by ensuring we have a directory available to download the audio files into:

In [18]:
from pathlib import Path

download_dir_name = "audio"

# Generate a path
download_dir = Path(download_dir_name)

# Ensure the directory (and its parents for a long path) exist
download_dir.mkdir(parents=True, exist_ok=True)

Now we can download the audio file into that directory. The filename is the last part of the URL:

In [19]:
import urllib

# The URL is given by the src attribute of a source element
audio_url = example_sources[0]["src"]
# The file name is the last part of the URL
audio_filename = audio_url.split("/")[-1]

# Create a path to the audio file in the download directory
local_audio = download_dir / audio_filename

# Download the audio file from th specified URL to the required location
urllib.request.urlretrieve (audio_url, local_audio)

(PosixPath('audio/SOSS_007913_060155.mp4'),
 <http.client.HTTPMessage at 0x10f0097f0>)

Now we can play it from the local copy:

In [20]:
from IPython.display import Audio

Audio(local_audio)

## Putting all the pieces together

We can now put all the pieces together to make a scraper for the metadata (and optionally all the audio files) for all the Duuncan Williamson tracks on the *Tobar an Dualchais* website (or at least, all those records identified by a search on *Duncan Williamson*).

The recipe will run something like:

- load page with all "next page" links displayed;
- grab results links for first page;
- grab results links for all next pages;
- for each results link, grab result record page, extract required data.

If we use the `requests-cache` package, we can keep a local copy of downloaded pages so if we need to rerun things (for example, to extract more data fields) we will have locally cached copies of the files to work from.

In [21]:
import requests_cache
from datetime import timedelta

# Cache into a sqlite database; set to expire after 500 days
requests_cache.install_cache("tobar_cache", backend="sqlite", 
                             expire_after=timedelta(days=500))

To make the scrape easier, lets set up some functions based the sketches we've already made:

In [22]:
import time

#Relative links are relative to this domain
DOMAIN = "https://www.tobarandualchais.co.uk"

def get_search_results_page(page_num=1, page_size=10):
    """Get paged search results page with specified number of results per page.
       Return as soup.
    """
    # Use a Python f-sring to format the URL based on passed in parameter values
    url = f"https://www.tobarandualchais.co.uk/search?l=en&page={page_num}&page_size={page_size}&term=%22Williamson%2C+Duncan%2C+1928-2007+%284292%29%22&type=archival_object"
    
    response = requests.get(url)
    results_page_soup = BeautifulSoup(response.text)
    
    return results_page_soup


def get_number_of_results_pages(results_page_soup):
    """Return the number of results pages available."""
    num_results_pages_ = results_page_soup.find("p", {"class": "search-page"}).text
    num_results_pages = int(num_results_pages_.split()[-1])

    return num_results_pages


def get_result_links(results_page_soup):
    """Get results links from a results soup page."""
    result_links_ = results_page_soup.find_all("a", {"class": "search-item__link"})
    result_links = [result_link["href"] for result_link in result_links_]
    
    return result_links


def get_audio_files_url(result_record_soup, domain=DOMAIN):
    """Get audio file link from turbo-frame loaded page."""
    audio_sources = []

    # Grab and iterate through each turbo-frame element
    for turbo_frame in result_record_soup.find_all('turbo-frame'):
        # The frame URL is given by the src attribute
        turbo_frame_url = f'{domain}{turbo_frame["src"]}'
        # Get the frame page text, make soup from it
        # and find the (first and only) source element
        # Append this element to our sources list for the record page
        audio_url_ = BeautifulSoup(requests.get(turbo_frame_url).text).find("source")
        if audio_url_:
            audio_sources.append(audio_url_["src"])
    return audio_sources


# Provide a delay between calls to the website
def get_result_record_data(result_path, domain=DOMAIN, audio=False, be_nice=0.1):
    """Get result record data for a result page."""
    # Slight delay before we make a call
    # This is just so we don't hammer the server
    # If we are hitting the cached pages, this can be set to 0
    time.sleep(be_nice)

    result_record_data = {}
    
    result_record_url = f"{domain}{result_path}"
    response = requests.get(result_record_url)
    # Is the response ok?
    if response.status_code!=200:
        print(f"Something wrong with {result_record_url}")
        return {}

    result_record_soup = BeautifulSoup(response.text)
    
    # Title
    result_record_data["title"] = result_record_soup.find("span", {"class": "contributor__title" }).text

    # Summary
    _summary = result_record_soup.find("h3", string="Summary")
    result_record_data["summary"] = str(_summary.find_next("p")) if _summary else ''
    
    # Date
    result_record_data["raw_date"] = result_record_soup.find("span", string='Date').find_next("span").text

    # Genre
    result_record_data["genre"] = result_record_soup.find("h3", string='Genre').find_next("p").text

    # URL
    result_record_data["url"] = result_record_url

    # Output date format
    dt = "%Y-%m-%d"

    # If only a year is specified, by default the parsed datetime
    # will be set relative to the current datetime
    # Or we can force a relative dummy date
    try_date = parse(result_record_data["raw_date"].strip(),
                                 settings={'RELATIVE_BASE': datetime.datetime(2000, 1, 1)})
    result_record_data["date"] = try_date.strftime(dt) if try_date else ''


    if audio:
        result_record_data["audio_url"] = get_audio_files_url(result_record_soup,
                                                              domain=domain)

    return result_record_data

Let's try for just a single small results page:

In [23]:
results_page_soup = get_search_results_page(page_size=5)

results_links = get_result_links(results_page_soup)
num_results_pages = get_number_of_results_pages(results_page_soup)

results_links, num_results_pages

(['/track/60155?l=en',
  '/track/60156?l=en',
  '/track/60158?l=en',
  '/track/60162?l=en',
  '/track/60167?l=en'],
 58)

And let's see if we can iterate through those to get metadata results, first without the audio link:

In [24]:
results_metadata = []

for results_link in results_links:
    results_metadata.append( get_result_record_data(results_link) )
    
results_metadata

  now = self.get_local_tz().localize(now)
  date_obj = stz.localize(date_obj)


[{'title': 'Balmoral Highlanders/Father John MacMillan of Barra/Jean Mauchline',
  'summary': '<p class="contributor-bio-item__content">Diddling of three marches. They are \'Balmoral Highlanders\', \'Father John MacMillan of Barra\' and \'Jean Mauchline\'.</p>',
  'raw_date': '1977',
  'genre': 'Music',
  'url': 'https://www.tobarandualchais.co.uk/track/60155?l=en',
  'date': '1977-01-01'},
 {'title': 'Diddling',
  'summary': '<p class="contributor-bio-item__content">Duncan Williamson points out that the pronunciation of the \'words\' change in faster diddling. His mother was a great diddler, \'Bundle and Go\' being one of her favourites. He diddles two tunes.</p>',
  'raw_date': '1977',
  'genre': 'Information',
  'url': 'https://www.tobarandualchais.co.uk/track/60156?l=en',
  'date': '1977-01-01'},
 {'title': 'Leaving Lismore',
  'summary': '<p class="contributor-bio-item__content">A piper would choose a tune to play according to the location, occasion and listeners. He might play a 

And then with an audio link:

In [25]:
results_metadata = []

for results_link in results_links:
    results_metadata.append( get_result_record_data(results_link, audio=True) )
    
results_metadata

[{'title': 'Balmoral Highlanders/Father John MacMillan of Barra/Jean Mauchline',
  'summary': '<p class="contributor-bio-item__content">Diddling of three marches. They are \'Balmoral Highlanders\', \'Father John MacMillan of Barra\' and \'Jean Mauchline\'.</p>',
  'raw_date': '1977',
  'genre': 'Music',
  'url': 'https://www.tobarandualchais.co.uk/track/60155?l=en',
  'date': '1977-01-01',
  'audio_url': ['https://digitalpreservation.is.ed.ac.uk/bitstream/handle/20.500.12734/10602/SOSS_007913_060155.mp4']},
 {'title': 'Diddling',
  'summary': '<p class="contributor-bio-item__content">Duncan Williamson points out that the pronunciation of the \'words\' change in faster diddling. His mother was a great diddler, \'Bundle and Go\' being one of her favourites. He diddles two tunes.</p>',
  'raw_date': '1977',
  'genre': 'Information',
  'url': 'https://www.tobarandualchais.co.uk/track/60156?l=en',
  'date': '1977-01-01',
  'audio_url': ['https://digitalpreservation.is.ed.ac.uk/bitstream/han

As that seems to work, now we can try for a complete scrape. To minimise the calls, we get a the results links first, then dedupe:

In [26]:
# tqdm is a simple progress bar indicator
from tqdm.notebook import tqdm

results_batch_size = 50

results_page_num = num_results_pages = 1

# Get all results links
results_links = []

while results_page_num <= num_results_pages:
    results_page_soup = get_search_results_page(page_num=results_page_num,
                                                page_size=results_batch_size)
    
    # Extend the list of results links we have so far
    results_links.extend( get_result_links(results_page_soup) )
    
    # If this is the first page of results, how many pages are there
    if results_page_num==1:
        num_results_pages = get_number_of_results_pages(results_page_soup)
    
    # Increment which page we are on
    results_page_num += 1

print(f"{len(results_links)} results links identified from search results.")

# Dedupe the results list by making a set from them    
results_links = list(set(results_links))

print(f"{len(results_links)} unique results links.")

290 results links identified from search results.
290 unique results links.


Having got all the results links, we can now grab all the individual results pages:

In [27]:
# Now we can iterate though the unique results
results_metadata = []

for results_link in tqdm(results_links):
    metadata_record =  get_result_record_data(results_link, audio=True)
    if metadata_record:
        results_metadata.append( metadata_record )

  0%|          | 0/290 [00:00<?, ?it/s]

Something wrong with https://www.tobarandualchais.co.uk/track/91107?l=en
Something wrong with https://www.tobarandualchais.co.uk/track/66370?l=en
Something wrong with https://www.tobarandualchais.co.uk/track/39404?l=en
Something wrong with https://www.tobarandualchais.co.uk/track/30879?l=en
Something wrong with https://www.tobarandualchais.co.uk/track/29116?l=en
Something wrong with https://www.tobarandualchais.co.uk/track/64461?l=en
Something wrong with https://www.tobarandualchais.co.uk/track/33313?l=en
Something wrong with https://www.tobarandualchais.co.uk/track/28969?l=en
Something wrong with https://www.tobarandualchais.co.uk/track/33323?l=en
Something wrong with https://www.tobarandualchais.co.uk/track/30876?l=en
Something wrong with https://www.tobarandualchais.co.uk/track/91097?l=en
Something wrong with https://www.tobarandualchais.co.uk/track/91102?l=en
Something wrong with https://www.tobarandualchais.co.uk/track/29143?l=en
Something wrong with https://www.tobarandualchais.c

Let's preview the last few results:

In [28]:
results_metadata[-5:]

[{'title': 'The Wee Toon Clerk',
  'summary': '<p class="contributor-bio-item__content">Comic night-visiting song.</p>',
  'raw_date': '18 January 1987',
  'genre': 'Song',
  'url': 'https://www.tobarandualchais.co.uk/track/91091?l=en',
  'date': '1987-01-18',
  'audio_url': ['https://digitalpreservation.is.ed.ac.uk/bitstream/handle/20.500.12734/5508/SOSS_009517_091091.mp4']},
 {'title': 'Black Plague of Mingaley',
  'summary': '<p class="contributor-bio-item__content">The contributor mentions a song about a plague on Mingulay which killed the entire local population, and recites one verse.<br/><br/>Followed by a brief discussion of the contributor\'s song repertoire. If a song doesn\'t appeal to people he\'s travelling with, he won\'t sing it. If they don\'t know the song or it doesn\'t mean much to them he gives them something else.</p>',
  'raw_date': '21 February 1976',
  'genre': 'Song',
  'url': 'https://www.tobarandualchais.co.uk/track/110378?l=en',
  'date': '1976-02-21',
  'au

Let's now serialise the results into a text file.

The following function will generate a single record report:

In [29]:
import markdownify

def display_record(record):
    """text display of record."""
    txt = f"""
{record['title']}
{record["url"]}
{record['genre']}
{record['raw_date']}

{markdownify.markdownify(record['summary']).strip()}

{" :: ".join(record['audio_url'])}
"""
    
    return txt

In [30]:
display_record(results_metadata[0])

"\nThe Princess on the Glass Hill\nhttps://www.tobarandualchais.co.uk/track/28935?l=en\nStory\n06 May 1976\n\nThe story of how lazy Jack got horses and armour and won the princess on the glass hill.  \n  \nJack was the youngest of three sons of a widow. He was very lazy and dirty and untidy. He never helped his mother as his brothers did. However one night he was persuaded to go and keep watch in their little cornfield to stop the deer eating the crop. He sat down leaning his back on a tree and fell asleep. A noise awoke him, and he saw a black horse with black armour tied to the saddle. He took the horse and hid it in the wood, so that his brothers would not get it. The next night he agreed to keep watch again, and this time he was wakened by a white horse, with silver armour. He hid it with the first horse. On the third night when Jack was keeping watch a brown horse with gold armour came, and Jack hid it too. The following day Jack's brothers brought news from the market that the ki

In [31]:
full_text = """# Duncan Williamson Audio on Tobar an Dualchais

Via: https://www.tobarandualchais.co.uk/

---

"""

full_text += "\n---\n".join([display_record(record_result) for record_result in results_metadata])

full_text[:1000]

'# Duncan Williamson Audio on Tobar an Dualchais\n\nVia: https://www.tobarandualchais.co.uk/\n\n---\n\n\nThe Princess on the Glass Hill\nhttps://www.tobarandualchais.co.uk/track/28935?l=en\nStory\n06 May 1976\n\nThe story of how lazy Jack got horses and armour and won the princess on the glass hill.  \n  \nJack was the youngest of three sons of a widow. He was very lazy and dirty and untidy. He never helped his mother as his brothers did. However one night he was persuaded to go and keep watch in their little cornfield to stop the deer eating the crop. He sat down leaning his back on a tree and fell asleep. A noise awoke him, and he saw a black horse with black armour tied to the saddle. He took the horse and hid it in the wood, so that his brothers would not get it. The next night he agreed to keep watch again, and this time he was wakened by a white horse, with silver armour. He hid it with the first horse. On the third night when Jack was keeping watch a brown horse with gold armour

In [32]:
with open("williamson_audio.md", "w") as f:
    f.write(full_text)

We can also generate a simple CSV file, most conveniently via a *pandas* dataframe:

In [33]:
import pandas as pd

df = pd.DataFrame(results_metadata)

df.head()

Unnamed: 0,title,summary,raw_date,genre,url,date,audio_url
0,The Princess on the Glass Hill,"<p class=""contributor-bio-item__content"">The s...",06 May 1976,Story,https://www.tobarandualchais.co.uk/track/28935...,1976-05-06,[https://digitalpreservation.is.ed.ac.uk/bitst...
1,"Jack mistook a thorn tree for an old woman, an...","<p class=""contributor-bio-item__content"">Jack ...",11 July 1976,Story,https://www.tobarandualchais.co.uk/track/30609...,1976-07-11,[https://digitalpreservation.is.ed.ac.uk/bitst...
2,The story of an old ballad about a farmer refu...,"<p class=""contributor-bio-item__content"">The s...",13 November 1976,Story,https://www.tobarandualchais.co.uk/track/33139...,1976-11-13,[https://digitalpreservation.is.ed.ac.uk/bitst...
3,Down in Yonder Bushes,"<p class=""contributor-bio-item__content"">Jilte...",17 July 1976,Song,https://www.tobarandualchais.co.uk/track/30890...,1976-07-17,[https://digitalpreservation.is.ed.ac.uk/bitst...
4,Lord Ullin's Daughter,"<p class=""contributor-bio-item__content"">A son...",September 1977,Song,https://www.tobarandualchais.co.uk/track/78630...,1977-09-01,[https://digitalpreservation.is.ed.ac.uk/bitst...


Cast the summary to markdown:

In [34]:
df['summary'] = df['summary'].apply(lambda x: markdownify.markdownify(x).strip())

df.head()

Unnamed: 0,title,summary,raw_date,genre,url,date,audio_url
0,The Princess on the Glass Hill,The story of how lazy Jack got horses and armo...,06 May 1976,Story,https://www.tobarandualchais.co.uk/track/28935...,1976-05-06,[https://digitalpreservation.is.ed.ac.uk/bitst...
1,"Jack mistook a thorn tree for an old woman, an...","Jack mistook a thorn tree for an old woman, an...",11 July 1976,Story,https://www.tobarandualchais.co.uk/track/30609...,1976-07-11,[https://digitalpreservation.is.ed.ac.uk/bitst...
2,The story of an old ballad about a farmer refu...,The story of an old ballad about a farmer refu...,13 November 1976,Story,https://www.tobarandualchais.co.uk/track/33139...,1976-11-13,[https://digitalpreservation.is.ed.ac.uk/bitst...
3,Down in Yonder Bushes,Jilted lover's song.,17 July 1976,Song,https://www.tobarandualchais.co.uk/track/30890...,1976-07-17,[https://digitalpreservation.is.ed.ac.uk/bitst...
4,Lord Ullin's Daughter,A song derived from Thomas Campbell's poem 'Lo...,September 1977,Song,https://www.tobarandualchais.co.uk/track/78630...,1977-09-01,[https://digitalpreservation.is.ed.ac.uk/bitst...


And stringify the list of audio_urls:

In [35]:
df['audio_url'] = df['audio_url'].apply(lambda x: ' :: '.join(x))

df.head()

Unnamed: 0,title,summary,raw_date,genre,url,date,audio_url
0,The Princess on the Glass Hill,The story of how lazy Jack got horses and armo...,06 May 1976,Story,https://www.tobarandualchais.co.uk/track/28935...,1976-05-06,https://digitalpreservation.is.ed.ac.uk/bitstr...
1,"Jack mistook a thorn tree for an old woman, an...","Jack mistook a thorn tree for an old woman, an...",11 July 1976,Story,https://www.tobarandualchais.co.uk/track/30609...,1976-07-11,https://digitalpreservation.is.ed.ac.uk/bitstr...
2,The story of an old ballad about a farmer refu...,The story of an old ballad about a farmer refu...,13 November 1976,Story,https://www.tobarandualchais.co.uk/track/33139...,1976-11-13,https://digitalpreservation.is.ed.ac.uk/bitstr...
3,Down in Yonder Bushes,Jilted lover's song.,17 July 1976,Song,https://www.tobarandualchais.co.uk/track/30890...,1976-07-17,https://digitalpreservation.is.ed.ac.uk/bitstr...
4,Lord Ullin's Daughter,A song derived from Thomas Campbell's poem 'Lo...,September 1977,Song,https://www.tobarandualchais.co.uk/track/78630...,1977-09-01,https://digitalpreservation.is.ed.ac.uk/bitstr...


We can trivially save the dataframe as a CSV file, adding some structure along the way by first sorting the dataframe by genre and date. 

In [37]:
df.sort_values(["genre", "date"]).to_csv("duncan_williamson_audio.csv", index=False)

The saved files should contain a complete summary of records available.