# Exploring CSV files

Created in February 2024 for the University of Strathclyde by Gustavo Candela

### About the Moving Image Archive Dataset

This dataset represents the descriptive metadata from the [Hutton Drawings](https://data.nls.uk/data/metadata-collections/hutton-drawings/).
This dataset represents the complete descriptive metadata for the Hutton drawings, a digitised collection of drawings, maps, plans and prints relating mainly to Scottish churches and other ecclesiastical buildings, castles or other dwellings.
The original drawings date from 1781-1792 and 1811-1820 and are arranged by county. Some of the drawings are by George Henry Hutton, a professional soldier and amateur antiquary, who compiled the collection. 

- Data format: metadata available as MARCXML and Dublin Core
- Data source: https://data.nls.uk/data/metadata-collections/hutton-drawings/

### Table of contents

- [Preparation](#Preparation)
- [Loading the CSV data into pandas](#Loading-the-CSV-data-into-pandas)
- [Wikidata enrichment](#Let's-enrich-the-data-with-Wikidata)

### Preparation

Import the libraries required to explore the summary of each record included in the dataset to present a word cloud.

In [41]:
import pandas as pd

### Loading the CSV data into pandas

In [42]:
path_csv = "output/Hutton-Drawings.csv"
df = pd.read_csv (path_csv, sep=',')

#### Let's see the structure of the dataset

In [43]:
## structure of the data
print(df.columns.tolist())

['title', 'author', 'date', 'subjects', 'geographic_names']


#### Let's explore the content

In [44]:
# number of records
print(df.count())

title               537
author              537
date                537
subjects            537
geographic_names    532
dtype: int64


In [45]:
print(df["author"].head(10))

0    Hutton, George Henry, -1827 -- Hutton, George ...
1    Forbes, John, of Boynlie -- Hutton, George Hen...
2    Hutton, George Henry, -1827 -- Hutton, George ...
3    Hutton, George Henry, -1827 -- Hutton, George ...
4    Buchan, David Stewart Erskine, Earl of, 1742-1...
5    Hutton, George Henry, -1827 -- Hutton, George ...
6    Hutton, George Henry, -1827 -- Hutton, George ...
7    Hutton, George Henry, -1827 -- Hutton, George ...
8    Fernie, John, active 1812 -- Morton, Alexander...
9    Hutton, George Henry, -1827 -- Hutton, George ...
Name: author, dtype: object


The information stored in the metadata field place of publication is the same string in all the records.

In [46]:
print(df["geographic_names"].unique())

['Europe, United Kingdom, Scotland, Aberdeen (unitary authority)'
 'Europe, United Kingdom, Scotland, Fife, Inchcolm (island) -- Europe, United Kingdom, Scotland, Fife (unitary authority)'
 'Europe, United Kingdom, Scotland, Scottish Borders, Berwick (general region)'
 'Europe, United Kingdom, Scotland, Scottish Borders, Dryburgh Abbey (ruins) -- Europe, United Kingdom, Scotland, Scottish Borders, Berwick (general region)'
 'Europe, United Kingdom, Scotland, Aberdeen, Aberdeen (inhabited place) -- Europe, United Kingdom, Scotland, Aberdeen (unitary authority)'
 'Europe, United Kingdom, Scotland, Fife (unitary authority)'
 'Europe, United Kingdom, Scotland, Fife, Culross (inhabited place) -- Europe, United Kingdom, Scotland, Fife (unitary authority)'
 'Europe, United Kingdom, Scotland, Fife, Saint Andrews (inhabited place) -- Europe, United Kingdom, Scotland, Fife (unitary authority)'
 'Europe, United Kingdom, Scotland, Fife (unitary authority) -- Europe, United Kingdom, Scotland, Fife,

In [47]:
# get unique values
geographic_names = pd.unique(df['geographic_names'].str.split(' -- ', expand=True).stack()).tolist()
print("Total unique geographic_names:" + str(len(geographic_names)))
for s in sorted(geographic_names, key=str.lower):
    print(s)

Total unique geographic_names:101
Europe, United Kingdom, England, Yorkshire (general region)
Europe, United Kingdom, Great Britain, Tweed (river)
Europe, United Kingdom, Scotland, Aberdeen (unitary authority)
Europe, United Kingdom, Scotland, Aberdeen, Aberdeen (inhabited place)
Europe, United Kingdom, Scotland, Aberdeenshire, Banff (general region)
Europe, United Kingdom, Scotland, Aberdeenshire, Banff (inhabited place)
Europe, United Kingdom, Scotland, Aberdeenshire, Cowie Harbour (bay)
Europe, United Kingdom, Scotland, Aberdeenshire, Kincardine (general region)
Europe, United Kingdom, Scotland, Aberdeenshire, Monymusk (inhabited place)
Europe, United Kingdom, Scotland, Angus (unitary authority)
Europe, United Kingdom, Scotland, Angus, Guthrie (inhabited place)
Europe, United Kingdom, Scotland, Argyll and Bute, Argyll (general region)
Europe, United Kingdom, Scotland, Argyll and Bute, Bute (general region)
Europe, United Kingdom, Scotland, Argyll and Bute, Bute, Island of, Rothesay 

### Let's enrich the data with Wikidata

#### Now that we have the geographic names depicted in the pictures we can create a visualisation map.

We can use Wikidata as an example so we do not have to install any software. The first thing that we have to do is identifyting the items in [Wikidata](https://www.wikidata.org/). Then, we will create a SPARQL query that will return a map as a result. I only have used a random selection of the items provided by the previous list.

- Yorkshire = https://www.wikidata.org/wiki/Q163 (wd:Q163)
- Aberdeen = https://www.wikidata.org/wiki/Q36405 (wd:Q36405)
- Aberdeenshire = https://www.wikidata.org/wiki/Q189912 (wd:Q189912)
- Aberdeenshire, Monymusk =  https://www.wikidata.org/wiki/Q68816212 (wd:Q68816212)
- Kincardine = https://www.wikidata.org/wiki/Q1011221 (wd:Q1011221)
- Angus = https://www.wikidata.org/wiki/Q202177 (wd:Q202177)
- River Tay = https://www.wikidata.org/wiki/Q19719 (wd:Q19719)

Using the following SPARQL query, we can create a visualisation map:

```#defaultView:Map
SELECT ?place ?placeLabel (SAMPLE(?image) as ?img) (SAMPLE(?coord) as ?c)
WHERE {   
  VALUES ?place { wd:Q163 wd:Q36405 wd:Q189912 wd:Q68816212 wd:Q1011221 wd:Q202177 wd:Q19719 }.

       ?place wdt:P625 ?coord.
       OPTIONAL {?place wdt:P18 ?image}.
          
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
} GROUP BY ?place ?placeLabel ?img
```

Check the result in this [link](https://w.wiki/9Fde).
