# Exploring Collections using Broccoli

Generic introduction to Broccoli and Brinta.

- `/brinta`: The Brinta API gives access to the indices.
- `/projects`: The projects API gives access to search and detail views

In [23]:
import requests
import json
from IPython.display import display
from IPython.core.display import HTML

BROCCOLI_BASE = 'https://broccoli.tt.di.huc.knaw.nl'
GOETGEVONDEN_BASE = 'https://api.goetgevonden.nl'

response = requests.get(BROCCOLI_BASE)
display(HTML(response.text))

In [24]:
about = '/about'
api_doc = '/swagger'
# The SWAGGER documentation doesn't display in the notebook. Just click the URL to view:
BROCCOLI_BASE + api_doc


'https://broccoli.tt.di.huc.knaw.nl/swagger'

## Discovering which projects are available

In [25]:
projects_url = BROCCOLI_BASE + '/projects'
response = requests.get(projects_url)
response.json()

['globalise', 'republic', 'suriano']

There are currently three projects available via Broccoli:

- `globalise`: the API for transcriptions of the Dutch East India Company (VOC) used by e.g. the GLOBALISE Transcription Viewer (https://transcriptions.globalise.huygens.knaw.nl/).
- `suriano`: the API for the correspondence of the diplomat Christoforo Suriano, used by the digital edition Correspondence of Suriano (https://suriano.huygens.knaw.nl).
- `republic`: the API for the resolutions of the States General of the Dutch Republic (1576-1796), used by the Goetgevonden application (https://goetgevonden.nl).

The Swagger documentation shows that the structure of a search URL is `/projects/{projectId}/search`. So for a search in the Globalise transcriptions, we need to use `/projects/globalise/search`:

In [26]:

globalise_api = BROCCOLI_BASE + '/projects/globalise'
globalise_search = globalise_api + '/search'
response = requests.get(globalise_search)
response.json()

{'code': 404, 'message': 'bodyId not found: search'}

We get a HTTP 404 error, because a search request requires us to send search parameters along with the request URL. Further on, you'll find out more about which parameters you can send to tailor your request to your needs, but for now, we'll do a simple text search to demonstrate basic full-text search. For this, all you need is to set up a query parameter dictionary with a key `text` and a search term as the value:

In [27]:
query_params = {'text': 'peper'}
response = requests.post(globalise_search, json=query_params)
response.json()

{'total': {'value': 160366, 'relation': 'eq'},
 'results': [{'_id': 'urn:globalise:NL-HaNA_1.04.02_1474_0503',
   '_hits': {'text': ['Transport m 36433410 1/4. lb: <em>Peper</em> Teweten 487078. lb a:o 1659 als. genegotieert werden Gesien wat quantiteijt',
     '<em>Peper</em> In Elk.',
     'Monteringh op voorsz: <em>Peper</em>, Waarbij kan Jaar op de mallabaars, en Canarasz: Cust is 1012306¼. lb a„o',
     '. min 21821580 1/8 lb <em>peper</em> Transporteere 254307. lb tot Coilan „ Calicoilan 232171. „ 487078. lb <em>peper</em>',
     '. van 13858675½ lb poper Transporteere: 13858675½ lb <em>Peper</em> ƒ']},
   'textTokenCount': 365,
   'invNr': '1474',
   'document': 'NL-HaNA_1.04.02_1474_0503',
   'langIso': ['nld'],
   'langLabel': ['Dutch']},
  {'_id': 'urn:globalise:NL-HaNA_1.04.02_1444_0926',
   '_hits': {'text': ['Rijp <em>peper</em>, en Zout - - - - - - - - - - - . . . . . . . . . . - - „ 91: 5: 3: Contant. . . . . . . .',
     '. . . . . . „ 60:. . „ 202: 24: 607:10:—: Rijp, 

We get a response telling us there are a total of 160,366 results (pages) containing the string `peper`. Zooming in on an individual result (default 10 results per request) shows the same elements as shown in the Transcription Viewer:

In [28]:
response.json()['results'][2]

{'_id': 'urn:globalise:NL-HaNA_1.04.02_2060_0595',
 '_hits': {'text': ['Souratte 17 Cnooten - - - - - - Lange <em>peper</em>. - caneel. - 22 caneel. - nooten. . . . caneel. - 23 25 noten',
   'Clange <em>Peper</em> - - - - - - 10000. — 9828 — 18 garioffel nagelen - - - - - - - - 11078. — 11069. — nooten',
   '- - - - - - - - - - - - 5045- 5032. - Vertoog wegens de gevallene minderheden op de specerijen en <em>peper</em>',
   'L: - - - - - - 7750 - 7610. - 140. – <em>peper</em> swarte. . . . - - - 425000 -412338. —12662 — 2 3/8 r=m 165',
   '– o 17 1/17. <em>peper</em> swarte. . . . . . . . lb. 500000 486052 —„ 13948 —„ 2 3/5 r=m 721½ <em>Peper</em> swarte.']},
 'textTokenCount': 213,
 'invNr': '2060',
 'document': 'NL-HaNA_1.04.02_2060_0595',
 'langIso': ['nld'],
 'langLabel': ['Dutch']}

The `text` property shows snippets with the context of the search term `peper`, but to access the full document, we need to extract the `document` identifier, which we can then request 

**TO DO**: figure out how we can request the full text of a document.

## Exploring the Indexes

Documents indexed by each of the project have multiple properties, inlcuding textual elements and metadata. The metadata can be used to refine queries and filter results. But to know which metadata fields are available and how they can be used to refine and filter, you can query the Brinta API per project, using (as shown in the Swagger documentation) `/brinta/{projectId}/indices`:

In [29]:
response = requests.get(BROCCOLI_BASE + '/brinta/globalise/indices')
print(json.dumps(response.json(), indent=4))

{
    "globalise-2024.03.18-test": {
        "invNr": "keyword",
        "document": "keyword",
        "langIso": "keyword",
        "langLabel": "keyword"
    },
    "globalise-2024.03.18-4": {
        "invNr": "keyword",
        "document": "keyword",
        "langIso": "keyword",
        "langLabel": "keyword"
    },
    "globalise-2024.03.18-3": {
        "invNr": "keyword",
        "document": "keyword",
        "lang": "keyword"
    },
    "globalise-2024.03.18-2": {
        "invNr": "keyword",
        "document": "keyword"
    },
    "globalise-2024.03.18": {
        "invNr": "keyword",
        "document": "keyword"
    }
}


At the point of writing this notebook, there are three `globalise` indexes, all based on versions of the data from 18 March 2024. The indexes all share the document properties `invNr` and `document`, but index `2024.03.18-3` also has a `lang` property that encodes the language of a document. The metadata is currently limited to these field, but as the project progresses, more properties will become available and queryable.



In [30]:
response = requests.get(BROCCOLI_BASE + '/brinta/globalise/in/')
print(json.dumps(response.json(), indent=4))

{
    "code": 405,
    "message": "HTTP 405 Method Not Allowed"
}


For Suriano, we see a single index, with different document properties that are indexed:

In [31]:
response = requests.get(BROCCOLI_BASE + '/brinta/suriano/indices')
print(json.dumps(response.json(), indent=4))

{
    "surind-029": {
        "date": "date",
        "recipient": "keyword",
        "sender": "keyword",
        "summary": "text",
        "entityNames": "keyword"
    }
}


In the Suriano project, document metadata includes the `sender`, `recipient` and `date` of each letter, as well as a `summary` and a list of identified `entityNames`.

For Republic, there is a long list of document properties:

In [9]:
response = requests.get(BROCCOLI_BASE + '/brinta/republic/indices')
print(json.dumps(response.json(), indent=4))

{
    "republic-2025-05-01": {
        "textType": "keyword",
        "resolutionType": "keyword",
        "propositionType": "keyword",
        "delegateName": "keyword",
        "personName": "keyword",
        "roleName": "keyword",
        "roleCategories": "keyword",
        "locationName": "keyword",
        "locationCategories": "keyword",
        "organisationName": "keyword",
        "organisationCategories": "keyword",
        "commissionName": "keyword",
        "commissionCategories": "keyword",
        "sessionWeekday": "keyword",
        "delegateId": "keyword",
        "personId": "keyword",
        "roleId": "keyword",
        "locationId": "keyword",
        "organisationId": "keyword",
        "commissionId": "keyword",
        "bodyType": "keyword",
        "sessionDate": "date",
        "sessionDay": "byte",
        "sessionMonth": "byte",
        "sessionYear": "short"
    }
}


The generic broccoli server gives access to a Republic index made on 2025-05-01. 

If you send the same request to the Goetgevonden API, you get access to indices made on different dates, based on more recent versions of the data:

In [10]:
response = requests.get(GOETGEVONDEN_BASE + '/brinta/republic/indices')
print(json.dumps(response.json(), indent=4))

{
    "republic-2024.11.30": {
        "textType": "keyword",
        "resolutionType": "keyword",
        "propositionType": "keyword",
        "delegateName": "keyword",
        "personName": "keyword",
        "roleName": "keyword",
        "roleCategories": "keyword",
        "locationName": "keyword",
        "locationCategories": "keyword",
        "organisationName": "keyword",
        "organisationCategories": "keyword",
        "commissionName": "keyword",
        "commissionCategories": "keyword",
        "sessionWeekday": "keyword",
        "delegateId": "keyword",
        "personId": "keyword",
        "roleId": "keyword",
        "locationId": "keyword",
        "organisationId": "keyword",
        "commissionId": "keyword",
        "bodyType": "keyword",
        "sessionDate": "date",
        "sessionDay": "byte",
        "sessionMonth": "byte",
        "sessionYear": "short"
    }
}


### Querying Resolutions

You can query the resolutions in the same way as the Globalise VOC transcriptions, using a dictionary with a `text` property:

In [11]:
republic_api = GOETGEVONDEN_BASE + '/projects/republic'
republic_search = republic_api + '/search'


query = {'text': 'peper'}
response = requests.post(republic_search, json=query)
data = response.json()
data.keys()

dict_keys(['total', 'results', 'aggs'])

In [12]:
data['results'][0]

{'_id': 'urn:republic:inv-3276-date-1667-09-15-session-74-resolution-13',
 '_hits': {'text': ['gecomen waren, dat uijt Engelandt binnen dese Landen was gebracht een seer notable partije verbrande <em>Peper</em>',
   ', dewelcke sinden brandt van Londen geweest sijnde, alleen de forme van <em>Peper</em>, doch geensints eenige',
   'edoch dat evenwel eenige baetsoeckende Luijden daerop scheenen uijt te wesen, om de voors verbrande <em>Peper</em>',
   "sal werden, op pane dat die geene, die eenige vande voors verbrande <em>Peper</em>, contrarie 't voors verboth",
   "sal hebben ingevoert, vercocht, ofte oock met andere <em>Peper</em> vermenght 't zij in groote ofte in cleijne"]},
 'textType': 'handwritten',
 'resolutionType': 'ordinaris',
 'propositionType': 'onbekend',
 'sessionWeekday': 'donderdag',
 'bodyType': 'Resolution',
 'sessionDate': '1667-09-15',
 'sessionDay': 15,
 'sessionMonth': 9,
 'sessionYear': 1667}

In [13]:
data['aggs']

{}

There are no aggregations, because the query only contained a `text` property. If we can get aggregation information, we need to include properties that are indexed by Brinta and exposed to API. The following properties can be used:

- **as part of the query** to filter the results, and
- **as aggregations** to retrieve a summary of the possible values of a property.



In [14]:
response = requests.get(GOETGEVONDEN_BASE + '/brinta/republic/indices/')
data = response.json()
data['republic-2024.11.30']

{'textType': 'keyword',
 'resolutionType': 'keyword',
 'propositionType': 'keyword',
 'delegateName': 'keyword',
 'personName': 'keyword',
 'roleName': 'keyword',
 'roleCategories': 'keyword',
 'locationName': 'keyword',
 'locationCategories': 'keyword',
 'organisationName': 'keyword',
 'organisationCategories': 'keyword',
 'commissionName': 'keyword',
 'commissionCategories': 'keyword',
 'sessionWeekday': 'keyword',
 'delegateId': 'keyword',
 'personId': 'keyword',
 'roleId': 'keyword',
 'locationId': 'keyword',
 'organisationId': 'keyword',
 'commissionId': 'keyword',
 'bodyType': 'keyword',
 'sessionDate': 'date',
 'sessionDay': 'byte',
 'sessionMonth': 'byte',
 'sessionYear': 'short'}

The following query parameters include a full-text query and a request for aggregations of the `organisationName` and `organisationCategories`:

In [15]:
query = {
    'text': 'paarden OR paerden',
    'terms': {},
    'aggs': {
        'organisationName': {
            'order': 'countDesc',
            'size': 25
        },
        'organisationCategories': {
            'order': 'countDesc',
            'size': 25
        }
    }
}

result = requests.post(republic_search, json=query)
data = result.json()


In [16]:
data.keys()

dict_keys(['total', 'results', 'aggs'])

This time, the response from Broccoli contains `aggs` or aggregations. These contain the counts for the number of resolutions containing a certain `organisationName` or `organisationCategories` value.

In [17]:
data['aggs'].keys()

dict_keys(['organisationName', 'organisationCategories'])

In [18]:
import pandas as pd

data['aggs']['organisationName']

{'Raad van State': 2720,
 'Admiraliteit van Rotterdam': 510,
 'Admiraliteit van Zeeland': 460,
 'Hof van de Koning van Engeland': 313,
 'Generaliteitsrekenkamer': 299,
 'Staten van Zeeland': 294,
 'Staten-Generaal van de Republiek': 282,
 'Gezamenlijke Admiraliteiten': 234,
 'Regimenten - Dragonders': 162,
 'Regimenten - Garderegimenten': 153,
 'Regimenten - Cavallerie': 145,
 'Staten van Holland en Westfriesland': 119,
 'Admiraliteit van Amsterdam': 115,
 'Hof van de Koning van Frankrijk': 115,
 'Staten van Stad en Lande': 109,
 'Admiraliteit van Friesland': 104,
 'Leger van den Staat': 94,
 'Staten van Friesland': 85,
 'Regiment van den Heere Prince van Nassau': 82,
 'Hof van de Duitse Keizer': 80,
 'Staten van Gelderland': 71,
 'Admiraliteit van Westfriesland': 68,
 'Staten van Overijssel': 65,
 'Staten van Utrecht': 64,
 'Plaatselijke overheid van Maastricht': 57}

In [19]:
data['aggs']['organisationCategories']

{'Republiek': 5456,
 'Landelijk': 4132,
 'Generaliteit': 4025,
 'Bestuur': 3964,
 'Oorlog': 2391,
 'Regionaal': 1952,
 'Zeevaart': 1394,
 'Admiraliteit': 1345,
 'Regiment': 1090,
 'Europa': 1062,
 'Diplomatie': 721,
 'Regimenten op naam': 715,
 'Vorstenhof': 704,
 'Regimenten naar wapen': 436,
 'Financiën': 344,
 'Plaatselijk': 263,
 'Rechtspraak': 215,
 'Wereld': 108,
 'Zuidelijke Nederlanden': 105,
 'Handel': 69,
 'Regimenten naar herkomst': 38,
 'Godsdienst': 33,
 'Bataafse tijd': 31,
 'Rooms-Katholiek': 19,
 'Munt': 10}

The values of property can be added as `terms` to the query to only get resolutions that contain specific organisations or organisations of a specific category.

In [20]:
query = {
    'text': 'paarden OR paerden',
    'terms': {'organisationName': ['Admiraliteit van Rotterdam']},
    'aggs': {
        'organisationName': {
            'order': 'countDesc',
            'size': 25
        },
        'organisationCategories': {
            'order': 'countDesc',
            'size': 25
        }
    }
}

result = requests.post(republic_search, json=query)
data = result.json()

data['aggs']

{'organisationName': {'Admiraliteit van Rotterdam': 510,
  'Raad van State': 56,
  'Gezamenlijke Admiraliteiten': 24,
  'Admiraliteit van Zeeland': 17,
  'Staten van Zeeland': 15,
  'Admiraliteit van Westfriesland': 13,
  'Staten-Generaal van de Republiek': 13,
  'Admiraliteit van Amsterdam': 11,
  'Admiraliteit van Friesland': 9,
  'Generaliteitsrekenkamer': 9,
  'Leger van den Staat': 7,
  'Hof van de Koning van Engeland': 6,
  'Hof van de Koning van Frankrijk': 6,
  'Plaatselijke overheid van Maastricht': 6,
  'Regimenten - Garderegimenten': 6,
  'Staten van Gelderland': 6,
  'Raad van Vlaanderen': 4,
  'Regimenten - Cavallerie': 4,
  'Regimenten - Dragonders': 4,
  'Staten van Holland en Westfriesland': 4,
  'Westindische Compagnie': 4,
  'Battaillon van het tweede Regiment Orange Nassau': 3,
  'Comptoir-Generaal van de Unie': 3,
  'Hof van Gelderland': 3,
  'Hof van Holland': 3},
 'organisationCategories': {'Admiraliteit': 510,
  'Generaliteit': 510,
  'Oorlog': 510,
  'Regionaal'

In [21]:
query = {
    'terms': {},
    'date': {
        'name': 'sessionDate',
        'from': '1671-01-01',
        'to': '1720-12-31'
    },
    'aggs': {
        'roleName': {
            'order': 'countDesc',
            'size': 25
        },
        'roleLabels': {
            'order': 'countDesc',
            'size': 25
        }
    }
}

republic_api = BROCCOLI_BASE + '/projects/republic'
republic_search = republic_api + '/search'

result = requests.post(republic_search, json=query)
data = result.json()


In [22]:
query = {
    'terms': {'roleLabels': 'Status & relaties'},
    'date': {
        'name': 'sessionDate',
        'from': '1671-01-01',
        'to': '1720-12-31'
    },
    'aggs': {
        'roleName': {
            'order': 'countDesc',
            'size': 25
        },
        'roleLabels': {
            'order': 'countDesc',
            'size': 25
        }
    }
}

republic_api = BROCCOLI_BASE + '/projects/republic'
republic_search = republic_api + '/search'

result = requests.post(republic_search, json=query)
data = result.json()
data['aggs']

{'roleName': {'koning': 38832,
  'extraordinaris envoyé': 29079,
  'griffier': 24223,
  'majesteit': 15923,
  'graaf': 13585,
  'kapitein': 12458,
  'generaal': 10936,
  'keizer': 10778,
  'gedeputeerde': 10657,
  'commissaris': 10543,
  'koopman': 10224,
  'keurvorst': 10201,
  'hertog': 10075,
  'lieutenant': 9887,
  'kolonel': 9850,
  'ambassadeur': 8908,
  'commandant': 7443,
  'minister': 7120,
  'envoyé': 7119,
  'extraordinaris ambassadeur': 6904,
  'officier': 6313,
  'majoor': 6223,
  'baron': 5787,
  'burger': 5409,
  'consul': 5234}}

## Proposition types over time