# Overview of available data
This notebook documents the data that we get from the Rijksmuseum. Enjoy!

In [15]:
from os import listdir
from os.path import isfile, join
import json

In [17]:
DATA_DIRECTORY = '../collection/sample/'
files = [f for f in listdir(DATA_DIRECTORY) if isfile(join(DATA_DIRECTORY, f))]

print(f"Scanning {DATA_DIRECTORY}...")
print(f"Found {len(files)} files.")

json_data = []

for file in files:
    with open(join(DATA_DIRECTORY, file)) as f:
        entry = json.load(f)
        json_data.append(entry)

print(f"Read {len(json_data)} data entries.")

Scanning ../collection/sample/...
Found 1 files.
Read 1 data entries.


## Detailed API Endpoint

Endpoint: `GET /api/[culture]/collection/[object-number]`

Example: `https://www.rijksmuseum.nl/api/nl/collection/SK-C-5?key=[api-key]`

Returns an object with **3 top-level keys**:
* **elapsedMilliseconds**: How much time it took to execute the API request
* **artObjectPage**: Metadata about the page which displays the art object
    * **id**: ID of the art object page. Starts with the language of the retrieved page (e.g. `en-*` or `nl-*`)
    * **similarPages**: Array of similar pages (*how are these pages selected?; on what basis?*)
    * **lang**: The language of the page depends on the API call (*e.g. [https://www.rijksmuseum.nl/api/en/collection/SK-C-5](https://www.rijksmuseum.nl/api/en/collection/SK-C-5) vs [https://www.rijksmuseum.nl/api/nl/collection/SK-C-5?](https://www.rijksmuseum.nl/api/nl/collection/SK-C-5?)*)
    * **tags**: A list of tags (*how are these tags created?*)
    * **plaqueDescription**: A plain-text descirption of the art object
    * **audioFile1**: An audio file associated with the art object (*what does the audiofile contain?*)
    * **audioFileLabel1**: \<unknown\>
    * **audioFileLabel2**: \<unknown\>
    * **createdOn**: The data when the page was created (*e.g.* `"2012-08-09T14:47:53.679885+00:00"`)
    * **updatedOn**: The data when the page was created (*e.g.* `"2012-08-09T14:47:53.679885+00:00"`)
    * **adlibOverrides**: \<unknown\>
        * **titel**: \<unknown\>
        * **maker**: \<unknown\>
        * **etiketText**: \<unknown\>
* **artObject**: The actual art object we query
    * **links**: (??) How to access this object
        * **search**: An API URL to search for this object (*e.g. [http://www.rijksmuseum.nl/api/nl/collection](http://www.rijksmuseum.nl/api/nl/collection)*)
    * **id**: Same as `entry['artObjectPage']['id']`
    * **priref**: (??)
    * **objectNumber**: Art object number. Same as `id` but without the language prefix
    * **language**: Same as `entry['artObjectPage']['lang']`
    * **title**: Title of the art object
    * **copyrightHolder**: (??)
    * **webImage**:
        * **guid**: (??) Some kind of ID
        * **offsetPercentageX**: (??) Where the displayed part of the painting starts in X dimension (i.e. everything before that is hidden behind the frame)
        * **offsetPercentageY**: (??) Where the displayed part of the painting starts in Y dimension (i.e. everything before that is hidden behind the frame)
        * **width**: Width in pixels
        * **height**: Height in pixels
        * **url**: URL to the HD art object
    * **colors**: An array of objects containing the most dominant colors ordered in a descending order (*how many colors? Is this array always of the same length?*):
        * **percentage**: How much that color is present
        * **hex**: Hex code of the color
    * **colorsWithNormalization**: An array of objects containing the same colors as in `colors` in the same order  but with a normalization function (*what is the normalization function? what is the purpose of this?*):
        * **originalHex**: Hex code of the color (same as `colors[i]['hex']`)
        * **normalizedHex**: Hex code of the color
    * **normalizedColors**: An array of objects containing the same colors as in `colors` in the same order but with a normalization function (*what is the normalization function? what is the purpose of this?*):
        * **percentage**: How much that color is present (same as `colors[i]['percentage']`)
        * **hex**: Hex code of the color
    * **normalized32Colors**: An array of objects containing the some of the colors in `colors` in the same order but with a normalization function (*what is the normalization function? what is the purpose of this?*):
        * **percentage**: How much that color is present (same as `colors[i]['percentage']`)
        * **hex**: Hex code of the color
    * **titles**: An array of different plain-text titles for the art object
    * **description**: Plain-text description of the art object
    * **labelText**: \<unknown\>
    * **objectTypes**: An array of categories to which this art object belongs (*what are all possible categories?*)
    * **objectCollection**: An array of collections to which this art object belongs (*what are all possible collections?*)
    * **makers**: An array of the art object's makers
    * **principalMakers**: An array of detailed information about the principal makers (*what is the difference between this field and `makers`?*):
        * **name**: Plain-text name of maker (*e.g. Rembrandt van Rijn*)
        * **unFixedName**: Plain-text name of maker in a different format (*e.g. Rijn, Rembrandt van*)
        * **placeOfBirth**: Where the maker was born (*is this always a city?*)
        * **dateOfBirth**: Date of birth of the maker (*e.g* `1606-07-15`)
        * **dateOfBirthPrecision**: \<unknown\>
        * **dateOfDeath**: Date of death of the maker (*e.g* `1669-10-08`)
        * **dateOfDeathPrecision**: \<unknown\>
        * **placeOfDeath**: Where the maker died (*is this always a city?*)
        * **occupation**: An array of plain-text professions
        * **roles**: (??) Maybe in which role the maker has made the art object
        * **nationality**: (??) Maker's nationality
        * **biography**: (??) Maybe a short plain-text biography?
        * **productionPlaces**: (??) An array of places (*are these always cities?*) where the maker was active/where the maker made the art object
    * **plaqueDescriptionDutch**: Plain-text description in Dutch
    * **plaqueDescriptionEnglish**: Plain-text description in English
    * **principalMaker**: The name of the principalMaker (*maybe the first one in `principalMakers` if we assume that list is sorted based on importance to the art object's creation*)
    * **artistRole**: \<unknown\>
    * **associations**: \<unknown\>
    * **acquisition**: How was the art object acquired by the Rijksmuseum
        * **method**: For example `loan`
        * **date**: Date of acquisiotion (*e.g.* `1808-01-01T00:00:00`)
        * **creditLine**: Plain-text description of the acquisition (*e.g.* `On loan from the City of Amsterdam`)
    * **exhibitions**: \<unknown\>
    * **materials**: An array of elements the art object is made of
    * **techniques**: \<unknown\>
    * **productionPlaces**: An array of places (*are these always cities?*)
    * **dating**: An object containing data about the dating of the art object
        * **presentingDate**: (??) Maybe when the art object was first presented
        * **sortingDate**: (??) Maybe what date is used for chronologically sorting the art objects
        * **period**: (??) Maybe century
        * **yearEarly**: (??) Maybe when the art object was started
        * **yearLate**: (??) Maybe when the art object was completed
    * **classification**:
        * **iconClassIdentifier**: An array of [iconclass](http://www.iconclass.org/help/outline) identifiers of the objects recognized in the painting
        * **iconClassDescription**: An array of [iconclass](http://www.iconclass.org/help/outline) descriptions for the [iconclass](http://www.iconclass.org/help/outline) identifiers contained in `iconClassIdentifier`
        * **motifs**: \<unknown>\
        * **events**: \<unknown>\
        * **periods**: \<unknown>\
        * **places**: (??) An array of places depicted in the art object
        * **people**: An array of the names of the people depicted in the art object (*in the form:* `"Banninck Cocq, Frans"`)
        * **objectNumbers**: An array of the identifiers of the art object (*maybe an array if the art object is part of a multi-object art object*)
    * **hasImage**: A boolean showing whether the art object has an image in the dataset
    * **historicalPersons**: An array of the names of historical people depicted in the art object (*is this the same as the array in `people` in `classification`?*)
    * **inscriptions**: (??) Maybe an array of texts inscribed on the art object? (*can this be used as additional information?; How many art objects have inscriptions?*)
    * **documentation**: An array of references to documentation about this art object
    * **catRefRPK**: \<unknown>\
    * **principalOrFirstMaker**: (??) Maybe who is acknowledged as the primary maker of this art object
    * **dimensions**: An array of measurement objects, each with the following fields:
        * **unit**: The unit of measurement (*e.g. `cm`, `kg`, etc.*)
        * **type**: The type of measurement (*e.g. `width`, `weight`, etc.*)
        * **part**: \<unknown>\
        * **value**: The value of the measurement
    * **physicalProperties**: \<unknown>\
    * **physicalMedium**: A string containing (?maybe?) the primary physical material used
    * **longTitle**: A long title of the art object (*does this contain any additional information?*)
    * **subTitle**: A subtitle of the art object (*is this always the dimensions of the art object?*)
    * **scLabelLine**: Additional plain-text label (*does this contain any additional information?*)
    * **label**: An object with the following fields:
        * **title**: The title of the art object (*is this the same as the top-level `title` field?*)
        * **makerLine**: (??) Maybe what label the maker gave to the art object
        * **description**: A plain-text description of the art object (*is this the same as the top-level `description` field?*)
        * **notes**: (??) Maybe the event at which this label was created?
        * **date**: (??) Maybe the date when this label was created?
    * **showImage**: (??) Maybe whether the image is publically available (*are there non-public images?*)
    * **location**: the hall/wall spot where the art object is physically displayed at the Rijksmuseum (*is this accurate?; does this change?; Is this the hall identifier or the specific wall spot identifier?*)