# Tutorial Entity API

In this tutorial you will learn how to use the [Entity API](https://pro.europeana.eu/page/entity), which offers information about several type of entities: `agent`, `place`, `concept` and `timespan`. These named entities are part of the Europeana Entity Collection, a collection of entities in the context of Europeana harvested from and linked to controlled vocabularies, such as ​Geonames, Dbpedia and Wikidata.

The Entity API has three methods:

* `apis.entity.suggest`: returns entities of a certain type matching a text query 

* `apis.entity.retrieve`: returns information about an individual entity of a certain type

* `apis.entity.resolve`: returns entities that match a query url


We will use [PyEuropeana](https://github.com/europeana/rd-europeana-python-api), a Python client library for Europeana APIs. Read more about how the package works in the [Documentation](https://rd-europeana-python-api.readthedocs.io/en/stable/).

Install PyEuropeana with pip:

In [1]:
%%capture
!pip install pyeuropeana
#!pip install https://github.com/europeana/rd-europeana-python-api/archive/master.zip

Europeana APIs require a key for authentication, find more information on how to get your API key [here](https://pro.europeana.eu/pages/get-api). Once you obtain your key you can set it as an environment variable using the `os` library:

In [2]:
import os
os.environ['EUROPEANA_API_KEY'] = 'api2demo'

In [None]:

import pandas as pd
pd.set_option('display.max_colwidth', 15)

import pyeuropeana.apis as apis

## Agents

In this section we focus on the agent type of entities. We would like to find out if there are agents that match some query. In the following cell we import the `apis` module from `pyeuropeana`  and call the `suggest` method, which returns a dictionary

In [3]:

resp = apis.entity.suggest(                     
   text = 'leonardo',
   TYPE = 'agent',
)

resp.keys()

dict_keys(['@context', 'type', 'total', 'items'])

The response contains several fields. The field `total` represents the number of entities matching our query



In [4]:
resp['total']

10

The field `items` contains a list where each object represents an entity, which are the results of the search

In [5]:
len(resp['items'])

10

This list can be converted in a pandas DataFrame as follows:

In [6]:
df = pd.json_normalize(resp['items'])
cols = df.columns.tolist()
cols = cols[-2:]+cols[:-2]
df = df[cols]

The resulting dataframe has several columns. The `id` column contain the identifier for the entity. The columns starting with `shownBy` contain information about an illustration for a given entity. We can discard this information if we want

In [7]:
rm_cols = [col for col in df.columns if 'isShownBy' in col]
df = df.drop(columns=rm_cols)
df.head()

Unnamed: 0,prefLabel.en,altLabel.en,id,type,dateOfBirth,dateOfDeath
0,Leonardo da...,[Leonardo d...,http://data...,Agent,1452-04-15,1519-05-02
1,Leonardo Leo,"[Leo, Leona...",http://data...,Agent,1694-08-05,1744-10-31
2,Leonardo Sc...,"[Sciascia, ...",http://data...,Agent,1921-01-08,1989-11-20
3,Leonardo Pa...,[Padura Fue...,http://data...,Agent,1955,
4,Bruno Leona...,"[Gelber, Br...",http://data...,Agent,1941-03-19,


We have some information about several entities matching our query. What other information can we obtain for these entities?

The method `retrieve` can be used to obtain more information about a particular entity using its identifier. The `id` column in the table above contains the uris of the different entities, where the identifier is an integer located at the end of each entiry uri.

For example, for the entity *Leonardo da Vinci* with uri http://data.europeana.eu/agent/base/146741 we can call `retrieve` as:


In [8]:
resp = apis.entity.retrieve(
   TYPE = 'agent',
   IDENTIFIER = 146741,
)

resp.keys()

dict_keys(['@context', 'id', 'type', 'isShownBy', 'prefLabel', 'altLabel', 'dateOfBirth', 'end', 'dateOfDeath', 'placeOfBirth', 'placeOfDeath', 'biographicalInformation', 'identifier', 'sameAs'])

We observe that the response contains several fields, some of them not present in the suggest method. 

The field `prefLabel` contains a list of the name of the entity in different languages. We can transform this list into a dataframe

In [9]:
def get_name_df(resp):
  lang_name_df = None
  if 'prefLabel' in resp.keys():
    lang_name_df = pd.DataFrame([{'language':lang,'name':name} for lang,name in resp['prefLabel'].items()])
  return lang_name_df

lang_name_df = get_name_df(resp)
lang_name_df.head()

Unnamed: 0,language,name
0,ar,ليوناردو دا...
1,az,Leonardo da...
2,be,Леанарда да...
3,bg,Леонардо да...
4,bs,Leonardo da...


The field `biographicalInformation` can be useful to know more about the biography of the agent in particular. This information is also multilingual, and can be transformed into a pandas DataFrame

In [10]:
def get_biography_df(resp):
  bio_df = None
  if 'biographicalInformation' in resp.keys():
    bio_df = pd.DataFrame(resp['biographicalInformation'])
  return bio_df

bio_df = get_biography_df(resp)
bio_df.head()

Unnamed: 0,@language,@value
0,de,Leonardo da...
1,no,Leonardo di...
2,hi,लिओनार्दो द...
3,fi,Leonardo di...
4,be,Леана́рда д...


We can access the biography in English for instance in the following way

In [11]:
bio_df['@value'].loc[bio_df['@language'] == 'en'].values[0]

'Leonardo di ser Piero da Vinci (Italian pronunciation: [leoˈnardo da vˈvintʃi] About this sound pronunciation ; April 15, 1452 – May 2, 1519, Old Style) was an Italian Renaissance polymath: painter, sculptor, architect, musician, mathematician, engineer, inventor, anatomist, geologist, cartographer, botanist, and writer. His genius, perhaps more than that of any other figure, epitomized the Renaissance humanist ideal.Leonardo has often been described as the archetype of the Renaissance Man, a man of "unquenchable curiosity" and "feverishly inventive imagination". He is widely considered to be one of the greatest painters of all time and perhaps the most diversely talented person ever to have lived. According to art historian Helen Gardner, the scope and depth of his interests were without precedent and "his mind and personality seem to us superhuman, the man himself mysterious and remote". Marco Rosci states that while there is much speculation about Leonardo, his vision of the world 

Now, let's say that we want to find the biography for all the entities returned by `entity.search`. We can encapsulate the previous steps into a function that can be applied to the DataFrame reulting from `entity.search`:

In [12]:
def get_bio_uri(uri):
  id = int(uri.split('/')[-1])
  resp = apis.entity.retrieve(
    TYPE = 'agent',
    IDENTIFIER = id,
  )

  bio_df = get_biography_df(resp)
  bio = bio_df['@value'].loc[bio_df['@language'] == 'en'].values[0]
  return bio

df['bio'] = df['id'].apply(get_bio_uri)
df.head()

Unnamed: 0,prefLabel.en,altLabel.en,id,type,dateOfBirth,dateOfDeath,bio
0,Leonardo da...,[Leonardo d...,http://data...,Agent,1452-04-15,1519-05-02,Leonardo di...
1,Leonardo Leo,"[Leo, Leona...",http://data...,Agent,1694-08-05,1744-10-31,Leonardo Le...
2,Leonardo Sc...,"[Sciascia, ...",http://data...,Agent,1921-01-08,1989-11-20,Leonardo Sc...
3,Leonardo Pa...,[Padura Fue...,http://data...,Agent,1955,,Leonardo Pa...
4,Bruno Leona...,"[Gelber, Br...",http://data...,Agent,1941-03-19,,Bruno Leona...


The biography in English has been added for each entity. Great!

Something of interest can be the place of birth and death of the agents. We can create a function as:

In [13]:
def get_place_resp(resp, event):

  if event == 'birth':
    if 'placeOfBirth' not in resp.keys():
      return
    place = resp['placeOfBirth']

  elif event == 'death':
    if 'placeOfDeath' not in resp.keys():
      return
    place = resp['placeOfDeath']

  if not place:
    return

  place = list(place[0].values())[0]
  
  if place.startswith('http'):
     place = place.split('/')[-1].replace('_',' ')
  return place



resp = apis.entity.retrieve(
   TYPE = 'agent',
   IDENTIFIER = 146741,
)
get_place_resp(resp, 'birth')


'Republic of Florence'

<font color='red'>WARNING</font>

The function above parses the URI and extracts the name of the places of birth and date. In reality we should use either the `resolve` method of the Entity API, if the URI is that of an entity in Europeana's Entity Collection, or seek to de-reference it using (Linked Data) [content negotiation](https://https://www.w3.org/DesignIssues/Conneg), if it is not known in the Entity Collection. 

Now we can add this information to the original DataFrame:

In [14]:
def get_place(uri,event):
  id = int(uri.split('/')[-1])
  resp = apis.entity.retrieve(
    TYPE = 'agent',
    IDENTIFIER = id,
  )
  return get_place_resp(resp,event)


df['placeOfBirth'] = df['id'].apply(lambda x: get_place(x,'birth'))
df['placeOfDeath'] = df['id'].apply(lambda x: get_place(x,'death'))
df.head()



Unnamed: 0,prefLabel.en,altLabel.en,id,type,dateOfBirth,dateOfDeath,bio,placeOfBirth,placeOfDeath
0,Leonardo da...,[Leonardo d...,http://data...,Agent,1452-04-15,1519-05-02,Leonardo di...,Republic of...,Kingdom of ...
1,Leonardo Leo,"[Leo, Leona...",http://data...,Agent,1694-08-05,1744-10-31,Leonardo Le...,,Naples
2,Leonardo Sc...,"[Sciascia, ...",http://data...,Agent,1921-01-08,1989-11-20,Leonardo Sc...,Racalmuto,Sicily
3,Leonardo Pa...,[Padura Fue...,http://data...,Agent,1955,,Leonardo Pa...,Cuba,
4,Bruno Leona...,"[Gelber, Br...",http://data...,Agent,1941-03-19,,Bruno Leona...,Buenos Aires,


The previous pipeline can be applied to any other agent:

In [15]:
resp = apis.entity.suggest(
   text = 'Marguerite Gérard',
   TYPE = 'agent',
)

df = pd.json_normalize(resp['items'])
df = df.drop(columns=[col for col in df.columns if 'isShownBy' in col])
df['bio'] = df['id'].apply(get_bio_uri)
df['placeOfBirth'] = df['id'].apply(lambda x: get_place(x,'birth'))
df['placeOfDeath'] = df['id'].apply(lambda x: get_place(x,'death'))
df.head()



Unnamed: 0,id,type,dateOfBirth,dateOfDeath,prefLabel.en,altLabel.en,bio,placeOfBirth,placeOfDeath
0,http://data...,Agent,1761-01-28,1837-05-18,Marguerite ...,[Marguerite...,Marguerite ...,France,Paris


Finally, we can use the method `resolve` for obtaining the entity matching a an external URI when it is present as entity in the Europeana Entity Collection. Find more information [in the documentation of the Entity API](https://pro.europeana.eu/page/entity#resolve)

In [16]:
resp = apis.entity.resolve('http://dbpedia.org/resource/Leonardo_da_Vinci')
resp.keys()

dict_keys(['@context', 'id', 'type', 'isShownBy', 'prefLabel', 'altLabel', 'dateOfBirth', 'end', 'dateOfDeath', 'placeOfBirth', 'placeOfDeath', 'biographicalInformation', 'identifier', 'sameAs'])

## Places

One of the types of entities we can work with are places. Let's get the place of death of the previous agent





In [17]:
place_of_death = df['placeOfDeath'].values[0]
place_of_death

'Paris'

We can now search the entity corresponding to this place by using the suggest method using `place` as the `TYPE` argument. 

In [18]:
resp = apis.entity.suggest(
   text = place_of_death,
   TYPE = 'place',

)
place_df = pd.json_normalize(resp['items'])
cols = place_df.columns.tolist()
cols = cols[-1:]+cols[:-1]
place_df = place_df[cols]
place_df.head()

Unnamed: 0,prefLabel.en,id,type,isPartOf
0,Paris,http://data...,Place,[{'id': 'ht...
1,La Defense,http://data...,Place,[{'id': 'ht...
2,Jõelähtme P...,http://data...,Place,[{'id': 'ht...
3,Vihula Parish,http://data...,Place,[{'id': 'ht...
4,Põlva Parish,http://data...,Place,[{'id': 'ht...


Let's use the first uri with the `retrieve` method

In [19]:
uri = place_df['id'].values[0]
IDENTIFIER = uri.split('/')[-1]

resp = apis.entity.retrieve(
   IDENTIFIER = IDENTIFIER,
   TYPE = 'place',
)
resp.keys()

dict_keys(['@context', 'id', 'type', 'prefLabel', 'altLabel', 'lat', 'long', 'isPartOf', 'sameAs'])

We can reuse the function `get_name_df` for places as well, as the response has a similar data structure as for `agent`

In [20]:
name_df = get_name_df(resp)
name_df.head()

Unnamed: 0,language,name
0,,Paris
1,de,Paris
2,en,Paris
3,es,Paris
4,fr,Paris


The response include the field `isPartOf`, which indicates an entity that the current entity belongs to, if any

In [21]:
is_part_uri = resp['isPartOf'][0]
is_part_uri

'http://data.europeana.eu/place/base/42377'

Let's see what this misterious uri refers to using the retrieve method

In [22]:
is_part_id = is_part_uri.split('/')[-1]
resp = apis.entity.retrieve(
   IDENTIFIER = is_part_id,
   TYPE = 'place',
)

name_df = get_name_df(resp)
name_df.head()

Unnamed: 0,language,name
0,,Île-de-France
1,de,Île-de-France
2,en,Île-de-France
3,es,Isla de Fra...
4,fr,Région pari...


It had to be the emblematic *Île-de-France*, of course! And its coordinates are:

In [23]:
f"lat: {resp['lat']}, long: {resp['long']}"

'lat: 48.7, long: 2.5'

## Concepts

Let's query for all concepts

In [24]:
resp = apis.entity.suggest(
   text = 'war',
   TYPE = 'concept',
)

resp['total']

3

We build a table containing the field `items`, were we can see the name and uri of the different concepts

In [25]:
df = pd.json_normalize(resp['items'])
df = df.drop(columns=[col for col in df.columns if 'isShownBy' in col])
df.head()

Unnamed: 0,id,type,prefLabel.en
0,http://data...,Concept,World War I
1,http://data...,Concept,War photogr...
2,http://data...,Concept,Raku ware


Do we want to know more information about the first concept of the list? We got it

In [26]:
concept_uri = df['id'].values[0]
concept_uri

'http://data.europeana.eu/concept/base/83'

In [27]:
concept_id = concept_uri.split('/')[-1]
resp = apis.entity.retrieve(
   IDENTIFIER = concept_id,
   TYPE = 'concept',
)

name_df = get_name_df(resp)
name_df.loc[name_df['language'] == 'en']

Unnamed: 0,language,name
11,en,World War I


The concept is World War I. We can get some related concepts from dbpedia

In [28]:
resp['related'][:5]

['http://dbpedia.org/resource/Category:Wars_involving_Nicaragua',
 'http://dbpedia.org/resource/Category:Wars_involving_the_United_Kingdom',
 'http://dbpedia.org/resource/Category:Wars_involving_Greece',
 'http://dbpedia.org/resource/Category:Wars_involving_Sri_Lanka',
 'http://dbpedia.org/resource/Category:Wars_involving_Czechoslovakia']

The field `note` contains a multilingual description of the concept

In [29]:
note_df = pd.json_normalize([{'lang':k,'note':v[0]} for k,v in resp['note'].items()])
note_df.head()

Unnamed: 0,lang,note
0,ar,الحرب العال...
1,az,Birinci dün...
2,be,Першая сусв...
3,bg,Първата све...
4,bs,Prvi svjets...


We can obtain the description for a particular language as

In [30]:
note_df['note'].loc[note_df['lang'] == 'en'].values[0]

"World War I (WWI or WW1), also known as the First World War, was a global war centred in Europe that began on 28 July 1914 and lasted until 11 November 1918. From the time of its occurrence until the approach of World War II, it was called simply the World War or the Great War, and thereafter the First World War or World War I. In America, it was initially called the European War. More than 9 million combatants were killed; a casualty rate exacerbated by the belligerents' technological and industrial sophistication, and tactical stalemate. It was one of the deadliest conflicts in history, paving the way for major political changes, including revolutions in many of the nations involved.The war drew in all the world's economic great powers, which were assembled in two opposing alliances: the Allies (based on the Triple Entente of the United Kingdom, France and the Russian Empire) and the Central Powers of Germany and Austria-Hungary. Although Italy had also been a member of the Triple A

## Tips for using entities with the Search API

Once we know the identifier for a certain entity we can use the Search API to obtain objects containing it. 

For instance we can query objects containing the entity "Painting" using its uri http://data.europeana.eu/concept/base/47

In [31]:
concept_uri = 'http://data.europeana.eu/concept/base/47'
resp = apis.search(
    query = f'"{concept_uri}"'
)

resp['totalResults']

120708

Notice that in order to use a uri as a query we need to wrap it in quotation marks "".

We might want to query for object belonging to more than one entity. We can simply do that by using logical operators in the query. Querying for paintings from the 16th century:

In [32]:
resp = apis.search(
    query = '"http://data.europeana.eu/timespan/16" AND "http://data.europeana.eu/concept/base/47"',
    media = True,
    qf = 'TYPE:IMAGE'
)

resp['totalResults']

300

Queyring for paintings with some relation to Paris

In [33]:
resp = apis.search(
    query = '"http://data.europeana.eu/place/base/41488" AND "http://data.europeana.eu/concept/base/47"',
    media = True,
    qf = 'TYPE:IMAGE'
)

resp['totalResults']

14


When querying for entities uris, the objects returned are those that have the requested uris in the metadata. 


However, not all objects contain this information and instead many of them contain the name of the entity. It is always a good idea to query for the name of the entities as well, as there might be more objects:


In [34]:
resp = apis.search(
    query = 'Paris AND Painting',
    media = True,
    qf = 'TYPE:IMAGE'
)

resp['totalResults']

3612

## Conclusions

In this tutorial we learned:

* What types of entities are available in the Europeana Entity API

* To use the `suggest` method for obtaining entities of a certain type matching a text query

* To use the `retrieve` method for obtaining information about an individual entity of a certain type

* To use the method `resolve` for obtaining entities that match a query url

* To process some of the fields contained in the responses of the methods above and convert the responses to Pandas dataframes

* To query for entities using Europeana Search API



