# Example of sourcing data combining Search and Record API

This notebook leverages the API python wrapper to build a series of API calls with the Search and Record API.
The idea to use both Search API and Record API is because  looking up results only with the Search API does not always allow to retrieve all results, some fields are not shown,while the Record API related to an object contains all fields.
For example, not all multilingual fields are served in the Search API.
If we wanted to retrieve data for multilingual fields that are not served by the Search API an option is to use the Search API in combination with the Record API to retrieve all fields for a specific item.


This notebook leverages the SEARCH and RECORD API and the Python Wrapper developed by the RnD team

You can find more details about the search and the record APIs in the following links:

*   https://pro.europeana.eu/page/search
*   https://pro.europeana.eu/page/record

Those APIs serve data using the Europeana Data Model: https://pro.europeana.eu/page/intro#edm


In [1]:
#Importing libraries
import pandas as pd
import pyeuropeana.apis as apis
import pyeuropeana.utils as utils
import os
pd.set_option("display.max_rows", 600)
pd.set_option("display.max_columns", 600)
pd.options.mode.chained_assignment = None

 # Combination Search and Record API to retrieve field values

Here  example we look for the field value that for  proxy_dc_type.it
proxy_dc_type is indexed in Solr but the results are not returned by the Search API. 
Therefore to retrive the value of it fields for a particolar record we combine the use of record and search API.

In [2]:
#setting enviroment variable
os.environ['EUROPEANA_API_KEY'] = 'api2demo'

# Functions

In [3]:

def language_queries(lang1,bilingual,lang2='en'):
    """ This function builds a set of queries to extract information  
    on monolingual or bilingual fields, where one of the fields is english 
    by default 
    lang1: string first language, use ISO code (ex: fr for french)
    bilingual: string possible values are "AND" or "NOT" to select monolingual or 
    bilingual queries respectively
    """
    queries={
        'dc_description': f'(proxy_dc_description.{lang1}:* {bilingual} proxy_dc_description.{lang2}:*)',
        'dc_title': f'(proxy_dc_title.{lang1}:* {bilingual} proxy_dc_title.{lang2}:*)',
        'dc_subject': f'(proxy_dc_subject.{lang1}:* {bilingual} proxy_dc_subject.{lang2}:*)',
        'dc_coverage': f'(proxy_dc_coverage.{lang1}:* {bilingual} proxy_dc_coverage.{lang2}:*)',
        'edm_current_location':f'(proxy_edm_currentLocation.{lang1}:* {bilingual} proxy_edm_currentLocation.{lang2}:*)',
        'dcterms_medium': f'(proxy_dcterms_medium.{lang1}:* {bilingual} proxy_dcterms_medium.{lang2}:*)',
        'dcterms_hasPart':f'(proxy_dcterms_hasPart.{lang1}:* {bilingual} proxy_dcterms_hasPart.{lang2}:*)',
        'dcterms_spatial':f'(proxy_dcterms_spatial.{lang1}:* {bilingual} proxy_dcterms_spatial.{lang2}:*)',
        'dc_format':f'(proxy_dc_format.{lang1}:* {bilingual} proxy_dc_format.{lang2}:*)',
        'dc_source':f'(proxy_dc_source.{lang1}:* {bilingual} proxy_dc_source.{lang2}:*)',
        'dc_rights':f'(proxy_dc_rights.{lang1}:* {bilingual} proxy_dc_rights.{lang2}:*)',
        'dc_terms_alternative':f'(proxy_dcterms_alternative.{lang1}:* {bilingual} proxy_dcterms_alternative.{lang2}:*)',
        'dc_type': f'(proxy_dc_type.{lang1}:* {bilingual} proxy_dc_type.{lang2}:*)',
        'dcterms_isPartOf': f'(proxy_dcterms_isPartOf.{lang1}:* {bilingual} proxy_dcterms_isPartOf.{lang2}:*)',
        'dcterms_provenance': f'(proxy_dcterms_provenance.{lang1}:* {bilingual} proxy_dcterms_provenance.{lang2}:*)',
        'dcterms_temporal': f'(proxy_dcterms_temporal.{lang1}:* {bilingual} proxy_dcterms_temporal.{lang2}:*)',
        'edm_isRelatedTo': f'(proxy_edm_isRelatedTo.{lang1}:* {bilingual} proxy_edm_isRelatedTo.{lang2}:*)',
        'edm_dataProvider': f'(provider_aggregation_edm_dataProvider.{lang1}:* {bilingual} provider_aggregation_edm_dataProvider.{lang2}:*)',
        'edm_intermediateProvider': f'(provider_aggregation_edm_intermediateProvider.{lang1}:* {bilingual} provider_aggregation_edm_intermediateProvider.{lang2}:*)',
        'edm_provider': f'(provider_aggregation_edm_provider.{lang1}:* {bilingual} provider_aggregation_edm_provider.{lang2}:*)',
        'dcterms_isReferencedBy': f'(wr_dcterms_isReferencedBy.{lang1}:* {bilingual} wr_dcterms_isReferencedBy.{lang2}:*)'
            }
    return queries

In [4]:
def multiple_language_queries(lang ,bilingual, lang2='en'):
    """This function builds a dictionary where each key is a 
    language and the values are all the queries from function language_queries 
    lang: lists of languages, ISO format
    bilingual: string possible values are "AND" or "NOT" to select monolingual or 
    bilingual queries
    """
    queries={}
    for l in lang:
        queries_single_l=language_queries(l,bilingual,lang2='en')
        queries[l]=queries_single_l
    return queries

In [5]:
def tot_results_queries(lang ,n_rows=1, save=False, biling=True):
    """This function returns a dataframe , 
    the first column indicates the metadata considered   
    the second column the number of hits for that specific field
    the index of the dataframe are the languages in the parameter lang
    lang:  languages, ISO format
    save: boolean, if TRUE the resulting dataframe is saved as csv file
    n_rows: parameter for the number of returned items
    """
    if biling:
        queries_dict=multiple_language_queries(lang ,bilingual='AND', lang2='en')
    else:
        queries_dict=multiple_language_queries(lang ,bilingual='NOT', lang2='en')
    df=pd.DataFrame(index=lang)
    for l in lang:
        for key, value in queries_dict[l].items():  
            CHO_data = apis.search(query = '*:*',qf=f'{value}' ,rows = n_rows)
            tot_results=CHO_data['totalResults']
            df.loc[l,key]=tot_results 
    df.loc[:,'Tot_results'] = df.iloc[:,:].sum(axis=1)
    if 'en' in df.index:
        df.drop('en', axis=0, inplace=True)
    df_percentage=pd.DataFrame(columns= df.columns, index=df.index)
    for col in df.columns:
        df_percentage[col]=df[col]/df.Tot_results
    df_percentage.drop('Tot_results',axis=1, inplace=True)
    df_percentage.loc[:,'Tot_results'] = df_percentage.iloc[:,:].sum(axis=1)
    if 'en' in df_percentage.index:
        df_percentage.drop('en', axis=0, inplace=True)
    tot_lang='_'.join(lang)
    if save and biling:
        df.to_csv(f'{today}_{tot_lang}_tot_results_bilingual.csv')
    elif save and not biling:
        df.to_csv(f'{today}_{tot_lang}_tot_results_monolingual.csv')
    return df, df_percentage


In [6]:
def queries_items_uri(lang, n_rows=1, save=False,biling=True):
    """This function build a dataframe where the first column indicates
    the type of query executed and the second the item that satisfies that query
    lang: string language in ISO format, takes one value of lang (not a list)
    save: boolean, if TRUE the resulting dataframe is saved as excel file
    n_rows: parameter for the number of returned items
    biling: boolean, if TRUE bilingual version of the queries is used, if FALSE the 
    monolingual version"""
    if biling:
        queries_dict=multiple_language_queries([lang] ,bilingual='AND', lang2='en')
    else:
        queries_dict=multiple_language_queries([lang] ,bilingual='NOT', lang2='en')
    # initalizing list of dataframes
    df_list=[]
    for _ ,value in queries_dict[lang].items():  
        print(value)
        df=pd.DataFrame(columns=['field','europeana_uri'])
        CHO_data = apis.search(query = '*:*',qf=f'{value}' ,rows = n_rows)
        n_files=CHO_data['totalResults']
        if n_files > 0:
            CHO_data_all = apis.search(query = '*:*',qf=f'{value}' ,rows = n_files)
            print('ok')
            df['europeana_uri']=utils.search2df(CHO_data_all).uri
            print(len(df))
            df['field']=value
            df_list.append(df)     
        else:
            pass 
    df_tot = pd.concat(df_list, ignore_index=True) # concatenate all dataframes from all queries
    df_tot_clear_dup=df_tot.drop_duplicates(subset=None, keep='first', inplace=False)
    if save and biling:
        df_tot_clear_dup.to_csv(f'{today}_{lang}_en_bilingual.csv')
    elif save and not biling:
        df_tot_clear_dup.to_csv(f'{today}_{lang}_monolingual.csv')
    return df_tot_clear_dup

In [7]:
def monoling_biling_to_stack(mono_nr, bili_nr, lang_list,save=True):
    """ This function generates a dataframe whose columns are 
    - the number of bilingual tags,
    - the number of monolingual tags
    - the number of total tags
    The index of the df are the languages contained in lang_list
    Three version of the dataframe are generated
    - df_sorted_biling: rows sorted per descendinng values of bilignual tags
    - df_sorted_monloling: rows sorted per descendinng values of monoling tags
    -df_sorted_tot_lang_tagged: rows sorted per descendinng values of total tags
      monlolingual and bilingual
      Parameters
      mono_nr: number of monlingual hits per language- series
      bilin_nr: number of monlingual hits per language- series
      lang_list: list of languages considered
    """
    df_tot=pd.DataFrame({'n_biling_tag':bili_nr.Tot_results,'n_monoling_tag':mono_nr.Tot_results}, index=lang_list)
    df_tot.loc[:,'Tot_lang_tag']=df_tot.loc[:,'n_biling_tag']+df_tot.loc[:,'n_monoling_tag']
    if 'en'in df_tot.index:
        df_sorted=df_tot.drop('en',axis=0)  
    else:
        df_sorted=df_tot
    df_sorted_biling=df_sorted.sort_values(by='n_biling_tag', ascending=False)
    df_sorted_monoling=df_sorted.sort_values(by='n_monoling_tag', ascending=False)
    df_sorted_tot_lang_tagged=df_sorted.sort_values(by='Tot_lang_tag', ascending=False)
    if save:
        file_name=f'{today}_mono_bilingual_tot_results.csv'
        df_sorted_biling.to_csv(file_name)
    return df_sorted_biling,df_sorted_monoling,df_sorted_tot_lang_tagged

In [8]:
from pathlib import Path

In [53]:
saving_path = Path('/content/')
saving_path

PosixPath('/content')

In [9]:
import os

In [28]:
#Function to extract europeana_id numbers that correspond to a certain query
def search_api(query, n_objects,batch_size):
    # response = apis.search(
    # query = query,
    # rows = n_rows, 
    # profile='rich'
    # )
    saving_path = Path('./') # path to save the files
    n_objects = n_objects # total number of objects to query for
    batch_size = batch_size 
    batch_size_list = [batch_size for i in range(n_objects // batch_size)]
    rem = n_objects % batch_size
    if rem > 0:
        batch_size_list = batch_size_list + [rem]
    cursor = '*'
    for i,rows in enumerate(batch_size_list):
      response = apis.search(
          query = query,
          rows = rows,
          cursor = cursor)
      if 'nextCursor' in response.keys():
            cursor = response['nextCursor'] # we get the cursor from the last 
      df=utils.search2df(response).europeana_id
      df= pd.DataFrame(df)
      df.to_csv(saving_path.joinpath(f'df_{i}.csv'))
      if i>=1:
          df_1=pd.read_csv(f'df_{i-1}.csv',index_col=0)
          os.remove(f'df_{i-1}.csv')
          df_temp=pd.concat([df,df_1], axis=0)
          df_temp.to_csv(saving_path.joinpath(f'df_{i}.csv'))
    df_temp.reset_index(inplace=True,drop = True)
    return df_temp

In [19]:
#Function to extract data that correspond to the id numbers found in search_api function
def record_api(items_id):
    df_list=[]
    for item in items_id:
        data=apis.record(f'{item}')
        df_0=pd.json_normalize(data,['object','proxies'])
        df_proxy_provider=df_0.iloc[1] #selcting provider proxy - there are the info I am interested in
        df_proxy_provider=pd.DataFrame(df_proxy_provider)
        df_proxy_provider=df_proxy_provider.transpose()
        df_list.append(df_proxy_provider)
    df_proxy_tot = pd.concat(df_list, ignore_index=True, axis=0)
    return df_proxy_tot

In [29]:
query= '(proxy_dc_type.en:* AND proxy_dc_type.de:*)'

In [64]:
15*5

75

In [30]:
search_api(query, 10,3)

Unnamed: 0,europeana_id
0,/9200360/BibliographicResource_3000100168703
1,/9200360/BibliographicResource_3000100168709
2,/9200360/BibliographicResource_3000100168708
3,/9200360/BibliographicResource_3000100168707
4,/9200360/BibliographicResource_3000100168715
5,/9200360/BibliographicResource_3000100168714
6,/9200360/BibliographicResource_3000100168713
7,/9200360/BibliographicResource_3000100168721
8,/9200360/BibliographicResource_3000100168720
9,/9200360/BibliographicResource_3000100168719


In [31]:
def record_api(items_id):
    df_list=[]
    for item in items_id:
        data=apis.record(f'{item}')
        data_jnorm=pd.json_normalize(data)
        df_list.append(data_jnorm)
    df_jnorm_tot = pd.concat(df_list, ignore_index=True, axis=0)
    return df_jnorm_tot

In [32]:
def retrieve_norm_data(query, n_rows,batch_size):
    search_results_list=search_api(query, n_rows,batch_size)
    record_data=record_api(search_results_list)
    return record_data

In [None]:
from pathlib import Path

# the idea is to divide search into batches to avoid the memory issue
# the dataset will be saved in several files

n_objects = 1907 # total number of objects to query for
batch_size = 320 

batch_size_list = [batch_size for i in range(n_objects // batch_size)]
rem = n_objects % batch_size
print(rem)
if rem > 0:
    batch_size_list = batch_size_list + [rem]
    
    

#saving_path = Path('/content/') # path to save the files

query = '(proxy_dc_description.en:* AND proxy_dc_description.fi:*)'

#query = '*'

cursor = '*'
df = pd.DataFrame()
for i,rows in enumerate(batch_size_list):
  response = apis.search(
      query = query,
      rows = rows,
      cursor = cursor
  )
  if 'nextCursor' in response.keys():
    cursor = response['nextCursor'] # we get the cursor from the last response to use it in the next iteration of the loop
  #df.to_csv(saving_path.joinpath(f'df_{i}.csv'),index = False)


# checking that we get as many unique europeana ids as objects requested
# df = pd.DataFrame()
# for i in range(len(batch_size_list)):
#   print(pd.read_csv(saving_path.joinpath(f'df_{i}.csv')).shape)
#   df = df.append(pd.read_csv(saving_path.joinpath(f'df_{i}.csv')))

# df.europeana_id.unique().shape

In [22]:
query= '(proxy_dc_type.en:* AND proxy_dc_type.de:*)'

In [33]:
retrieve_norm_data(query, 200,4)

KeyboardInterrupt: 

In [19]:
df_search

NameError: name 'df_search' is not defined

In [20]:
df_proxy_tot=record_api(df_search)
df_proxy_tot.head()

NameError: name 'df_search' is not defined

In [48]:
query= '(proxy_dc_type.it:*)' #dc_type not served in Search API results
df_search=search_api(query)
df_search

TypeError: search_api() missing 2 required positional arguments: 'n_objects' and 'batch_size'


provider_aggregation_edm_dataProvider
provider_aggregation_edm_intermediateProvider
provider_aggregation_edm_provider
wr_dcterms_isReferencedBy

In [87]:
queries= ['(wr_dcterms_isReferencedBy:*)'] #dc_type not served in Search API results
df_list=[]
for query in queries:
    df_search=search_api(query)
    df_proxy_tot=record_api(df_search)
    df_list.append(df_proxy_tot) 
df_tot = pd.concat(df_list, ignore_index=True, axis=0)


TypeError: search_api() missing 2 required positional arguments: 'n_objects' and 'batch_size'

In [74]:
query= '(provider_aggregation_edm_provider.fr:*)'
#item=search_api(query,10,3)

# data=apis.record(f'{item}')
# data

In [46]:
data=apis.record('/09903/FE4D667B5DE8E51E3D59577425C7CDD17523E23C')


In [47]:
data

{'apikey': 'api2demo',
 'success': True,
 'statsDuration': 224,
 'requestNumber': 999,
 'object': {'about': '/09903/FE4D667B5DE8E51E3D59577425C7CDD17523E23C',
  'aggregations': [{'about': '/aggregation/provider/09903/FE4D667B5DE8E51E3D59577425C7CDD17523E23C',
    'edmDataProvider': {'def': ['http://data.europeana.eu/organization/1482250000004509131']},
    'edmIsShownAt': 'http://xml.memovs.ch/s009CD019a.xml',
    'edmProvider': {'def': ['http://data.europeana.eu/organization/1482250000004509131']},
    'edmRights': {'def': ['http://rightsstatements.org/vocab/InC/1.0/']},
    'edmUgc': 'false',
    'aggregatedCHO': '/item/09903/FE4D667B5DE8E51E3D59577425C7CDD17523E23C',
    'webResources': [{'about': 'http://xml.memovs.ch/s009CD019a.xml',
      'textAttributionSnippet': 'Musique d’été 2 - https://www.europeana.eu/item/09903/FE4D667B5DE8E51E3D59577425C7CDD17523E23C. Mariétan, Pierre. Médiathèque Valais - Martigny - http://xml.memovs.ch/s009CD019a.xml. In Copyright - http://rightsstateme

In [230]:
utils.process_CHO_record(data)

{'europeana_id': '/449/libria_317731',
 'image_url': 'https://www.byterfly.eu/iiif-server/iiif/2/http%3A%2F%2F150.145.48.48%3A8080%2Ffedora%2Fobjects%2Flibria:317746%2Fdatastreams%2FJP2%2Fcontent/full/full/0/default.jpg',
 'uri': 'http://data.europeana.eu/item/449/libria_317731',
 'dataset_name': '449_CulturaItalia_Byterfly_Leconomista',
 'country': 'Italy',
 'language': 'it',
 'type': 'TEXT',
 'title': "Avviso al popolo sul bisogno suo primario o sia Trattato sulla totale e perfetta libertà nel commercio de' grani",
 'title_lang': {'def': "Avviso al popolo sul bisogno suo primario o sia Trattato sulla totale e perfetta libertà nel commercio de' grani"},
 'rights': 'http://rightsstatements.org/vocab/NoC-OKLR/1.0/',
 'provider': 'http://data.europeana.eu/organization/1482250000000338951',
 'provider_lang': {'def': 'http://data.europeana.eu/organization/1482250000000338951'}}

In [80]:
response = apis.search(
    query = query,
    rows = 1, 
    profile='rich'
    )

In [None]:
data

In [76]:
flat=pd.json_normalize(response)
flat

Unnamed: 0,apikey,success,requestNumber,itemsCount,totalResults,nextCursor,items,url,params.wskey,params.query,params.qf,params.reusability,params.media,params.thumbnail,params.landingpage,params.colourpalette,params.theme,params.sort,params.profile,params.rows,params.cursor,params.callback,params.facet
0,api2demo,True,999,1,13022,AoE/EC8wOTkwMy9GRkUxMUJGREZGRDQyMTIxQkVGNTM3Nj...,"[{'completeness': 0, 'country': ['Switzerland'...",https://api.europeana.eu/record/v2/search.json...,api2demo,(provider_aggregation_edm_provider.fr:*),,,,,,,,europeana_id,rich,1,*,,


In [82]:
df=pd.json_normalize(data=response,record_path=['items'])

In [86]:
list(df.columns)

['completeness',
 'country',
 'dataProvider',
 'dcContributor',
 'dcCreator',
 'dcDescription',
 'dcLanguage',
 'dctermsSpatial',
 'edmDatasetName',
 'edmIsShownAt',
 'edmPlace',
 'edmPlaceAltLabel',
 'edmPlaceLabel',
 'edmPlaceLatitude',
 'edmPlaceLongitude',
 'europeanaCollectionName',
 'europeanaCompleteness',
 'guid',
 'id',
 'index',
 'language',
 'link',
 'organizations',
 'previewNoDistribute',
 'provider',
 'rights',
 'score',
 'timestamp',
 'timestamp_created',
 'timestamp_created_epoch',
 'timestamp_update',
 'timestamp_update_epoch',
 'title',
 'type',
 'ugc',
 'dcContributorLangAware.def',
 'dcCreatorLangAware.def',
 'dcDescriptionLangAware.def',
 'dcLanguageLangAware.def',
 'dcSubjectLangAware.def',
 'dcTitleLangAware.def',
 'dcTypeLangAware.def',
 'edmPlaceAltLabelLangAware.def',
 'edmPlaceLabelLangAware.hi',
 'edmPlaceLabelLangAware.ps',
 'edmPlaceLabelLangAware.pt',
 'edmPlaceLabelLangAware.def',
 'edmPlaceLabelLangAware.hr',
 'edmPlaceLabelLangAware.hu',
 'edmPlaceLabe

In [57]:
pd.json_normalize(response.items)

NotImplementedError: 

In [201]:
'object.aggregations' in flat.columns

True

In [69]:
if 'object.aggregations'in flat.columns:
    flat_1=pd.json_normalize(data, record_path=['object','aggregations']) 
else:
    flat_1=pd.DataFrame()
if 'object.organizations'in flat.columns:    
    flat_2=pd.json_normalize(data, record_path=['object','organizations'])
else:
    flat_2=pd.DataFrame()
# if 'object.aggregations'in flat.columns:        
#     flat_3=pd.json_normalize(data, record_path=['object','aggregations','webResources'])
# else:
    #flat_3=pd.DataFrame()
if 'object.places'in flat.columns:  
    flat_3=pd.json_normalize(data, record_path=['object','places'])
else:
    flat_3=pd.DataFrame()
if 'object.providedCHOs'in flat.columns:
    flat_4=pd.json_normalize(data, record_path=['object','providedCHOs'])
else:
    flat_4=pd.DataFrame()
if 'object.proxies'in flat.columns:
    #Here I select the proider proxy
    flat_5=pd.json_normalize(data, record_path=['object','proxies'])
    flat_5_filt=flat_6.loc[[1]].reset_index(drop=True)
else:
    flat_5_filt=pd.DataFrame()
# if 'object.qualityAnnotations'in flat.columns:
#     flat_7=pd.json_normalize(data, record_path=['object','qualityAnnotations'])
# else:
#     flat_7=pd.DataFrame()

In [34]:
# #here I select edmDataProvider,edmProvider,edmRights
# flat_1_filter=flat_1.drop(['about','edmIsShownAt','edmUgc','aggregatedCHO','webResources'], axis=1)
# #here I select textAttributionSnippet from webresources
# flat_3_web_res_filt=flat_3.drop(['about','htmlAttributionSnippet','ebucoreHasMimeType','ebucoreFileByteSize','rdfType'], axis=1)
# #Here I select the concepts
# flat_4_filt=flat_4.drop(['about','latitude','longitude','owlSameAs'], axis=1)
# #Here I select the proider proxy
# flat_6_filt=flat_6.iloc[[1]].drop(['about', 'proxyIn', 'proxyFor' ],axis=1).reset_index(drop=True)
# flat_6_filt

In [66]:
flat_2

In [70]:
df=pd.concat([flat_1 ,flat_2,flat_3,flat_4,flat_5_filt],axis=1)
df

In [25]:
flat_3

Unnamed: 0,about,textAttributionSnippet,htmlAttributionSnippet,ebucoreHasMimeType,ebucoreFileByteSize,ebucoreWidth,ebucoreHeight,edmHasColorSpace,edmComponentColor,ebucoreOrientation,rdfType,edmSpatialResolution,webResourceDcRights.def,webResourceEdmRights.def,dcFormat.def,dcSource.def,dcCreator.def
0,https://deriv.nls.uk/dcn30/1043/5077/104350772...,Ladies' Edinburgh Debating Society publication...,<link rel='stylesheet' type='text/css' href='h...,image/jpeg,117819,1000.0,1691.0,sRGB,"[#DCDCDC, #FFE4C4, #F5DEB3, #EEE8AA, #D3D3D3, ...",portrait,,,,,,,
1,https://deriv.nls.uk/dcn23/1039/0013/103900138...,Ladies' Edinburgh Debating Society publication...,<link rel='stylesheet' type='text/css' href='h...,application/pdf,13791933,,,,,,http://www.europeana.eu/schemas/edm/FullTextRe...,0.0,[The work is likely to be in the public domain...,[http://rightsstatements.org/vocab/CNE/1.0/],[pdf],[#Resource:103655663],[National Library of Scotland]
2,https://digital.nls.uk/103655663,Ladies' Edinburgh Debating Society publication...,<link rel='stylesheet' type='text/css' href='h...,text/html,58668,,,,,,http://www.europeana.eu/schemas/edm/FullTextRe...,,,,,,


In [27]:
query= '(wr_dcterms_isReferencedBy:en*)'
df_search=search_api(query)[0]
data=apis.record(f'{df_search}')
df_0=pd.json_normalize(data,['object','aggregations'])
# df_1=pd.json_normalize(df_0,[)

TypeError: search_api() missing 2 required positional arguments: 'n_objects' and 'batch_size'

In [None]:
data=apis.record('/09903/FFF7C96578DEEAA154DC865CFF18ADE808258C8F')
data

In [319]:
response = apis.search(
    query = 'proxy_dc_description.nl:*',
    rows = 5, 
    profile='standard'
    )

In [310]:
df=pd.json_normalize(response)
df                    

Unnamed: 0,apikey,success,requestNumber,itemsCount,totalResults,nextCursor,items,url,params.wskey,params.query,params.qf,params.reusability,params.media,params.thumbnail,params.landingpage,params.colourpalette,params.theme,params.sort,params.profile,params.rows,params.cursor,params.callback,params.facet
0,api2demo,True,999,5,1866111,AoEuLzkwNDAyL1NLX0NfODg=,"[{'completeness': 10, 'country': ['Netherlands...",https://api.europeana.eu/record/v2/search.json...,api2demo,proxy_dc_description.nl:*,,,,,,,,europeana_id,standard,5,*,,


# Search API and Rercord API

The Search and Rercod API have different fields. So lets check how the multilingual fields are mapped from Search API to Record API

In [322]:
df_items=pd.json_normalize(response, record_path=['items'])

# Conclusions

Here we saw how to combine Search APi and Record API to retrieve field values that are not served by the Saerch API alone. The general idea is that the record API serves all results while the Search API only a fraction of it to optmize perfomance. 