# Initialize Impresso Client

In this cell, we initialize the Impresso client to connect to the Impresso API. 
The `impresso` variable is an instance of `impresso.client.ImpressoClient`, which allows us to interact with the API 
and perform various operations such as searching for articles, retrieving article details, and fetching facets.

In [None]:
from impresso import connect

impresso = connect()

## Search content items

In this notebook, we will search for content items that contain the term "European Union" in the text. The results are ordered by date.

Below the result container is rendered as an overview of what it contains.


In [20]:
result = impresso.search.find(
    term="European Union",
    order_by="date",
)
result

Unnamed: 0_level_0,copyrightStatus,type,sourceMedium,title,topics,transcriptLength,totalPages,languageCode,isOnFrontPage,publicationDate,...,pageNumbers,collectionUids,entities.locations,entities.persons,entities.organisations,entities.newsAgencies,mentions.locations,mentions.persons,mentions.organisations,mentions.newsAgencies
uid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
luxwort-1948-11-25-a-i0033,in_cpy,ar,print,Europäischer Föderalistenkongreß in Rom,"[{'uid': 'tm-de-all-v2.0_tp25_de', 'relevance'...",733,1,de,False,1948-11-25T00:00:00+00:00,...,[6],[],[],"[{'uid': '2-50-Pius_XII.', 'count': 1}]",[],[],[],"[{'surfaceForm': 'Papst Pius XII', 'mentionCon...",[],[]
FZG-1950-06-17-a-i0045,in_cpy,ar,print,Um EJnropa herum Die Furcht vor Krieg un...,"[{'uid': 'tm-de-all-v2.0_tp86_de', 'relevance'...",1200,1,de,True,1950-06-17T00:00:00+00:00,...,[1],[],[],"[{'uid': '2-50-Konrad_Adenauer', 'count': 1}, ...",[],[],[],"[{'surfaceForm': 'Adenauer', 'mentionConfidenc...",[],[]
JDG-1954-11-03-a-i0032,in_cpy,ar,print,Washington et les négociations avec Moscou,"[{'uid': 'tm-fr-all-v2.0_tp29_fr', 'relevance'...",717,1,fr,False,1954-11-03T00:00:00+00:00,...,[3],[],"[{'uid': '2-54-Moscou', 'count': 1}, {'uid': '...","[{'uid': '2-50-Anthony_Eden', 'count': 1}, {'u...","[{'uid': '2-53-États-Unis', 'count': 1}, {'uid...",[],"[{'surfaceForm': 'Moscou', 'mentionConfidence'...","[{'surfaceForm': 'sir Anthony Eden', 'mentionC...",[{'surfaceForm': 'Assemblée des Nations Unies'...,[]


Below, we will search for a term "European Union" in the Impresso data.
Then we will use the `result` variable, to access and print titles of the first three articles returned by the search query.

The `pydantic` property is a [Pydantic](https://docs.pydantic.dev/latest/) model representing the response of the Impresso API. It provides a way to ensure that the data conforms to specified types and constraints, making it easier to work with structured data in a reliable and consistent manner.
We use the `data` property of the response to iterate over the page of the results and return titles of the articles that contain the search term.

In [23]:
result = impresso.search.find(
    term="European Union",
    order_by="date",
)
for article in result.pydantic.data[:3]:
    print(article.title)

Europäischer Föderalistenkongreß in Rom
Um EJnropa herum Die Furcht vor Krieg un...
Washington et les négociations avec Moscou


There are several useful properties on the result object that let us know the total nubmer of results found, the current page and its size.

In [24]:
print("%i results were found for this term. The current result object contains %i items starting from the item number %i" % (result.total, result.size, result.offset))

91 results were found for this term. The current result object contains 91 items starting from the item number 0


### Pydantic
The full response from the Impresso API as a pydantic model.

In [25]:
result.pydantic

SearchResponseSchema(data=[ContentItem(uid='luxwort-1948-11-25-a-i0033', copyrightStatus='in_cpy', type='ar', sourceMedium='print', title='Europäischer Föderalistenkongreß in Rom', transcript=None, entities=NamedEntities(locations=[], persons=[NamedEntity(uid='2-50-Pius_XII.', count=1.0)], organisations=[], newsAgencies=[]), mentions=EntityMentions(locations=[], persons=[EntityMention(surfaceForm='Papst Pius XII', mentionConfidence=95.88, startOffset=None, endOffset=None)], organisations=[], newsAgencies=[]), topics=[TopicMention(uid='tm-de-all-v2.0_tp25_de', relevance=0.202), TopicMention(uid='tm-de-all-v2.0_tp52_de', relevance=0.16), TopicMention(uid='tm-de-all-v2.0_tp86_de', relevance=0.157), TopicMention(uid='tm-de-all-v2.0_tp14_de', relevance=0.112), TopicMention(uid='tm-de-all-v2.0_tp77_de', relevance=0.112), TopicMention(uid='tm-de-all-v2.0_tp95_de', relevance=0.096), TopicMention(uid='tm-de-all-v2.0_tp24_de', relevance=0.057)], embeddings=None, transcriptLength=733.0, totalPage

### Pandas
We can also get the search results as a [Pandas](https://pandas.pydata.org/) DataFrame. 
This allows us to easily manipulate and analyze the data using pandas' powerful data manipulation capabilities.

In [26]:
df = result.df
df.head(2)

Unnamed: 0_level_0,copyrightStatus,type,sourceMedium,title,topics,transcriptLength,totalPages,languageCode,isOnFrontPage,publicationDate,...,pageNumbers,collectionUids,entities.locations,entities.persons,entities.organisations,entities.newsAgencies,mentions.locations,mentions.persons,mentions.organisations,mentions.newsAgencies
uid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
luxwort-1948-11-25-a-i0033,in_cpy,ar,print,Europäischer Föderalistenkongreß in Rom,"[{'uid': 'tm-de-all-v2.0_tp25_de', 'relevance'...",733,1,de,False,1948-11-25T00:00:00+00:00,...,[6],[],[],"[{'uid': '2-50-Pius_XII.', 'count': 1}]",[],[],[],"[{'surfaceForm': 'Papst Pius XII', 'mentionCon...",[],[]
FZG-1950-06-17-a-i0045,in_cpy,ar,print,Um EJnropa herum Die Furcht vor Krieg un...,"[{'uid': 'tm-de-all-v2.0_tp86_de', 'relevance'...",1200,1,de,True,1950-06-17T00:00:00+00:00,...,[1],[],[],"[{'uid': '2-50-Konrad_Adenauer', 'count': 1}, ...",[],[],[],"[{'surfaceForm': 'Adenauer', 'mentionConfidenc...",[],[]


## Get a content item

Below we will use the `content_items` resource to get an article by its ID:

In [27]:
content_item = impresso.content_items.get("NZZ-1794-08-09-a-i0002")
content_item

Unnamed: 0,uid,copyrightStatus,type,sourceMedium,transcript,topics,transcriptLength,totalPages,languageCode,isOnFrontPage,...,pageNumbers,collectionUids,entities.locations,entities.persons,entities.organisations,entities.newsAgencies,mentions.locations,mentions.persons,mentions.organisations,mentions.newsAgencies
0,NZZ-1794-08-09-a-i0002,in_cpy,page,print,^chutch schleunige Vorkehrungen der bewafneten...,"[{'uid': 'tm-de-all-v2.0_tp47_de', 'relevance'...",988,1,de,False,...,[2],[],"[{'uid': '2-54-Paris', 'count': 1}, {'uid': '2...","[{'uid': '2-50-François_Hanriot', 'count': 1}]",[],[],"[{'surfaceForm': 'Couthon', 'mentionConfidence...","[{'surfaceForm': 'Roberepiere', 'mentionConfid...",[],[]


We can also get it as a Pydantic model or as a DataFrame.

In [28]:
content_item.pydantic.transcript

'^chutch schleunige Vorkehrungen der bewafnetenNazi «» « ulmacht versichert. unV t >;.. 3 >; Stadthaus auf n...!)..,. zlVbereyiere wit sc! i! enAn. b >; wgsri! sich befand, mm \'. ügcn l » ssl ». Während der N « chi cischlcncndie vnser,«\'?.«.» \'NißliNererwaliungen i » P.. ric vor een,.« eum »., um »« lftlven il », cr Anbanglichteitzu veillcl\'ern. Gegen ren Älprgen <;» ml >; t dae Stadthaus erode. <;. leda »! ermordete \'Dh selbK. D >; e beyden Roberepiere versuchten ras glci- H ». D, r\'ältere schoß sich einen Theil de >;>; Gesichte hincks, jlnv rer jüngere stürz « sich aue dem Fenster. und Herbräch b <; yde Beine. Dcr ältere wu » de.. ufeiner Trag » , dnhrevortenKonvcntssaal geblecht, abermal. e. lMdt » erpicht, ihn herumzutragen. Hm ic,. Lhenmder (2z. Iich) Abends wurden beyde Roberepiere, Couthon, der . Migadcgeneral Lavalelce, Hanriot, Hominandont der Marter Nazionalgarde, Dümas, Präsident des Nevolu » Hsd »« ribnnals, St. Iuft, Payän Nazivna!» Aa « nt » n der \'Komüne ven Paris

In [29]:
content_item.df[['uid', 'countryCode', 'languageCode']]


Unnamed: 0,uid,countryCode,languageCode
0,NZZ-1794-08-09-a-i0002,CH,de


## Search facets

In this cell, we will search for facets related to the term "fromage" in the Impresso collection. This is a convenient way to see a breakdown of the search results by country.


In [30]:
country_facet = impresso.search.facet("country", term="fromage")
country_facet.df

Unnamed: 0_level_0,count
value,Unnamed: 1_level_1
CH,197740
FR,33993
LU,7478
