# Initialize Impresso Client

In this cell, we initialize the Impresso client to connect to the Impresso API. 
The `impresso` variable is an instance of `impresso.client.ImpressoClient`, which allows us to interact with the API 
and perform various operations such as searching for articles, retrieving article details, and fetching facets.

In [None]:
from impresso import connect

impresso = connect()

## Search content items

In this notebook, we will search for content items that contain the term "European Union" in the text. The results are ordered by date.

Below the result container is rendered as an overview of what it contains.


In [2]:
result = impresso.search.find(
    term="European Union",
    order_by="date",
)
result

Unnamed: 0_level_0,copyrightStatus,type,sourceMedium,title,locationEntities,personEntities,organisationEntities,newsAgenciesEntities,topics,transcriptLength,totalPages,languageCode,isOnFrontPage,publicationDate,issueUid,countryCode,providerCode,mediaUid,mediaType
uid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
luxwort-1948-11-25-a-i0033,in_cpy,ar,print,[REDACTED],[],"[{'uid': '2-50-Pius_XII.', 'count': 1}]",[],[],"[{'uid': 'tm-de-all-v2.0_tp25_de', 'relevance'...",733,1,de,False,1948-11-25T00:00:00+00:00,luxwort-1948-11-25-a,LU,BNL,luxwort,newspaper
FZG-1950-06-17-a-i0045,in_cpy,ar,print,[REDACTED],[],"[{'uid': '2-50-Konrad_Adenauer', 'count': 1}, ...",[],[],"[{'uid': 'tm-de-all-v2.0_tp86_de', 'relevance'...",1200,1,de,True,1950-06-17T00:00:00+00:00,FZG-1950-06-17-a,CH,SNL,FZG,newspaper
JDG-1954-11-03-a-i0032,in_cpy,ar,print,[REDACTED],"[{'uid': '2-54-Moscou', 'count': 1}, {'uid': '...","[{'uid': '2-50-Anthony_Eden', 'count': 1}, {'u...","[{'uid': '2-53-États-Unis', 'count': 1}, {'uid...",[],"[{'uid': 'tm-fr-all-v2.0_tp29_fr', 'relevance'...",717,1,fr,False,1954-11-03T00:00:00+00:00,JDG-1954-11-03-a,CH,SNL,JDG,newspaper


Below, we will search for a term "European Union" in the Impresso data.
Then we will use the `result` variable, to access and print the excerpts of the first three articles returned by the search query.

The `pydantic` property is a [Pydantic](https://docs.pydantic.dev/latest/) model representing the response of the Impresso API. It provides a way to ensure that the data conforms to specified types and constraints, making it easier to work with structured data in a reliable and consistent manner.
We use the `data` property of the response to iterate over the page of the results and return excerpts of the articles that contain the search term.

In [None]:
result = impresso.search.find(
    term="European Union",
    order_by="date",
)
for article in result.pydantic.data[:3]:
    print(article.transcript)

There are several useful properties on the result object that let us know the total nubmer of results found, the current page and its size.

In [4]:
print("%i results were found for this term. The current result object contains %i items starting from the item number %i" % (result.total, result.size, result.offset))

91 results were found for this term. The current result object contains 91 items starting from the item number 0


### Pydantic
The full response from the Impresso API as a pydantic model.

In [5]:
result.pydantic

SearchResponseSchema(data=[ContentItem(uid='luxwort-1948-11-25-a-i0033', copyrightStatus='in_cpy', type='ar', sourceMedium='print', title='[REDACTED]', transcript=None, locationEntities=[], personEntities=[NamedEntity(uid='2-50-Pius_XII.', count=1.0)], organisationEntities=[], newsAgenciesEntities=[], topics=[TopicMention(uid='tm-de-all-v2.0_tp25_de', relevance=0.202), TopicMention(uid='tm-de-all-v2.0_tp52_de', relevance=0.16), TopicMention(uid='tm-de-all-v2.0_tp86_de', relevance=0.157), TopicMention(uid='tm-de-all-v2.0_tp14_de', relevance=0.112), TopicMention(uid='tm-de-all-v2.0_tp77_de', relevance=0.112), TopicMention(uid='tm-de-all-v2.0_tp95_de', relevance=0.096), TopicMention(uid='tm-de-all-v2.0_tp24_de', relevance=0.057)], transcriptLength=733.0, totalPages=1.0, languageCode='de', isOnFrontPage=False, publicationDate=datetime.datetime(1948, 11, 25, 0, 0, tzinfo=TzInfo(UTC)), issueUid='luxwort-1948-11-25-a', countryCode='LU', providerCode='BNL', mediaUid='luxwort', mediaType='newsp

### Pandas
We can also get the search results as a [Pandas](https://pandas.pydata.org/) DataFrame. 
This allows us to easily manipulate and analyze the data using pandas' powerful data manipulation capabilities.

In [6]:
df = result.df
df.head(2)

Unnamed: 0_level_0,copyrightStatus,type,sourceMedium,title,locationEntities,personEntities,organisationEntities,newsAgenciesEntities,topics,transcriptLength,totalPages,languageCode,isOnFrontPage,publicationDate,issueUid,countryCode,providerCode,mediaUid,mediaType
uid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
luxwort-1948-11-25-a-i0033,in_cpy,ar,print,[REDACTED],[],"[{'uid': '2-50-Pius_XII.', 'count': 1}]",[],[],"[{'uid': 'tm-de-all-v2.0_tp25_de', 'relevance'...",733,1,de,False,1948-11-25T00:00:00+00:00,luxwort-1948-11-25-a,LU,BNL,luxwort,newspaper
FZG-1950-06-17-a-i0045,in_cpy,ar,print,[REDACTED],[],"[{'uid': '2-50-Konrad_Adenauer', 'count': 1}, ...",[],[],"[{'uid': 'tm-de-all-v2.0_tp86_de', 'relevance'...",1200,1,de,True,1950-06-17T00:00:00+00:00,FZG-1950-06-17-a,CH,SNL,FZG,newspaper


## Get a content item

Below we will use the `content_items` resource to get an article by its ID:

In [7]:
content_item = impresso.content_items.get("NZZ-1794-08-09-a-i0002")
content_item

Unnamed: 0,uid,copyrightStatus,type,sourceMedium,transcript,locationEntities,personEntities,organisationEntities,newsAgenciesEntities,topics,transcriptLength,totalPages,languageCode,isOnFrontPage,publicationDate,issueUid,countryCode,providerCode,mediaUid,mediaType
0,NZZ-1794-08-09-a-i0002,in_cpy,page,print,[REDACTED],"[{'uid': '2-54-Paris', 'count': 1}, {'uid': '2...","[{'uid': '2-50-François_Hanriot', 'count': 1}]",[],[],"[{'uid': 'tm-de-all-v2.0_tp47_de', 'relevance'...",988,1,de,False,1794-08-09T00:00:00+00:00,NZZ-1794-08-09-a,CH,NZZ,NZZ,newspaper


We can also get it as a Pydantic model or as a DataFrame.

In [8]:
content_item.pydantic.transcript

'[REDACTED]'

In [9]:
content_item.df[['uid', 'countryCode', 'languageCode']]


Unnamed: 0,uid,countryCode,languageCode
0,NZZ-1794-08-09-a-i0002,CH,de


## Search facets

In this cell, we will search for facets related to the term "fromage" in the Impresso collection. This is a convenient way to see a breakdown of the search results by country.


In [10]:
country_facet = impresso.search.facet("country", term="fromage")
country_facet.df

Unnamed: 0_level_0,count
value,Unnamed: 1_level_1
CH,197704
FR,33993
LU,7474
