# The EP full-text library - Lesson 2
This notebook expands on lesson 1 to dive into more advanced concepts of EPAB, the implementation in TIP of the EP full-text library. We will introduce querying by full text fields, divisionals and parents, and search report fields. As we did in the first notebook, we first create an instance of the EPAB library.

In [2]:
# Importing the EPAB client
from epo.tipdata.epab import EPABClient

# creating an instance of the EPAB client
epab = EPABClient()


## Querying by full text fields
Much like the [EP full-text search](https://www.epo.org/en/searching-for-patents/technical/ep-full-text), one of the most powerful features of the EPAB library is that it gives you access to the description, claims, title and abstract of the publications within the EPAB database. 

### Querying by the title
You can search for applications containing one or more terms in the title. When performing a first search for patent publications of a given technological concept, it is generally a good approach to search in the title, since when a publication contains the search term in the title it is likely that it is a good match for your search query. If you followed lesson 1, you probably can guess nomenclature of the search method: `query_title`.

In [3]:
# querying by the title of the publication with the word 'covid'
q = epab.query_title('covid')
q.get_results("title", limit=5, output_type='list')


[{'title': {'de': 'CANNABIDIOL ZUR ERHÖHUNG DER IMPFSTOFFVERMITTELTEN IMMUNITÄT UND PROPHYLAXE VON COVID-19',
   'en': 'CANNABIDIOL FOR AUGMENTING VACCINE MEDIATED IMMUNITY AND PROPHYLAXIS OF COVID-19',
   'fr': "CANNABIDIOL DESTINÉ À AUGMENTER L'IMMUNITÉ MÉDIÉE PAR UN VACCIN ET LA PROPHYLAXIE DE LA COVID-19"}},
 {'title': {'de': 'NEUE ANWENDUNG EINER IMMUNOGENEN ODER IMPFSTOFFZUSAMMENSETZUNG GEGEN COVID-19',
   'en': 'NEW APPLICATION OF AN IMMUNOGENIC OR VACCINE COMPOSITION AGAINST COVID-19',
   'fr': "NOUVELLE APPLICATION D'UNE COMPOSITION IMMUNOGENE OU VACCINALE CONTRE LA COVID-19"}},
 {'title': {'de': 'IMMUNOGENE ZUSAMMENSETZUNGEN UND IMPFSTOFFE MIT MASERNVEKTORISIERTEM COVID-19',
   'en': 'MEASLES-VECTORED COVID-19 IMMUNOGENIC COMPOSITIONS AND VACCINES',
   'fr': "COMPOSITIONS ET VACCINS IMMUNOGÉNIQUES CONTRE LA COVID-19 À BASE D'UN VECTEUR DE VIRUS DE LA ROUGEOLE"}},
 {'title': {'de': 'MODULABTASTVORRICHTUNG ZUR ÜBERPRÜFUNG DES COVID-19-STATUS',
   'en': 'MODULE SCANNER DEVICE FO

#### Understanding fulltext languages
You can see in the result that the title field contains a dictionary with three titles. It is very important, when working with fulltext, to take into consideration that the EPO publishes the fulltext fields in the three official languages: German, English, and French.

When you search for a term in a fulltext field, by default you will search in all three languages. This can be problematic. A good example of a search query that would yield different results in English and German is the word "Gift."

In English, "gift" refers to a present or something given willingly to someone without payment. However, in German, "Gift" means "poison." You can change this by specifying one or more of the official languages with the strings `EN`, `DE` and `FR`.

In [4]:
# searching for publications with the word GIFT only in the English title
q = epab.query_title('gift', language="EN")
q.get_results("title", limit=5, )

Unnamed: 0,title.de,title.en,title.fr
0,Faltbarer Geschenkkorb,Foldable gift basket,Panier pliable pour cadeaux
1,Synthetisches Geschenkpapier,Synthetic gift paper,Papier synthétique pour emballages cadeaux
2,KOMBINATION VON PHOTORAHMEN UND GLÜCKWUNSCHKARTE,PHOTOFRAME AND GIFT CARD COMBINATION,ENSEMBLE CADRE POUR PHOTOGRAPHIE ET CARTE
3,"SYSTEM ZUM VERPACKEN, VERARBEITEN UND AKTIVIER...","SYSTEM FOR PACKAGING, PROCESSING, AND ACTIVATI...","SYSTÈME D'EMBALLAGE, DE TRAITEMENT ET D'ACTIVA..."
4,VERFAHREN UND SYSTEM UM ELEKTRONISCH EIN ONLIN...,METHODS AND SYSTEMS FOR ELECTRONICALLY ACCEPTI...,PROCEDES ET SYSTEMES POUR ACCEPTER ET ECHANGER...


#### Refresher of query combination
We saw in lesson 1 that we can combine queries to create more complex queries. Let's see if there are any publications that contain the word gift in both the German and English titles. 

In [8]:
# we get a second query with publications mentioning poison, in German
r = epab.query_title('gift', language="DE")
print (f'publications with the word Gift in German', r)

#combining the two queries
s = q & r

print (f'Poisionus gifts found:', s)

publications with the word Gift in German 1520 publications
Poisionus gifts found: 0 publications


### Case sensitivity
You have seen that we are querying in lowercase and the titles are displayed in all uppercase. It will come at no surprise that the search for full text terms is by default case insensitive. This can be overriden with `ignore_case=False`. Below we perform two queries with and without this parameter, to see the different results we get. 

In [26]:
# searching for publications with the word GIFT only in the English title ignoring case
q = epab.query_title('gift', language="EN")
print (f'Publications with the word gift in any combination of lower and upper case', q)

q.get_results('title', limit=5)




Publications with the word gift in any combination of lower and upper case 171 publications


Unnamed: 0,title.de,title.en,title.fr
0,SYSTEME UND VERFAHREN ZUR AUSWAHL EINER DIGITA...,SYSTEMS AND METHODS FOR DIGITAL GIFT CARD SELE...,SYSTÈMES ET PROCÉDÉS DE SÉLECTION DE CARTE-CAD...
1,GESCHENKPACKUNG MIT SCHALTUNGSBETÄTIGUNGSVERMÖGEN,GIFT PACKAGE HAVING CIRCUIT ACTUATING CAPABILITY,EMBALLAGE CADEAU AYANT UNE CAPACITÉ D'ACTIONNE...
2,GESCHENKKARTONBEHÄLTER,GIFT BOX CONTAINER,PAQUET-CADEAU
3,Dekorative Geschenkverpackung,Decorative gift package,Emballage décoratif pour cadeau
4,"Kombination von Geschenk und Verpackung, insbe...",A combination comprising a gift and its casing...,"Combinaison d'un cadeau et de son emballage, e..."


In [27]:
# searching for publications with the word GIFT only in the English title forcing lowercase
r = epab.query_title('gift', language="EN", ignore_case=False)
print (f'Publications with the word gift in lowercase', r)

r.get_results('title', limit=5)

Publications with the word gift in lowercase 46 publications


Unnamed: 0,title.de,title.en,title.fr
0,Behälter für Geschenke,A container for gifts,Récipient pour cadeaux
1,Geschenkschachtel,Box for gift objects,Boîte à cadeaux
2,Netzsystem und Verfahren zur Bereitstellung vo...,Web system and method of providing personal gifts,Système Web et procédé de fourniture de cadeau...
3,Synthetisches Geschenkpapier,Synthetic gift paper,Papier synthétique pour emballages cadeaux
4,Geschenkschachtel,Box for gift objects,Boîte à cadeaux


### Multiple search terms
We can enter multiple search terms in the queries we run on EPAB by full text fields. When we enter multiple terms, by default these terms are combined with an `OR`

In [33]:
# Searching a set of possible terms (e.g. synonyms)
q = epab.query_title(search_terms="covid, corona virus, coronavirus", language="EN")
print (q)
q.get_results("title.en", output_type="datagrid", limit=10)

973 publications


DataGrid(auto_fit_columns=True, auto_fit_params={'area': 'all', 'padding': 30, 'numCols': None}, corner_render…

#### Multiple search terms combined with AND
We can also query with several strings, and specify that they all should be present, with the `match_all` parameter.

In [35]:
# We can also look for having multiple terms in the same title
q = epab.query_title(search_terms="coronavirus, vaccine", match_all=True, language="EN")
print(q)
q.get_results("title.en", limit=5)

139 publications


Unnamed: 0,title.en
0,A DNA PLASMID SARS-CORONAVIRUS-2/COVID-19 VACCINE
1,CANINE CORONAVIRUS VACCINE
2,"Coronavirus, nucleic acid, protein and methods..."
3,CANINE CORONAVIRUS VACCINE
4,Canine coronavirus vaccine from feline enteric...


#### Multiple search terms with advanced combinations
What if you want to mix `AND` with `OR` with the combinations of terms? Combining queries comes in handy for this case. 

In [39]:
# searching for synonims of Covid 
q = epab.query_title(search_terms="covid, corona virus, coronavirus", language="EN")

# searching for synonims of vaccine
r = epab.query_title(search_terms="vaccine%, inmun%", language="EN")

s = q & r

s.get_results('title.en', limit = 10)

Unnamed: 0,title.en
0,VACCINE COMPOSITION AGAINST CORONAVIRUS
1,VACCINE AGAINST HUMAN-PATHOGENIC CORONAVIRUSES
2,VACCINE COMPOSITIONS FOR TREATING CORONAVIRUS ...
3,VACCINES AGAINST CORONAVIRUS AND METHODS OF USE
4,VACCINES AGAINST CORONAVIRUS AND METHODS OF USE
5,VACCINES AGAINST SARS-COV-2 AND OTHER CORONAVI...
6,VACCINE COMPOSITIONS FOR THE TREATMENT OF CORO...
7,Vaccine against severe accute respiratory synd...
8,VACCINE WITH IMPROVED IMMUNOGENICITY AGAINST M...
9,VACCINE COMPOSITION FOR PREVENTING SEVERE ACUT...


### Querying abstract, claims and description
You can query other parts of the fulltext such as the claims, the abstract, and the description with the same methods, obviously changing the part of the fulltext in the method nomenclature. 

In [44]:
# abstract search
q = epab.query_abstract(search_terms="handover, base station", match_all=True, ignore_case=True)
print(q)
q.get_results("abstract", output_type="list", limit=2)

1410 publications


[{'abstract': {'language': 'EN',
   'text': '<p id="pa01" num="0001">A radio base station according to the present invention comprising : a mobile communication system, a relay node and a radio base station are connected via a radio bearer, a mobile station is configured to conduct a handover process between the state in which a radio bearer is set with the relay node in order to communicate via the relay node and the radio base station, and the state in which a radio bearer is set with the radio base station in order to communicate via the radio base station, and the mobile station is configured such that during the handover process, control signals involved in the handover process are sent and received via a radio bearer between the relay node and the radio base station.<img id="iaf01" file="imgaf001.tif" wi="119" he="83" img-content="drawing" img-format="tif"/></p>'}},
 {'abstract': {'language': 'EN',
   'text': '<p id="pa01" num="0001">A radio base station according to the present 

## Retrieving statistics from a query
Sometimes you will want to get statistics over the results of a query, before further processing it. The method `get_stats` returns a dataframe with the statistics over one or more selected fields. when you run this method on a query object, for the selected field(s) you will get the following information. 

- the `count` column reports the total number of occurrences of the corresponding field(s) value
- the `unique_publications` column reports the number of unique publications having that value
- the last two lines of the table are used to report the remainder and the total