# The EP full-text library - Lesson 3
This notebook expands on lesson 3 to dive into more advanced concepts of EPAB, the implementation in TIP of the EP full-text library. We will introduce iterative result processing and raw SQL queries. As we did in the first notebook, we first create an instance of the EPAB library. Remember that by default we are getting access to a test database. For this lesson we will work with access to the full database. 

In [13]:
# Importing the EPAB client
from epo.tipdata.epab import EPABClient

# creating an instance of the EPAB client with the PROD database
epab = EPABClient(env='PROD')


## Iterative result processing
When we work with the Production database, it is likely that some queries will retrieve very large number of publications. We have seen the `get_results()` method for getting data from the result of a query. This method will get the data for all the publications resulting from the relevant query in one pass.

In [16]:
# We query for publications within the wireless communications field
q = epab.query_ipc("H04W%")
# Let's see the size of the results object
print (f'Our query results contains ', q)

Our query results contains  183817 publications


### Getting all the results in one go
We can now decide to get all the results in one go, using the `get_results()` method that we know. With this size of query we can run into memory problems, or otherwise overload our workspace. 

In [21]:
all_results = q.get_results('title.en')

#displaying all the results in a dataframe
print (f'The amount of results downloaded in one go is', len(all_results))

The amount of results downloaded in one go is 183818


In [None]:
# code cell



In [None]:
# searching for publications with the word GIFT only in the English title
q = epab.query_title('gift', language="EN")
q.get_results("title", limit=5, )

In [None]:
# we get a second query with publications mentioning poison, in German
r = epab.query_title('gift', language="DE")
print (f'publications with the word Gift in German', r)

#combining the two queries
s = q & r

print (f'Poisionus gifts found:', s)

In [None]:
# searching for publications with the word GIFT only in the English title ignoring case
q = epab.query_title('gift', language="EN")
print (f'Publications with the word gift in any combination of lower and upper case', q)

q.get_results('title', limit=5)




In [None]:
# searching for publications with the word GIFT only in the English title forcing lowercase
r = epab.query_title('gift', language="EN", ignore_case=False)
print (f'Publications with the word gift in lowercase', r)

r.get_results('title', limit=5)

In [None]:
# Searching a set of possible terms (e.g. synonyms)
q = epab.query_title(search_terms="covid, corona virus, coronavirus", language="EN")
print (q)
q.get_results("title.en", output_type="datagrid", limit=10)

#### Multiple search terms combined with AND
We can also query with several strings, and specify that they all should be present, with the `match_all` parameter.

In [None]:
# We can also look for having multiple terms in the same title
q = epab.query_title(search_terms="coronavirus, vaccine", match_all=True, language="EN")
print(q)
q.get_results("title.en", limit=5)

#### Multiple search terms with advanced combinations
What if you want to mix `AND` with `OR` with the combinations of terms? Combining queries comes in handy for this case. 

In [None]:
# searching for synonims of Covid 
q = epab.query_title(search_terms="covid, corona virus, coronavirus", language="EN")

# searching for synonims of vaccine
r = epab.query_title(search_terms="vaccine%, inmun%", language="EN")

s = q & r

s.get_results('title.en', limit = 10)

In [None]:
# abstract search
q = epab.query_abstract(search_terms="handover, base station", match_all=True, ignore_case=True)
print(q)
q.get_results("abstract", output_type="list", limit=2)

### Getting the results in batches
For queries of this size, particularly when you want to get more data than just the title, such as the full text of the description, it is a good idea to use the `iterator()` method. In the example below we will get the results in batches of 5000 documents. 

In [22]:
fetched = 0
# We call the iterator method and ask for batches of 5000 results
for batch in q.iterator("title.en", batch_size=5000):
    #the size of the batch, for didactic purposes
    batch_size = len(batch)

    #we add the fetched batch to the total amount of fetched documents
    fetched += batch_size
    
    #displaying the batch fetching operation
    print(f"In this iteration I have fetched {batch_size} publications. Total fetched: {fetched}")


In this iteration I have fetched 5000 publications. Total fetched: 5000
In this iteration I have fetched 5000 publications. Total fetched: 10000
In this iteration I have fetched 5000 publications. Total fetched: 15000
In this iteration I have fetched 5000 publications. Total fetched: 20000
In this iteration I have fetched 5000 publications. Total fetched: 25000
In this iteration I have fetched 5000 publications. Total fetched: 30000
In this iteration I have fetched 5000 publications. Total fetched: 35000
In this iteration I have fetched 5000 publications. Total fetched: 40000
In this iteration I have fetched 5000 publications. Total fetched: 45000
In this iteration I have fetched 5000 publications. Total fetched: 50000
In this iteration I have fetched 5000 publications. Total fetched: 55000
In this iteration I have fetched 5000 publications. Total fetched: 60000
In this iteration I have fetched 5000 publications. Total fetched: 65000
In this iteration I have fetched 5000 publications. 