# Paststat register - Exercise 1
In this exercise we will gradually build a query using ORM that gives us all the Swiss Applicants related to EP publications from 2020 and onwards. 

## The publications table
Firt we need to learn about the table `reg102_pat_publn`. This table contains details about european and international publications that are visible in the European Patent Register. 


## Coverage of reg102_pat_publn
We will first perform a simple query to see all the pubications stored in the table 102

In [3]:
# Importing the patstat client
from epo.tipdata.patstat import PatstatClient

# Initialize the PATSTAT client
patstat = PatstatClient()

# Access ORM
db = patstat.orm()
# Importing tables as models
from epo.tipdata.patstat.database.models import REG102_PAT_PUBLN



In [8]:
q = db.query(
    REG102_PAT_PUBLN.publn_auth,
    REG102_PAT_PUBLN.publn_nr
)
    
# Creating a dataframe with the results
res = patstat.df(q)

res


Unnamed: 0,publn_auth,publn_nr
0,EP,1850675
1,EP,4033803
2,EP,1669758
3,EP,2319404
4,EP,0366576
...,...,...
12099429,WO,2005086064
12099430,WO,2004035241
12099431,WO,2005002335
12099432,WO,2023056298


### Aggregating the results
We will aggregate the results working with pandas to see the amount of publications per publishing authority.

In [10]:
# Aggregating the publication numbers per publication authority
aggregated_res = res.groupby('publn_auth').count().reset_index()

# Changing the header of the columns
aggregated_res.rename(columns={'publn_auth': 'Authority', 'publn_nr': 'Number of publications'}, inplace=True)

aggregated_res

Unnamed: 0,Authority,Number of publications
0,EP,7354241
1,WO,4745193


### EP publications from 2020 onwards
We will use the filter functionality to see only EP publications published since 2020, with these two filters. 

- `publn_auth == 'EP'`: Only includes records where the publication authority is 'EP'.
- `publn_date > '2019-12-31'`: Only includes records where the publication date is after December 31, 2019.

The results are ordered by the `publn_date` column in ascending order.


In [12]:
q = db.query(
    REG102_PAT_PUBLN.publn_auth,
    REG102_PAT_PUBLN.publn_nr,
    REG102_PAT_PUBLN.publn_date
).filter(
    REG102_PAT_PUBLN.publn_auth == 'EP',
    REG102_PAT_PUBLN.publn_date > '2019-12-31'
).order_by(
    REG102_PAT_PUBLN.publn_date
)
# Creating a dataframe with the results
res = patstat.df(q)

res


Unnamed: 0,publn_auth,publn_nr,publn_date
0,EP,3586069,2020-01-01
1,EP,3585723,2020-01-01
2,EP,3585639,2020-01-01
3,EP,3588120,2020-01-01
4,EP,3587004,2020-01-01
...,...,...,...
1228177,EP,4246930,9999-12-31
1228178,EP,3625233,9999-12-31
1228179,EP,4011747,9999-12-31
1228180,EP,3828660,9999-12-31


## Introducing the reg107_parties table
The goal of this exercise is to find out the patent publications that mention an applicant from Switzerland. For this we need to work with the `reg107_parties` table in the PATSTAT Register database. This table stores information about the parties involved in patent applications. 



### Finding Swiss applicants
We will make a simple query in the `reg107_parties` to find all the parties with a place of residence in Switzerland. There can be thre types of parties:

* **Applicant ("A")**
* **Inventor ("I")**
* **Agent or representative ("R")**

We will filter so we get only applicants. 

In [10]:
from epo.tipdata.patstat.database.models import REG107_PARTIES

q = db.query(
    REG107_PARTIES.name,
    ).filter(
    REG107_PARTIES.country == 'CH',
    REG107_PARTIES.type == 'A'
).order_by(
    REG107_PARTIES.name
)

# Creating a dataframe with the results
res = patstat.df(q)

res[0:10]


Unnamed: 0,name
0,' PLANET' MATTHIAS JAGGI
1,' PLANET' MATTHIAS JAGGI
2,' PLANET' MATTHIAS JAGGI
3,' PLANET' MATTHIAS JAGGI
4,' PLANET' MATTHIAS JAGGI
5,' VEVEY' Technologies S.A. Villeneuve
6,' VEVEY' Technologies S.A. Villeneuve
7,'BRUGG'-KABEL AG
8,'BRUGG'-KABEL AG
9,'BRUGG'-KABEL AG


### Understanding duplicates in the parties table
Unfortunately there is no unique identifier for applicants, inventors, or proprietors in the field of patent data. Each patent application mentions the applicants with their names and addresses. It often happens that the same applicant is filed with variations of the same address, or different addresses, in different patent applications. The applicant name itself can sometimes be spelled differently. This typically creates multiple records in the parties table for one single legal entity or a single person. 

Please take into consideration this fact for all your patent data analysis.

## Joining parties to publications via the applications table

If you look at the logical model diagram of patstat in the documentation, you will see that the `reg107_parties` and `reg102_pat_publication` tables are not related. In Patstat Register the central table is `reg101_appln`, which contains data about the European and International patent applications in the register. 

We will then join table 102, 101, and 107 to get the desired query. 

Let's first join tables 101 and 102 to get the publications from 2020 and later, and get the application ID for each publication. This application ID will later be needed for joining tables 101 and 107.  

In [8]:
from epo.tipdata.patstat.database.models import REG101_APPLN

q = db.query(
    REG102_PAT_PUBLN.publn_auth,
    REG102_PAT_PUBLN.publn_nr,
    REG102_PAT_PUBLN.publn_date,
    REG101_APPLN.id,
    REG101_APPLN.appln_nr
).join(
    REG101_APPLN, REG102_PAT_PUBLN.id == REG101_APPLN.id
).filter(
    REG102_PAT_PUBLN.publn_auth == 'EP',
    REG102_PAT_PUBLN.publn_date > '2019-12-31'
).order_by(
    REG102_PAT_PUBLN.publn_date
)

# Creating a dataframe with the results
res = patstat.df(q)
res


Unnamed: 0,publn_auth,publn_nr,publn_date,id,appln_nr
0,EP,3369722,2020-01-01,17732015,17732015
1,EP,3587433,2020-01-01,19164666,19164666
2,EP,3354276,2020-01-01,17210162,17210162
3,EP,3266684,2020-01-01,17180080,17180080
4,EP,3587000,2020-01-01,18382480,18382480
...,...,...,...,...,...
1228177,EP,3828660,9999-12-31,20210354,20210354
1228178,EP,3868955,9999-12-31,20158127,20158127
1228179,EP,4101330,9999-12-31,22177008,22177008
1228180,EP,4124439,9999-12-31,21188661,21188661


### Our final query

We are reaching the end of the exercise. We are ready now to build a query that connects the 101, 102 and 107 tables, and looks for Swiss applicants related to publications from 2020 and onwards. 

In [11]:
q = db.query(
    REG102_PAT_PUBLN.publn_auth,
    REG102_PAT_PUBLN.publn_nr,
    REG102_PAT_PUBLN.publn_date,
    REG101_APPLN.id,
    REG101_APPLN.appln_nr,
    REG107_PARTIES.name
).join(
    REG101_APPLN, REG102_PAT_PUBLN.id == REG101_APPLN.id
).join(
    REG107_PARTIES, REG101_APPLN.id == REG107_PARTIES.id
).filter(
    REG102_PAT_PUBLN.publn_auth == 'EP',
    REG102_PAT_PUBLN.publn_date > '2019-12-31',
    REG107_PARTIES.country == 'CH',
    REG107_PARTIES.type == 'A'
).order_by(
    REG107_PARTIES.name
)

# Creating a dataframe with the results
res = patstat.df(q)
res


Unnamed: 0,publn_auth,publn_nr,publn_date,id,appln_nr,name
0,EP,3325722,2021-03-03,16744714,16744714,'Brugg' Drahtseil AG
1,EP,3114461,2022-12-21,15707967,15707967,1 Drop SA
2,EP,3669190,2024-03-06,18765185,18765185,1Lab SA
3,EP,3669190,2024-03-06,18765185,18765185,1Lab SA
4,EP,3669190,2020-06-24,18765185,18765185,1Lab SA
...,...,...,...,...,...,...
63036,EP,2765437,2020-05-06,13154916,13154916,École Polytechnique Fédérale de Lausanne (EPFL)
63037,EP,3488013,2023-09-06,17745667,17745667,École Polytechnique Fédérale de Lausanne (EPFL)
63038,EP,4323306,2024-02-21,22722470,22722470,"Üstün, Orhan"
63039,EP,4112494,2024-01-10,22181300,22181300,éscale cosmétique
