# Lab 1 - Enter the world of Patents

In this lab, we will explore the world of patents. We will use the data from 
European Patent Office (EPO) to extract information about patents. In this lab,
you will learn:

- Basic structure of a patent
- Querying patents using the EPO API via SPARQL (a query language for RDF)
- Extracting information from patents
- Visualizing the extracted information
- Summarizing the extracted information

I am very excited to start this journey with you as we will disover many interesting
things about patents. Let's get started!

In [1]:
# install the package
install.packages("../_utils/SPARQL_1.16.tar.gz", repos = NULL, type="source")

In [1]:
library(pacman)
# vtree: sudo apt-get install libcairo2-dev librsvg2-dev
# https://yulab-smu.top/treedata-book/index.html
p_load(readxl, stringr, data.table, magrittr, ggplot2, SPARQL,
        eurostat, XML, RCurl, knitr, vtree)

# color palette
gray_scale <- c('#F3F4F8','#D2D4DA', '#B3B5BD', 
                '#9496A1', '#7d7f89', '#777986', 
                '#656673', '#5B5D6B', '#4d505e',
                '#404352', '#2b2d3b', '#282A3A',
                '#1b1c2a', '#191a2b',
                '#141626', '#101223')

ft_palette <- c('#990F3D', '#0D7680', '#0F5499', '#262A33', '#FFF1E5')

ft_contrast <- c('#F83', '#00A0DD', '#C00', '#006F9B', '#F2DFCE', '#FF7FAA',
                 '#00994D', '#593380')

peep_head <- function(dt, n = 5) {
    dt %>%
        head(n) %>%
        kable()
}

peep_sample <- function(dt, n = 5) {
    dt %>%
        .[sample(.N, n)] %>%
        kable()
}

peep_tail <- function(dt, n = 5) {
    dt %>%
        tail(n) %>%
        kable()
}

## Patents that mention specific words in the abstract

In [2]:
# query the SPARQL endpoint
endpoint <- "https://data.epo.org/linked-data/query"

# patents that abstract contains the word 'battery'
query1 <- "
prefix dcterms: <http://purl.org/dc/terms/>
prefix patent: <http://data.epo.org/linked-data/def/patent/>
prefix text: <http://jena.apache.org/text#>

SELECT DISTINCT ?publication ?title ?abstract 
WHERE {
    ?publication text:query ( dcterms:abstract 'battery' );
                 patent:titleOfInvention ?title;
                 dcterms:abstract        ?abstract.
} LIMIT 10
"
query1_result <- SPARQL(endpoint, query1)$results

In [4]:
str(query1_result)

'data.frame':	10 obs. of  3 variables:
 $ publication: chr  "<http://data.epo.org/linked-data/data/publication/EP/1182716/A3/->" "<http://data.epo.org/linked-data/data/publication/EP/1182716/A3/->" "<http://data.epo.org/linked-data/data/publication/EP/1182716/A3/->" "<http://data.epo.org/linked-data/data/publication/EP/1182716/A2/->" ...
 $ title      : chr  "\"Batteriehalterung\"@de" "\"Support de batterie\"@fr" "\"Battery lock\"@en" "\"Batteriehalterung\"@de" ...
 $ abstract   : chr  "\"A battery lock for a communication unit with holding means for holding an internal battery in operating posit"| __truncated__ "\"A battery lock for a communication unit with holding means for holding an internal battery in operating posit"| __truncated__ "\"A battery lock for a communication unit with holding means for holding an internal battery in operating posit"| __truncated__ "\"A battery lock for a communication unit with holding means for holding an internal battery in operating posit"| __trunc

In [4]:
head(query1_result)

Unnamed: 0_level_0,publication,title,abstract
Unnamed: 0_level_1,<chr>,<chr>,<chr>
1,<http://data.epo.org/linked-data/data/publication/EP/1182716/A3/->,"""Batteriehalterung""@de","""A battery lock for a communication unit with holding means for holding an internal battery in operating position in a battery compartment. The holding means includes features to hold, lift and release the battery in the compartment from the battery compartment. The battery compartment includes protruding parts corresponding to slots on the battery, where the protruding parts co-operates with holding means of the battery to hold the battery in the battery compartment. The holding means of the battery lock includes first protruding parts to hold the battery in the battery compartment, grips to lift and release the battery, holes to hinge the battery lock in the battery compartment, second protruding parts to lift the battery actuated by the grips and fastening means to fasten the battery lock to the battery compartment. The fastening means on the battery compartment to fasten the battery lock to the battery compartment include pivots to hinge the battery on, locking flaps to keep the battery lock on the pivots and protruding parts that connects to slots on the grips of the battery lock. ""@en"
2,<http://data.epo.org/linked-data/data/publication/EP/1182716/A3/->,"""Support de batterie""@fr","""A battery lock for a communication unit with holding means for holding an internal battery in operating position in a battery compartment. The holding means includes features to hold, lift and release the battery in the compartment from the battery compartment. The battery compartment includes protruding parts corresponding to slots on the battery, where the protruding parts co-operates with holding means of the battery to hold the battery in the battery compartment. The holding means of the battery lock includes first protruding parts to hold the battery in the battery compartment, grips to lift and release the battery, holes to hinge the battery lock in the battery compartment, second protruding parts to lift the battery actuated by the grips and fastening means to fasten the battery lock to the battery compartment. The fastening means on the battery compartment to fasten the battery lock to the battery compartment include pivots to hinge the battery on, locking flaps to keep the battery lock on the pivots and protruding parts that connects to slots on the grips of the battery lock. ""@en"
3,<http://data.epo.org/linked-data/data/publication/EP/1182716/A3/->,"""Battery lock""@en","""A battery lock for a communication unit with holding means for holding an internal battery in operating position in a battery compartment. The holding means includes features to hold, lift and release the battery in the compartment from the battery compartment. The battery compartment includes protruding parts corresponding to slots on the battery, where the protruding parts co-operates with holding means of the battery to hold the battery in the battery compartment. The holding means of the battery lock includes first protruding parts to hold the battery in the battery compartment, grips to lift and release the battery, holes to hinge the battery lock in the battery compartment, second protruding parts to lift the battery actuated by the grips and fastening means to fasten the battery lock to the battery compartment. The fastening means on the battery compartment to fasten the battery lock to the battery compartment include pivots to hinge the battery on, locking flaps to keep the battery lock on the pivots and protruding parts that connects to slots on the grips of the battery lock. ""@en"
4,<http://data.epo.org/linked-data/data/publication/EP/1182716/A2/->,"""Batteriehalterung""@de","""A battery lock for a communication unit with holding means for holding an internal battery in operating position in a battery compartment. The holding means includes features to hold, lift and release the battery in the compartment from the battery compartment. The battery compartment includes protruding parts corresponding to slots on the battery, where the protruding parts co-operates with holding means of the battery to hold the battery in the battery compartment. The holding means of the battery lock includes first protruding parts to hold the battery in the battery compartment, grips to lift and release the battery, holes to hinge the battery lock in the battery compartment, second protruding parts to lift the battery actuated by the grips and fastening means to fasten the battery lock to the battery compartment. The fastening means on the battery compartment to fasten the battery lock to the battery compartment include pivots to hinge the battery on, locking flaps to keep the battery lock on the pivots and protruding parts that connects to slots on the grips of the battery lock.""@en"
5,<http://data.epo.org/linked-data/data/publication/EP/1182716/A2/->,"""Support de batterie""@fr","""A battery lock for a communication unit with holding means for holding an internal battery in operating position in a battery compartment. The holding means includes features to hold, lift and release the battery in the compartment from the battery compartment. The battery compartment includes protruding parts corresponding to slots on the battery, where the protruding parts co-operates with holding means of the battery to hold the battery in the battery compartment. The holding means of the battery lock includes first protruding parts to hold the battery in the battery compartment, grips to lift and release the battery, holes to hinge the battery lock in the battery compartment, second protruding parts to lift the battery actuated by the grips and fastening means to fasten the battery lock to the battery compartment. The fastening means on the battery compartment to fasten the battery lock to the battery compartment include pivots to hinge the battery on, locking flaps to keep the battery lock on the pivots and protruding parts that connects to slots on the grips of the battery lock.""@en"
6,<http://data.epo.org/linked-data/data/publication/EP/1182716/A2/->,"""Battery lock""@en","""A battery lock for a communication unit with holding means for holding an internal battery in operating position in a battery compartment. The holding means includes features to hold, lift and release the battery in the compartment from the battery compartment. The battery compartment includes protruding parts corresponding to slots on the battery, where the protruding parts co-operates with holding means of the battery to hold the battery in the battery compartment. The holding means of the battery lock includes first protruding parts to hold the battery in the battery compartment, grips to lift and release the battery, holes to hinge the battery lock in the battery compartment, second protruding parts to lift the battery actuated by the grips and fastening means to fasten the battery lock to the battery compartment. The fastening means on the battery compartment to fasten the battery lock to the battery compartment include pivots to hinge the battery on, locking flaps to keep the battery lock on the pivots and protruding parts that connects to slots on the grips of the battery lock.""@en"


In [14]:
# make it more readable
query1_result %>%
    as.data.table() %>%
    # delete '<' and '>' for publication
    .[, publication := gsub("[<>]", "", publication)] %>%
    .[, .(publication, title)] %>%
    peep_head()



|publication                                                      |title                    |
|:----------------------------------------------------------------|:------------------------|
|http://data.epo.org/linked-data/data/publication/EP/1182716/A3/- |"Batteriehalterung"@de   |
|http://data.epo.org/linked-data/data/publication/EP/1182716/A3/- |"Support de batterie"@fr |
|http://data.epo.org/linked-data/data/publication/EP/1182716/A3/- |"Battery lock"@en        |
|http://data.epo.org/linked-data/data/publication/EP/1182716/A2/- |"Batteriehalterung"@de   |
|http://data.epo.org/linked-data/data/publication/EP/1182716/A2/- |"Support de batterie"@fr |

Please **click the link in the above table** to check the full information about the patents. Notice that we have same patent number - **1182716** - in the list. This is because the same patent but different publications are listed in the results.

Here is some basic information about the patents:

- A document: European patent application, published 18 months after filing with the EPO or 18 months after priority date. 

    - A1 document: European patent application published with European search report
    - A2 document: European patent application published without European search report (search report not available at the publication date)
    - A3 document: Separate publication of the European search report
    - A4 document: Supplementary search report

- B document:European patent specification

    - B1 document: European patent specification (granted patent)
    - B2 document: New European patent specification (amended specification)
    - B3 document: European patent specification (after limitation procedure)


> Anyone can apply for a patent, but not every patent will be granted!

![patent-application-process](../images/IPRIS_The-Process-Diagram_01_0.png)

In [24]:
# the first query gives us many same publications but titles are in
# different languages. Let's only keep the English titles
query2 <- "
prefix dcterms: <http://purl.org/dc/terms/>
prefix patent: <http://data.epo.org/linked-data/def/patent/>
prefix text: <http://jena.apache.org/text#>

SELECT DISTINCT ?publication ?title ?abstract 
WHERE {
    ?publication text:query ( dcterms:abstract 'battery' );
                 patent:titleOfInvention ?title;
                 dcterms:abstract        ?abstract.
    FILTER (lang(?title) = 'en')
} LIMIT 10
"

query2_result <- SPARQL(endpoint, query2)$results

In [25]:
head(query2_result)

Unnamed: 0_level_0,publication,title,abstract
Unnamed: 0_level_1,<chr>,<chr>,<chr>
1,<http://data.epo.org/linked-data/data/publication/EP/1182716/A3/->,"""Battery lock""@en","""A battery lock for a communication unit with holding means for holding an internal battery in operating position in a battery compartment. The holding means includes features to hold, lift and release the battery in the compartment from the battery compartment. The battery compartment includes protruding parts corresponding to slots on the battery, where the protruding parts co-operates with holding means of the battery to hold the battery in the battery compartment. The holding means of the battery lock includes first protruding parts to hold the battery in the battery compartment, grips to lift and release the battery, holes to hinge the battery lock in the battery compartment, second protruding parts to lift the battery actuated by the grips and fastening means to fasten the battery lock to the battery compartment. The fastening means on the battery compartment to fasten the battery lock to the battery compartment include pivots to hinge the battery on, locking flaps to keep the battery lock on the pivots and protruding parts that connects to slots on the grips of the battery lock. ""@en"
2,<http://data.epo.org/linked-data/data/publication/EP/1182716/A2/->,"""Battery lock""@en","""A battery lock for a communication unit with holding means for holding an internal battery in operating position in a battery compartment. The holding means includes features to hold, lift and release the battery in the compartment from the battery compartment. The battery compartment includes protruding parts corresponding to slots on the battery, where the protruding parts co-operates with holding means of the battery to hold the battery in the battery compartment. The holding means of the battery lock includes first protruding parts to hold the battery in the battery compartment, grips to lift and release the battery, holes to hinge the battery lock in the battery compartment, second protruding parts to lift the battery actuated by the grips and fastening means to fasten the battery lock to the battery compartment. The fastening means on the battery compartment to fasten the battery lock to the battery compartment include pivots to hinge the battery on, locking flaps to keep the battery lock on the pivots and protruding parts that connects to slots on the grips of the battery lock.""@en"
3,<http://data.epo.org/linked-data/data/publication/EP/4166383/A1/->,"""BATTERY SWAPPING METHOD, SERVER, AND BATTERY INSTALLATION/REMOVAL DEVICE""@en","""The embodiments of the application provide a battery swapping method, a server and a battery installing-and-removing device. The battery swapping method comprising: receiving battery swapping status information of an electric vehicle; sending a battery removing instruction to a battery installing-and-removing device based on the battery swapping status information, the battery removing instruction is configured for instructing the battery installing-and-removing device to remove a first battery from the electric vehicle and transport the first battery to a first position; sending a battery installation instruction to the battery installing-and-removing device when detecting that the first battery is transported to the first position; receiving the battery installation information sent by the battery installing-and-removing device, the battery installation information is configured for indicating that a second battery is to be installed by the battery installing-and-removing device to the electric vehicle; sending a battery swapping completion instruction to the electric vehicle based on the battery installation information; wherein power of the second battery is higher than power of the first battery.""@en"
4,<http://data.epo.org/linked-data/data/publication/EP/3518375/A1/->,"""BATTERY SYSTEM""@en","""Provided is a battery system having high expandability. Each battery pack included in a battery system (1) acquires physical quantities of one or more batteries. Battery state information is calculated based on the acquired physical quantities. Then, each battery pack communicates with another battery pack. One battery pack is set as a master battery pack MP. Another battery pack other than the master battery pack (MP) is set as a slave battery pack (SP). The slave battery pack (SP) transmits the battery state information to the master battery pack (MP). The master battery pack (MP) calculates integrated battery information based on the battery state information of each battery pack.""@en"
5,<http://data.epo.org/linked-data/data/publication/EP/4181343/A1/->,"""CHARGING METHOD AND ELECTRONIC DEVICE""@en","""Embodiments of this application provide a charging method and an electronic device. The method includes: controlling, if a battery level of a battery is less than or equal to a first preset battery level, the electronic device to be in a state in which both a load and the battery are powered by a charging device; and controlling, if the battery level of the battery is greater than or equal to a second preset battery level, the electronic device to be in a state in which neither the load nor the battery is powered by the charging device, and the battery supplies power to the load, where the second preset battery level is greater than the first preset battery level and is less than or equal to a maximum battery level of the battery. In embodiments of this application, when the battery level of the battery is less than or equal to the first preset battery level, the electronic device is controlled to be in the state in which both the load and the battery are powered by the charging device, so that the battery level of the battery increases. When the battery level of the battery is greater than or equal to the second preset battery level, the electronic device is controlled to be in the state in which neither the load nor the battery is powered by the charging device, and the battery supplies power to the load, so that the battery level of the battery decreases. This can avoid that the battery is in a high battery level state for a long time, thereby prolonging a service life of the battery.""@en"
6,<http://data.epo.org/linked-data/data/publication/EP/4235924/A1/->,"""BATTERY MODULE AND BATTERY PACK COMPRISING SAME""@en","""Disclosed herein is a battery module according to the present invention including: a first battery cell assembly in which multiple battery cells are stacked in the thickness direction of the battery cell; a second battery cell assembly in which each battery cell stacked in the first battery cell assembly and each battery cell arranged in a row in the longitudinal direction of the battery cell is stacked in the thickness direction of the battery cell in the same number as the number of battery cells stacked in the first battery cell assembly; and a module case accommodating the first and second battery cell assembly, wherein the battery cells of the first battery cell assembly are electrically connected to each other, and the battery cells of the second battery cell assembly are electrically connected to each other, but the battery cells are not electrically connected to each other between the first and second battery cell assemblies. In addition, the present invention includes a battery pack composed of the battery module.""@en"


Please read the following explanation about the patent documents:

- [Filter B1 for granted patents.](https://chat.openai.com/share/d1de83a5-54ee-4a46-8ef8-edfc14e677de)

Please follow the following link:

- https://data.epo.org/linked-data/data/publication/EP/1182716

And answer the following questions:

- When was the patent first published?
- What kind of information were added to the patent after the first publication?
- When was the patent granted?


## An applicant's Publications

Now, we will explore the publications of an applicant. The goal is to find out the number of patents that an applicant has filed and been published from EPO.

In [44]:
query3 <- "
prefix dcterms: <http://purl.org/dc/terms/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#> 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX vcard: <http://www.w3.org/2006/vcard/ns#> 
PREFIX patent: <http://data.epo.org/linked-data/def/patent/>

SELECT * 
WHERE {
    ?publn rdf:type patent:Publication;
           patent:applicantVC ?applicant;
           patent:publicationAuthority ?auth;
           patent:publicationNumber ?publnNum;
           patent:publicationKind ?kind;
           patent:publicationDate ?publnDate;
           patent:citesPatentPublication ?citation.
    ?applicant vcard:fn ?name.
    FILTER (?name = 'Nio Technology (Anhui) Co., Ltd' || ?name = 'NIO Nextev Limited')
}
LIMIT 1000
"

query3_result <- SPARQL(endpoint, query3)$results

In [45]:
str(query3_result)

'data.frame':	538 obs. of  8 variables:
 $ publn    : chr  "<http://data.epo.org/linked-data/data/publication/EP/4063170/A1/->" "<http://data.epo.org/linked-data/data/publication/EP/4063170/A1/->" "<http://data.epo.org/linked-data/data/publication/EP/4063170/A1/->" "<http://data.epo.org/linked-data/data/publication/EP/4063170/A1/->" ...
 $ applicant: chr  "<http://data.epo.org/linked-data/data/vc/F5273AF84CF433300D3D2A2C2A4BB5E3>" "<http://data.epo.org/linked-data/data/vc/F5273AF84CF433300D3D2A2C2A4BB5E3>" "<http://data.epo.org/linked-data/data/vc/F5273AF84CF433300D3D2A2C2A4BB5E3>" "<http://data.epo.org/linked-data/data/vc/F5273AF84CF433300D3D2A2C2A4BB5E3>" ...
 $ auth     : chr  "<http://data.epo.org/linked-data/id/st3/EP>" "<http://data.epo.org/linked-data/id/st3/EP>" "<http://data.epo.org/linked-data/id/st3/EP>" "<http://data.epo.org/linked-data/id/st3/EP>" ...
 $ publnNum : chr  "4063170" "4063170" "4063170" "4063170" ...
 $ kind     : chr  "<http://data.epo.org/linked-data/def/pat

In [46]:
head(query3_result)

Unnamed: 0_level_0,publn,applicant,auth,publnNum,kind,publnDate,citation,name
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>
1,<http://data.epo.org/linked-data/data/publication/EP/4063170/A1/->,<http://data.epo.org/linked-data/data/vc/F5273AF84CF433300D3D2A2C2A4BB5E3>,<http://data.epo.org/linked-data/id/st3/EP>,4063170,<http://data.epo.org/linked-data/def/patent/publicationKind_A1>,1664316000,<http://data.epo.org/linked-data/data/publication/GB/2579607/A/->,"Nio Technology (Anhui) Co., Ltd"
2,<http://data.epo.org/linked-data/data/publication/EP/4063170/A1/->,<http://data.epo.org/linked-data/data/vc/F5273AF84CF433300D3D2A2C2A4BB5E3>,<http://data.epo.org/linked-data/id/st3/EP>,4063170,<http://data.epo.org/linked-data/def/patent/publicationKind_A1>,1664316000,<http://data.epo.org/linked-data/data/publication/CN/110435476/A/->,"Nio Technology (Anhui) Co., Ltd"
3,<http://data.epo.org/linked-data/data/publication/EP/4063170/A1/->,<http://data.epo.org/linked-data/data/vc/F5273AF84CF433300D3D2A2C2A4BB5E3>,<http://data.epo.org/linked-data/id/st3/EP>,4063170,<http://data.epo.org/linked-data/def/patent/publicationKind_A1>,1664316000,<http://data.epo.org/linked-data/data/publication/CN/111885135/A/->,"Nio Technology (Anhui) Co., Ltd"
4,<http://data.epo.org/linked-data/data/publication/EP/4063170/A1/->,<http://data.epo.org/linked-data/data/vc/F5273AF84CF433300D3D2A2C2A4BB5E3>,<http://data.epo.org/linked-data/id/st3/EP>,4063170,<http://data.epo.org/linked-data/def/patent/publicationKind_A1>,1664316000,<http://data.epo.org/linked-data/data/publication/CN/111942211/A/->,"Nio Technology (Anhui) Co., Ltd"
5,<http://data.epo.org/linked-data/data/publication/EP/4012821/A1/->,<http://data.epo.org/linked-data/data/vc/F5273AF84CF433300D3D2A2C2A4BB5E3>,<http://data.epo.org/linked-data/id/st3/EP>,4012821,<http://data.epo.org/linked-data/def/patent/publicationKind_A1>,1655244000,<http://data.epo.org/linked-data/data/publication/US/5626982/A/->,"Nio Technology (Anhui) Co., Ltd"
6,<http://data.epo.org/linked-data/data/publication/EP/4012821/A1/->,<http://data.epo.org/linked-data/data/vc/F5273AF84CF433300D3D2A2C2A4BB5E3>,<http://data.epo.org/linked-data/id/st3/EP>,4012821,<http://data.epo.org/linked-data/def/patent/publicationKind_A1>,1655244000,<http://data.epo.org/linked-data/data/publication/WO/2019224020/A1/->,"Nio Technology (Anhui) Co., Ltd"


When you are doing patent research, you have to deal with entity harmonization. This means that you have to find all the publications of an applicant, even if the name of the applicant is written in different ways. For example, the same applicant can be written as "IBM", "International Business Machines Corporation", "IBM Corporation", etc. For NIO Inc., the have been published with the following names:

- NIO Inc.
- NIO NextEV Limited
- NIO GmbH
- NIO Technology Co., Ltd
- and many more...

You can read the [financial report](https://www.sec.gov/Archives/edgar/data/1736541/000110465921046834/R8.htm) to find out the names of the subsidiaries of NIO Inc.

In [47]:
# convert publication date to date format with as.POSIXct
query3_result %>%
    as.data.table() %>%
    # delete '<' and '>' for publn, applicant, auth, kind
    .[, c("publn", "applicant", "auth", "kind", "citation") := lapply(.SD, function(x) gsub("[<>]", "", x)),
                        .SDcols = c("publn", "applicant", "auth", "kind", "citation")] %>%
    .[, publnDate := as.POSIXct(publnDate)] %>%
    # order by publication date
    .[order(-publnDate)] %>%
    peep_head(10)



|publn                                                            |applicant                                                                |auth                                      |publnNum |kind                                                          |publnDate  |citation                                                            |name                            |
|:----------------------------------------------------------------|:------------------------------------------------------------------------|:-----------------------------------------|:--------|:-------------------------------------------------------------|:----------|:-------------------------------------------------------------------|:-------------------------------|
|http://data.epo.org/linked-data/data/publication/EP/4335674/A1/- |http://data.epo.org/linked-data/data/vc/F5273AF84CF433300D3D2A2C2A4BB5E3 |http://data.epo.org/linked-data/id/st3/EP |4335674  |http://data.epo.org/linked-data/def/patent/publicationKind_A

In [49]:
# where does Nio EV learn from?
# we can check the citations of the patents
query3_result %>%
    as.data.table() %>%
    # delete '<' and '>' for publn, applicant, auth, kind
    .[, c("publn", "applicant", "auth", "kind", "citation") := lapply(.SD, function(x) gsub("[<>]", "", x)),
                        .SDcols = c("publn", "applicant", "auth", "kind", "citation")] %>%
    .[, publnDate := as.POSIXct(publnDate)] %>%
    # select the columns we need
    .[, .(publnNum, publnDate, citation, name)] %>%
    # get the country code of citation by extracting string after /publication/ and before /
    .[, citation_country := str_extract(citation, "(?<=/publication/)[^/]+")] %>%
    peep_head()



|publnNum |publnDate  |citation                                                          |name                            |citation_country |
|:--------|:----------|:-----------------------------------------------------------------|:-------------------------------|:----------------|
|4063170  |2022-09-28 |http://data.epo.org/linked-data/data/publication/GB/2579607/A/-   |Nio Technology (Anhui) Co., Ltd |GB               |
|4063170  |2022-09-28 |http://data.epo.org/linked-data/data/publication/CN/110435476/A/- |Nio Technology (Anhui) Co., Ltd |CN               |
|4063170  |2022-09-28 |http://data.epo.org/linked-data/data/publication/CN/111885135/A/- |Nio Technology (Anhui) Co., Ltd |CN               |
|4063170  |2022-09-28 |http://data.epo.org/linked-data/data/publication/CN/111942211/A/- |Nio Technology (Anhui) Co., Ltd |CN               |
|4012821  |2022-06-15 |http://data.epo.org/linked-data/data/publication/US/5626982/A/-   |Nio Technology (Anhui) Co., Ltd |US               |

In [52]:
query3_result %>%
    as.data.table() %>%
    # delete '<' and '>' for publn, applicant, auth, kind
    .[, c("publn", "applicant", "auth", "kind", "citation") := lapply(.SD, function(x) gsub("[<>]", "", x)),
                        .SDcols = c("publn", "applicant", "auth", "kind", "citation")] %>%
    .[, publnDate := as.POSIXct(publnDate)] %>%
    # select the columns we need
    .[, .(publnNum, publnDate, citation, name)] %>%
    # get the country code of citation by extracting string after /publication/ and before /
    .[, citation_country := str_extract(citation, "(?<=/publication/)[^/]+")] %>%
    .[, .N, by = .(citation_country)] %>%
    .[order(-N)] %>%
    peep_head(10)



|citation_country |   N|
|:----------------|---:|
|US               | 225|
|CN               | 117|
|EP               |  51|
|WO               |  47|
|DE               |  44|
|JP               |  36|
|KR               |   9|
|FR               |   4|
|GB               |   3|
|TW               |   1|

The above table shows that Nio Inc. closely followed the US and China markets. Germany and Japan are also important sources of innovation for Nio Inc as they cited patents from these countries too but not as many as the US and China.

## Get to know IPC and CPC

The International Patent Classification (IPC) is a hierarchical patent classification system that is used to classify the content of patents. The Cooperative Patent Classification (CPC) is a patent classification system, which has been jointly developed by the European Patent Office (EPO) and the United States Patent and Trademark Office (USPTO). The CPC is a more detailed classification system than the IPC.

In [56]:
# let's run the following query to learn IPC and CPC 
query4 <- "
prefix cpc: <http://data.epo.org/linked-data/def/cpc/>
prefix dcterms: <http://purl.org/dc/terms/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix skos: <http://www.w3.org/2004/02/skos/core#>

SELECT ?broaderCPC ?title ?ipc
WHERE {
    ?CPC rdf:type/rdfs:subClassOf cpc:Classification.
    ?CPC rdfs:label 'A44B 11/2523'.  # Note the single space between A44B and 11/2523
    ?CPC skos:broader* ?broaderCPC.
    ?broaderCPC cpc:concordantIPC ?ipc.
    ?broaderCPC dcterms:title ?title
}
ORDER BY ASC(?broaderCPC)
LIMIT 20
"

query4_result <- SPARQL(endpoint, query4)$results

In [60]:
query4_result %>%
    as.data.table() %>%
    # delete '<' and '>' for broaderCPC, ipc
    .[, c("broaderCPC", "ipc") := lapply(.SD, function(x) gsub("[<>]", "", x)),
                        .SDcols = c("broaderCPC", "ipc")] %>%
    peep_head()



|broaderCPC                                        |title                                                                                         |ipc                                               |
|:-------------------------------------------------|:---------------------------------------------------------------------------------------------|:-------------------------------------------------|
|http://data.epo.org/linked-data/def/cpc/A         |"HUMAN NECESSITIES"@en                                                                        |http://data.epo.org/linked-data/def/ipc/A         |
|http://data.epo.org/linked-data/def/cpc/A4        |"PERSONAL OR DOMESTIC ARTICLES"@en                                                            |http://data.epo.org/linked-data/def/ipc/A4        |
|http://data.epo.org/linked-data/def/cpc/A44       |"HABERDASHERY; JEWELLERY"@en                                                                  |http://data.epo.org/linked-data/def/ipc/A44       |
|ht

In [58]:
# make it readable
query4_result %>%
    as.data.table() %>%
    # extract string after 'cpc/' and before '>' for broaderCPC
    .[, broaderCPC := str_extract(broaderCPC, "(?<=cpc/)[^>]+")] %>%
    # extract string after 'ipc/' and before '>' for ipc
    .[, ipc := str_extract(ipc, "(?<=ipc/)[^>]+")] %>%
    kable()



|broaderCPC  |title                                                                                                                                              |ipc       |
|:-----------|:--------------------------------------------------------------------------------------------------------------------------------------------------|:---------|
|A           |"HUMAN NECESSITIES"@en                                                                                                                             |A         |
|A4          |"PERSONAL OR DOMESTIC ARTICLES"@en                                                                                                                 |A4        |
|A44         |"HABERDASHERY; JEWELLERY"@en                                                                                                                       |A44       |
|A44B        |"BUTTONS, PINS, BUCKLES, SLIDE FASTENERS, OR THE LIKE"@en                                                         

## What kind of IPC and CPC classes do electric vehicles cluster around?

We want to find out the IPC and CPC classes that electric vehicles cluster around. We will first use top two EV companies - Tesla and BYD - to find out the IPC and CPC classes that electric vehicles cluster around. We will then compare the IPC and CPC classes of EVs with traditional vehicles, such as Volkswagen and BMW.

> Fun facts about patent: After Zip2, when I realized that receiving a patent really just meant that you bought a lottery ticket to a lawsuit, I avoided them whenever possible. - Elon Musk

Please this [article](https://www.tesla.com/blog/all-our-patent-are-belong-you) to find out why Elon Musk decided to open Tesla's patents.

Some articles about Tesla's Open Patent Policy:

- https://www.automotiveworld.com/articles/are-open-source-patent-portfolios-the-key-to-the-ev-revolution/
- https://startupnation.com/manage-your-business/teslas-open-source-patent-strategy/

In [3]:
# get publications of TESLA INC, TESLA MOTORS INC, BYD COMPANY LIMITED
# it will take around 20 seconds to run
query5 <- "
prefix dcterms: <http://purl.org/dc/terms/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#> 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX vcard: <http://www.w3.org/2006/vcard/ns#> 
PREFIX patent: <http://data.epo.org/linked-data/def/patent/>

SELECT * 
WHERE {
    ?publn rdf:type patent:Publication;
           patent:applicantVC ?applicant;
           patent:publicationAuthority ?auth;
           patent:publicationNumber ?publnNum;
           patent:publicationKind ?kind;
           patent:publicationDate ?publnDate;
           patent:application ?application.
  	?application patent:classificationCPCInventive ?cpc.
    ?applicant vcard:fn ?name.
    FILTER (?name = 'Tesla Motors, Inc.' || ?name = 'Tesla, Inc.' || ?name = 'BYD Company Limited')
}
LIMIT 15000
"

query5_result <- SPARQL(endpoint, query5)$results

In [4]:
str(query5_result)

'data.frame':	10226 obs. of  9 variables:
 $ publn      : chr  "<http://data.epo.org/linked-data/data/publication/EP/2533325/A1/->" "<http://data.epo.org/linked-data/data/publication/EP/2533325/A1/->" "<http://data.epo.org/linked-data/data/publication/EP/2533325/A1/->" "<http://data.epo.org/linked-data/data/publication/EP/2533325/A1/->" ...
 $ applicant  : chr  "<http://data.epo.org/linked-data/data/vc/1CF0D6E9F409338A09FAB869729D600C>" "<http://data.epo.org/linked-data/data/vc/1CF0D6E9F409338A09FAB869729D600C>" "<http://data.epo.org/linked-data/data/vc/1CF0D6E9F409338A09FAB869729D600C>" "<http://data.epo.org/linked-data/data/vc/1CF0D6E9F409338A09FAB869729D600C>" ...
 $ auth       : chr  "<http://data.epo.org/linked-data/id/st3/EP>" "<http://data.epo.org/linked-data/id/st3/EP>" "<http://data.epo.org/linked-data/id/st3/EP>" "<http://data.epo.org/linked-data/id/st3/EP>" ...
 $ publnNum   : chr  "2533325" "2533325" "2533325" "2533325" ...
 $ kind       : chr  "<http://data.epo.org/linked-

In [6]:
# now let's clean the data
query5_result %>%
    as.data.table() %>%
    # delete '<' and '>' for publn, applicant, auth, kind
    .[, c("publn", "applicant", "auth", "kind", "application", "cpc") := lapply(.SD, function(x) gsub("[<>]", "", x)),
                        .SDcols = c("publn", "applicant", "auth", "kind", "application", "cpc")] %>%
    .[, publnDate := as.POSIXct(publnDate)] %>%
    # select the columns we need
    .[, .(publn, name, publnNum, publnDate, cpc)] %>%
    # extract string after 'cpc/' and before '>' for cpc
    .[, cpc_code := str_extract(cpc, "(?<=cpc/)[^>]+")] %>%
    peep_head()



|publn                                                            |name               |publnNum |publnDate  |cpc                                                |cpc_code   |
|:----------------------------------------------------------------|:------------------|:--------|:----------|:--------------------------------------------------|:----------|
|http://data.epo.org/linked-data/data/publication/EP/2533325/A1/- |Tesla Motors, Inc. |2533325  |2012-12-12 |http://data.epo.org/linked-data/def/cpc/H01M10-52  |H01M10-52  |
|http://data.epo.org/linked-data/data/publication/EP/2533325/A1/- |Tesla Motors, Inc. |2533325  |2012-12-12 |http://data.epo.org/linked-data/def/cpc/H01M50-204 |H01M50-204 |
|http://data.epo.org/linked-data/data/publication/EP/2533325/A1/- |Tesla Motors, Inc. |2533325  |2012-12-12 |http://data.epo.org/linked-data/def/cpc/H01M50-317 |H01M50-317 |
|http://data.epo.org/linked-data/data/publication/EP/2533325/A1/- |Tesla Motors, Inc. |2533325  |2012-12-12 |http://data.epo.org

In [69]:
query5_result %>%
    as.data.table() %>%
    # delete '<' and '>' for publn, applicant, auth, kind
    .[, c("publn", "applicant", "auth", "kind", "application", "cpc") := lapply(.SD, function(x) gsub("[<>]", "", x)),
                        .SDcols = c("publn", "applicant", "auth", "kind", "application", "cpc")] %>%
    .[, publnDate := as.POSIXct(publnDate)] %>%
    # select the columns we need
    .[, .(publn, name, publnNum, publnDate, cpc)] %>%
    # extract string after 'cpc/' and before '>' for cpc
    .[, cpc_code := str_extract(cpc, "(?<=cpc/)[^>]+")] %>%
    # only keep the unique publications
    unique(by = "publnNum") %>%
    # get the count of applicants
    .[, .N, by = .(name)] %>%
    .[order(-N)] %>%
    peep_head()



|name                |   N|
|:-------------------|---:|
|BYD Company Limited | 942|
|Tesla, Inc.         | 134|
|Tesla Motors, Inc.  |  69|

When you look at the above table, you might feel unexpected to see that BYD has more patents than Tesla. You can read the following articles to find out why BYD has more patents than Tesla:

- https://asia.nikkei.com/Spotlight/Electric-cars-in-China/BYD-outpaces-Tesla-16-fold-in-patent-filings
- https://www.counterpointresearch.com/insights/global-electric-vehicle-market-share/

In [74]:
# now let's analyze the CPC codes
query5_result %>%
    as.data.table() %>%
    # delete '<' and '>' for publn, applicant, auth, kind
    .[, c("publn", "applicant", "auth", "kind", "application", "cpc") := lapply(.SD, function(x) gsub("[<>]", "", x)),
                        .SDcols = c("publn", "applicant", "auth", "kind", "application", "cpc")] %>%
    .[, publnDate := as.POSIXct(publnDate)] %>%
    # select the columns we need
    .[, .(publn, name, publnNum, publnDate, cpc)] %>%
    # extract string after 'cpc/' and before '>' for cpc
    .[, cpc_code := str_extract(cpc, "(?<=cpc/)[^>]+")] %>%
    .[, .N, by = .(cpc_code)] %>%
    .[order(-N)] %>%
    peep_head(10)



|cpc_code    |   N|
|:-----------|---:|
|H01M10-625  | 126|
|H01M10-613  | 104|
|H01M50-209  | 101|
|H01M50-249  |  91|
|H01M10-0525 |  76|
|H01M50-204  |  65|
|B60L58-27   |  59|
|B60L50-64   |  58|
|H01M10-6556 |  58|
|B60L53-11   |  51|

In [5]:
# visualize the results
# cpc has 5 levels: section, class, subclass, group, main group
query5_result %>%
    as.data.table() %>%
    # delete '<' and '>' for publn, applicant, auth, kind
    .[, c("publn", "applicant", "auth", "kind", "application", "cpc") := lapply(.SD, function(x) gsub("[<>]", "", x)),
                        .SDcols = c("publn", "applicant", "auth", "kind", "application", "cpc")] %>%
    .[, publnDate := as.POSIXct(publnDate)] %>%
    # select the columns we need
    .[, .(publn, name, publnNum, publnDate, cpc)] %>%
    # extract string after 'cpc/' and before '>' for cpc
    .[, cpc_code := str_extract(cpc, "(?<=cpc/)[^>]+")] %>%
    .[, .(cpc_code)] %>%
    # extract the characters before '-' as group
    .[, group := str_extract(cpc_code, "^[^-]+")] %>%
    # extract the first 4 characters as section
    .[, subclass := str_sub(group, 1, 4)] %>%
    # extract the first 3 characters as class
    .[, class := str_sub(group, 1, 3)] %>%
    # extract the first 1 character as section
    .[, section := str_sub(group, 1, 1)] -> cpc_hierarchy

head(cpc_hierarchy)

cpc_code,group,subclass,class,section
<chr>,<chr>,<chr>,<chr>,<chr>
H01M10-52,H01M10,H01M,H01,H
H01M50-204,H01M50,H01M,H01,H
H01M50-317,H01M50,H01M,H01,H
H01M50-24,H01M50,H01M,H01,H
H02J7-02,H02J7,H02J,H02,H
B60J10-30,B60J10,B60J,B60,B


In [12]:
cpc_hierarchy %>%
    .[, .N, by = .(class, subclass)] %>%
    .[order(-N)] %>%
    peep_head()



|class |subclass |    N|
|:-----|:--------|----:|
|H01   |H01M     | 3270|
|B60   |B60L     | 1389|
|B60   |B60K     |  592|
|H02   |H02J     |  337|
|C23   |C23C     |  316|

In [13]:
# visualize the hierarchy
cpc_tree <- vtree(cpc_hierarchy, c("section", "class", "subclass", "group"),
          keep = list(subclass = c("H01M", "B60L", "B60K", "H02J")),
          horiz = FALSE, showcount = FALSE, prunesmaller=90)
grVizToPNG(cpc_tree, width=900, folder="../images")

![cpc-tree](../images/cpc_tree.png)

As you can see from the above image, Most of the patents are classified under the following CPC classes:

- H: Electricity
    - H01: Basic Electric Elements
        - H01M: Processes or Means, e.g. Batteries, for the Direct Conversion of Chemical into Electrical Energy
        - H02J: Circuit Arrangements or Systems for Supplying or Distributing Electric Power; Systems for Storing Electric Energy
- B: Performing Operations; Transporting
    - B60: Vehicles in General
        - B60L: Propulsion of Electrically-Propelled Vehicles
        - B60K: Arrangement or mounting of propulsion units in vehicles 

There are also a small number of patents that are classified under the following CPC classes:

- C: Chemistry; Metallurgy
    - C23: Coating Metallic Material; Coating Material with Metallic Material; Chemical Surface Treatment; Diffusion Treatment of Metallic Material; Coating by Vacuum Evaporation, by Sputtering, by Ion Implantation or by Chemical Vapour Deposition, in General; Inhibiting Corrosion of Metallic Material or Incrustation in General
    - C22: Metallurgy; Ferrous or Non-Ferrous Alloys; Treatment of Alloys or Non-Ferrous Metals

The C section of the CPC is related to the chemistry and metallurgy of the materials used in the batteries and the B section is related to the vehicles in general. H section is related to the electricity and the batteries.

This means that if we want to find out automotive companies that are working on electric vehicles, we should look at the following CPC classes:

- H01M: Processes or Means, e.g. Batteries, for the Direct Conversion of Chemical into Electrical Energy
- H02J: Circuit Arrangements or Systems for Supplying or Distributing Electric Power; Systems for Storing Electric Energy
- B60L: Propulsion of Electrically-Propelled Vehicles
- B60K: Arrangement or mounting of propulsion units in vehicles

## Query Firms that are working on Electric Vehicles

After knowing that `H01M`, `H02J`, `B60L`, and `B60K` are the main CPC classes that electric vehicles cluster around, we will query the firms that are working on electric vehicles and try to find out the number of patents that they have filed and been published from EPO. How could we construct our query? We can start by listing variables
we want:

- `applicant`: name of applicant
- `publication`: published patent, publication number, publication date, etc.
- `cpc`: CPC class

In [5]:
# this query will take around 30 seconds to run
query6 <- "
PREFIX cpc: <http://data.epo.org/linked-data/def/cpc/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#> 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX vcard: <http://www.w3.org/2006/vcard/ns#> 
PREFIX patent: <http://data.epo.org/linked-data/def/patent/>

SELECT *
WHERE {
    ?publn rdf:type patent:Publication;
           patent:applicantVC ?applicant;
           patent:publicationAuthority ?auth;
           patent:publicationNumber ?publnNum;
           patent:publicationKind ?kind;
           patent:publicationDate ?publnDate;
           patent:application ?application.
    ?application patent:classificationCPCInventive ?cpcCode.
    ?applicant vcard:fn ?name.
  	?applicant vcard:hasAddress ?address.
  	?address patent:countryCode ?country.
    ?cpcCode skos:broader ?cpcCodeB1.
    ?cpcCodeB1 skos:broader ?cpcCodeB2.
    ?cpcCodeB2 skos:broader ?cpcCodeB3.
    FILTER (?cpcCodeB3 = <http://data.epo.org/linked-data/def/cpc/H01M>)
    FILTER (?publnDate >= xsd:date('2014-01-01'))
    FILTER (?publnDate <= xsd:date('2015-01-01'))
}
LIMIT 70000
"

query6_result <- SPARQL(endpoint, query6)$results

StartTag: invalid element name
xmlSAX2Characters: huge text nodeExtra content at the end of the document


In [9]:
str(query6_result)

'data.frame':	4145 obs. of  14 variables:
 $ publn      : chr  "<http://data.epo.org/linked-data/data/publication/EP/2639871/B1/->" "<http://data.epo.org/linked-data/data/publication/EP/2639871/B1/->" "<http://data.epo.org/linked-data/data/publication/EP/1659653/B1/->" "<http://data.epo.org/linked-data/data/publication/EP/2755269/A1/->" ...
 $ applicant  : chr  "<http://data.epo.org/linked-data/data/vc/AEC648C9D6BD5416D6990A8C8B56CD4B>" "<http://data.epo.org/linked-data/data/vc/AD7240A4098F9C68371DD982972D590F>" "<http://data.epo.org/linked-data/data/vc/78BF375C39AA0FF0AB5FD3DAB39B88CC>" "<http://data.epo.org/linked-data/data/vc/A65408431534D01F56986B808B5A8DA8>" ...
 $ auth       : chr  "<http://data.epo.org/linked-data/id/st3/EP>" "<http://data.epo.org/linked-data/id/st3/EP>" "<http://data.epo.org/linked-data/id/st3/EP>" "<http://data.epo.org/linked-data/id/st3/EP>" ...
 $ publnNum   : chr  "2639871" "2639871" "1659653" "2755269" ...
 $ kind       : chr  "<http://data.epo.org/linked-