## Evolution of patenting activity in Europe - An analysis of technological fields

In this notebook we will study how the patenting activty in Europe has evolved, focusing the analysis on the evolution of individual technological fields as defined by the World Intellectual Property Organization (WIPO). We will obtain the raw data from PATSTAT Global, process the resulting DataFrame in TIP, and use a third party library to present the results in an engaging way. 

### The WIPO technology fields
WIPO has created a comprehensive technology concordance tparticularly useful when studying general technology trends, where patent classification schemes may be too granular. This technology concordance table is based on the International Patent Classification (IPC), which is a highly detailed technology classification system applied to patents worldwide. The WIPO technology concordance comprises 35 technology fields, grouped into five higher-level technology sectors, namely Electrical engineering, Instruments, Chemistry, Mechanical engineering, and Other fields.

The database PATSTAT Global provides this concordance between an application's IPC symbols and the WIPO's technology field, in table `TLS230_APPLN_TECHN_FIELD` and that will be the basis of our analysis.

## Initialization
As with any analysis involving PATSTAT, we need to initialize the client. This notebook is published with the client initialized in production with `env='PROD'`. If you want to modify it, we recommend you initialze the client with `env='TEST'` to reduce the time to process your queries. When you are happy with your code and are ready to conduct the real analysis, you can change the environment to production.

We will be working with object relational mapping (ORM) as the recommended method of retrieving data from PATSTAT. For that need to import the three PATSTAT tables we will be working with.

In [1]:
from epo.tipdata.patstat import PatstatClient

# Initialize the PATSTAT client
patstat = PatstatClient(env='PROD')

# Access ORM
db = patstat.orm()

# Importing the necessary tables
from epo.tipdata.patstat.database.models import TLS201_APPLN, TLS901_TECH_FIELD_IPC, TLS230_APPLN_TECHN_FIELD

# Retrieving the data

In [3]:
q = db.query(
    TLS201_APPLN.appln_id,
    TLS201_APPLN.appln_filing_year.label('Year'),
    TLS230_APPLN_TECHN_FIELD.techn_field_nr,
    TLS230_APPLN_TECHN_FIELD.weight
).join(
    TLS230_APPLN_TECHN_FIELD, TLS201_APPLN.appln_id == TLS230_APPLN_TECHN_FIELD.appln_id
).filter(
    TLS201_APPLN.appln_filing_year <= 2022, 
    TLS201_APPLN.appln_auth == 'EP',
)

res = patstat.df(q)
print (len(control))
# res should be larger than control since there are more than one technical field per application in some cases 
print (len(res))
res

4472524
6725372


Unnamed: 0,appln_id,Year,techn_field_nr,weight
0,57049316,2009,22,0.076923
1,365818931,2005,22,0.200000
2,531427378,2019,22,0.500000
3,323978875,2010,22,0.166667
4,16014792,2003,22,0.200000
...,...,...,...,...
6725367,472111056,2016,35,1.000000
6725368,542847099,2020,35,0.500000
6725369,16323747,2006,35,1.000000
6725370,16713823,1988,35,1.000000


# Get the top weight per application
To simplyfy the analysis, we want the top weight per each application. We first get the indices of the top weight per application

In [5]:
max_weight_indices = res.groupby('appln_id')['weight'].idxmax()

max_weight_indices

appln_id
1             840316
2            3262701
3            4778767
4            1731950
5            1777783
              ...   
603870177     128631
603878831     546834
604061543    2626863
604120620    5538119
604120624     573141
Name: weight, Length: 4294450, dtype: int64

# Using the index to find the sector number
With the indices of the top weight per application we filter the dataframe to get only the top field per unique application

In [6]:
# Step 2: Use these indices to filter the DataFrame
top_weights = res.loc[max_weight_indices]
top_weights


Unnamed: 0,appln_id,Year,techn_field_nr,weight
840316,1,2000,3,0.333333
3262701,2,1992,15,0.800000
4778767,3,2000,24,1.000000
1731950,4,2000,8,1.000000
1777783,5,2000,8,1.000000
...,...,...,...,...
128631,603870177,2019,1,1.000000
546834,603878831,2009,2,1.000000
2626863,604061543,2021,13,1.000000
5538119,604120620,2016,29,1.000000


## Drop the weight column
We do not need to see the weight column anymore.

In [7]:
top_weights = top_weights.drop(columns=['weight'])
top_weights

Unnamed: 0,appln_id,Year,techn_field_nr
840316,1,2000,3
3262701,2,1992,15
4778767,3,2000,24
1731950,4,2000,8
1777783,5,2000,8
...,...,...,...
128631,603870177,2019,1
546834,603878831,2009,2
2626863,604061543,2021,13
5538119,604120620,2016,29


In [8]:
# Group by 'Year' and 'techn_field_nr' and count the number of 'appln_id'
aggregated = top_weights.groupby(['Year', 'techn_field_nr'])['appln_id'].count().reset_index()

# Rename the columns for clarity
#aggregated.columns = ['Year', 'field', 'Applications']

# Display the aggregated DataFrame
aggregated



Unnamed: 0,Year,techn_field_nr,appln_id
0,1978,1,202
1,1978,2,45
2,1978,3,35
3,1978,4,9
4,1978,5,10
...,...,...,...
1562,2022,31,1872
1563,2022,32,4137
1564,2022,33,1333
1565,2022,34,1554


# Retrieving the names of the technology fields

In [10]:
q = db.query(
    TLS901_TECH_FIELD_IPC.techn_field_nr,
    TLS901_TECH_FIELD_IPC.techn_field,
    
)
fields  = patstat.df(q)

fields

Unnamed: 0,techn_field_nr,techn_field
0,1,"Electrical machinery, apparatus, energy"
1,1,"Electrical machinery, apparatus, energy"
2,1,"Electrical machinery, apparatus, energy"
3,1,"Electrical machinery, apparatus, energy"
4,1,"Electrical machinery, apparatus, energy"
...,...,...
765,35,Civil engineering
766,35,Civil engineering
767,35,Civil engineering
768,35,Civil engineering


## Drop the duplicates
We just need one instance of each tech field number and its description, so we drop the duplicates.

In [11]:
unique_fields = fields.drop_duplicates()
unique_fields

Unnamed: 0,techn_field_nr,techn_field
0,1,"Electrical machinery, apparatus, energy"
30,2,Audio-visual technology
48,3,Telecommunications
58,4,Digital communication
61,5,Basic communication processes
71,6,Computer technology
88,7,IT methods for management
89,8,Semiconductors
91,9,Optics
101,10,Measurement


# Merging the two dataframes

In [12]:
merged = aggregated.merge(unique_fields, on=['techn_field_nr', 'techn_field_nr'], how='inner')
merged

Unnamed: 0,Year,techn_field_nr,appln_id,techn_field
0,1978,1,202,"Electrical machinery, apparatus, energy"
1,1978,2,45,Audio-visual technology
2,1978,3,35,Telecommunications
3,1978,4,9,Digital communication
4,1978,5,10,Basic communication processes
...,...,...,...,...
1562,2022,31,1872,Mechanical elements
1563,2022,32,4137,Transport
1564,2022,33,1333,"Furniture, games"
1565,2022,34,1554,Other consumer goods


# Drop the column for tech field number and rename

In [14]:
final = merged.drop(columns=['techn_field_nr'])
final = final.rename(columns={'Year': 'Year', 'appln_id': 'Applications', 'techn_field': 'Field'})
final


Unnamed: 0,Year,Applications,Field
0,1978,202,"Electrical machinery, apparatus, energy"
1,1978,45,Audio-visual technology
2,1978,35,Telecommunications
3,1978,9,Digital communication
4,1978,10,Basic communication processes
...,...,...,...
1562,2022,1872,Mechanical elements
1563,2022,4137,Transport
1564,2022,1333,"Furniture, games"
1565,2022,1554,Other consumer goods


# Pivot the dataframe

In [15]:
# Pivot the DataFrame with 'Year' as the index and 'Field' as the columns
pivoted = final.pivot(index='Year', columns='Field', values='Applications')

# Replace NaN values with 0
pivoted = pivoted.fillna(0)

# Convert the values to integers
pivoted = pivoted.astype(int)

# Display the pivoted DataFrame
pivoted


Field,Analysis of biological materials,Audio-visual technology,Basic communication processes,Basic materials chemistry,Biotechnology,Chemical engineering,Civil engineering,Computer technology,Control,Digital communication,...,Organic fine chemistry,Other consumer goods,Other special machines,Pharmaceuticals,Semiconductors,"Surface technology, coating",Telecommunications,Textile and paper machines,Thermal processes and apparatus,Transport
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1978,7,45,10,141,54,94,100,15,51,9,...,697,15,136,154,106,41,35,70,71,106
1979,70,294,123,515,125,346,411,211,156,68,...,1531,129,439,349,232,217,194,366,273,384
1980,115,586,282,895,292,689,675,455,271,125,...,1901,286,851,465,483,376,381,657,465,682
1981,144,783,348,1022,371,852,881,659,372,126,...,2259,436,1056,589,551,566,508,915,699,956
1982,145,1041,421,1115,466,883,962,732,431,122,...,2341,473,1310,635,665,511,523,927,590,1033
1983,202,1389,482,1151,601,1018,1090,862,464,195,...,2480,553,1364,800,696,539,658,1033,587,1093
1984,242,1601,567,1285,954,1136,1323,972,541,223,...,2738,615,1646,910,801,739,683,1257,682,1275
1985,272,1682,571,1251,1110,1275,1321,1071,491,271,...,2901,667,1717,1148,895,750,805,1184,661,1349
1986,280,1863,695,1442,1274,1400,1413,1250,564,281,...,2775,720,1890,1363,958,805,915,1303,676,1591
1987,317,2033,623,1638,1427,1427,1331,1381,617,270,...,3063,776,1985,1497,977,880,962,1372,640,1671


# Remove odd years to smoothen the transitions

In [16]:
# Filter the DataFrame to include only even years
pivoted_even_years = pivoted[pivoted.index % 2 == 0]

# Display the filtered DataFrame
pivoted_even_years


Field,Analysis of biological materials,Audio-visual technology,Basic communication processes,Basic materials chemistry,Biotechnology,Chemical engineering,Civil engineering,Computer technology,Control,Digital communication,...,Organic fine chemistry,Other consumer goods,Other special machines,Pharmaceuticals,Semiconductors,"Surface technology, coating",Telecommunications,Textile and paper machines,Thermal processes and apparatus,Transport
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1978,7,45,10,141,54,94,100,15,51,9,...,697,15,136,154,106,41,35,70,71,106
1980,115,586,282,895,292,689,675,455,271,125,...,1901,286,851,465,483,376,381,657,465,682
1982,145,1041,421,1115,466,883,962,732,431,122,...,2341,473,1310,635,665,511,523,927,590,1033
1984,242,1601,567,1285,954,1136,1323,972,541,223,...,2738,615,1646,910,801,739,683,1257,682,1275
1986,280,1863,695,1442,1274,1400,1413,1250,564,281,...,2775,720,1890,1363,958,805,915,1303,676,1591
1988,410,2280,812,1684,1635,1524,1635,1584,772,371,...,3420,812,2115,1732,1248,986,1068,1783,691,1976
1990,377,3111,908,2118,2092,1919,1947,2469,837,587,...,3713,966,2498,2294,1636,1109,1612,2318,801,2290
1992,337,2978,825,1989,2073,1814,1607,2373,764,660,...,3434,867,2282,2436,1509,1076,1828,2130,823,2138
1994,374,2879,931,2243,2084,1997,1850,2532,802,827,...,3340,903,2125,2815,1415,994,2087,2108,896,2218
1996,441,3457,972,2410,2597,1960,1964,3101,955,1382,...,3552,1061,2308,3429,1665,1154,2729,2431,903,2957


# Create the video

In [17]:
import bar_chart_race as bcr
bcr.bar_chart_race(
    df=pivoted_even_years,
    filename='race.mp4',
    orientation='h',
    sort='desc',
    n_bars=10,
    fixed_order=False,
    fixed_max=False,  
    steps_per_period=720,
    period_length=5000,
    interpolate_period=True,
    period_label={'x': .99, 'y': .25, 'ha': 'right', 'va': 'center'},
    period_fmt='{x:.0f}',  # Use .0f to ensure years are shown as integers
    #perpendicular_bar_func='median',
    cmap='dark12',
    title='Applications per year',
    bar_label_size=7,
    tick_label_size=7,
    scale='linear',
    fig=None,
    writer=None,
    bar_kwargs={'alpha': .7},
    filter_column_colors=True
)

ModuleNotFoundError: No module named 'bar_chart_race'

In [None]:
from IPython.display import Video

# Ensure the correct path to your MP4 file
Video("race.mp4", embed=True)

# 