<a href="https://colab.research.google.com/github/herrkrueger/funwithipcxml/blob/main/ipcbrowser.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Fun with IPC XML, Python lxml and ElementTree

###Python Quellen
* John Shipman's tutorial on [Python XML processing with lxml](https://www.academia.edu/38587906/Python_XML_processing_with_lxml)
* The ElementTree API on [Python.org](https://docs.python.org/3/library/xml.etree.elementtree.html)

###WIPO Links
* Current Edition of [IPC Master Files from WIPO's Download and IT support area](https://www.wipo.int/classifications/ipc/en/ITsupport/) 
 * here the  [direkt link zu the zip File](https://www.wipo.int/ipc/itos4ipc/ITSupport_and_download_area//20210101/MasterFiles/ipc_scheme_images_20210101.zip)
* Documentation and XSDs are [here](https://www.wipo.int/classifications/ipc/en/ITsupport/Version20210101/documentation/IPCfiles.html)
 * esp. the Specification of the Scheme file [here](https://www.wipo.int/ipc/itos4ipc/ITSupport_and_download_area/Documentation/20210101/IPC_scheme_specs_v3_1.docx)
* [Link](https://www.wipo.int/classifications/ipc/ipcpub/?notion=scheme&version=20210101&symbol=none&menulang=en&lang=en&viewmode=f&fipcpc=no&showdeleted=yes&indexes=no&headings=yes&notes=yes&direction=o2n&initial=A&cwid=none&tree=no&searchmode=smart) to the IPC Browser of WIPO


#First Sample Code

First, we just import lxml and get the file (manual download, put it here next to the sample data) and access it, print the upper level elements tags and the attribut dictonary. These are the sections of the IPC Tree. The attributes contain: 'kind' and 'symbol' and 'entryTpe'.

In [None]:
from lxml import etree as ET
import time

filename = "./EN_ipc_scheme_20210101.xml"
parser = ET.XMLParser(remove_blank_text=True)
tree = ET.parse(filename, parser=parser)
root = tree.getroot()

def ipcparser():
  for child in root:
    print(child.tag, child.attrib)
ipcparser()

##What do we see? 

The **tag** (including the xmlns - NameSpace, that this entry belongs. There is only one ns in the XML btw.) and the **attributes**, obviously, and the **atributes** are:

* 'kind' with its Values:
 * s = section
 * t = sub-section title
 * c = class
 * i = sub-class index
 * u = sub-class
 * g = guidance heading
 * m = main group
 * 1 to B = 11 levels of group (hexadecimal notation)
 * n = note
* 'symbol' with its Values:
 * The IPC Symbol! Thats the thing... 
* 'entryType' with its Values:
 * K = classification symbol (default, i.e. for classification purpose only)
 * I = Indexing symbol  (i.e. for indexing purpose only)
 * D = Double purpose classification symbol (i.e. for both classification and indexing purpose) – existed only prior to the IPC reform
 * Z = problematic entry (i.e. structure and/or contents have been partially converted from CPC or FI)
Interesting for us, are only entryType 'K'

Dictionaries for kind level and title of level
```
kind_to_level = {
  's': 1,
  'c':2,
  'u':3,
  'g':4,
  'm':4,
  '1':5,
  '2':6,
  '3':7,
  '4':8,
  '5':9,
  '6':10,
  '7':11,
  '8':12,
  '9':13,
  'A':14,
  'B':15}

kind_to_levelTitle = {
  's':'section',
  't':'sub-section title',
  'c':'class',
  'I':'sub-class index',
  'u':'sub-class',
  'g':'guidance heading',
  'm':'main group',
  '1':'.subgroup',
  '2':'..subgroup',
  '3':'...subgroup',
  '4':'....subgroup',
  '5':'.....subgroup',
  '6':'......subgroup',
  'n':'note'}
```

#Next Sample Code

We try to iterate two levels down and print a list of section, classes and sub classes. 

In [None]:
from lxml import etree as ET
import time

filename = "./EN_ipc_scheme_20210101.xml"

parser = ET.XMLParser(remove_blank_text=True)
tree = ET.parse(filename, parser=parser)
root = tree.getroot()

ipcEntry = '{http://www.wipo.int/classifications/ipc/masterfiles}ipcEntry'

def ipcParser():
  for sections in root:    
    print('1st level sections: ', sections.attrib['symbol'], " kind:", sections.attrib['kind'])    
    
    #go on level deeper to classes
    for classes in sections.iterchildren(tag=ipcEntry):
      print('2nd level classes: ', classes.attrib['symbol'], " kind:", classes.attrib['kind'])
     
      #go on level deeper to sub classes
      for subclasses in classes.iterchildren(tag=ipcEntry):
        print('3nd level sub classes: ', subclasses.attrib['symbol'], " kind:", subclasses.attrib['kind'])

ipcParser()