# Overview of the U.S. Patent Classification System (USPC) 

## Background Info
A class generally delineates one technology from another. Subclasses delineate processes, structural features, and functional features of the subject matter encompassed within the scope of a class.

A class/subclass pair of identifiers uniquely identifies a subclass within a class (for
example, the identifier “2/456” represents Class 2, Apparel, subclass 456, Body cover).

A USPC classification uniquely identifies one of the more than 150,000 subclasses in the
USPC. Because subclass identifiers may be repeated among the more than 450 classes, a
USPC classification must include both a class and a subclass. 

A class schedule is a listing of all the subclasses in a class in top-tobottom order of classification precedence, with the most complex and comprehensive subject matter generally at the top of the schedule, and the least complex and comprehensive at the bottom. 

The United States Patent Classification Standards and Procedures (USPCLASP) is
the official guide for conducting a reclassification project, including classifying
documents into the USPC, creating new classes and subclasses, and modifying or
abolishing existing classes and subclasses. 

Other Considerations:
- Every U.S. patent
document has at least one mandatory classification, and may optionally include one or
more discretionary classifications. 


- “Invention information” is the technical subject matter disclosed in a
document that is new and non-obvious to one having ordinary skill in the technical field.
“Other” information is non-trivial, technical subject matter that is not invention
information, but which otherwise clearly teaches or illustrates a principle that would be
useful for search purposes. 

## How Class is Determined

Using the claims as a
guide. 

U.S. patents receive a mandatory classification for all claimed disclosure, that is, the
claims are read in conjunction with the specification since the claims define the invention
information.

A classification is assigned to the patent where each of its claims is
separately classified.

Both novel and non-obvious considered to be “invention information.”  Published patent applications documents may additionally be
 assigned discretionary classifications based on other, non-invention information
 disclosure. 


.... 


Patent applications classified in particular subclasses are generally examined by the
examining personnel responsible for those subclasses. 


https://www.uspto.gov/sites/default/files/patents/resources/classification/overview.pdf

potentially helpful package
https://github.com/daneads/pypatent

In [1]:
import pandas as pd
import numpy as np
import json
import requests
import json
from datetime import datetime
import ast

# Data Dive

In [4]:
pwd

u'/projects/cps2019_funding/shared/Patents_Data/classification'

In [5]:
pat0401 = pd.read_csv("/projects/cps2019_funding/shared/Patents_Data/patentData/patents_200401")

In [6]:
pat0405 = pd.read_csv("/projects/cps2019_funding/shared/Patents_Data/patentData/patents_200405")

In [7]:
pat0409 = pd.read_csv("/projects/cps2019_funding/shared/Patents_Data/patentData/patents_200409")

In [8]:
pat0401.head()

Unnamed: 0.1,Unnamed: 0,IPCs,applications,assignees,cited_patents,cpcs,nbers,patent_abstract,patent_id,patent_type,uspcs,wipos
0,0,"[{u'ipc_section': u'H', u'ipc_subclass': u'R',...","[{u'app_type': u'10', u'app_id': u'10/592576'}]","[{u'assignee_key_id': u'487946', u'assignee_ci...",[{u'cited_patent_category': u'cited by examine...,[{u'cpc_subgroup_title': u'Stereophonic arrang...,"[{u'nber_category_title': None, u'nber_subcate...","A Closed Loop Headphone Apparatus with 1, 2, 3...",10028058,utility,"[{u'uspc_mainclass_title': None, u'uspc_sequen...",[{u'wipo_field_title': u'Audio-visual technolo...
1,1,"[{u'ipc_section': u'C', u'ipc_subclass': u'N',...","[{u'app_type': u'10', u'app_id': u'10/829504'}]","[{u'assignee_key_id': u'129714', u'assignee_ci...",[{u'cited_patent_category': u'cited by applica...,[{u'cpc_subgroup_title': u'Mutation or genetic...,"[{u'nber_category_title': None, u'nber_subcate...",Materials and methods are provided for produci...,10100316,utility,"[{u'uspc_mainclass_title': None, u'uspc_sequen...","[{u'wipo_field_title': u'Biotechnology', u'wip..."
2,2,"[{u'ipc_section': u'A', u'ipc_subclass': u'K',...","[{u'app_type': u'10', u'app_id': u'10/795128'}]","[{u'assignee_key_id': u'158959', u'assignee_ci...",[{u'cited_patent_category': u'cited by applica...,[{u'cpc_subgroup_title': u'Medicinal preparati...,"[{u'nber_category_title': None, u'nber_subcate...",This invention provides methods and kits for m...,10105417,utility,"[{u'uspc_mainclass_title': None, u'uspc_sequen...","[{u'wipo_field_title': u'Pharmaceuticals', u'w..."
3,3,"[{u'ipc_section': u'B', u'ipc_subclass': u'D',...","[{u'app_type': u'10', u'app_id': u'10/821311'}]","[{u'assignee_key_id': u'152087', u'assignee_ci...",[{u'cited_patent_category': u'cited by applica...,[{u'cpc_subgroup_title': u'Processes for apply...,"[{u'nber_category_title': None, u'nber_subcate...",Devices and methods are provided for ejecting ...,10112212,utility,"[{u'uspc_mainclass_title': None, u'uspc_sequen...","[{u'wipo_field_title': u'Surface technology, c..."
4,4,"[{u'ipc_section': u'G', u'ipc_subclass': u'Q',...","[{u'app_type': u'10', u'app_id': u'10/833495'}]","[{u'assignee_key_id': u'24540', u'assignee_cit...",[{u'cited_patent_category': u'cited by examine...,[{u'cpc_subgroup_title': u'Finance; Insurance;...,"[{u'nber_category_title': None, u'nber_subcate...",A method and apparatus to manage accounts audi...,10115152,utility,"[{u'uspc_mainclass_title': None, u'uspc_sequen...",[{u'wipo_field_title': u'IT methods for manage...


In [9]:
pat04 = pd.concat([pat0401,pat0405,pat0409], axis=0)

In [10]:
pat04.shape

(99144, 12)

In [11]:
pat = pat04.sample(frac=0.01, replace=False, random_state=42)

In [12]:
pat.shape

(991, 12)

In [53]:
pat.assignees[0:1].values

array([ "[{u'assignee_key_id': u'50637', u'assignee_city': u'Redwood City', u'assignee_state_fips': u'06'}]"], dtype=object)

In [123]:
pat.applications[0:1].values

array(["[{u'app_type': u'11', u'app_id': u'11/025657'}]"], dtype=object)

In [125]:
pat.nbers[0:1].values

array([ "[{u'nber_category_title': u'Elec', u'nber_subcategory_title': u'Measuring & Testing'}]"], dtype=object)

# First look at all USPC mainclass

Realized I want to see this all as a time series. Need date: 

In [125]:
fields = 'f=["patent_id","app_date","assignee_county",\
"patent_abstract","app_type","cpc_category","cpc_group_title", "cpc_section_id","cpc_sequence",\
"cpc_subgroup_title","cpc_subsection_title","ipc_class", "ipc_main_group","ipc_section","ipc_sequence",\
"ipc_subclass","ipc_subgroup","nber_category_title","nber_subcategory_title","patent_type","uspc_mainclass_title",\
"uspc_sequence","uspc_subclass_title", "wipo_field_title","wipo_sector_title","app_date"]'

In [134]:
#for python 2, input need to be a string e.g. '2014' and '01' or '05' or '09'
year = input('year: ')
month = input('month: ')

year: '2004'
month: '09'


In [135]:
if month == '01':
    url1 = 'http://www.patentsview.org/api/patents/query?q={"_and":[{"_gte":{"app_date":"' + year + '-01-01' + '"}},{"_lt":{"app_date":"' + year + '-05-01' + '"}},{"assignee_country":"US"}]}&' + fields + '&o={"page":'
elif month == '05':
    url1 = 'http://www.patentsview.org/api/patents/query?q={"_and":[{"_gte":{"app_date":"' + year + '-05-01' + '"}},{"_lt":{"app_date":"' + year + '-09-01' + '"}},{"assignee_country":"US"}]}&' + fields + '&o={"page":'
else:
    url1 = 'http://www.patentsview.org/api/patents/query?q={"_and":[{"_gte":{"app_date":"' + year + '-09-01' + '"}},{"_lt":{"app_date":"' + str(int(year)+1) + '-01-01' + '"}},{"assignee_country":"US"}]}&' + fields + '&o={"page":'
resp = requests.request('GET', url1 + '1,"per_page":10000}')
page1 = json.loads(resp.text)
count = -(-page1['total_patent_count']//10000)
if count > 10:
    print ('Total nubmer of patents for this year and months exceeds 100,000!!')
df = pd.DataFrame(page1['patents'])
for i in range(2,count+1):
    url = url1 + str(i) + ',"per_page":10000}'
    resp = requests.request('GET', url)
    patent = json.loads(resp.text)
    df = df.append(pd.DataFrame(patent['patents']))

In [128]:
df01 = df.copy()

In [129]:
df01.shape

(31701, 10)

In [143]:
df.shape

(32647, 9)

In [132]:
df05 = df.copy()

In [133]:
df05.shape

(32647, 10)

In [136]:
df09 = df.copy()

In [137]:
df = pd.concat([df01,df05,df09],axis=0)

In [124]:
%mkdir data

In [138]:
df.to_csv('data/patents', encoding = 'utf-8')

In [151]:
df.shape

(99144, 9)

In [289]:
df_01 = pd.read_csv('data/patents')

In [290]:
df_01 = df_01.sample(frac=0.01, replace=False, random_state=42)

In [291]:
df_01.shape

(991, 11)

In [292]:
df_01.head()

Unnamed: 0.1,Unnamed: 0,IPCs,applications,assignees,cpcs,nbers,patent_abstract,patent_id,patent_type,uspcs,wipos
66161,1813,"[{u'ipc_section': u'G', u'ipc_subclass': u'L',...","[{u'app_date': u'2004-12-28', u'app_type': u'1...","[{u'assignee_key_id': u'50637', u'assignee_cou...","[{u'cpc_subgroup_title': u'Detecting, measurin...","[{u'nber_category_title': u'Elec', u'nber_subc...",Implantable pressure sensors and methods for m...,7028550,utility,[{u'uspc_mainclass_title': u'Measuring and tes...,"[{u'wipo_field_title': u'Medical technology', ..."
98885,4537,"[{u'ipc_section': None, u'ipc_subclass': None,...","[{u'app_date': u'2004-09-21', u'app_type': u'2...","[{u'assignee_key_id': u'307464', u'assignee_co...","[{u'cpc_subgroup_title': None, u'cpc_category'...","[{u'nber_category_title': None, u'nber_subcate...",,D576802,design,"[{u'uspc_mainclass_title': u'Brushware', u'usp...","[{u'wipo_field_title': None, u'wipo_sector_tit..."
26688,6688,"[{u'ipc_section': u'G', u'ipc_subclass': u'F',...","[{u'app_date': u'2004-03-22', u'app_type': u'1...","[{u'assignee_key_id': u'132861', u'assignee_co...",[{u'cpc_subgroup_title': u'Error detection; Er...,"[{u'nber_category_title': u'Cmp&Cmm', u'nber_s...",A computer cluster includes a first computer f...,7890798,utility,[{u'uspc_mainclass_title': u'Error detection/c...,"[{u'wipo_field_title': u'Computer technology',..."
93118,8770,"[{u'ipc_section': u'B', u'ipc_subclass': u'B',...","[{u'app_date': u'2004-09-02', u'app_type': u'1...","[{u'assignee_key_id': u'128907', u'assignee_co...","[{u'cpc_subgroup_title': u'Processes, other th...","[{u'nber_category_title': u'Others', u'nber_su...",A coated wood board flooring having improved m...,7972707,utility,[{u'uspc_mainclass_title': u'Stock material or...,"[{u'wipo_field_title': u'Surface technology, c..."
45074,3373,"[{u'ipc_section': u'B', u'ipc_subclass': u'D',...","[{u'app_date': u'2004-07-01', u'app_type': u'1...","[{u'assignee_key_id': u'358202', u'assignee_co...",[{u'cpc_subgroup_title': u'Cartridge filters o...,"[{u'nber_category_title': u'Chemical', u'nber_...",Single-use long-life faucet mounted water filt...,7252757,utility,[{u'uspc_mainclass_title': u'Liquid purificati...,[{u'wipo_field_title': u'Chemical engineering'...


In [6]:
df_01.uspcs[0:1]

66161    [{'uspc_mainclass_title': 'Measuring and testi...
Name: uspcs, dtype: object

## Goal: Make dataframe that has

ID / Date / Abstract / app_id / USPCS: MainClass/  USPCS: subClass / CPC: MainClass / etc...

In [None]:
# First go obtain desired info from each part.

In [92]:
import ast
s = ast.literal_eval(df_01.applications.tolist()[2])

In [None]:
# make whole column into literal values

In [123]:
df_01.applications[0:1].values

array(["[{'app_date': '2004-12-28', 'app_type': '11', 'app_id': '11/025657'}]"], dtype=object)

In [107]:
df_01['date'] = df_01['applications'].apply(lambda x: ast.literal_eval(x)[0].values()[0])

In [None]:
# Start here

In [293]:
def columnMaker(df, column, var):
    """Makes column from json dataframe nested list"""
    return df[column].apply(lambda x: ast.literal_eval(x)[0].values()[var])

In [294]:
df_01['date'] = columnMaker(df_01, "applications", 0)
df_01['app_id'] = columnMaker(df_01, "applications", 2)

In [295]:
df_01.nbers[0:1].values

array([ "[{u'nber_category_title': u'Elec', u'nber_subcategory_title': u'Measuring & Testing'}]"], dtype=object)

In [296]:
df_01['county'] = columnMaker(df_01, "assignees", 1)

In [254]:
ast.literal_eval(df_01.nbers[0:1].values[0])

[{u'nber_category_title': u'Elec',
  u'nber_subcategory_title': u'Measuring & Testing'}]

In [297]:
df_01['nber_cat'] = columnMaker(df_01, "nbers", 0)

In [298]:
df_01['nber_subcat'] = columnMaker(df_01, "nbers", 1)

In [299]:
df_01['wipos_cat'] = columnMaker(df_01, "wipos", 1)

In [300]:
df_01.head()

Unnamed: 0.1,Unnamed: 0,IPCs,applications,assignees,cpcs,nbers,patent_abstract,patent_id,patent_type,uspcs,wipos,date,app_id,county,nber_cat,nber_subcat,wipos_cat
66161,1813,"[{u'ipc_section': u'G', u'ipc_subclass': u'L',...","[{u'app_date': u'2004-12-28', u'app_type': u'1...","[{u'assignee_key_id': u'50637', u'assignee_cou...","[{u'cpc_subgroup_title': u'Detecting, measurin...","[{u'nber_category_title': u'Elec', u'nber_subc...",Implantable pressure sensors and methods for m...,7028550,utility,[{u'uspc_mainclass_title': u'Measuring and tes...,"[{u'wipo_field_title': u'Medical technology', ...",2004-12-28,11/025657,Richland,Elec,Measuring & Testing,Instruments
98885,4537,"[{u'ipc_section': None, u'ipc_subclass': None,...","[{u'app_date': u'2004-09-21', u'app_type': u'2...","[{u'assignee_key_id': u'307464', u'assignee_co...","[{u'cpc_subgroup_title': None, u'cpc_category'...","[{u'nber_category_title': None, u'nber_subcate...",,D576802,design,"[{u'uspc_mainclass_title': u'Brushware', u'usp...","[{u'wipo_field_title': None, u'wipo_sector_tit...",2004-09-21,29/213606,Harris County,,,
26688,6688,"[{u'ipc_section': u'G', u'ipc_subclass': u'F',...","[{u'app_date': u'2004-03-22', u'app_type': u'1...","[{u'assignee_key_id': u'132861', u'assignee_co...",[{u'cpc_subgroup_title': u'Error detection; Er...,"[{u'nber_category_title': u'Cmp&Cmm', u'nber_s...",A computer cluster includes a first computer f...,7890798,utility,[{u'uspc_mainclass_title': u'Error detection/c...,"[{u'wipo_field_title': u'Computer technology',...",2004-03-22,10/806261,Harris County,Cmp&Cmm,Computer Hardware & Software,Electrical engineering
93118,8770,"[{u'ipc_section': u'B', u'ipc_subclass': u'B',...","[{u'app_date': u'2004-09-02', u'app_type': u'1...","[{u'assignee_key_id': u'128907', u'assignee_co...","[{u'cpc_subgroup_title': u'Processes, other th...","[{u'nber_category_title': u'Others', u'nber_su...",A coated wood board flooring having improved m...,7972707,utility,[{u'uspc_mainclass_title': u'Stock material or...,"[{u'wipo_field_title': u'Surface technology, c...",2004-09-02,10/932519,,Others,Miscellaneous,Chemistry
45074,3373,"[{u'ipc_section': u'B', u'ipc_subclass': u'D',...","[{u'app_date': u'2004-07-01', u'app_type': u'1...","[{u'assignee_key_id': u'358202', u'assignee_co...",[{u'cpc_subgroup_title': u'Cartridge filters o...,"[{u'nber_category_title': u'Chemical', u'nber_...",Single-use long-life faucet mounted water filt...,7252757,utility,[{u'uspc_mainclass_title': u'Liquid purificati...,[{u'wipo_field_title': u'Chemical engineering'...,2004-07-01,10/883156,Dover,Chemical,Miscellaneous,Chemistry


In [215]:
df_01.uspcs[0:1].values

array([ "[{u'uspc_mainclass_title': u'Measuring and testing', u'uspc_sequence': u'0', u'uspc_subclass_title': u'Diaphragm'}]"], dtype=object)

In [230]:
ast.literal_eval(df_01.applications[0:1].values[0])[0].values()[0]

u'2004-12-28'

In [221]:
ast.literal_eval(df_01.applications[0:1].values[0])[0]

{u'app_date': u'2004-12-28', u'app_id': u'11/025657', u'app_type': u'11'}

In [271]:
ast.literal_eval(df_01.cpcs[0:1].values[0])[0].values()[2]

u'Diagnosis; surgery; identification'

In [None]:
## Not working for cpcs and ipcs

In [304]:
df_01.nber_subcat.unique()

array([u'Measuring & Testing', None, u'Computer Hardware & Software',
       u'Miscellaneous', u'Transportation', u'Communications',
       u'Metal Working', u'Electronic business methods and software',
       u'Coating', u'Organic Compounds', u'Drugs', u'Mat. Proc & Handling',
       u'Receptacles', u'Electrical Lighting', u'Surgery & Med Inst.',
       u'Electrical Devices', u'Nuclear & X-rays',
       u'Computer Peripherials', u'Semiconductor Devices',
       u'Pipes & Joints', u'Genetics', u'Resins', u'Information Storage',
       u'Earth Working & Wells', u'Power Systems', u'Heating',
       u'Agriculture,Husbandry,Food', u'Agriculture,Food,Textiles',
       u'Motors & Engines + Parts', u'Optics', u'Apparel & Textile',
       u'Amusement Devices', u'Furniture,House Fixtures', u'Gas'], dtype=object)

In [None]:
# Group by Month

In [313]:
df_01['date'] =  pd.to_datetime(df_01['date'])

In [314]:
#Reindex by date
df_01.index=df_01['date']

In [320]:
df_01[df_01.index.month == 1].nber_subcat.unique()

array([u'Electrical Devices', u'Drugs', u'Semiconductor Devices',
       u'Miscellaneous', u'Measuring & Testing', None,
       u'Computer Hardware & Software', u'Metal Working',
       u'Communications', u'Electronic business methods and software',
       u'Organic Compounds', u'Computer Peripherials',
       u'Furniture,House Fixtures', u'Power Systems', u'Heating',
       u'Apparel & Textile', u'Agriculture,Husbandry,Food',
       u'Information Storage', u'Electrical Lighting', u'Optics'], dtype=object)

In [321]:
df_01[df_01.index.month == 2].nber_subcat.unique()

array([u'Metal Working', u'Miscellaneous', u'Computer Hardware & Software',
       u'Electrical Devices', None, u'Electrical Lighting',
       u'Communications', u'Resins',
       u'Electronic business methods and software', u'Organic Compounds',
       u'Information Storage', u'Motors & Engines + Parts',
       u'Measuring & Testing', u'Transportation', u'Optics', u'Drugs',
       u'Agriculture,Food,Textiles', u'Power Systems', u'Coating',
       u'Pipes & Joints', u'Surgery & Med Inst.', u'Mat. Proc & Handling',
       u'Semiconductor Devices', u'Earth Working & Wells',
       u'Amusement Devices', u'Apparel & Textile', u'Nuclear & X-rays'], dtype=object)

In [None]:
# Get shapefile for county

In [None]:
# not working for uspcs

In [206]:
ast.literal_eval(df_01.uspcs.tolist()[1]).values()

AttributeError: 'list' object has no attribute 'values'

In [147]:
df_01['uspcs_mainclass'] = columnMaker(df_01, "uspcs", 0)

SyntaxError: invalid syntax (<unknown>, line 1)

In [186]:
df_01['app_id'] = df_01['applications'].apply(lambda x: x[0].values()[2])

### First lets focus on USPCS, could be interesting to look into how the different classificaitons compare

In [None]:
# Next make column for USPCS

In [260]:
df_01['uspcs_mainclass'] = df_01['uspcs'].apply(lambda x: x[0].values()[0])

AttributeError: 'str' object has no attribute 'values'

In [209]:
df_01['uspcs_subclass'] = df_01['uspcs'].apply(lambda x: x[0].values()[2])

In [None]:
# CPCS

In [226]:
df_01['cpc_group_title'] = df_01['cpcs'].apply(lambda x: x[0].values()[2])

In [231]:
df_01['cpc_subsection_title'] = df_01['cpcs'].apply(lambda x: x[0].values()[4])

In [247]:
# nber
df_01['nber_category'] = df_01['nbers'].apply(lambda x: x[0].values()[0])

In [253]:
df_01['nber_subcategory'] = df_01['nbers'].apply(lambda x: x[0].values()[1])

In [262]:
# wipos
df_01['wipos_field'] = df_01['wipos'].apply(lambda x: x[0].values()[0])
df_01['wipos_sector'] = df_01['wipos'].apply(lambda x: x[0].values()[1])


In [263]:
df_01.head()

Unnamed: 0,IPCs,applications,cpcs,nbers,patent_abstract,patent_id,patent_type,uspcs,wipos,date,app_id,uspcs_mainclass,uspcs_subclass,cpc_group_title,cpc_subsection_title,nber_category,nber_subcategory,wipos_field,wipos_sector
1813,"[{u'ipc_section': u'G', u'ipc_subclass': u'L',...","[{u'app_date': u'2004-12-28', u'app_type': u'1...","[{u'cpc_subgroup_title': u'Detecting, measurin...","[{u'nber_category_title': u'Elec', u'nber_subc...",Implantable pressure sensors and methods for m...,7028550,utility,[{u'uspc_mainclass_title': u'Measuring and tes...,"[{u'wipo_field_title': u'Medical technology', ...",2004-12-28,11/025657,Measuring and testing,Diaphragm,Diagnosis; surgery; identification,Medical or veterinary science; hygiene,Elec,Measuring & Testing,Medical technology,Instruments
4537,"[{u'ipc_section': None, u'ipc_subclass': None,...","[{u'app_date': u'2004-09-21', u'app_type': u'2...","[{u'cpc_subgroup_title': None, u'cpc_category'...","[{u'nber_category_title': None, u'nber_subcate...",,D576802,design,"[{u'uspc_mainclass_title': u'Brushware', u'usp...","[{u'wipo_field_title': None, u'wipo_sector_tit...",2004-09-21,29/213606,Brushware,Radially arranged bristles,,,,,,
6688,"[{u'ipc_section': u'G', u'ipc_subclass': u'F',...","[{u'app_date': u'2004-03-22', u'app_type': u'1...",[{u'cpc_subgroup_title': u'Error detection; Er...,"[{u'nber_category_title': u'Cmp&Cmm', u'nber_s...",A computer cluster includes a first computer f...,7890798,utility,[{u'uspc_mainclass_title': u'Error detection/c...,"[{u'wipo_field_title': u'Computer technology',...",2004-03-22,10/806261,Error detection/correction and fault detection...,"Concurrent, redundantly operating processors",Electric digital data processing,Computing; calculating; counting,Cmp&Cmm,Computer Hardware & Software,Computer technology,Electrical engineering
8770,"[{u'ipc_section': u'B', u'ipc_subclass': u'B',...","[{u'app_date': u'2004-09-02', u'app_type': u'1...","[{u'cpc_subgroup_title': u'Processes, other th...","[{u'nber_category_title': u'Others', u'nber_su...",A coated wood board flooring having improved m...,7972707,utility,[{u'uspc_mainclass_title': u'Stock material or...,"[{u'wipo_field_title': u'Surface technology, c...",2004-09-02,10/932519,Stock material or miscellaneous articles,Of wood,Processes for applying liquids or other fluent...,Spraying or atomising in general; applying liq...,Others,Miscellaneous,"Surface technology, coating",Chemistry
3373,"[{u'ipc_section': u'B', u'ipc_subclass': u'D',...","[{u'app_date': u'2004-07-01', u'app_type': u'1...",[{u'cpc_subgroup_title': u'Cartridge filters o...,"[{u'nber_category_title': u'Chemical', u'nber_...",Single-use long-life faucet mounted water filt...,7252757,utility,[{u'uspc_mainclass_title': u'Liquid purificati...,[{u'wipo_field_title': u'Chemical engineering'...,2004-07-01,10/883156,Liquid purification or separation,Responsive to fluid flow,Separation,Physical or chemical processes or apparatus in...,Chemical,Miscellaneous,Chemical engineering,Chemistry


In [269]:
df_02 = df_01.drop([ u'IPCs',         u'applications',
                       u'cpcs',                u'nbers', 
                u'patent_type',                u'uspcs',
                      u'wipos',],axis=1)

In [270]:
df_02.head()

Unnamed: 0,patent_abstract,patent_id,date,app_id,uspcs_mainclass,uspcs_subclass,cpc_group_title,cpc_subsection_title,nber_category,nber_subcategory,wipos_field,wipos_sector
1813,Implantable pressure sensors and methods for m...,7028550,2004-12-28,11/025657,Measuring and testing,Diaphragm,Diagnosis; surgery; identification,Medical or veterinary science; hygiene,Elec,Measuring & Testing,Medical technology,Instruments
4537,,D576802,2004-09-21,29/213606,Brushware,Radially arranged bristles,,,,,,
6688,A computer cluster includes a first computer f...,7890798,2004-03-22,10/806261,Error detection/correction and fault detection...,"Concurrent, redundantly operating processors",Electric digital data processing,Computing; calculating; counting,Cmp&Cmm,Computer Hardware & Software,Computer technology,Electrical engineering
8770,A coated wood board flooring having improved m...,7972707,2004-09-02,10/932519,Stock material or miscellaneous articles,Of wood,Processes for applying liquids or other fluent...,Spraying or atomising in general; applying liq...,Others,Miscellaneous,"Surface technology, coating",Chemistry
3373,Single-use long-life faucet mounted water filt...,7252757,2004-07-01,10/883156,Liquid purification or separation,Responsive to fluid flow,Separation,Physical or chemical processes or apparatus in...,Chemical,Miscellaneous,Chemical engineering,Chemistry


In [271]:
df_02.shape

(991, 12)

In [279]:
len(df_02.date.unique())

256

In [275]:
len(df_02.uspcs_mainclass.unique())

227

In [276]:
len(df_02.cpc_group_title.unique())

226

In [277]:
len(df_02.nber_category.unique())

7

In [278]:
df_02.nber_category.unique()

array([u'Elec', None, u'Cmp&Cmm', u'Others', u'Chemical', u'Mech',
       u'Drgs&Med'], dtype=object)

# By month, lets see what categories we see

In [298]:
type(df_02.date)

pandas.core.series.Series

In [285]:
df_02['date'] =  pd.to_datetime(df_02['date'])

In [297]:
df_02.loc[1813][['date']]

date    2004-12-28 00:00:00
Name: 1813, dtype: object

In [303]:
type(df_02['date'][1813])

pandas.tslib.Timestamp

In [308]:
#Reindex by date
df_02.index=df_02['date']

In [311]:
df_02.head()

Unnamed: 0_level_0,patent_abstract,patent_id,date,app_id,uspcs_mainclass,uspcs_subclass,cpc_group_title,cpc_subsection_title,nber_category,nber_subcategory,wipos_field,wipos_sector
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2004-12-28,Implantable pressure sensors and methods for m...,7028550,2004-12-28,11/025657,Measuring and testing,Diaphragm,Diagnosis; surgery; identification,Medical or veterinary science; hygiene,Elec,Measuring & Testing,Medical technology,Instruments
2004-09-21,,D576802,2004-09-21,29/213606,Brushware,Radially arranged bristles,,,,,,
2004-03-22,A computer cluster includes a first computer f...,7890798,2004-03-22,10/806261,Error detection/correction and fault detection...,"Concurrent, redundantly operating processors",Electric digital data processing,Computing; calculating; counting,Cmp&Cmm,Computer Hardware & Software,Computer technology,Electrical engineering
2004-09-02,A coated wood board flooring having improved m...,7972707,2004-09-02,10/932519,Stock material or miscellaneous articles,Of wood,Processes for applying liquids or other fluent...,Spraying or atomising in general; applying liq...,Others,Miscellaneous,"Surface technology, coating",Chemistry
2004-07-01,Single-use long-life faucet mounted water filt...,7252757,2004-07-01,10/883156,Liquid purification or separation,Responsive to fluid flow,Separation,Physical or chemical processes or apparatus in...,Chemical,Miscellaneous,Chemical engineering,Chemistry


In [315]:
df_02.loc[df_02['date'].isin(['2010-08', '2011-08'])]

Unnamed: 0_level_0,patent_abstract,patent_id,date,app_id,uspcs_mainclass,uspcs_subclass,cpc_group_title,cpc_subsection_title,nber_category,nber_subcategory,wipos_field,wipos_sector
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2004-01-31,56,59,59,59,59,59,55,55,55,55,55,55
2004-02-29,83,85,85,85,85,85,82,82,82,82,82,82
2004-03-31,72,78,78,78,77,77,72,72,71,71,72,72
2004-04-30,75,79,79,79,79,79,73,73,73,73,73,73
2004-05-31,80,87,87,87,87,87,79,79,79,79,79,79
2004-06-30,105,113,113,113,113,113,104,104,104,104,104,104
2004-07-31,55,61,61,61,61,61,55,55,55,55,55,55
2004-08-31,79,85,85,85,85,85,78,78,79,79,78,78
2004-09-30,81,87,87,87,87,87,80,80,80,80,80,80
2004-10-31,73,81,81,81,81,81,73,73,73,73,73,73
