This application helps to parse XML files from the USPTO trademark public data that it is available in bulk form. From the XML files this packages generates python dictionaries that can be easily analyze or create CSV files to be work with other analytical tools. USPTO searchable data is viewable through a search interface on the Open Data site.
https://developer.uspto.gov/product/trademark
System requirements
- Python 3
Python Hard Dependencies
- xml
- zipfile
- gzip
- bz2
To install the package located source file on your system then run:
python setup install
With this notebook and the uspto package you can parse the XML raw trademark data from the provided by USPTO.
import pandas as pd
import uspto as pto
# Path to data
path = "data/apc161231-56_sample.xml"
data = pto.openUSPTO(path)
Getting the root might take a couple of minutes depending on size of the XML file and the RAM of your machine.
data = pto.openUSPTO(path)
root = data.getroot()
With the pto.getDetails(root)
function we can extract useful information about the XML file also the volume of the trademark applications on the file.
details = pto.getDetails(root)
pd.DataFrame.from_dict(details,orient='index')
0 | |
---|---|
version-no | 2.0 |
creation-datetime | 201702250716 |
version-date | 20041108 |
file-segment | TRMK |
action-key | TX |
case-files-vol | 40382 |
Extract the case file header data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.
file_header = pto.getFileHeader(root)
table = pd.DataFrame.from_dict(file_header, orient='index')
table.head()
location-date | use-application-currently-in | amended-to-itu-application-in | filing-basis-filed-as-44d-in | collective-trademark-in | section-8-accepted-in | standard-characters-claimed-in | drawing-3d-filed-in | foreign-priority-in | color-drawing-current-in | ... | filing-date | attorney-name | attorney-docket-number | employee-name | law-office-assigned-location-code | published-for-opposition-date | domestic-representative-name | abandonment-date | amend-to-register-date | registration-date | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
87252004 | 20161205 | T | F | F | F | F | F | F | F | T | ... | 20161130 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
87252005 | 20161205 | F | F | F | F | F | F | F | F | F | ... | 20161130 | Julie A. Hopkins | 100859.1.7 | NaN | NaN | NaN | Julie A. Hopkins | NaN | NaN | NaN |
87252006 | 20161205 | F | F | F | F | F | T | F | F | F | ... | 20161130 | Paul R. Fransway | 73285-2 | NaN | NaN | NaN | Paul R. Fransway | NaN | NaN | NaN |
87252007 | 20161205 | T | F | F | F | F | F | F | F | T | ... | 20161130 | Christopher J. Woods | 1010933 | NaN | NaN | NaN | Christopher J. Woods | NaN | NaN | NaN |
87252008 | 20161205 | F | F | F | F | F | F | F | F | F | ... | 20161130 | Julie A. Hopkins | 100859.1.7 | NaN | NaN | NaN | Julie A. Hopkins | NaN | NaN | NaN |
5 rows Ă— 64 columns
table.to_csv("casefileHeader.csv")
Extract the case file classification data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.
classifications = pto.getClassifications(root)
data = []
for k in classifications.keys():
for d in classifications[k]:
data.append(classifications[k][d])
table = pd.DataFrame(data)
table.head()
first-use-anywhere-date | first-use-in-commerce-date | international-code | international-code-total-no | primary-code | serial-number | status-code | status-date | us-code | us-code-total-no | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 042 | 1 | 042 | 87326720 | 6 | 20170210 | 100,101 | 2 |
1 | 0 | 0 | 025 | 1 | 025 | 87331869 | 6 | 20170216 | 022,039 | 2 |
2 | 0 | 0 | 009 | 1 | 009 | 87326722 | 6 | 20170210 | 021,023,026,036,038 | 5 |
3 | 0 | 0 | 016 | 1 | 016 | 87326722 | 6 | 20170210 | 002,005,022,023,029,037,038,050 | 8 |
4 | 0 | 0 | 036 | 1 | 036 | 87326722 | 6 | 20170210 | 100,101,102 | 3 |
table.to_csv("classifications.csv")
Extract the case file classification codes from the XML file, this table can also be obtanied from the classification table. This function creates a dictionary that can be transform as a table using Pandas.
classification_codes = pto.getClassificationCodes(root)
data = []
for k in classification_codes.keys():
for d in classification_codes[k]:
data.append(classification_codes[k][d])
table = pd.DataFrame(data)
table.head()
international-code | serial-number | us-code | |
---|---|---|---|
0 | 042 | 87326720 | 100,101 |
1 | 025 | 87331869 | 022,039 |
2 | 009 | 87326722 | 021,023,026,036,038 |
3 | 016 | 87326722 | 002,005,022,023,029,037,038,050 |
4 | 036 | 87326722 | 100,101,102 |
table.to_csv("classification_codes.csv")
Extract the case file Design Search data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.
design = pto.getDesignSearch(root)
data = []
for k in design.keys():
for d in design[k]:
data.append(design[k][d])
table = pd.DataFrame(data)
table.head()
code | serial-number | |
---|---|---|
0 | 031519 | 87326722 |
1 | 031524 | 87326722 |
2 | 031525 | 87326722 |
3 | 260121 | 87326722 |
4 | 021108 | 87277572 |
table.to_csv("designSearch.csv")
Extract the case file owners data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.
owners = pto.getFileOwners(root)
data = []
for k in owners.keys():
for d in owners[k]:
data.append(owners[k][d])
table = pd.DataFrame(data)
table.head()
address-1 | address-2 | city | composed-of-statement | country | dba-aka-text | entity-statement | entry-number | legal-entity-type-code | nationality | other | party-name | party-type | postcode | serial-number | state | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 637 W 58th St | NaN | Kansas City | NaN | NaN | NaN | NaN | 1 | 16 | {'state': 'MO'} | NaN | MSMJ | 10 | 64113 | 87326720 | MO |
1 | 12243 Washington Ave | NaN | Blue Island | NaN | NaN | NaN | NaN | 1 | 01 | {'country': 'US'} | NaN | Greg English | 10 | 60406 | 87331869 | IL |
2 | 5100 South I-35 Service Rd | NaN | Oklahoma City | NaN | NaN | NaN | chartered bank | 1 | 99 | {'state': 'OK'} | NaN | Frontier State Bank | 10 | 73129 | 87326722 | OK |
3 | P.O. Box 943 | 1621 East Electric Avenue | McAlester | NaN | NaN | NaN | NaN | 1 | 03 | {'state': 'OK'} | NaN | Big V Feeds, Inc. | 10 | 74502 | 87326723 | OK |
4 | 6900 Interbay Blvd | NaN | Tampa | NaN | NaN | NaN | NaN | 1 | 16 | {'state': 'FL'} | NaN | LJ Avalon LLC | 10 | 33616 | 87320958 | FL |
table.to_csv("fileOwners.csv")
Extract the case file statements data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.
statements = pto.getFileStatements(root)
data = []
for k in statements.keys():
for d in statements[k]:
data.append(statements[k][d])
table = pd.DataFrame(data)
table.head()
serial-number | text | type-code | |
---|---|---|---|
0 | 87326720 | Inspecting buildings for the existence of mold | GS0421 |
1 | 87331869 | Athletic apparel, namely, headwear; headwear | GS0251 |
2 | 87331869 | MASTER KICK MAN | PM0001 |
3 | 87326722 | The color(s) blue, white, and grey is/are clai... | CC0000 |
4 | 87326722 | The mark consists of a white soaring eagle wit... | DM0000 |
table.to_csv("fileStatements.csv")
Extract the case file Foreign Applications data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.
foreign = pto.getForeignApplications(root)
data = []
for k in foreign.keys():
for d in foreign[k]:
data.append(foreign[k][d])
table = pd.DataFrame(data)
table.head()
application-number | country | entry-number | filing-date | foreign-priority-claim-in | other | registration-date | registration-expiration-date | registration-number | registration-renewal-date | serial-number | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 569192 | PT | 1 | 20160812 | T | NaN | NaN | NaN | NaN | NaN | 87330826 |
1 | 015719925 | EM | 1 | 20160803 | T | NaN | NaN | NaN | NaN | NaN | 87322637 |
2 | 302016033472 | DE | 1 | 20161124 | T | NaN | NaN | NaN | NaN | NaN | 87322641 |
3 | 016181281 | EU | 1 | 20161219 | T | NaN | NaN | NaN | NaN | NaN | 87273490 |
4 | 1777139 | AU | 1 | 20160616 | T | NaN | NaN | NaN | NaN | NaN | 87262553 |
table.to_csv("foreignApplications.csv")
Extract the case file Prior Applications data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.
prior = pto.getPriorApplications(root)
data = []
for k in prior.keys():
for d in prior[k]:
data.append(prior[k][d])
table = pd.DataFrame(data)
table.head()
number | other-related-in | prior-registration-application | relationship-type | serial-number | |
---|---|---|---|---|---|
0 | 3487431 | F | 2 | 0 | 87261195 |
1 | 4739670 | F | 2 | 0 | 87261195 |
2 | 1186117 | F | 3 | 0 | 87273474 |
3 | 3053476 | F | 3 | 0 | 87273474 |
4 | 4447492 | F | 3 | 0 | 87273474 |
table.to_csv("priorApplications.csv")
Extract the case file events data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.
events = pto.getFileEvent(root)
data = []
for k in events.keys():
for d in events[k]:
data.append(events[k][d])
table = pd.DataFrame(data)
table.head()
code | date | description-text | number | serial-number | type | |
---|---|---|---|---|---|---|
0 | NWOS | 20170210 | NEW APPLICATION OFFICE SUPPLIED DATA ENTERED I... | 2 | 87326720 | I |
1 | NWAP | 20170210 | NEW APPLICATION ENTERED IN TRAM | 1 | 87326720 | I |
2 | MPMK | 20170217 | NOTICE OF PSEUDO MARK E-MAILED | 3 | 87331869 | E |
3 | NWOS | 20170216 | NEW APPLICATION OFFICE SUPPLIED DATA ENTERED I... | 2 | 87331869 | I |
4 | NWAP | 20170214 | NEW APPLICATION ENTERED IN TRAM | 1 | 87331869 | I |
table.to_csv("fileEvent.csv")
Extract the case file correspondent data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.
correspondent = pto.getCorrespondent(root)
data = []
for k in correspondent.keys():
data.append(correspondent[k])
table = pd.DataFrame(data)
table.head()
address-1 | address-2 | address-3 | address-4 | address-5 | serial-number | |
---|---|---|---|---|---|---|
0 | MSMJ | 637 W 58TH ST | KANSAS CITY, MO 64113 | NaN | NaN | 87326720 |
1 | KELLY A. DONAHUE | VERRILL DANA, LLP | ONE PORTLAND SQUARE | PORTLAND, ME 04112-0586 | NaN | 87325322 |
2 | BARBOSA, JAIME | 15921 SW 61 STREET | DAVIE, FL 33331 | NaN | NaN | 87326721 |
3 | SCOTT NYMAN | NYMAN IP LLC | 20 NORTH WACKER DRIVE, SUITE 1200 | CHICAGO, IL 60606 | NaN | 87331869 |
4 | JASON GOLDSMITH | GOLDSMITH ASSOCIATES, PLLC | P.O. BOX 140091 | P.O. BOX 140091 | DALLAS, TX 75214 | 87326722 |
table.to_csv("correspondent.csv")
Extract the case file Madrid Filing data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.
madrid_filing = pto.getMadridFiling(root)
data = []
for k in madrid_filing.keys():
data.append(madrid_filing[k])
table = pd.DataFrame(data)
table.head()
entry-number | international-registration-date | international-registration-number | international-renewal-date | international-status-code | international-status-date | irregularity-reply-by-date | madrid-history-events | original-filing-date-uspto | reference-number | serial-number | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | NaN | NaN | NaN | 403 | 20170213 | NaN | 3 | 20170210 | A0064942 | 87322369 |
1 | 1 | NaN | NaN | NaN | 403 | 20170214 | NaN | 3 | 20170213 | A0064960 | 87328683 |
2 | 1 | NaN | NaN | NaN | 403 | 20170213 | NaN | 3 | 20170210 | A0064942 | 87322372 |
3 | 1 | NaN | NaN | NaN | 403 | 20170216 | NaN | 3 | 20170214 | A0064995 | 87330276 |
4 | 1 | NaN | NaN | NaN | 403 | 20170213 | NaN | 3 | 20170210 | A0064942 | 87322374 |
table.to_csv("madridFiling.csv")
Extract the case file Madrid Events data from the XML file. This function creates a dictionary that can be transform as a table using Pandas.
madrid_events = pto.getMadridEvents(root)
data = []
for k in madrid_events.keys():
for d in madrid_events[k]:
data.append(madrid_events[k][d])
table = pd.DataFrame(data)
table.head()
code | date | description-text | entry-number | serial-number | |
---|---|---|---|---|---|
0 | NEWAP | 20170210 | NEW APPLICATION FOR IR RECEIVED | 1 | 87322369 |
1 | MCERT | 20170213 | MANUALLY CERTIFIED | 2 | 87322369 |
2 | APPST | 20170213 | IR CERTIFIED AND SENT TO IB | 3 | 87322369 |
3 | NEWAP | 20170213 | NEW APPLICATION FOR IR RECEIVED | 1 | 87328683 |
4 | MCERT | 20170214 | MANUALLY CERTIFIED | 2 | 87328683 |
table.to_csv("madridEvents.csv")
The following table schema diagram from 2015 is a good example of what you can expect to be on the USPTO trademark data.