Visual Descriptor Extraction

In this directoty we provide two datasets. The first one Ground Truth Corpus is a small one used for NLP purposes and the second one Patent Figure Dataset is used for CV purposes.

1. Ground Truth Corpus

This dataset contains figure captions for design patents from the USPTO database. Objects and Aspects are highlighted. The annotation follows a BIO schema.

This dataset is all about text. It can be used to train models in NLP domain.

2. Patent Figure Dataset

This dataset contains 66417 design patent figures along with their corresponding visual descriptors and metadata.
Figures are in total 3G and they can be found in Google Drive link: https://drive.google.com/file/d/1Zc3ApBMtFh-Avk1PcZGFSc44mr-SLuUB/view?usp=sharing
Figures are in PNG format.

Visual descriptors and metadata are in a txt file which can be found in this derectory. This file gives the following infomation:

patentID: This is the patent ID in the USPTO database. One patent has a unique ID
patentdate: This is the data the patent was released.
figid: This is the index for figures within a patent. A patent may contain many figures.
caption: This is the figure caption.
object: What is the object in the figure
aspect: Which aspect of view is presented.
figure_file: This is the file name for a figure. It can be used to match figures in the dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
Ground Truth NLP		Ground Truth NLP
Patent Figure Dataset CV		Patent Figure Dataset CV
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visual Descriptor Extraction

1. Ground Truth Corpus

2. Patent Figure Dataset

About

Releases

Packages

lamps-lab/Visual-Descriptor

Folders and files

Latest commit

History

Repository files navigation

Visual Descriptor Extraction

1. Ground Truth Corpus

2. Patent Figure Dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages