Skip to content

The ImPaKT (Implication Parsing and Knowledge exTraction) dataset contains semantic parsing annotations for 2489 sentences from shopping web pages in the C4 corpus, corresponding to annotations of 3719 expressed implication relationships and 6117 typed and summarized attributes.

License

google-research-datasets/impakt

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 

ImPaKT: A Dataset for Open-Schema Knowledge Base Construction

This dataset contains semantic parsing annotations for 2489 sentences from shopping web pages in the C4 corpus, corresponding to annotations of 3719 expressed implication relationships and 6117 typed and summarized attributes.

The data is released under the CC BY 4.0 license.

More details can be found in an upcoming paper.

The dataset is in JSON Lines format, where each line is a json object with the following schema:

{
  "snippet": "",
  "provenance": {
    "url": "https://pleasanthearthfireplacedoors.com/gas-logs/vented-vs-vent-free-gas-logs-which-one-to-get/",
    "timestamp": "2019-04-22T20:53:43Z",
    "span_start": 360,
    "span_end": 435
  },
  "category": "Home & Garden > Fireplaces",
  "classification": "Yes",
  "attributes": [
    {
      "attribute": "ambiance",
      "summary": "ambiance"
    },
    {
      "attribute": "flickering fire",
      "summary": "flickering fire"
    }
  ],
  "atomic_attributes": {
    "ambiance": [
      {
        "attribute": "ambiance",
        "summary": "ambiance",
        "attribute_type": "use case"
      }
    ],
    "flickering fire": [
      {
        "attribute": "flickering fire",
        "summary": "flickering fire",
        "attribute_type": "features"
      }
    ]
  },
  "implications": [
    {
      "antecedent": "flickering fire",
      "consequent": "ambiance"
    }
  ]
}

The provenance information lets users join onto the C4 corpus to find the snippet that was annotated. The C4 corpus can be constructed using the scripts at the original dataset listing, or a pre-constructed version can be used, such as the one created by AI2 and hosted by HuggingFace.

About

The ImPaKT (Implication Parsing and Knowledge exTraction) dataset contains semantic parsing annotations for 2489 sentences from shopping web pages in the C4 corpus, corresponding to annotations of 3719 expressed implication relationships and 6117 typed and summarized attributes.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published