Skip to content

A JavaScript utility for scraping structured recipe data from the web

Notifications You must be signed in to change notification settings

noemata83/kitchenhand

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kitchenhand

A simple scraper for extracting structured recipe data from the web

This is a simple utility module that attempts to retrieve and parse recipe data from websites that employ the structured data recipe schema defined at Schema.org. The scraper can extract data both from webpages that store recipe in embedded ld+json as well as recipes that flag DOM elements with itemprops. That means that Kitchenhand should be able to handle the majority of recipes that are stored using the common schema, provided that the implementation of that schema in the markup is reasonably sane.

Basic Usage

const kitchenhand = require('kitchenhand');

kitchenhand(<url>).then(recipe => {
    console.log(recipe);
});

Kitchenhand will return either a recipe object, or an error message if recipe data could not be retrieved from the specified URL:

{ 
    message: "Could not retrieve recipe data from <url>"
}

Options

The call to the kitchenhand() function also optionally accepts an options parameter. At present, however, only the parseIngredients option is handled.

parseIngredients

When passing the parseIngredients option, Kitchenhand will attempt to parse the list of ingredients into objects with three properties: amount, unit, and name.

kitchenhand(<url>, { parseIngredients }).then(recipe => console.log(recipe));

If all goes well, this will result in a recipeIngredient array that looks like the following:

Recipe {
    ...,
    recipeIngredient: [
        { amount: '1/2', unit: 'cup ', name: 'fresh parsley leaves' },
        { amount: '1/2', unit: 'cup ', name: 'fresh cilantro leaves' },
        ...
    ],
}

Currently, the parseIngredients function relies on a regexp-driven algorithm that is not exceptionally robust, especially in relation to the great variety of formats in which ingredients are listed in recipes today. Expect that your mileage may vary!

About

A JavaScript utility for scraping structured recipe data from the web

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published