Skip to content

suzuken/extract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Extract

Build Status Go Report Card

Extract is HTML Extractor. This extractor is based on wedata.

Acknowledgement

  • items.json is originally from http://wedata.net/databases/LDRFullFeed/items.json.
  • Currently, Extract only works for URLs which in wedata.

How to use

From _example,

package main

import (
	"flag"
	"fmt"
	"log"
	"os"

	"github.com/suzuken/extract"
)

func main() {
	var (
		rawurl = flag.String("url", "http://example.com", "url for extract")
	)
	flag.Parse()
	ex := extract.New()
	if rule := ex.Match(*rawurl); rule == nil {
		log.Printf("%s doesn't match in rule", *rawurl)
		os.Exit(0)
	}
	c, err := ex.ExtractURL(*rawurl)
	if err != nil {
		log.Fatalf("extract failed: %s", err)
	}
	fmt.Printf("content: %v", c)
}

LICENSE

MIT

All data in wedata are in the public domain. see also: http://wedata.net/help/about .

Special Thanks

  • Wedata project and members.

Author

Kenta Suzuki (a.k.a. suzuken)

About

Yet another HTML Extractor

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published