Skip to content

An Elixir library to extract useful data from HTML documents, suitable for web scraping.

Notifications You must be signed in to change notification settings

mindflayer/select.ex

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Select Build Status

An Elixir library to extract useful data from HTML documents, suitable for web scraping.

Basic Usage

iex(1)> nodes = Select.parse """
...(1)>   <html>
...(1)>     <head>
...(1)>       <title>A Title</title>
...(1)>     </head>
...(1)>     <body>
...(1)>       <h1>An H1</h1>
...(1)>       <!-- A Comment -->
...(1)>       <ul>
...(1)>         <li>A List Item</li>
...(1)>         <!-- Another Comment -->
...(1)>         <li>Another List Item</li>
...(1)>       </ul>
...(1)>       A Text Node
...(1)>     </body>
...(1)>   </html>
...(1)> """
[{"html", %{},
  [{"head", %{}, [{"title", %{}, ["A Title"]}]},
   {"body", %{},
    [{"h1", %{}, ["An H1"]}, {:comment, " A Comment "},
     {"ul", %{},
      [{"li", %{}, ["A List Item"]}, {:comment, " Another Comment "},
       {"li", %{}, ["Another List Item"]}]}, "\n      A Text Node\n    "]}]}]
iex(2)> nodes |> Select.find({:name, "li"})
[{"li", %{}, ["A List Item"]}, {"li", %{}, ["Another List Item"]}]
iex(3)> nodes |> Select.find(:text)
["A Title", "An H1", "A List Item", "Another List Item",
 "\n      A Text Node\n    "]
iex(4)> nodes |> Select.find(:comment)
[comment: " A Comment ", comment: " Another Comment "]
iex(5)> nodes |> Select.find({:or, {:name, "li"}, :text})
["A Title", "An H1", {"li", %{}, ["A List Item"]}, "A List Item",
 {"li", %{}, ["Another List Item"]}, "Another List Item",
 "\n      A Text Node\n    "]

For more available matchers, check out the tests.

License

MIT

About

An Elixir library to extract useful data from HTML documents, suitable for web scraping.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Elixir 100.0%