[BUG] element extraction methods like `$`, el, element and elements not found #90

pedramkeyani · 2020-04-29T01:23:03Z

Describe the bug
The documentation for extracting data from a website is out of date and does not compile.

Code Sample
import it.skrape.extract import it.skrape.selects.$` <-- is not in the selects package and doesn't compile
import it.skrape.selects.el <-- is not in the selects package and doesn't compile
import it.skrape.skrape

data class MyScrapedData(
val userName: String,
val repositoryNames: List
)

fun main() {
val githubUserData = skrape {
url = "https://github.com/skrapeit"

    extract {
        MyScrapedData(
                userName = el(".h-card .p-nickname").text(),
                repositoryNames = `$`("span.repo").map { it.text() }
        )
    }
}
println("${githubUserData.userName}'s repos are ${githubUserData.repositoryNames}")

}`

Expected behavior
I've tried all but the most basic examples to learn the different components of scraping. selects.element and selects.elements are also used in the examples but they don't appear to be in the code. This very well could be a problem with how I have or haven't configured intellij.

The text was updated successfully, but these errors were encountered:

pedramkeyani · 2020-04-29T02:31:35Z

I also should note that I'm using Intellij and have tried this with it.skrape:skrapeit-core:RELEASE as well as 6 alpha, 4 alpha and 4.1 alpha.

christian-draeger · 2020-04-29T06:05:49Z

Hey,
the methods you described has been available at really early versions of skrape{it}. They are not supported anymore in the versions you described.
Please let me know it there is someexame in the docs that is still showing the old syntax, so it can be changed.

New syntax(alpha6)
https://github.com/skrapeit/skrape.it#parse-html-and-extract-it

Important to note:
Please use this dependency:
https://github.com/skrapeit/skrape.it#add-dependency

Please let me know if this worked for you.

pedramkeyani · 2020-04-29T06:34:54Z

Hi Christian, here is the document that has the old style. Also it allws you to select between all the versions of the API except for the newer ones (6 alpha) https://docs.skrape.it/docs/dsl/extracting-data-from-websites

I also found examples in the documentation of using selects.element and selects.elements but I can't find it right now. I'll look around for it.

christian-draeger · 2020-04-29T07:00:22Z

Ok thanks. I will update the examples soon. Thx for pointing this out. Let me know if the other (newer example in the readme) works for you :)

pedramkeyani · 2020-04-29T07:16:41Z

Thanks for following up. The example in the readme worked (from what I remember yesterday). One of the challenges I am having is how to use the different features to scrape a little more elegently. I stumbled around the code and figured out how to use withClass and then rawCssSelector which was more powerful but I'm still not able to figure out how to iterate through dom elements (for all the elements of a list and map their hrefs to the text). One example of something I need to do is visit dom elements and grab multiple pieces of information to put into an object and store that for later. It may be easier to go through the details on slack so I messages you on the kotlinlang channel.


/*
                    li {
                        rawCssSelector = "div.jokes-nav > ul > li"

                         links.addAll(findAll { eachText() })
                    }
*/

                    a {
                        rawCssSelector  = " div.jokes-nav > ul > li > a"

                        findAll {
                            println(eachHref())
                            println(eachText())

                        }
                    }
                }```

christian-draeger · 2020-04-29T07:19:47Z

Ok cool, I will close this one for now

skrapeit · 2020-04-29T09:33:20Z

related to #91 - solution that have been discussed on the Kotlin slack to extract all links including its text and href until #91 has been released:

fun main() {
    val allNavLinks = skrape {
        url = "http://www.laughfactory.com/jokes"
        extract {
            htmlDocument {
                ".jokes-nav a" {
                    withAttributeKey = "href"
                    findAll {
                        associate { it.text to it.attribute("href") }
                    }

                }
            }
        }
    }
    println(allNavLinks)
}

prints:
{Popular Jokes=http://www.laughfactory.com/jokes/popular-jokes, Latest Jokes=http://www.laughfactory.com/jokes/latest-jokes, Joke of the Day=http://www.laughfactory.com/jokes/joke-of-the-day, Animal Jokes=http://www.laughfactory.com/jokes/animal-jokes, Blonde Jokes=http://www.laughfactory.com/jokes/blonde-jokes, Boycott These Jokes=http://www.laughfactory.com/jokes/boycott-these-jokes, Clean Jokes=http://www.laughfactory.com/jokes/clean-jokes, Family Jokes=http://www.laughfactory.com/jokes/family-jokes, Food Jokes=http://www.laughfactory.com/jokes/food-jokes, Holiday Jokes=http://www.laughfactory.com/jokes/holiday-jokes, How to be Insulting=http://www.laughfactory.com/jokes/how-to-be-insulting, Insult Jokes=http://www.laughfactory.com/jokes/insult-jokes, Miscellaneous Jokes=http://www.laughfactory.com/jokes/miscellaneous-jokes, National Jokes=http://www.laughfactory.com/jokes/national-jokes, Office Jokes=http://www.laughfactory.com/jokes/office-jokes, Political Jokes=http://www.laughfactory.com/jokes/political-jokes, Pop Culture Jokes=http://www.laughfactory.com/jokes/pop-culture-jokes, Racist Jokes=http://www.laughfactory.com/jokes/racist-jokes, Relationship Jokes=http://www.laughfactory.com/jokes/relationship-jokes, Religious Jokes=http://www.laughfactory.com/jokes/religious-jokes, School Jokes=http://www.laughfactory.com/jokes/school-jokes, Science Jokes=http://www.laughfactory.com/jokes/science-jokes, Sex Jokes=http://www.laughfactory.com/jokes/sex-jokes, Sexist Jokes=http://www.laughfactory.com/jokes/sexist-jokes, Sports Jokes=http://www.laughfactory.com/jokes/sports-jokes, Technology Jokes=http://www.laughfactory.com/jokes/technology-jokes, Word Play Jokes=http://www.laughfactory.com/jokes/word-play-jokes, Yo Momma Jokes=http://www.laughfactory.com/jokes/yo-momma-jokes}

what it’s doing:
defining the http request (skrape methods scope)
do the call (extract methods scope)
deserialize the html body that was received from extracts response
define a selector that matches all elements that are matching the following css-query selector --> “.jokes-nav a[href]”
get all elements that matches the selector as list (findAll method)
extract the elements text and href values to a map (using kotlins build-in associate function)

:) hope this helps

as an alternativ this will also work ()if you don’t like the string invokation like ".jokes-nav a" {}

fun main() {
    val allNavLinks = skrape {
        url = "http://www.laughfactory.com/jokes"
        extract {
            htmlDocument {
                div {
                    withClass = "jokes-nav"
                    a {
                        withAttributeKey = "href"
                        findAll {
                            associate { it.text to it.attribute("href") }
                        }
                    }
                }
            }
        }
    }
    println(allNavLinks)
}

both solutions are perfectly fine to use and i think just a matter of taste. the first one is using a plain css-selector and an invokes it, the second one is build the selector by using the DSL which will be more readable if you have more complex selectors (under the hood skrape{it} will make a selector string out of it again)

pedramkeyani added the bug Something isn't working label Apr 29, 2020

pedramkeyani assigned skrapeit Apr 29, 2020

christian-draeger changed the title ~~[BUG]~~ [BUG] e Apr 29, 2020

christian-draeger changed the title ~~[BUG] e~~ [BUG] element extraction methods like $, el, element and elements not found Apr 29, 2020

christian-draeger closed this as completed Apr 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] element extraction methods like `$`, el, element and elements not found #90

[BUG] element extraction methods like `$`, el, element and elements not found #90

pedramkeyani commented Apr 29, 2020

pedramkeyani commented Apr 29, 2020

christian-draeger commented Apr 29, 2020 •

edited

pedramkeyani commented Apr 29, 2020

christian-draeger commented Apr 29, 2020

pedramkeyani commented Apr 29, 2020 •

edited by skrapeit

christian-draeger commented Apr 29, 2020

skrapeit commented Apr 29, 2020 •

edited

[BUG] element extraction methods like $, el, element and elements not found #90

[BUG] element extraction methods like $, el, element and elements not found #90

Comments

pedramkeyani commented Apr 29, 2020

pedramkeyani commented Apr 29, 2020

christian-draeger commented Apr 29, 2020 • edited

pedramkeyani commented Apr 29, 2020

christian-draeger commented Apr 29, 2020

pedramkeyani commented Apr 29, 2020 • edited by skrapeit

christian-draeger commented Apr 29, 2020

skrapeit commented Apr 29, 2020 • edited

[BUG] element extraction methods like `$`, el, element and elements not found #90

[BUG] element extraction methods like `$`, el, element and elements not found #90

christian-draeger commented Apr 29, 2020 •

edited

pedramkeyani commented Apr 29, 2020 •

edited by skrapeit

skrapeit commented Apr 29, 2020 •

edited