New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] BrowserFetcher not working on Android #180
Comments
Sure. Could you provide the url you want to scrape from? EDIT: found url in Screenshot. Will try tomorrow or at least on Monday |
ok i just checked. beside from that i could imagine a solution like this, which will give you a map with the category name as key and a list of entries as value: data class Entry(
val name: String,
val amount: String,
val unit: String,
val percentDv: String,
)
fun getNuts() = skrape(BrowserFetcher) {
request {
url = "https://nutritiondata.self.com/facts/nut-and-seed-products/3086/2"
timeout = 20_000
}
response {
htmlDocument {
"#NutritionInformationSlide .m-t13" {
findAll {
map {
it.div {
withAttribute = "align" to "center"
findFirst {
text
}
} to it.div {
withClass = "clearer"
findAll {
associate { entry ->
Entry(
name = entry.div {
withClass = "nf1"
0 { text }
},
amount = entry.div {
withClass = "nf2"
0 { text }
},
unit = entry.div {
withClass = "nf3"
0 { text }
},
percentDv = entry.div {
withClass = "nf4"
0 { text }
}
)
}
}
}
}
}
}
}
}
}
fun main() {
getNuts().forEach(::println)
} which is printing:
|
a bit refactored version / breaking things down into functions: data class Entry(
val name: String,
val amount: String,
val unit: String,
val percentDv: String,
)
private val DocElement.categoryName: String
get() = div {
withAttribute = "align" to "center"
findFirst {
text
}
}
private fun DocElement.textOf(className: String) = div {
withClass = className
0 { text }
}
private val DocElement.entries: List<Entry>
get() = div {
withClass = "clearer"
findAll {
map { entry ->
Entry(
name = entry.textOf("nf1"),
amount = entry.textOf("nf2"),
unit = entry.textOf("nf3"),
percentDv = entry.textOf("nf4"),
)
}
}
}
private fun getNuts(): Map<String, List<Entry>> = skrape(BrowserFetcher) {
request {
url = "https://nutritiondata.self.com/facts/nut-and-seed-products/3086/2"
timeout = 20_000
}
response {
htmlDocument {
"#NutritionInformationSlide .m-t13" {
findAll {
associate {
it.categoryName to it.entries
}
}
}
}
}
}
fun main() {
getNuts().forEach(::println)
} |
Wow! Thank you so much! I will analise your code and try it it out. I'm still pretty new to programming and new to website scrapping and this personal project is very important to me, and you helped me a lot! Have a nice week 💯 👍 |
I get a similar error as #163
|
This is a problem of html-unit library running on Android (which skrape{it} is internally using to implement the BrowserFetcher) |
@christian-draeger this dude was able to solve it. You said that skrape it used html-unit, maybe if you use the new snap shot he mentioned maybe it fixes it HtmlUnit/htmlunit#444 (comment) |
I will try tomorrow. If all tests will succeed we can ask @rbri to make it an actual release since for now it's just a snapshot version release. What means the implementation of this version could change at any time. Therefore we should wait for an official html unit release including the fix. I already asked what's the status of the snapshot / if we can expect an official release soon here :) But as said, I will try to verify if the snapshot version in general will fix our problems here :) Looking forward to be able to use the BrowserFetcher on Android soon 🎉 |
yes please give me a sign, will update the readme and make a release if it works for you |
Ok, so I added
But now I have error Then I invalidated caches and restart and then it worked.
but I think it was because the website was slow, so it magically disappeared when I increased the timeout to 40s or it was a coincidence idk, but even before the 40s, now I'm having the error bellow. I've been running the code in several ways and testing diferent stuff, the error bellow is now gone appearently, and the present error is the one above
I also tried just using "clearer m-t13" for the Css selector search thing instead of "#NutritionInformationSlide .m-t13" and also didnt work |
HtmlUnit/htmlunit#444 (comment) OK but that are great news because it means the fetching and rendering of the site in general is working. Now it's probably just about finding the correct css selectors. i am waiting for HtmlUnit/htmlunit-android#1 to make an official release including the fix to make BrowserFetcher work on android |
Hey, im new to this, can you help me get the HTML of a whole page, and if you can, also help me parse it into objects?
I basically want to get all the nutritional information in these tables.
And I also need to make sure 100g is selected
Here is my code But it's not working, I get error "No static field INSTANCE..." I'm using your code @here
The text was updated successfully, but these errors were encountered: