Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use whole html to find og tags #2

Open
eugene-kirzhanov opened this issue Jul 1, 2020 · 0 comments
Open

Use whole html to find og tags #2

eugene-kirzhanov opened this issue Jul 1, 2020 · 0 comments

Comments

@eugene-kirzhanov
Copy link

eugene-kirzhanov commented Jul 1, 2020

Some web pages contain <meta property="og:*"> tags inside body section not in head.
For example, https://www.youtube.com/watch?v=G2icJffwJLY

For now, library loads only head from HTML in JsoupHtmlFetcher, so it finds nothing in such webpages.

class JsoupHtmlFetcher : HtmlFetcher {
    override fun fetchHead(url: URL): String? {
        val connection = Jsoup.connect(url.toString())
        connection.userAgent("Mozilla")
        connection.timeout(5000)
        return try {
            connection.get()
                    .head()
                    .html()
        } catch (e: HttpStatusException) {
            null
        }
    }
}

I know this is an issue on websites, but I think it will be good to cover this exceptional case by fetching whole html before finding og tags. Or add some boolean parameter to OgMapper.process() method to allow parser use whole html.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant