Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no data for medium.com urls from the iOS app #75

Closed
hamedb89 opened this issue Feb 16, 2019 · 5 comments
Closed

no data for medium.com urls from the iOS app #75

hamedb89 opened this issue Feb 16, 2019 · 5 comments

Comments

@hamedb89
Copy link

hamedb89 commented Feb 16, 2019

This link doesn't throw an error but also doesn't scrape open graph data.
Anybody else having this problem?

https://link.medium.com/HozE2bvBmU

edit: the link above redirects to this one: https://rsci.app.link/HozE2bvBmU?_p=f4503a44f32dc261648a177f2460

@d-ivashchuk
Copy link

Experiencing the same issue! Probably because as already pointed out it’s a mirrored link and not the final link.

@hamedb89 Were you able to solve it?

I would appreciate clarification for this one, as I saw that it should work with redirects, however medium is for sure not working!

Thanks a lot🙌🏼

@jshemas
Copy link
Owner

jshemas commented May 15, 2019

Hello,

I started digging into this. openGraphScraper uses the request library to make http requests and one of the options that I use is followAllRedirects, which should follow all of the redirects.

request docs: https://github.com/request/request#requestoptions-callback

Unfortunately this is the page I get when using request:

<html>
        <body>
                <script type="text/javascript">
                function unescapeHtml(escaped_str) {
                    var div = document.createElement('div');
                    div.innerHTML = escaped_str;
                    var child = div.childNodes[0];
                    return child ? child.nodeValue : null;
                }

                function validateProtocol(url) {
                        var parser = document.createElement('a');
                    parser.href = url;
                    var protocol = parser.protocol.toLowerCase();
                        if ([ 'javascript:',  'vbscript:',  'data:', 'ftp:',':' , ' '].indexOf(protocol) < 0) {
                                return url;
                        }
                        return null;
                }

                function validate(url) {
                        var unescaped_value = unescapeHtml(url);
                        if (unescaped_value && validateProtocol(unescaped_value)) {
                                return unescaped_value;
                        }
                        return '/';
                }                       var hasURI = false;
                        var intervalExecuted = false;
                        window.onload = function() {
                                document.getElementById("l").src = validate("null/p/17dc61071ed5?link_click_id=656998272080551846");
                                window.setTimeout(function() {
                                        if (!hasURI) {
                                                window.top.location = validate("https://medium.com/@racheltho/techs-long-hours-are-discriminatory-counter-productive-17dc61071ed5?source=linkShare-5037b396cb47-1550350532&amp;_branch_match_id=656998272080551846");
                                        }
                                        intervalExecuted = true;
                                }, 300);
                        };

                        window.onblur = function() {
                                hasURI = true;
                        };

                        window.onfocus = function() {
                                if (hasURI) {
                                        window.top.location = validate("https://medium.com/@racheltho/techs-long-hours-are-discriminatory-counter-productive-17dc61071ed5?source=linkShare-5037b396cb47-1550350532&amp;_branch_match_id=656998272080551846");
                                } else if(intervalExecuted) {
                                        window.top.location = validate("https://medium.com/@racheltho/techs-long-hours-are-discriminatory-counter-productive-17dc61071ed5?source=linkShare-5037b396cb47-1550350532&amp;_branch_match_id=656998272080551846");
                                }
                        }
                </script>
                <iframe id="l" width="1" height="1" style="visibility:hidden"></iframe>
        </body>
</html>

Request can't run javascript, so there is no way of getting to the final page. To get this to work I would have to rewrite openGraphScraper to use headless chrome or something like that.

@d-ivashchuk
Copy link

How I solved this issue: just grabbed the body of the page with axios and with regex found the normal url which then can be processed by ogs. This is a good example of middleware function.

@hamedb89
Copy link
Author

@d-ivashchuk Great idea Dimitri, I will try it out as soon as I can get to it. Great idea though thanks!

@jshemas
Copy link
Owner

jshemas commented May 30, 2020

Yes, using the html option in ogs is great way to solve this problem. Closing the issue.

@jshemas jshemas closed this as completed May 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants