Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken Scrapers #123

Closed
bnkai opened this issue Aug 9, 2020 · 223 comments
Closed

Broken Scrapers #123

bnkai opened this issue Aug 9, 2020 · 223 comments

Comments

@bnkai
Copy link
Collaborator

bnkai commented Aug 9, 2020

Any issues with scrapers not working should be mentioned here
The name of the scraper, the xpath or part not working would be appretiated.


Known Issues

  • IAFD may need a couple of tries to scrape (CF detection issues)
  • nhentai scraper is broken ( blocked/detected by site / CF ?)

updated 2022-09-25

@bnkai bnkai pinned this issue Aug 9, 2020
@Belleyy
Copy link
Contributor

Belleyy commented Aug 21, 2020

Look like the JavLibrary scraper can be broken sometimes.
You get the DDOS Protection Cloudflare that block it (You normally need to wait 5sec to be redirected to the site.)
I try with useCDP don't fix it.

Idea:
Javlibrary have mirror/clone, maybe it would be good to have a option if it's fail, it change the url and try with these site.
Exemple all are the same:

https://www.javlibrary.com/en/?v=javlilbj7e
https://www.m45e.com/en/?v=javlilbj7e
https://www.u44r.com/en/?v=javlilbj7e
https://www.g46e.com/en/?v=javlilbj7e

But i don't think it would be useful for other scraper.

@bnkai
Copy link
Collaborator Author

bnkai commented Sep 2, 2020

@brumouta thanks for the feedback welivetogether,babes now are moved to a separate one
edit added momsbang,momslickteens and propertysex also

@budislov
Copy link
Contributor

budislov commented Sep 6, 2020

RealityKings has some more broken domains:
bellesafilms.com, danejones.com, lesbea.com and sexyhub.com only parse the image. Will work fine if they are moved to RealityKingsOL

@bnkai
Copy link
Collaborator Author

bnkai commented Sep 6, 2020

Thanks for the feedback @budislov
The relevant scrapers have been updated

@budislov
Copy link
Contributor

Looks like RealityKingsOL is broken. Tried to scrap from both babes.com and bellesafilms.com and only the tags came through. It appears that the div classes used in the scrapper have changed. Will investigate further.

@bnkai
Copy link
Collaborator Author

bnkai commented Sep 11, 2020

Pending PR is available for RealityKingsOL and Brazzers
relevant PRs merged

@Ziatexataor
Copy link

iafd.com performer scraper not working

@Belleyy Belleyy mentioned this issue Oct 6, 2020
@bnkai
Copy link
Collaborator Author

bnkai commented Oct 6, 2020

IAFD fixed , thanks for the report @Ziatexataor and for the fix @Belleyy

@malibustacynewhat
Copy link

TransSensual.yml seems to be broken. Tested with new and older scenes and can't pull the data

@bnkai
Copy link
Collaborator Author

bnkai commented Oct 11, 2020

@malibustacynewhat thanks for the report
The relevant PR by @Belleyy fixes the issue

@mmenanno
Copy link
Contributor

JAVLibrary is broken https://github.com/stashapp/CommunityScrapers/blob/master/scrapers/javlibrary.yml

Looks to be a Cloudflare error but using the CDP driver didn't resolve it for me when testing:

<!DOCTYPE html><html lang="en-US"><head>
  <meta charset="UTF-8"/>
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
  <meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1"/>
  <meta name="robots" content="noindex, nofollow"/>
  <meta name="viewport" content="width=device-width,initial-scale=1"/>
  <title>Just a moment...</title>
  <style type="text/css">
    html, body {width: 100%; height: 100%; margin: 0; padding: 0;}
    body {background-color: #ffffff; color: #000000; font-family:-apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", Roboto, Oxygen, Ubuntu, "Helvetica Neue",Arial, sans-serif; font-size: 16px; line-height: 1.7em;-webkit-font-smoothing: antialiased;}
    h1 { text-align: center; font-weight:700; margin: 16px 0; font-size: 32px; color:#000000; line-height: 1.25;}
    p {font-size: 20px; font-weight: 400; margin: 8px 0;}
    p, .attribution, {text-align: center;}
    #spinner {margin: 0 auto 30px auto; display: block;}
    .attribution {margin-top: 32px;}
    @keyframes fader     { 0% {opacity: 0.2;} 50% {opacity: 1.0;} 100% {opacity: 0.2;} }
    @-webkit-keyframes fader { 0% {opacity: 0.2;} 50% {opacity: 1.0;} 100% {opacity: 0.2;} }
    #cf-bubbles > .bubbles { animation: fader 1.6s infinite;}
    #cf-bubbles > .bubbles:nth-child(2) { animation-delay: .2s;}
    #cf-bubbles > .bubbles:nth-child(3) { animation-delay: .4s;}
    .bubbles { background-color: #f58220; width:20px; height: 20px; margin:2px; border-radius:100%; display:inline-block; }
    a { color: #2c7cb0; text-decoration: none; -moz-transition: color 0.15s ease; -o-transition: color 0.15s ease; -webkit-transition: color 0.15s ease; transition: color 0.15s ease; }
    a:hover{color: #f4a15d}
    .attribution{font-size: 16px; line-height: 1.5;}
    .ray_id{display: block; margin-top: 8px;}
    #cf-wrapper #challenge-form { padding-top:25px; padding-bottom:25px; }
    #cf-hcaptcha-container { text-align:center;}
    #cf-hcaptcha-container iframe { display: inline-block;}
  </style>

    <meta http-equiv="refresh" content="12"/>
<script type="text/javascript">
  //<![CDATA[
  (function(){
    
    window._cf_chl_opt={
      cvId: "1",
      cType: "non-interactive",
      cNounce: "90957",
      cRay: "5e0bb321ef7bca98",
      cHash: "da202b537a470c2",
      cFPWv: "g",
      cRq: {
        ru: "aHR0cDovL3d3dy5qYXZsaWJyYXJ5LmNvbS9lbi8/dj1qYXZtZXpiZTNh",
        ra: "TW96aWxsYS81LjAgKE1hY2ludG9zaDsgSW50ZWwgTWFjIE9TIFggMTBfMTVfNSkgQXBwbGVXZWJLaXQvNTM3LjM2IChLSFRNTCwgbGlrZSBHZWNrbykgQ2hyb21lLzgzLjAuNDEwMy4xMDYgU2FmYXJpLzUzNy4zNg==",
        rm: "R0VU",
        d: "q4jiR7WSBtf4fLzLz9igfZOdIxwSKG18lkM8oKJ2oB8n30GM2iyW8aiQ9atzUZsOBOiOCY1F45Ok0xoQE9LhBiZfXlfVJaHdOBUlqNu1cCbboEIdvJX1FuypXHYYwXjfaKTC2p4xeTL5nAkfqvaQqkt1H/1p0rqFLGuv5JXJ3gBxB6Y/uALdxdsFi+lSlCG6Qe3X2Lj+WYyKl3todU7QjK8vUNythAJOrMTlR1fGrfbfXESvY4tSMJo7OEhwZymfB+AKhpzlHeTcuo+T40qfUHcXUDFRZCqSIvBynJ532Jn2bbqiZ1XffuBhRCVhBxK+kkJ9NurfuchvBr0bA3lk+Dnyykdr0hUr5lE34hioN0t6bDwXnGSBMCsX40Hx6TDDQa+utstnZqYk3G1jtYupvATJXzjvxhaNDHgOwHJomiUip/glK6aw52FuNwxXEj7ZJmdJPg4omti3B/1l7wy5+Z1rERc/nHgZE2JBxsOMDFpFXx6oNX/ZCk1//+mIVxGVFfNCBIGI1eyIKCP6LkCcsw1+aeO2YHmOzBkz9Ebx3drg5ouDQU0bmnNNsuh6vtMZ2eydA3b8y1H2mfO+UoUwB7Ej5u0cR1gJGbuSHpK+imsOFpqmwJdDPhqXYl5xcy6nVCnU2xeyqXJP/HMHGjU4h3Op/vlZKIuhtqFPC6Guk0FIUbFTI4JGMG7u3UwcuuYUrnmYXFX1vupeVrqsjsRFJXnqRhnWc+EJ62b3QYIqf/pFpb/eKU8DpE4wKEmd05vkzLCS1DZQ29AxACho6Zf0brScVV2/qvY5qVsNlk9QCSJdmmR7eyfAPju4BoRmFWdRVEymwQHM7raS1XGdZvcFDw==",
        t: "MTYwMjQ1MjAwOS4yNzIwMDA=",
        m: "jAJ8FygcOeJXMeDg2+r+pIbPCZvv7uD3AA/cCQ2MIkQ=",
        i1: "2tfaQpq68/qtCUW9AL9YZA==",
        i2: "DT4KsCiUsfsu8FZXKRmHjg==",
        uh: "TprDV0CpLyfpdzs+8x+WX/Btsv1e+OQLx8NzEGjSfMY=",
        hh: "3htzUBXaqug0moZaVaRPWNYG1rRQQxdDndKhxQafs0M=",
      }
    }
    window._cf_chl_enter = function(){window._cf_chl_opt.p=1};
    
    var a = function() {try{return !!window.addEventListener} catch(e) {return !1} },
    b = function(b, c) {a() ? document.addEventListener("DOMContentLoaded", b, c) : document.attachEvent("onreadystatechange", b)};
    b(function(){
      var cookiesEnabled=(navigator.cookieEnabled)? true : false;
      var cookieSupportInfix=cookiesEnabled?'/nocookie':'/cookie';
      var a = document.getElementById('cf-content');a.style.display = 'block';
      var isIE = /(MSIE|Trident\/|Edge\/)/i.test(window.navigator.userAgent);
      var trkjs = isIE ? new Image() : document.createElement('img');
      trkjs.setAttribute("src", "/cdn-cgi/images/trace/jschal/js"+cookieSupportInfix+"/transparent.gif?ray=5e0bb321ef7bca98");
      trkjs.id = "trk_jschal_js";
      trkjs.setAttribute("alt", "");
      document.body.appendChild(trkjs);
      
      var cpo = document.createElement('script');
      cpo.type = 'text/javascript';
      cpo.src = "/cdn-cgi/challenge-platform/h/g/orchestrate/jsch/v1";
      var done = false;
      cpo.onload = cpo.onreadystatechange = function() {
        if (!done && (!this.readyState || this.readyState === "loaded" || this.readyState === "complete")) {
          done = true;
          cpo.onload = cpo.onreadystatechange = null;
          window._cf_chl_enter()
        }
      };
      document.getElementsByTagName('head')[0].appendChild(cpo);
    
    }, false);
  })();
  //]]>
</script>


</head>
<body>
  <div style="display: none;"><a href="http://bt50.org/nonalignedfrequent.php?pl=0">table</a></div><table width="100%" height="100%" cellpadding="20">
    <tbody><tr>
      <td align="center" valign="middle">
          <div class="cf-browser-verification cf-im-under-attack">
  <noscript>
    <h1 data-translate="turn_on_js" style="color:#bd2426;">Please turn JavaScript on and reload the page.</h1>
  </noscript>
  <div id="cf-content" style="display:none">
    
    <div id="cf-bubbles">
      <div class="bubbles"></div>
      <div class="bubbles"></div>
      <div class="bubbles"></div>
    </div>
    <h1><span data-translate="checking_browser">Checking your browser before accessing</span> javlibrary.com.</h1>
    
    <div id="no-cookie-warning" data-translate="turn_on_cookies" style="display:none">
      <p data-translate="turn_on_cookies" style="color:#bd2426;">Please enable Cookies and reload the page.</p>
    </div>
    <p data-translate="process_is_automatic">This process is automatic. Your browser will redirect to your requested content shortly.</p>
    <p data-translate="allow_5_secs">Please allow up to 5 seconds…</p>
  </div>
   
  <form class="challenge-form" id="challenge-form" action="/en/?v=javmezbe3a&amp;__cf_chl_jschl_tk__=c44b146f044ddd9d0b23bf4928759e99e7ddef0e-1602452009-0-Ab8hTl3noYmOwwAWI1D0d_6zhaYO-4vHBJD8JW4VCFmZKjqal-xVCdpCdbztfKStCEp8QJa2ganoOGB_Jnq-Qwtu6BnG7zySJxaY_Oc54OgSHPG3Mt1wJ-nYfmFjU8ShDtM6t2VT15V5I0rsRAGRc5RZPs1OE8Vi3aozMxTjxatgWYLmnk0ozVyDVudpWURh7xhqtqs9M9vv_jAfqIUgHIwFe1MVURVaxrV4jOsccyGYHvJ8ZLFmpzrqf8LPPa2N3M1SG-T4vUDhsLgjgeIkfOC6_U3zZBVNKUY8HU47JaiTLjHHnOMHfzeA4iz76Sb2MQ" method="POST" enctype="application/x-www-form-urlencoded">
    <input type="hidden" name="r" value="c77fde06d76dccbdf1aa275a6824657ec7878994-1602452009-0-AefHkw7YBHV4yapfdyGgNFofr2bk+ZNLsmu1vxzyTAyFPQickf2DVbsdFnOKYI9Zs5D6PO21kZcj5siVtnYOhmEJ7HOBLBCp4lS+GBW8iyR62pXG9ezmP6Fu4qRomUkK8uCSsqveohhquzDEYroSgMpZT0eIJXFIprAfC6uIux7NSx6mo8wGMKFoW3TJJFmAN4FKgZdHpkLShowC8AaRocTx86yZzOOrEywJ5CGsOzw5vNg4GvS4gK6MB+pR3iKfGRnXamisWHrWYZWDyfiGHOfcD8LmcCWzeIEMfD+nADV4477P2jWOHIDvEqtS7Yi0G3qKvH16LmR28qALhOLv8PAhv2GBzp8EOUcdXkJfFN1Jloqm5JU2eoCn/5uBxE0xl80s8Xfaa9vhkhqRicv3XnmHpJRhXgNvauGiYLcmaJ0189RtB6eEhZ6j1N9o9pfstDcSa00ur7vPLgDCd2AqiVrVz8SG8zb+8L+wlfrTaBCIlAiecjoTFLHTPEZW2V4eaVYzY9ECAb69YOhnGBhUXDiDk8wjSLZv8uZYMIxwW+jEsdzAtJ9TkMq5VXrE/sORd24lamS6K3Lr8g9BasZTjJdR3Omni9UmlQVaVDXUIPQBAb6x1nhf57/47lvWjDgrjuEw47NDosN3IHSDoyKYUMg="/>
    <input type="hidden" value="3715604b2b146b25182bb17d479ebda2" id="jschl-vc" name="jschl_vc"/>
    <!-- <input type="hidden" value="" id="jschl-vc" name="jschl_vc"/> -->
    <input type="hidden" name="pass" value="1602452013.272-qzCPIXiuVG"/>
    <input type="hidden" id="jschl-answer" name="jschl_answer"/>
  </form>
  
  <div id="trk_jschal_nojs" style="background-image:url(&#39;/cdn-cgi/images/trace/jschal/nojs/transparent.gif?ray=5e0bb321ef7bca98&#39;)"> </div>
</div>

          
          <div class="attribution">
            DDoS protection by <a href="https://www.cloudflare.com/5xx-error-landing/" target="_blank">Cloudflare</a>
            <br/>
            <span class="ray_id">Ray ID: <code>5e0bb321ef7bca98</code></span>
          </div>
      </td>
     
    </tr>
  </tbody></table>


</body></html>

@Ziatexataor
Copy link

teamskeet.com
not working

@bnkai
Copy link
Collaborator Author

bnkai commented Nov 6, 2020

Teamskeet only works for a single query and then cloudflare blocks the ip i think.
Not much can be done

for javlibrary with the last update you can change the url to one of the mirrors and it should work

@SpedNSFW
Copy link
Contributor

SpedNSFW commented Nov 8, 2020

Vixen Network sites now require you to login when opening a scene page, thus the scraper no longer works.

@Belleyy
Copy link
Contributor

Belleyy commented Nov 8, 2020

Vixen Network sites now require you to login when opening a scene page, thus the scraper no longer works.

Already solved in discord, but for other people:
If you are in performer page, the link to the scene will have members. in the ULR (https://members.tushy.com/inauguration)
Just remove the members. to get to the scene. 😃

@Threak
Copy link

Threak commented Nov 17, 2020

teenfidelity.com doesn't work (part of /scrapers/KellyMadisonMedia.yml)
the comment states the first scraping attempt should set a cookie, the second attempt should work, but it doesn't

@bnkai
Copy link
Collaborator Author

bnkai commented Nov 17, 2020

@Threak are you sure you setup cdp correctly? Just tried and it seems to work. The first request has something to do with their site protection not necessary a cookie.
You can append this at the end of the scraper file , refresh the scrapers

debug:
  printHTML: true

and have a look at the log so that you can see what the site returns to stash.

@Belleyy
Copy link
Contributor

Belleyy commented Nov 17, 2020

@bnkai I think there is a difference between headless chromium and using normal chrome.
@Threak What CDP do you use, Headless chrome or a chromium executable ?

I use a chromium executable and this scraper don't work for me like Teamskeet scraper, so i think there is a difference between headless and classic.

@bnkai
Copy link
Collaborator Author

bnkai commented Nov 18, 2020

@Belleyy you might be right
I am using a headless chrome docker container so that might be it.
Teamskeet works only for 1-2 queries max but teendfidelity works ok after first query

@bnkai
Copy link
Collaborator Author

bnkai commented Nov 18, 2020

@Belleyy upon futher investigation it seems that the docker container method maintains some cookies which i assume the executable one doesn't.
@Belleyy @Threak can you try the the stash version from this PR stashapp/stash#934 (download links below) with this scraper file https://pastebin.com/UBuHFkfm? (make sure to removethe old scraper file )

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

100    46  100    46    0     0     35      0  0:00:01  0:00:01 --:--:--    35

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

100 43.7M  100   157  100 43.7M      6  1890k  0:00:26  0:00:23  0:00:03  206k

stash-osx uploaded to url: "https://gofile.io/d/lq6J3w"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

100    46  100    46    0     0     35      0  0:00:01  0:00:01 --:--:--    35

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

100 40.5M  100   161  100 40.5M     39   9.9M  0:00:04  0:00:04 --:--:--  9.9M

stash-win.exe uploaded to url: "https://gofile.io/d/DozNwz"

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

100    46  100    46    0     0     29      0  0:00:01  0:00:01 --:--:--    29

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current

                                 Dload  Upload   Total   Spent    Left  Speed

100 41.2M  100   159  100 41.2M      5  1376k  0:00:31  0:00:30  0:00:01     0

stash-linux uploaded to url: "https://gofile.io/d/5Tl0hi"

First make sure to set the log level to debug. Then do a scrape. After the scrape get the nats values that are printed in the log and replace in the yml file the Value: "" entries. Do a refresh scrapers from stash and the scraper should work for pornfidelity. As a bonus you can set CDP to false as it no longer seems to be needed ( Use it first though to verify that all works ok with the plain chrome executable ) .

@Belleyy
Copy link
Contributor

Belleyy commented Nov 18, 2020

@bnkai Just tested with few scene and it work 👍 (With & Without CDP)

Edit: I just found that the chromium process was still in background, will try it more later to know if i was doing something wrong or it's a issue to your PR. Can't reproduce it 🤷‍♂️ .

@bnkai
Copy link
Collaborator Author

bnkai commented Nov 18, 2020

@Belleyy this seems to verify what i thought , we'll probably have to update the scraper to mention that a CDP remote instance is required (plain executable is not enough) till the cookies PR is merged.

@JDRanpariya
Copy link

JDRanpariya commented Dec 28, 2020

I have following 4 errors regrading feild cookies.
time="2020-12-28T20:50:20+05:30" level=error msg="Error loading scraper C:\\Users\\...\\.stash\\scrapers\\Colette.yml: yaml: unmarshal errors:\n line 63: field cookies not found in type scraper.scraperDriverOptions"

time="2020-12-28T20:50:20+05:30" level=error msg="Error loading scraper C:\\Users\\...\\.stash\\scrapers\\KellyMadisonMedia.yml: yaml: unmarshal errors:\n line 42: field cookies not found in type scraper.scraperDriverOptions"

time="2020-12-28T20:50:20+05:30" level=error msg="Error loading scraper C:\\Users\\...\\.stash\\scrapers\\javdb.yml: yaml: unmarshal errors:\n line 89: field cookies not found in type scraper.scraperDriverOptions"

time="2020-12-28T20:50:20+05:30" level=error msg="Error loading scraper C:\\Users\\...\\.stash\\scrapers\\mgstage.yml: yaml: unmarshal errors:\n line 29: field cookies not found in type scraper.scraperDriverOptions"

@Belleyy
Copy link
Contributor

Belleyy commented Dec 28, 2020

@JDRanpariya Are you using the dev build ? This scraper need a version of stash >= v0.4.0-14.

@JDRanpariya
Copy link

JDRanpariya commented Dec 28, 2020

I'm using following build
https://github.com/stashapp/stash/releases/tag/v0.4.0

The 24 Nov one

@bnkai
Copy link
Collaborator Author

bnkai commented Dec 29, 2020

@JDRanpariya you need to switch to a recent dev version as stated in the scrapers list v0.4.0-14 at least for cookie support. The one you have doesnt support that as its v0.4.0 ( 14 commits older that what you need)

@bkbd3177
Copy link

@Maista6969 Thank you! I'm not sure how to reopen an issue, so I left a comment on the one you referenced.

@LeGrosFromage
Copy link

DesperateAmateurs:

I have the scraper installed (Stash has been restarted a few times since it was installed) but on a scene, clicking "Scrape with..." DA will not be in the list, pasting in a DA URL and clicking the white download/scrape button returns nothing at all (no dialog box, no message, no data) and clicking "Scrape with URL" from the "Scrape with..." list displays the message "No scenes found" - so either I'm doing something completely wrong or the scraper is 100% borked... could be either. Or both. The scraper is listed as the current/latest version.

@smcallah
Copy link
Contributor

I just installed the scraper through the community scrapers installer in v0.25.0 and it is able get data when entering a DA URL and clicking the scrape button.

Make sure you are clicking the reload scraper button, as well as refreshing the browser window where you are attempting to scrape the URL on a scene. I had to refresh my scene page or else the scrape button was greyed out.

DesperateAmateurs:

I have the scraper installed (Stash has been restarted a few times since it was installed) but on a scene, clicking "Scrape with..." DA will not be in the list, pasting in a DA URL and clicking the white download/scrape button returns nothing at all (no dialog box, no message, no data) and clicking "Scrape with URL" from the "Scrape with..." list displays the message "No scenes found" - so either I'm doing something completely wrong or the scraper is 100% borked... could be either. Or both. The scraper is listed as the current/latest version.

@LeGrosFromage
Copy link

I just installed the scraper through the community scrapers installer in v0.25.0 and it is able get data when entering a DA URL and clicking the scrape button.

Make sure you are clicking the reload scraper button, as well as refreshing the browser window where you are attempting to scrape the URL on a scene. I had to refresh my scene page or else the scrape button was greyed out.

DesperateAmateurs:
I have the scraper installed (Stash has been restarted a few times since it was installed) but on a scene, clicking "Scrape with..." DA will not be in the list, pasting in a DA URL and clicking the white download/scrape button returns nothing at all (no dialog box, no message, no data) and clicking "Scrape with URL" from the "Scrape with..." list displays the message "No scenes found" - so either I'm doing something completely wrong or the scraper is 100% borked... could be either. Or both. The scraper is listed as the current/latest version.

I've:
Reloaded scrapers and restarted Stash.
Quit/restarted browser.
The button is available, but no data is returned. When you click it you get the "busy circle" for 0.5 seconds then it disappears.

@Maista6969
Copy link
Collaborator

I just installed the scraper through the community scrapers installer in v0.25.0 and it is able get data when entering a DA URL and clicking the scrape button.
Make sure you are clicking the reload scraper button, as well as refreshing the browser window where you are attempting to scrape the URL on a scene. I had to refresh my scene page or else the scrape button was greyed out.

DesperateAmateurs:
I have the scraper installed (Stash has been restarted a few times since it was installed) but on a scene, clicking "Scrape with..." DA will not be in the list, pasting in a DA URL and clicking the white download/scrape button returns nothing at all (no dialog box, no message, no data) and clicking "Scrape with URL" from the "Scrape with..." list displays the message "No scenes found" - so either I'm doing something completely wrong or the scraper is 100% borked... could be either. Or both. The scraper is listed as the current/latest version.

I've: Reloaded scrapers and restarted Stash. Quit/restarted browser. The button is available, but no data is returned. When you click it you get the "busy circle" for 0.5 seconds then it disappears.

What URL are you scraping? This could be a networking issue, but we can rule out the scraper itself being broken: I have tested it with several scenes now (like this one) and it works as expected. The only problem I can see is that the URL pattern in this scraper is too liberal: it will accept any URL that contains desperateamateurs.com/, but that covers a lot of URLs that aren't scrapable scenes 😅 I've pushed a fix for that in 75d5337

Can you use other scrapers without any issues or is this the first/only one you've tried?

@Eviepayne
Copy link

Having an issue with dc-onlyfans scraper

2024-03-22   20:47:59 Info     Retrieved latest version: v0.25.1 (bf7cb78d)
2024-03-22   20:47:50 Info     Retrieved latest version: v0.25.1 (bf7cb78d)
2024-03-22   20:29:15 Error    scrapeSingleScene: input: scrapeSingleScene scraper dc-onlyfans: could not unmarshal json from script output: EOF
2024-03-22   20:29:15 Error    could not unmarshal json from script output: EOF
2024-03-22   20:29:15 Error    [Scrape / DC Onlyfans] FileNotFoundError: [Errno 2] No such file or directory: 'data/vaultshare/OF/defiantpanda/Posts/Free'
2024-03-22   20:29:15 Error    [Scrape / DC Onlyfans]                 ^^^^^^^^^^^^^^^^
2024-03-22   20:29:15 Error    [Scrape / DC Onlyfans]     for name in os.listdir(self):
2024-03-22   20:29:15 Error    [Scrape / DC Onlyfans]   File "/usr/lib/python3.11/pathlib.py", line 932, in iterdir
2024-03-22   20:29:15 Error    [Scrape / DC Onlyfans]     for child in p.iterdir():
2024-03-22   20:29:15 Error    [Scrape / DC Onlyfans]   File "/opt/of/scrapers/community/dc-onlyfans/dc-onlyfans.py", line 159, in <module>
2024-03-22   20:29:15 Error    [Scrape / DC Onlyfans] Traceback (most recent call last):
2024-03-22   20:25:31 Error    scrapeSingleScene: input: scrapeSingleScene scraper dc_onlyfans_fansdb: scraper script error: exit status 1
2024-03-22   20:25:31 Error    [Scrape / DC OnlyFans (FansDB)] Could not find username or network in path: data/vaultshare/OF/defiantpanda/Posts/Free/Videos/0h1c8tuxrptucffl1cmvx_source.mp4
2024-03-22   20:25:20 Info     Version v0.25.1 (bf7cb78d) is already the latest released
2024-03-22   20:25:19 Info     stash is running at http://localhost:999/
2024-03-22   20:25:19 Info     stash is listening on 0.0.0.0:999
2024-03-22   20:25:19 Info     stash version: v0.25.1 (bf7cb78d) - Official Build - 2024-03-13 03:32:11
2024-03-22   20:25:19 Info     [InitHWSupport] Supported HW codecs:
2024-03-22   20:25:19 Info     using config file: /opt/of/config.yml

@Maista6969
Copy link
Collaborator

Having an issue with dc-onlyfans scraper

2024-03-22   20:47:59 Info     Retrieved latest version: v0.25.1 (bf7cb78d)
2024-03-22   20:47:50 Info     Retrieved latest version: v0.25.1 (bf7cb78d)
2024-03-22   20:29:15 Error    scrapeSingleScene: input: scrapeSingleScene scraper dc-onlyfans: could not unmarshal json from script output: EOF
2024-03-22   20:29:15 Error    could not unmarshal json from script output: EOF
2024-03-22   20:29:15 Error    [Scrape / DC Onlyfans] FileNotFoundError: [Errno 2] No such file or directory: 'data/vaultshare/OF/defiantpanda/Posts/Free'
2024-03-22   20:29:15 Error    [Scrape / DC Onlyfans]                 ^^^^^^^^^^^^^^^^
2024-03-22   20:29:15 Error    [Scrape / DC Onlyfans]     for name in os.listdir(self):
2024-03-22   20:29:15 Error    [Scrape / DC Onlyfans]   File "/usr/lib/python3.11/pathlib.py", line 932, in iterdir
2024-03-22   20:29:15 Error    [Scrape / DC Onlyfans]     for child in p.iterdir():
2024-03-22   20:29:15 Error    [Scrape / DC Onlyfans]   File "/opt/of/scrapers/community/dc-onlyfans/dc-onlyfans.py", line 159, in <module>
2024-03-22   20:29:15 Error    [Scrape / DC Onlyfans] Traceback (most recent call last):
2024-03-22   20:25:31 Error    scrapeSingleScene: input: scrapeSingleScene scraper dc_onlyfans_fansdb: scraper script error: exit status 1
2024-03-22   20:25:31 Error    [Scrape / DC OnlyFans (FansDB)] Could not find username or network in path: data/vaultshare/OF/defiantpanda/Posts/Free/Videos/0h1c8tuxrptucffl1cmvx_source.mp4
2024-03-22   20:25:20 Info     Version v0.25.1 (bf7cb78d) is already the latest released
2024-03-22   20:25:19 Info     stash is running at http://localhost:999/
2024-03-22   20:25:19 Info     stash is listening on 0.0.0.0:999
2024-03-22   20:25:19 Info     stash version: v0.25.1 (bf7cb78d) - Official Build - 2024-03-13 03:32:11
2024-03-22   20:25:19 Info     [InitHWSupport] Supported HW codecs:
2024-03-22   20:25:19 Info     using config file: /opt/of/config.yml

This scraper is very particular about file structures: your folder is named OF but the scraper expects a folder named exactly OnlyFans

@Eviepayne
Copy link

Eviepayne commented Mar 23, 2024

I still can't seem to get it working.
Any ideas? I wish the errors were more verbose and useful

2024-03-22 22:36:05 Error   scrapeSingleScene: input: scrapeSingleScene scraper dc-onlyfans: could not unmarshal json from script output: EOF
2024-03-22 22:36:05 Error   could not unmarshal json from script output: EOF
2024-03-22 22:36:05 Error   [Scrape / DC Onlyfans]                 ^^^^^^^^^^^^^^^^
2024-03-22 22:36:05 Error   [Scrape / DC Onlyfans]     for name in os.listdir(self):
2024-03-22 22:36:05 Error   [Scrape / DC Onlyfans] FileNotFoun Error: [Errno 2] No such file or directory: 'data/vaultshare/OnlyFans/defiantpanda/Posts/Free'
2024-03-22 22:36:05 Error   [Scrape / DC Onlyfans]     for child in p.iterdir():
2024-03-22 22:36:05 Error   [Scrape / DC Onlyfans]   File "/opt/of/scrapers/community/dc-onlyfans/dc-onlyfans.py", line 159, in <module>
2024-03-22 22:36:05 Error   [Scrape / DC Onlyfans] Traceback (most recent call last):
2024-03-22 22:36:05 Error   [Scrape / DC Onlyfans]   File "/usr/lib/python3.11/pathlib.py", line 932, in iterdir

@Eviepayne
Copy link

Got it working.
Thanks to the help of Maista on the discord they directed me to Fanscrape

@LeGrosFromage
Copy link

I Want Clips:
Trying to scrape either a scene or a performer times out after about 30 seconds with:
Response: Not successful Returned status code:504

@Maista6969
Copy link
Collaborator

I Want Clips: Trying to scrape either a scene or a performer times out after about 30 seconds with: Response: Not successful Returned status code:504

I am unable to reproduce this so the scraper isn't broken. Status code 504 is gateway timeout so it's definitely a networking issue, but not necessarily something you can do something about. It could just be a transient problem that will pass on its own 🙂

@LeGrosFromage
Copy link

I Want Clips: Trying to scrape either a scene or a performer times out after about 30 seconds with: Response: Not successful Returned status code:504

I am unable to reproduce this so the scraper isn't broken. Status code 504 is gateway timeout so it's definitely a networking issue, but not necessarily something you can do something about. It could just be a transient problem that will pass on its own 🙂

That's fair. Thanks for looking at it.

@Tany9696
Copy link

scraper Brazzers: error running scraper script
plz help! i cant Scrap brazzers scenes

@Maista6969
Copy link
Collaborator

scraper Brazzers: error running scraper script plz help! i cant Scrap brazzers scenes

Need more info to be able to help with this, but first: please look at the README for some manual steps required to use Python scrapers at this time

@Tany9696
Copy link

Tany9696 commented Apr 16, 2024 via email

@Maista6969
Copy link
Collaborator

Thanks for reply I installed python but brazzers scrap not work .in scene i click on edit and clicking on scrape with and brazzers and it showed error running scraper script i tried another site but same things happend

Can you check the logs at Debug level to see what's going wrong? I can't see your screen from where I'm sitting

@MyDirtyAccount
Copy link

The X-Art scraper was recently updated to support galleries by @Ksrx01 in #1698 (thanks!). It returns blank details on some galleries, due to inconsistent HTML structures by the studio.

In Bohemian Rhapsody and First Loves, the description is on the paragraph inside the one with ID desc:

<p id="desc"><p>It's a "Bohemian [...] Colette</p></p>
<p id="desc"><p>Chelsea is [...] so cute!</p></p>

The XPath expression is on line 39:

    gallery:
      Title: //div[@class="small-12 medium-12 large-6 columns info"]/h1[@class="show-for-large-up"]
      Details: //div[@class="small-12 medium-12 large-6 columns info"]/p[@id="desc"]
      Date:
        selector: //div[@class="small-12 medium-12 large-6 columns info"]/h2[1]/text()

@Ksrx01
Copy link
Contributor

Ksrx01 commented Apr 24, 2024

The X-Art scraper was recently updated to support galleries by @Ksrx01 in #1698 (thanks!). It returns blank details on some galleries, due to inconsistent HTML structures by the studio.

Noticed that issue too, shortly after updating it.
Unfortunately I didn't have the time to take a proper look. I had a few instances where it wasn't simply a nested P, some had DIV too.

@Maista6969
Copy link
Collaborator

In Bohemian Rhapsody and First Loves, the description is on the paragraph inside the one with ID desc:

The descriptions are actually adjacent to the paragraph with the ID desc! I couldn't find any galleries that had div elements like @Ksrx01 described but I'd love it if I had some examples.

In the meantime I've pushed a fix that ensures that we can scrape the full description for galleries on X-Art 🙂

@Ksrx01
Copy link
Contributor

Ksrx01 commented Apr 25, 2024

In Bohemian Rhapsody and First Loves, the description is on the paragraph inside the one with ID desc:

The descriptions are actually adjacent to the paragraph with the ID desc! I couldn't find any galleries that had div elements like @Ksrx01 described but I'd love it if I had some examples.

In the meantime I've pushed a fix that ensures that we can scrape the full description for galleries on X-Art 🙂

Thanks! Unfortunately I can't remember which galleries I had issued with.

@MyDirtyAccount
Copy link

In the meantime I've pushed a fix that ensures that we can scrape the full description for galleries on X-Art 🙂

Confirmed fixed! Thanks!

@thickconfusion
Copy link

I'm having an issue with Redgifs

ERRO[2024-04-26 19:54:50] [Scrape / Redgifs] HTTP Error: 404
ERRO[2024-04-26 19:54:50] [Scrape / Redgifs] Traceback (most recent call last):
ERRO[2024-04-26 19:54:50] [Scrape / Redgifs] File "/root/.stash/scrapers/community/Redgifs/Redgifs.py", line 183, in
ERRO[2024-04-26 19:54:50] [Scrape / Redgifs] result = json.dumps([scraper.getParseId(id)])
ERRO[2024-04-26 19:54:50] [Scrape / Redgifs] ^^^^^^^^^^^^^^^^^^^^^^
ERRO[2024-04-26 19:54:50] [Scrape / Redgifs] File "/root/.stash/scrapers/community/Redgifs/Redgifs.py", line 124, in getParseId
ERRO[2024-04-26 19:54:50] [Scrape / Redgifs] gif = req.get("gif")
ERRO[2024-04-26 19:54:50] [Scrape / Redgifs] ^^^^^^^
ERRO[2024-04-26 19:54:50] [Scrape / Redgifs] AttributeError: 'NoneType' object has no attribute 'get'
ERRO[2024-04-26 19:54:50] could not unmarshal json from script output: EOF
ERRO[2024-04-26 19:54:50] scrapeSingleScene: input: scrapeSingleScene error while name scraping with scraper Redgifs: could not unmarshal json from script output: EOF

@andbigdata
Copy link

IAFD updated their URL for performers. It used to be "https://www.iafd.com/person.rme/perfid=" and it is now "https://www.iafd.com/person.rme/id=" I didn't check all parts of the scraper and all of the data returned from the scraper looks correct.

@Maista6969
Copy link
Collaborator

IAFD updated their URL for performers. It used to be "https://www.iafd.com/person.rme/perfid=" and it is now "https://www.iafd.com/person.rme/id=" I didn't check all parts of the scraper and all of the data returned from the scraper looks correct.

Thank you for bringing this up, I've pushed a new version of the scraper YAML that will let it trigger on the new patterns as well as the old since those will redirect to the new and still scrape fine 🙂

@Maista6969
Copy link
Collaborator

Closing this in favor of creating individual issues for broken scrapers: if you've come here to report a broken scraper, please open a new issue

@Maista6969 Maista6969 unpinned this issue May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests