Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default WAYBACK_BASEURL may be incorrect #31

Closed
detrout opened this issue Jan 19, 2017 · 5 comments
Closed

default WAYBACK_BASEURL may be incorrect #31

detrout opened this issue Jan 19, 2017 · 5 comments

Comments

@detrout
Copy link

detrout commented Jan 19, 2017

I installed brozzler via pip and launched it with brozzler-easy in a Debian Jessie VM and was able to scrape a site. (Brozzler 1.1b8, pywb 0.33.0, python3.4)

However the default page links in the dashboard on the job detail page were pointing to http://localhost:8091/brozzler/ As far as I can tell there was nothing started by default listening on port 8091.

After some investigation I found there was something listening on port 8880 that looked like a wayback process, so I tried launching brozzler-easy like this:

WAYBACK_BASEURL=http://192.168.122.152:8880/brozzler brozzler-easy -d warc/ --dashboard-address 0.0.0.0

(the ip addresses were so I could use my regular browser instead of the VM browser to use the site)

Doing that allowed the wayback links to work, but the thumbnail & screenshot urls are still 404ing.

@detrout
Copy link
Author

detrout commented Jan 20, 2017

Looks like the thumbnails may need VNC, and I hadn't installed that.

nlevitt added a commit that referenced this issue Jan 21, 2017
…ozzler dashboard (#31); tweak arg parsing related stuff
@nlevitt
Copy link
Contributor

nlevitt commented Jan 21, 2017

Thanks for the report. c3b637d should fix the WAYBACK_BASEURL mismatch.

Doing that allowed the wayback links to work, but the thumbnail & screenshot urls are still 404ing.

I still need to code up some pywb support for thumbnail and screenshot urls. I'll leave this issue open to track that.

Looks like the thumbnails may need VNC, and I hadn't installed that.

The vnc thing is for watching the brozzler-controlled browsers in action. You can kinda see how to set that up if you look at ansible/roles/brozzler-worker. It almost but doesn't work out of the box with the vagrant setup (iirc because the vagrant vm's idea of its hostname is not resolvable from outside). In any case, it's not related to the archived thumbnails. (n.b. 8091 vs 8901)

@detrout
Copy link
Author

detrout commented Jan 22, 2017 via email

@nlevitt
Copy link
Contributor

nlevitt commented Jan 23, 2017

I eventually found code in brozzler where it looks like you need to enable warcprox features to get screenshots, but I was archiving a big site didn't get a chance to see if flipping that switch made it work

Oh, yeah, that too. But even if you get the screenshots I don't think replay will work.

nlevitt added a commit that referenced this issue Jan 24, 2017
* master:
  restore ping_timeout argument to WebSocketApp.run_forever to fix problem of leaking websocket receiver threads hanging forever on select()
  missed a spot
  improve brozzler-dashboard logging; fix default wayback baseurl in brozzler dashboard (#31); tweak arg parsing related stuff
  avoid js errors in case site or job is not configured to keep stats
  add travis-ci slack notification to internetarchive/brozzler channel
@nlevitt
Copy link
Contributor

nlevitt commented Jan 31, 2017

Added support for screenshot: and thumbnail: urls.
5c68477
Resolving this issue.

@nlevitt nlevitt closed this as completed Jan 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants