Skip to content
This repository has been archived by the owner on Dec 9, 2018. It is now read-only.

Dryscrape to login via Facebook. #37

Open
noppanit opened this issue Oct 6, 2015 · 13 comments
Open

Dryscrape to login via Facebook. #37

noppanit opened this issue Oct 6, 2015 · 13 comments

Comments

@noppanit
Copy link

noppanit commented Oct 6, 2015

I'm trying to use Dryscrape to login via Facebook. But I get these error.

Traceback (most recent call last):
  File "/Users/noppanit/.virtualenvs/envpy3/lib/python3.4/site-packages/webkit_server.py", line 420, in __init__
    self._port = int(re.search(b"port: (\d+)", output).group(1))
AttributeError: 'NoneType' object has no attribute 'group'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "facebook_scraper.py", line 40, in <module>
    sess = dryscrape.Session(base_url = 'https://www.facebook.com')
  File "/Users/noppanit/.virtualenvs/envpy3/lib/python3.4/site-packages/dryscrape/session.py", line 22, in __init__
    self.driver = driver or DefaultDriver()
  File "/Users/noppanit/.virtualenvs/envpy3/lib/python3.4/site-packages/dryscrape/driver/webkit.py", line 30, in __init__
    super(Driver, self).__init__(**kw)
  File "/Users/noppanit/.virtualenvs/envpy3/lib/python3.4/site-packages/webkit_server.py", line 230, in __init__
    self.conn = connection or ServerConnection()
  File "/Users/noppanit/.virtualenvs/envpy3/lib/python3.4/site-packages/webkit_server.py", line 507, in __init__
    self._sock = (server or get_default_server()).connect()
  File "/Users/noppanit/.virtualenvs/envpy3/lib/python3.4/site-packages/webkit_server.py", line 450, in get_default_server
    _default_server = Server()
  File "/Users/noppanit/.virtualenvs/envpy3/lib/python3.4/site-packages/webkit_server.py", line 427, in __init__
    raise WebkitServerError("webkit-server failed to start. Output:\n" + err)
webkit_server.WebkitServerError: webkit-server failed to start. Output:
dyld: Library not loaded: @rpath/./libQtWebKit.4.dylib
  Referenced from: /Users/noppanit/.virtualenvs/envpy3/lib/python3.4/site-packages/webkit_server
  Reason: image not found

Here's the code I'm using.

import dryscrape

# make sure you have xvfb installed
dryscrape.start_xvfb()

# set up a web scraping session
sess = dryscrape.Session(base_url = 'https://www.facebook.com')

# we don't need images
sess.set_attribute('auto_load_images', False)

# visit homepage and search for a term
sess.visit('/')
q = sess.at_xpath('//*[@id="email"]')
q.set('email')
q = sess.at_xpath('//*[@id="pass"]')
q.set("password")
login_button = sess.at_xpath('//*[@id="u_0_x"]')
login_button.click()

# save a screenshot of the web page
sess.render('facebook.png')
print("Screenshot written to 'facebook.png'")
@trendsetter37
Copy link
Contributor

The login field has an id of u_0_v not u_0_x like you have in your code.

Try this:
login_button = sess.at_xpath('//*[@id="u_0_v"]')

@niklasb
Copy link
Owner

niklasb commented Oct 7, 2015

It sounds like you have a problem with your Qt installation. I just used Homebrew: brew install qt and it worked. You potentially have to install dryscrape after this.

What @trendsetter37 said might also be a problem, especially if the ID is randomized, in which case you need a different XPath expression to select the button.

@trendsetter37
Copy link
Contributor

@niklasb Yea I have ran into that before with randomized id's on login pages. However, I checked Facebook's login button and it was static as far as I could tell.

@noppanit
Copy link
Author

@trendsetter37 Thanks for the reply. I uninstall dryscrape using pip uninstall dryscrape and ran brew install qt and reinstall dryscrape again but I still see the same issue.

@niklasb
Copy link
Owner

niklasb commented Oct 12, 2015

Hi @noppanit. I haven't seen this error before. Which version of Mac OS X are you using? And which version of Qt was installed by Homebrew?

@noppanit
Copy link
Author

@niklasb My OSX version is 10.10.5 and Qt is qt-4.8.7_1

@ghost
Copy link

ghost commented Dec 15, 2015

hi @noppanit,
Were you able to solve this? I'm having the same issue as well.

@noppanit
Copy link
Author

@KickingHorse no I wasn't able to solve it. I just gave up. It looks like Facebook doesn't like scraping.

@niklasb
Copy link
Owner

niklasb commented Dec 19, 2015

Does the libQtWebKit.4.dylib file even exist on your computers @KickingHorse @noppanit ?

@KidDisco
Copy link

It does. The problem I was having was my code would work from the command line but the IDEs I was using (PyCharm and Spyder) kept complaining about the error above :

dyld: Library not loaded: @rpath/./libQtWebKit.4.dylib
  Referenced from: /Users/noppanit/.virtualenvs/envpy3/lib/python3.4/site-packages/webkit_server
  Reason: image not found 

Took me forever to figure it out, but eventually had to do something like this :

import os
os.environ.putenv('DYLD_FALLBACK_LIBRARY_PATH', '/Users/name/anaconda/lib/')

This ultimately fixed the problem for me.

This is where the file libQtWebKit.4.dylib was actually stored after installing from PyCharm.
I'm not sure where brew -install qt put these files as I was never able to find them.

@dcguim
Copy link

dcguim commented Sep 12, 2016

I just tryed to run a simple dryscrape.Session(url) and got the same error.
Just as @noppanit said, I got:
dyld: Library not loaded: /usr/local/opt/qt/lib/QtWebKit.framework/Versions/4/QtWebKit
I am on a Mac OS similar to @noppanits but i dont have Homebrew installed, I am trying to use Mac Ports only.
When installing qt4-mac from mac ports, it does not install qt under /usr/local/opt but rather /opt/local/libexec
So I created a symlink in /usr/local/opt to libexec:
ln -s /opt/local/libexec/qt4 /usr/local/opt/qt
But dryscrape does not find the dyld file cause there is no such path.
Once I created the symlink, I searched for the missing /usr/local/opt/qt/lib/QtWebKit.framework/Versions/4/QtWebKit
But there`s none as I said:

$ ls /usr/local/opt/qt/lib | grep 'QtWebKit'
libQtWebKit.4.9.7.dylib
libQtWebKit.4.9.dylib
libQtWebKit.4.dylib
libQtWebKit.dylib
libQtWebKit.prl

Dryscrape is searching the following file:

$ port contents qt4-mac | grep 'QtWebKit.framework.*/4/QtWebKit'
  /opt/local/libexec/qt4/Library/Frameworks/QtWebKit.framework/Versions/4/QtWebKit

Thats why that symlink wont work, any ideas in how to solve this qt-dryscrape relationship?

@dcguim
Copy link

dcguim commented Oct 14, 2016

Actually, there is a way to symlink it correctly but it is necessary to symlink each Qt 4 framework separately.

/usr/local/opt/qt/lib $:sudo ln -s /opt/local/libexec/qt4/Library/Frameworks/QtCore.framework/ QtCore.framework
/usr/local/opt/qt/lib $:sudo ln -s /opt/local/libexec/qt4/Library/Framework/QtGui.framework/ QtGui.framework
/usr/local/opt/qt/lib $:sudo ln -s /opt/local/libexec/qt4/Library/Framework/QtNetwork.framework/ QtNetwork.framework 
/usr/local/opt/qt/lib $:sudo ln -s /opt/local/libexec/qt4/Library/Framework/QtWebKit.framework/ QtWebKit.framework 

This way it`s not necessary to download homebrew, which is nice cause there are some who use Fink or MacPorts and would rather avoid installing several package managers.

@hehez
Copy link

hehez commented Dec 10, 2016

currectly, the login_button = sess.at_xpath('//*[@id="u_0_l"]'), but I think you could try q.form().submit() instead of finding the login button

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants