-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor fbscraper.py to remove facepy dependency #73
Conversation
@ghostwriternr - Scraping part is working so far. 👍 @DefCon-007 - There are warnings like Insecure Platform Warning and SNI Missing Warning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ghostwriternr - Just came across this error :
@athityakumar No, These warnings were not there before. And I think they are because of requests module and not due to facebook. |
@athityakumar I will look into the reason for those warnings. Funnily enough, I didn't encounter this when I was testing the code. Could this be because of difference in Python (and pip package) versions? UPDATE: Looks like that is indeed the issue. A quick google search says this is a security issue observed in Python versions < 2.7.9 and is commonly fixed by using Also, I'm not able to replicate the error you have pointed out in the screenshot. The line in question is the exception handler for file opening, and I have not edited any of the code in the function call stack till that point. Can you please tell me when does this occur? I might have missed a test case. |
@ghostwriternr - I'm using python 2.7, and both - the warning & error were due to different python versions. No FileNotFoundError for Python 2.7 : http://stackoverflow.com/questions/26745283/how-do-i-import-filenotfounderror-from-python-3 The UPDATE : |
@icyflame @DefCon-007 : Python |
@athityakumar : It will be better if naarad can become cross compatible with python 2 and python 3. How about we create a new issue for it ? |
@DefCon-007 : Sure. But we don't want the server executing the cron to be confused about this. |
Opened issue #74 . Do test this PR on |
If this is working fine, should I go ahead with fixing issue #49 ? Or do we fix issue #74 first before moving forward? It should be pretty straightforward now since graph API's And we could also decide on a rough metric to decide how far back the posts do we traverse to check for edits. Such a simple traversal to, say 10-15 posts, should be enough for our purpose. A comparison based on hash values using ETags might be an overkill (and also ineffective) for our current scraping style. |
@ghostwriternr - #49 is the thing that's going to matter for a user. So, let's do that first. Range of 10-15 posts seems logical - no sane group would edit a post that is older than that. And we would also need a script that deletes duplicate posts in all existing json files. |
@athityakumar : Noted. Will incorporate both when I'm done with my classes tomorrow. |
There's not much sense in making this cross compatible. A better choice would be to use Python 3. If you are having to add dependencies (like python version wise requirements.txt) or go out of your way to support Python 2, then I would suggest that you do NOT do it, and stick to Python 3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good PR, with a very good premise. The diff right now is too cluttered with these identical blocks appearing in the diff because of changed indentation, and it's hard to look at what has changed, and what has been moved forward by one space. Fixing this should be the way to go forward.
fbscraper.py
Outdated
DateTime[0]['real_date'], | ||
event['place']['name']) | ||
return message | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the exact same code that's appearing in the diff because this file was apparently using 3 spaces (??) as it's previous default indent.
@ghostwriternr: In any case, this has cluttered the diff too much, and I can't really figure what are your changes, and what changes are because of this diff.
3 spaces is not a good idea, but I would suggest that you keep that PR different from this. i.e. merge this PR in and follow it up with a PR that fixes all the indentation and does NOT touch any of the code.
The problem with the current approach is that git blame will falsely accuse @ghostwriternr of writing these lines of code when he actually only shifted them forward by 1 space.
fbscraper.py
Outdated
item['real_date'] = local_date.strftime('%d-%m-%Y') | ||
item['real_time'] = local_date.strftime('%I:%M%p') | ||
return data | ||
for item in data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, the indentation has caused these identical blocks to appear in the diff.
facepy's fork that has been in active use in naarad has not been updated in a while, and falls way behind vanilla Graph API in terms of functionality as well as documentation. Directly interfacing with the Graph API makes it easier to work with the API, and helps keeping the code simple enough for future contributors by not adding another abstraction layer in the middle. Notes: - Regenerated requirements.txt using 'pipreqs' (Please do check if the list is exhaustive)
52933b9
to
6591e7b
Compare
@athityakumar @icyflame @DefCon-007 : In case you missed the notification for the updated commit 3 days back. Do let me know if it needs any more changes 😄 |
Just to echo @icyflame's comment: Supporting Python 2 or cross compatibility is not a (big) priority. |
fbscraper.py
Outdated
|
||
graph = GraphAPI(access_token) | ||
s = requests.Session() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't feel like a good name for a session that's being used throughout the file. Maybe you should use something like session
or req_session
, that way the req_session.get
call would make more sense at first glance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@icyflame : Aye aye sir! That's a very valid point. Updated accordingly to req_session
.
@ghostwriternr Patch LGTM. One pretty minor aesthetic change, the diff is a lot cleaner now, and it was easier to review. @DefCon-007 you should ensure that this builds and updates the page properly before merging of course. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is working fine.
facepy's fork that has been in active use in naarad has not been updated
in a while, and falls way behind vanilla Graph API in terms of
functionality as well as documentation. Directly interfacing with the
Graph API makes it easier to work with the API, and helps keeping the
code simple enough for future contributors by not adding another
abstraction layer in the middle.
Notes:
list is exhaustive)