Skip to content
This repository has been archived by the owner on Jul 14, 2020. It is now read-only.

TypeError: the JSON object must be str, not 'bytes' #51

Open
paladini opened this issue May 21, 2017 · 5 comments
Open

TypeError: the JSON object must be str, not 'bytes' #51

paladini opened this issue May 21, 2017 · 5 comments

Comments

@paladini
Copy link

I have this issue using comment scraper for public pages. I've filled all variables correctly (app_id, app_secret and page id), have run the post scraper before and it finished successfully.

Following you can see the full error log:

$ python3 get_fb_comments_from_fb.py
Scraping <OMMITED> Comments From Posts: 2017-05-21 15:51:37.768667

Traceback (most recent call last):
  File "get_fb_comments_from_fb.py", line 220, in <module>
    scrapeFacebookPageFeedComments(file_id, access_token)
  File "get_fb_comments_from_fb.py", line 147, in scrapeFacebookPageFeedComments
    comments = json.loads(request_until_succeed(url))
  File "/usr/lib/python3.5/json/__init__.py", line 312, in loads
    s.__class__.__name__))
TypeError: the JSON object must be str, not 'bytes'

The page I'm scraping has posts and comments written in Brazilian Portuguese (PT-BR).

@paladini
Copy link
Author

If anyone is having the same issue, I've found how to fix that! Just change the following code from the comments scraper:

def request_until_succeed(url):
    req = Request(url)
    success = False
    while success is False:
        try:
            response = urlopen(req)
            if response.getcode() == 200:
                success = True
        except Exception as e:
            print(e)
            time.sleep(5)

            print("Error for URL {}: {}".format(url, datetime.datetime.now()))
            print("Retrying.")

    return response.read()

To this one (i've added .decode('utf-8') before returning the value):

    req = Request(url)
    success = False
    while success is False:
        try:
            response = urlopen(req)
            if response.getcode() == 200:
                success = True
        except Exception as e:
            print(e)
            time.sleep(5)

            print("Error for URL {}: {}".format(url, datetime.datetime.now()))
            print("Retrying.")

    return response.read().decode('utf-8')

Now it's working fine here, but don't know if it's reliable for everyone, so I'm not going to submit a pull request with this fix.

@minimaxir
Copy link
Owner

The script does encoding/decoding shenanigans in order to be compatible with both Python 2 and 3. I will have to check if that solution will work for Python 2.

@paladini
Copy link
Author

Thanks for the fast reply, @minimaxir !

@Mika15
Copy link

Mika15 commented May 31, 2017

Guys, again I have an issue with paging. Cannot figure out why it is happening. Can you help me? Thanks!
`---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
in ()
176
177 if name == 'main':
--> 178 scrapeFacebookPageFeedStatus(group_id, access_token)

in scrapeFacebookPageFeedStatus(group_id, access_token)
160 if 'paging' in statuses:
161 next_url = statuses['paging']['next']
--> 162 until = re.search('until=([0-9]*?)(&|$)', next_url).group(1)
163 if until is None:
164 return None

AttributeError: 'NoneType' object has no attribute 'group'`

@nxy
Copy link

nxy commented Sep 24, 2017

@paladini thanks worked for me

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants