UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 16383: unexpected end of data #281

xalexakis · 2018-05-03T13:51:27Z

HI,

after long searching for a solution to my problem I have to ask here for help.

I am getting the error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 16383: unexpected end of data
while trying to download an Ad Perfromance report as string in memory with the DownloadReport function or as stream with DownloadReportAsStream function.

The problem comes from the AdGroupName field that contains german characters like ä, ü, ö.

I am not receiving any error though when I use the DownloadReportAsStream function to write the report in a .csv file or when I force the decoding to be 'latin-1' with

report_data.write(chunk.decode('latin-1') if sys.version_info[0] == 3
                              and getattr(report_data, 'mode', 'w') == 'w' else chunk)

I am using Python 3, API v201802 and the locale in the container is set to be C.UTF-8.

Below are the functions which are mostly copied from AdWords API documentation.

write in csv (working):

def write_csv(client, report, customer_id):
    client.SetClientCustomerId(customer_id)
    report_downloader = client.GetReportDownloader(version='v201802')

    filename = str(customer_id) + '_' + report + '_' + min_date

    report_data = io.open(filename, 'wb')

    stream_data = report_downloader.DownloadReportAsStream(
        reports[report], skip_report_header=True, skip_column_header=False,
        skip_report_summary=True, include_zero_impressions=True)

    try:
        while True:
            chunk = stream_data.read(CHUNK_SIZE)
            if not chunk:
                break
            report_data.write(chunk.decode() if sys.version_info[0] == 3
                              and getattr(report_data, 'mode', 'w') == 'w' else chunk)
    finally:
        report_data.close()
        stream_data.close()

stream with latin-1 (working):

def stream(client, report, customer_id):
    client.SetClientCustomerId(customer_id)
    report_downloader = client.GetReportDownloader(version='v201802')

    report_data = io.StringIO()
    stream_data = report_downloader.DownloadReportAsStream(
        reports[report], skip_report_header=False, skip_column_header=False,
        skip_report_summary=False, include_zero_impressions=True)

    try:
        while True:
            chunk = stream_data.read(CHUNK_SIZE)
            if not chunk:
                break
            report_data.write(chunk.decode('latin-1') if sys.version_info[0] == 3
                              and getattr(report_data, 'mode', 'w') == 'w' else chunk)
        return report_data
    finally:
       report_data.close()
       stream_data.close()

write to string (not working):

def tostring(client, report, customer_id):
    client.SetClientCustomerId(customer_id)
    report_downloader = client.GetReportDownloader(version='v201802')
    
    report_data = io.StringIO()
    report_downloader.DownloadReport(
        reports[report], report_data, skip_report_header=True, skip_column_header=False,
        skip_report_summary=True, include_zero_impressions=True)
   
 return report_data

The text was updated successfully, but these errors were encountered:

msaniscalchi · 2018-05-03T17:34:45Z

Hello,

Thanks for the Report! I'm not able to reproduce on my end with test data, but this is an interesting case because at a glance it certainly looks like this should work. The fact that it works with encoding set to latin-1 may be one of the more useful clues. Some time ago, I recall there being a bug where lack of validation in the AdWords UI allowed invalid characters to be used in string fields such as resource names. I suspect this may be related. In other words, what you're reporting seems to strongly suggest that the data you're getting back from the API has latin-1 encoded content, which definitely sounds like a bug if that's accurate.

That said, troubleshooting this issue would probably be better handled in the AdWords API Support Forum, I suggest you reach out there. I'll leave this issue open for a while though; in the event that you determine that there is a library-related cause, feel free to follow up.

Regards,
Mark

msaniscalchi · 2018-05-08T14:53:12Z

Closing for now, but feel free to reopen if you find anything library-specific at fault.

xalexakis · 2018-05-08T15:02:40Z

OK thank you. I tried to comment but it was not possible. I have asked in the API forum and hope I will get an answer.

…

On Tue, May 8, 2018 at 4:53 PM, msaniscalchi ***@***.***> wrote: Closed #281 <#281> . — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#281 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AZoXY-cxYdxryHKvqoQpJ_a4XfxhWQITks5twbFbgaJpZM4TxH2V> .

fiboknacky · 2018-05-09T04:05:29Z

Hi Mark,

We can't reproduce this on our end, using both Python 2 and 3, so it seems not an issue on API server.
Could you re-open this and follow up, as it's probably related to Python?

Best,
Knack

msaniscalchi · 2018-05-09T13:32:39Z

O.K. Will do. I'll need to set up a few different environment configurations and see if anything can reproduce this.

xalexakis · 2018-05-09T13:39:32Z

Here is the whole error maessage in case it helps you more ``` Traceback (most recent call last): File "test.py", line 267, in <module> client_list.append(tostring(adwords_client, report_name, id)) File "test.py", line 243, in tostring skip_report_summary=True, include_zero_impressions=True) File ".../anaconda3/lib/python3.6/site-packages/googleads/common.py", line 531, in Wrapper return utility_method(*args, **kwargs) File ".../anaconda3/lib/python3.6/site-packages/googleads/adwords.py", line 1292, in DownloadReport output, **kwargs) File ".../anaconda3/lib/python3.6/site-packages/googleads/adwords.py", line 1569, in _DownloadReport and not isinstance(output, io.BytesIO)) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 16383: unexpected end of data ``` [API Forum issue]( https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/adwords-api/AkyJOLLe3aY )

…

On Wed, May 9, 2018 at 3:32 PM, msaniscalchi ***@***.***> wrote: O.K. Will do. I'll need to set up a few different environment configurations and see if anything can reproduce this. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#281 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AZoXYymyNfyFhNkrPtJPJ1aspWGJW498ks5twu_6gaJpZM4TxH2V> .

msaniscalchi · 2018-05-11T20:19:04Z

I think I now have a handle on what is going wrong here. The reason you're getting an error indicating an unexpected end of data is because... the decoder is reaching an unexpected end of data. The ReportDownloader.DownloadReport method currently doesn't take any special precautions in the event that a StringIO file-like object is provided--e.g. it is conceivable that any given chunk may truncate a multi-byte utf-8 character, causing the decode to fail.

As a work-around, if you provide a BytesIO instance and get the full report contents, you can decode that to get a string. Alternatively, you could just use the ReportDownloader.DownloadReportAsString method, which similarly gets the full content before decoding--worth noting, I also noticed that DownloadReportAsString doesn't chunk the response, meaning it could be problematic for particularly large reports.

It looks like there's a few improvements we can make here for a future release.

Regards,
Mark

kybs89 · 2018-07-05T09:14:15Z

I am getting exactly the same error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 16383: unexpected end of data

The suggested workaround of using ReportDownloader.DownloadReportAsString works, but is not optimal due to size of some larger reports.

The issue only started after i upgraded the library from version 10.0.0 to the most recent version as I migrated from API v201710 to v201806 using Python3

Regards,
Kyle

xalexakis · 2018-07-05T09:22:21Z

Hi Kyle, using the BytesIO instead of StringIO and the DownloadReport should work. I am using that with no problem Regards, Christos

…

On Thu, Jul 5, 2018 at 11:14 AM kyle-IS ***@***.***> wrote: I am getting exactly the same error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 16383: unexpected end of data The suggested workaround of using ReportDownloader.DownloadReportAsString works, but is not optimal due to size of some larger reports. The issue only started after i upgraded the library from version 10.0.0 to the most recent version as I migrated from API v201710 to v201806 using Python3 Regards, Kyle — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#281 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AZoXY2N1WJWoA6dgrYvXHyBUoGZgYXdzks5uDdjrgaJpZM4TxH2V> .

kybs89 · 2018-07-05T09:25:57Z

Ok, thanks Christos. i'll try that out!

success-m · 2018-07-05T10:33:38Z

Following code worked for me. In my case, special characters were not important to me. Hence, I removed those characters.

response = None
        try:
            output = open(path, 'a')
            response = report_downloader.DownloadReportAsStreamWithAwql(report_query, 'csv',
                                                                        skip_report_header=False,
                                                                        skip_column_header=False,
                                                                        skip_report_summary=False)
            while True:
                chunk = response.read(16 * 1024)
                if not chunk: break
                output.write(
                    ''.join(filter(lambda x: x in string.printable, chunk.decode(encoding='utf-8', errors='ignore'))))
            output.close()
        except Exception as error:
            raise Exception(error)
        finally:
            if response:
                response.close()

kennethjmyers · 2018-07-26T15:20:07Z

For anyone looking for the BytesIO method, I believe it looks something like this (please correct me if you think I've made a mistake):

bytes_buffer = BytesIO()
report_downloader.DownloadReport(report, bytes_buffer, ...)

with open(file_output, 'wb') as file:
    file.write(bytes_buffer.getvalue())
    
bytes_buffer.close()

minayaserrano · 2018-08-01T08:42:28Z

Same issue here. We upgraded from googleads 8.0.0 to googleads 12.2.0 / 13.0.0 (v201702 to v201806) and this exception raised:

Traceback (most recent call last):
  File "/home/p/keywords_report.py", line 14, in download_report
    skip_column_header=False, skip_report_summary=True)
  File "/home/p/.virtualenvs/oa-adwords-hIzJaW3w/lib/python3.6/site-packages/googleads/common.py", line 536, in Wrapper
    return utility_method(*args, **kwargs)
  File "/home/p/.virtualenvs/oa-adwords-hIzJaW3w/lib/python3.6/site-packages/googleads/adwords.py", line 1366, in DownloadReport
    output, **kwargs)
  File "/home/p/.virtualenvs/oa-adwords-hIzJaW3w/lib/python3.6/site-packages/googleads/adwords.py", line 1643, in _DownloadReport
    and not isinstance(output, io.BytesIO))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 16383: unexpected end of data

Our code (we do not use StringIO):

report_downloader = self.adwords_client.GetReportDownloader(version='v201806')
with open(csv_output, 'w') as report_file:
    try:
        report_downloader.DownloadReport(self.report, report_file, skip_report_header=True,
            include_zero_impressions=include_zero_impressions,
            skip_column_header=False, skip_report_summary=True)
    except Exception as e:
        print(f'Download failed. Exception: {e}')

In our case, exception only raises sometimes. Only for debug purposes we do a while:

report_downloader = self.adwords_client.GetReportDownloader(version='v201806')
with open(csv_output, 'w') as report_file:
    retry = True
    while retry:
        try:
            report_downloader.DownloadReport(self.report, report_file, skip_report_header=True,
                                             include_zero_impressions=include_zero_impressions,
                                             skip_column_header=False, skip_report_summary=True)
            retry = False
            print('Download success')
        except Exception as e:
            print(f'Download failed. Exception: {e}')

And this is the unexpected output:

Download fail. Exception: 'utf-8' codec can't decode byte 0xc3 in position 16383: unexpected end of data
Download fail. Exception: 'utf-8' codec can't decode byte 0xc3 in position 16383: unexpected end of data
Download fail. Exception: 'utf-8' codec can't decode byte 0xc3 in position 16383: unexpected end of data
Download fail. Exception: 'utf-8' codec can't decode byte 0xc3 in position 16383: unexpected end of data
Download success

Something weird it is happenning on ReportDownloader.DownloadReport

We can use ReportDownloader.DownloadReportAsString as a workarround (it is working), but we think is important to fix ReportDownloader.DownloadReport to do not break the client library interface.

ezhimakov · 2019-10-18T05:53:50Z

still happens

msaniscalchi added the P2 label May 3, 2018

msaniscalchi self-assigned this May 8, 2018

msaniscalchi closed this as completed May 8, 2018

msaniscalchi reopened this May 9, 2018

msaniscalchi added the bug label May 11, 2018

msaniscalchi closed this as completed Sep 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 16383: unexpected end of data #281

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 16383: unexpected end of data #281

xalexakis commented May 3, 2018

msaniscalchi commented May 3, 2018

msaniscalchi commented May 8, 2018

xalexakis commented May 8, 2018 via email

fiboknacky commented May 9, 2018

msaniscalchi commented May 9, 2018

xalexakis commented May 9, 2018 via email

msaniscalchi commented May 11, 2018

kybs89 commented Jul 5, 2018

xalexakis commented Jul 5, 2018 via email

kybs89 commented Jul 5, 2018

success-m commented Jul 5, 2018 •

edited

Loading

kennethjmyers commented Jul 26, 2018

minayaserrano commented Aug 1, 2018 •

edited

Loading

ezhimakov commented Oct 18, 2019

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 16383: unexpected end of data #281

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 16383: unexpected end of data #281

Comments

xalexakis commented May 3, 2018

msaniscalchi commented May 3, 2018

msaniscalchi commented May 8, 2018

xalexakis commented May 8, 2018 via email

fiboknacky commented May 9, 2018

msaniscalchi commented May 9, 2018

xalexakis commented May 9, 2018 via email

msaniscalchi commented May 11, 2018

kybs89 commented Jul 5, 2018

xalexakis commented Jul 5, 2018 via email

kybs89 commented Jul 5, 2018

success-m commented Jul 5, 2018 • edited Loading

kennethjmyers commented Jul 26, 2018

minayaserrano commented Aug 1, 2018 • edited Loading

ezhimakov commented Oct 18, 2019

success-m commented Jul 5, 2018 •

edited

Loading

minayaserrano commented Aug 1, 2018 •

edited

Loading