Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 16383: unexpected end of data #281

Closed
xalexakis opened this issue May 3, 2018 · 14 comments
Assignees
Labels

Comments

@xalexakis
Copy link

HI,

after long searching for a solution to my problem I have to ask here for help.

I am getting the error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 16383: unexpected end of data
while trying to download an Ad Perfromance report as string in memory with the DownloadReport function or as stream with DownloadReportAsStream function.

The problem comes from the AdGroupName field that contains german characters like ä, ü, ö.

I am not receiving any error though when I use the DownloadReportAsStream function to write the report in a .csv file or when I force the decoding to be 'latin-1' with

report_data.write(chunk.decode('latin-1') if sys.version_info[0] == 3
                              and getattr(report_data, 'mode', 'w') == 'w' else chunk)

I am using Python 3, API v201802 and the locale in the container is set to be C.UTF-8.

Below are the functions which are mostly copied from AdWords API documentation.

write in csv (working):

def write_csv(client, report, customer_id):
    client.SetClientCustomerId(customer_id)
    report_downloader = client.GetReportDownloader(version='v201802')

    filename = str(customer_id) + '_' + report + '_' + min_date

    report_data = io.open(filename, 'wb')

    stream_data = report_downloader.DownloadReportAsStream(
        reports[report], skip_report_header=True, skip_column_header=False,
        skip_report_summary=True, include_zero_impressions=True)

    try:
        while True:
            chunk = stream_data.read(CHUNK_SIZE)
            if not chunk:
                break
            report_data.write(chunk.decode() if sys.version_info[0] == 3
                              and getattr(report_data, 'mode', 'w') == 'w' else chunk)
    finally:
        report_data.close()
        stream_data.close()

stream with latin-1 (working):

def stream(client, report, customer_id):
    client.SetClientCustomerId(customer_id)
    report_downloader = client.GetReportDownloader(version='v201802')

    report_data = io.StringIO()
    stream_data = report_downloader.DownloadReportAsStream(
        reports[report], skip_report_header=False, skip_column_header=False,
        skip_report_summary=False, include_zero_impressions=True)

    try:
        while True:
            chunk = stream_data.read(CHUNK_SIZE)
            if not chunk:
                break
            report_data.write(chunk.decode('latin-1') if sys.version_info[0] == 3
                              and getattr(report_data, 'mode', 'w') == 'w' else chunk)
        return report_data
    finally:
       report_data.close()
       stream_data.close() 

write to string (not working):

def tostring(client, report, customer_id):
    client.SetClientCustomerId(customer_id)
    report_downloader = client.GetReportDownloader(version='v201802')
    
    report_data = io.StringIO()
    report_downloader.DownloadReport(
        reports[report], report_data, skip_report_header=True, skip_column_header=False,
        skip_report_summary=True, include_zero_impressions=True)
   
 return report_data
@msaniscalchi
Copy link
Contributor

Hello,

Thanks for the Report! I'm not able to reproduce on my end with test data, but this is an interesting case because at a glance it certainly looks like this should work. The fact that it works with encoding set to latin-1 may be one of the more useful clues. Some time ago, I recall there being a bug where lack of validation in the AdWords UI allowed invalid characters to be used in string fields such as resource names. I suspect this may be related. In other words, what you're reporting seems to strongly suggest that the data you're getting back from the API has latin-1 encoded content, which definitely sounds like a bug if that's accurate.

That said, troubleshooting this issue would probably be better handled in the AdWords API Support Forum, I suggest you reach out there. I'll leave this issue open for a while though; in the event that you determine that there is a library-related cause, feel free to follow up.

Regards,
Mark

@msaniscalchi msaniscalchi added the P2 label May 3, 2018
@msaniscalchi msaniscalchi self-assigned this May 8, 2018
@msaniscalchi
Copy link
Contributor

Closing for now, but feel free to reopen if you find anything library-specific at fault.

@xalexakis
Copy link
Author

xalexakis commented May 8, 2018 via email

@fiboknacky
Copy link
Member

Hi Mark,

We can't reproduce this on our end, using both Python 2 and 3, so it seems not an issue on API server.
Could you re-open this and follow up, as it's probably related to Python?

Best,
Knack

@msaniscalchi msaniscalchi reopened this May 9, 2018
@msaniscalchi
Copy link
Contributor

O.K. Will do. I'll need to set up a few different environment configurations and see if anything can reproduce this.

@xalexakis
Copy link
Author

xalexakis commented May 9, 2018 via email

@msaniscalchi
Copy link
Contributor

I think I now have a handle on what is going wrong here. The reason you're getting an error indicating an unexpected end of data is because... the decoder is reaching an unexpected end of data. The ReportDownloader.DownloadReport method currently doesn't take any special precautions in the event that a StringIO file-like object is provided--e.g. it is conceivable that any given chunk may truncate a multi-byte utf-8 character, causing the decode to fail.

As a work-around, if you provide a BytesIO instance and get the full report contents, you can decode that to get a string. Alternatively, you could just use the ReportDownloader.DownloadReportAsString method, which similarly gets the full content before decoding--worth noting, I also noticed that DownloadReportAsString doesn't chunk the response, meaning it could be problematic for particularly large reports.

It looks like there's a few improvements we can make here for a future release.

Regards,
Mark

@kybs89
Copy link

kybs89 commented Jul 5, 2018

I am getting exactly the same error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 16383: unexpected end of data

The suggested workaround of using ReportDownloader.DownloadReportAsString works, but is not optimal due to size of some larger reports.

The issue only started after i upgraded the library from version 10.0.0 to the most recent version as I migrated from API v201710 to v201806 using Python3

Regards,
Kyle

@xalexakis
Copy link
Author

xalexakis commented Jul 5, 2018 via email

@kybs89
Copy link

kybs89 commented Jul 5, 2018

Ok, thanks Christos. i'll try that out!

@success-m
Copy link

success-m commented Jul 5, 2018

Following code worked for me. In my case, special characters were not important to me. Hence, I removed those characters.

response = None
        try:
            output = open(path, 'a')
            response = report_downloader.DownloadReportAsStreamWithAwql(report_query, 'csv',
                                                                        skip_report_header=False,
                                                                        skip_column_header=False,
                                                                        skip_report_summary=False)
            while True:
                chunk = response.read(16 * 1024)
                if not chunk: break
                output.write(
                    ''.join(filter(lambda x: x in string.printable, chunk.decode(encoding='utf-8', errors='ignore'))))
            output.close()
        except Exception as error:
            raise Exception(error)
        finally:
            if response:
                response.close()

@kennethjmyers
Copy link

For anyone looking for the BytesIO method, I believe it looks something like this (please correct me if you think I've made a mistake):

bytes_buffer = BytesIO()
report_downloader.DownloadReport(report, bytes_buffer, ...)

with open(file_output, 'wb') as file:
    file.write(bytes_buffer.getvalue())
    
bytes_buffer.close()

@minayaserrano
Copy link

minayaserrano commented Aug 1, 2018

Same issue here. We upgraded from googleads 8.0.0 to googleads 12.2.0 / 13.0.0 (v201702 to v201806) and this exception raised:

Traceback (most recent call last):
  File "/home/p/keywords_report.py", line 14, in download_report
    skip_column_header=False, skip_report_summary=True)
  File "/home/p/.virtualenvs/oa-adwords-hIzJaW3w/lib/python3.6/site-packages/googleads/common.py", line 536, in Wrapper
    return utility_method(*args, **kwargs)
  File "/home/p/.virtualenvs/oa-adwords-hIzJaW3w/lib/python3.6/site-packages/googleads/adwords.py", line 1366, in DownloadReport
    output, **kwargs)
  File "/home/p/.virtualenvs/oa-adwords-hIzJaW3w/lib/python3.6/site-packages/googleads/adwords.py", line 1643, in _DownloadReport
    and not isinstance(output, io.BytesIO))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 16383: unexpected end of data

Our code (we do not use StringIO):

report_downloader = self.adwords_client.GetReportDownloader(version='v201806')
with open(csv_output, 'w') as report_file:
    try:
        report_downloader.DownloadReport(self.report, report_file, skip_report_header=True,
            include_zero_impressions=include_zero_impressions,
            skip_column_header=False, skip_report_summary=True)
    except Exception as e:
        print(f'Download failed. Exception: {e}')

In our case, exception only raises sometimes. Only for debug purposes we do a while:

report_downloader = self.adwords_client.GetReportDownloader(version='v201806')
with open(csv_output, 'w') as report_file:
    retry = True
    while retry:
        try:
            report_downloader.DownloadReport(self.report, report_file, skip_report_header=True,
                                             include_zero_impressions=include_zero_impressions,
                                             skip_column_header=False, skip_report_summary=True)
            retry = False
            print('Download success')
        except Exception as e:
            print(f'Download failed. Exception: {e}')

And this is the unexpected output:

Download fail. Exception: 'utf-8' codec can't decode byte 0xc3 in position 16383: unexpected end of data
Download fail. Exception: 'utf-8' codec can't decode byte 0xc3 in position 16383: unexpected end of data
Download fail. Exception: 'utf-8' codec can't decode byte 0xc3 in position 16383: unexpected end of data
Download fail. Exception: 'utf-8' codec can't decode byte 0xc3 in position 16383: unexpected end of data
Download success

Something weird it is happenning on ReportDownloader.DownloadReport

We can use ReportDownloader.DownloadReportAsString as a workarround (it is working), but we think is important to fix ReportDownloader.DownloadReport to do not break the client library interface.

@ezhimakov
Copy link

still happens

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants