Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve memory efficiency of base64-encoded byte array decoding operations #967

Open
clementdenis opened this issue Sep 3, 2015 · 2 comments

Comments

Projects
None yet
5 participants
@clementdenis
Copy link

commented Sep 3, 2015

Currently, decoding base64-encoded byte array with the Java API client is very inefficient in terms of memory usage.

Reading big attachments from the GMail API in memory-constrained environments like App Engine is quite a challenge because of that.
https://developers.google.com/gmail/api/v1/reference/users/messages/attachments/get

On a 64 bit Java 7 VM, it required at least 115 MB of heap (-Xmx115M) to read a 12Mb attachment from the Gmail API (which is a 17MB base64 string in the response from the API).

The code to test it is dead simple (MESSAGE_ID / ATTACHMENT_ID references a big attachment):

gmail.users().messages().attachments().get("me", MESSAGE_ID, ATTACHMENT_ID).execute()

Here is the stacktrace when trying to load the attachment with only 110MB or heap:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:2694)
at java.lang.String.(String.java:203)
at java.lang.StringBuilder.toString(StringBuilder.java:405)
at com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:360)
at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:277)
at com.google.api.client.json.jackson2.JacksonParser.getText(JacksonParser.java:76)
at com.google.api.client.json.JsonParser.parseValue(JsonParser.java:850)
at com.google.api.client.json.JsonParser.parse(JsonParser.java:471)
at com.google.api.client.json.JsonParser.parseValue(JsonParser.java:780)
at com.google.api.client.json.JsonParser.parse(JsonParser.java:381)
at com.google.api.client.json.JsonParser.parse(JsonParser.java:354)
at com.google.api.client.json.JsonObjectParser.parseAndClose(JsonObjectParser.java:87)
at com.google.api.client.json.JsonObjectParser.parseAndClose(JsonObjectParser.java:81)
at com.google.api.client.http.HttpResponse.parseAs(HttpResponse.java:459)
at com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
at LoadEmail.main(LoadEmail.java:31)

Obtaining a stream from the data field in the response would help keep the memory usage lower.

@ndupont

This comment has been minimized.

Copy link

commented Jun 2, 2017

Hi,
Any update or workaround about that important concern?
Ideally, providing similar API than for downloading file as stream from drive would be a good solution.
Thanks

@qtxo

This comment has been minimized.

Copy link

commented Jun 5, 2019

Use a manual parsing code:

Gmail.Users.Messages.Attachments.Get get = api.getGmail().users().messages()
                .attachments().get(ME, gid, attachId);

// Very important!!
get.setPrettyPrint(false).setFields("data");

InputStream is = get.executeAsInputStream();

is = InputStreamJsonField.getBase64DataStream(is, "data");

And InputStreamJsonField is just a custom InputStream wrapper that ignores the first JSON chars and delegates to org.apache.commons.codec.binary.Base64InputStream.Base64InputStream(InputStream)

/**
 * Very simple utility to stream data from a single field JSON,
 * now only used for Gmail Api where we can get a JSON like this:
 *   {"FIELD":"LARGE_DATA"}
 * In a single line without spaces, then we can easily extract the data
 */
public class InputStreamJsonField extends InputStream {
    private String field;
    private String expectedPrefix;
    private InputStream is;

    public static InputStream getBase64DataStream(InputStream is, String field) {
        return new Base64InputStream(new InputStreamJsonField(is, field));
    }

    public InputStreamJsonField(InputStream is, String field) {
        this.field = field;
        this.is = is;
    }


    @Override
    public int read() throws IOException {

        if (expectedPrefix == null) {
            // i.e: {"FIELD":"DATA_RETURNED"}
            expectedPrefix = "{\"" + field + "\":\"";

            byte[] buff = new byte[expectedPrefix.length()];

            String prefix;
            int pos = 0;
            do {
                int c = is.read();

                if (c == -1) return -1;

                if (c == ' ' || c == '\t' || c == '\n' || c == '\r') {
                    // swallow all blanks
                } else {
                    buff[pos++] = (byte) c;
                }

                prefix = new String(buff, 0, pos);
                if (!expectedPrefix.startsWith(prefix)) {
                    // error
                    break;
                }

            } while (!expectedPrefix.equals(prefix));

            if (!prefix.equals(expectedPrefix)) {
                throw new IllegalStateException(prefix + " != " + expectedPrefix);
            }

            return is.read();
        }

        int c = is.read();
        if ('"' == c) {
            // read the }
            c = is.read();
            // read -1 EOF
            c = is.read();
        }
        return c;
    }

    @Override
    public void close() throws IOException {
        is.close();
    }
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.