Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utf-8/encoding bug #171

Closed
moander opened this issue Jun 7, 2018 · 14 comments
Closed

utf-8/encoding bug #171

moander opened this issue Jun 7, 2018 · 14 comments
Assignees
Labels
api: bigtable Issues related to the googleapis/nodejs-bigtable API. 🚨 This issue needs some love. triage me I really want to be triaged. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@moander
Copy link

moander commented Jun 7, 2018

Please see my comments here #129 (review)

Problem 1:
Binary values in key will get unexpected values using binary values in buffers.

Problem 2:
Binary value filters will not return any matches

Problem 3:
TransformError from getRows after inserting unique keys 0xE6 and 0xF8

Environment details

  • OS: macOS 10.13.4
  • Node.js version: v8.11.2
  • npm version: 5.6.0
  • @google-cloud/bigtable version: 0.14.0

Steps to reproduce

async function test() {
    await bt.createTable('test', {
        families: [
            'c'
        ]
    }).catch(swallowCode(6));

    const table = bt.table('test');

    await table.mutate([
        {
            key: 'a',
            method: 'insert',
            data: {
                c: {
                    test: Buffer.from('x', 'binary')
                }
            }
        },
        {
            key: 'b',
            method: 'insert',
            data: {
                c: {
                    test: Buffer.from('æ', 'binary')
                }
            }
        },
        {
            key: Buffer.from('æ'),
            method: 'insert',
            data: {
                c: {
                    test: Buffer.from('æ', 'binary')
                }
            }
        },
        {
            key: Buffer.from('æ', 'binary'),
            method: 'insert',
            data: {
                c: {
                    test: Buffer.from('æ', 'binary')
                }
            }
        },

    ]);

    function printRow(row) {
        console.log(row.id, Buffer.from(row.id), row.data.c.test[0].value);
    }

    let rows;

    console.log('## no filter');
    [rows] = await table.getRows({
        decode: false,
        filter: [
            {
                column: {
                    cellLimit: 1
                }
            },
        ]
    });
    rows.forEach(printRow);

    console.log('## æ utf-8 filter');
    [rows] = await table.getRows({
        decode: false,
        filter: [
            {
                column: {
                    cellLimit: 1
                }
            },
            {
                value: 'æ'
            }
        ]
    });
    rows.forEach(printRow);


    console.log('## æ binary filter');
    [rows] = await table.getRows({
        decode: false,
        filter: [
            {
                column: {
                    cellLimit: 1
                }
            },
            {
                value: Buffer.from('æ', 'binary')
            }
        ]
    });
    rows.forEach(printRow);

    console.log('## x binary filter');
    [rows] = await table.getRows({
        decode: false,
        filter: [
            {
                column: {
                    cellLimit: 1
                }
            },
            {
                value: Buffer.from('x', 'binary')
            }
        ]
    });
    rows.forEach(printRow);
}

Output:

## no filter
a <Buffer 61> <Buffer 78>
b <Buffer 62> <Buffer e6>
æ <Buffer c3 a6> <Buffer e6>
� <Buffer ef bf bd> <Buffer e6>
## æ utf-8 filter
## æ binary filter
## x binary filter
a <Buffer 61> <Buffer 78>
@moander
Copy link
Author

moander commented Jun 7, 2018

I expected this to insert the same key twice because æ and ø both become 0xEFBFBD using the following code:

    await table.mutate([
        {
            key: Buffer.from('æ', 'binary'),
            method: 'insert',
            data: {
                c: {
                    test: Buffer.from('æ', 'binary')
                }
            }
        },
        {
            key: Buffer.from('ø', 'binary'),
            method: 'insert',
            data: {
                c: {
                    test: Buffer.from('x', 'binary')
                }
            }
        },
    ]);

But instead I stumbled upon another problem. Running the repro code from the original post now outputs the following error even after uncommenting the mutation above.

## no filter
{ TransformError: A commit happened but the same key followed: {"labels":[],"rowKey":{"type":"Buffer","data":[248]},"familyName":{"value":"c"},"qualifier":{"value":{"type":"Buffer","data":[116,101,115,116]}},"timestampMicros":"1528400832560000","value":{"type":"Buffer","data":[120]},"valueSize":0,"commitRow":true,"rowStatus":"commitRow"}
    at ChunkTransformer.validateNewRow (/Users/moander/dev/uzi/api/node_modules/@google-cloud/bigtable/src/chunktransformer.js:215:18)
    at ChunkTransformer.processNewRow (/Users/moander/dev/uzi/api/node_modules/@google-cloud/bigtable/src/chunktransformer.js:291:8)
    at ChunkTransformer._transform (/Users/moander/dev/uzi/api/node_modules/@google-cloud/bigtable/src/chunktransformer.js:77:14)

@moander
Copy link
Author

moander commented Jun 7, 2018

Expected result is that both queries works, but the second one still fails after that one conflicting mutation.

    await bt.table('test2').getRows().catch(err => console.log('test2', err.message));
    console.log('OK!');
    await bt.table('test').getRows().catch(err => console.log('test', err.message));
OK!
test A commit happened but the same key followed: {"labels":[],"rowKey":{"type":"Buffer","data":[248]},"familyName":{"value":"c"},"qualifier":{"value":{"type":"Buffer","data":[116,101,115,116]}},"timestampMicros":"1528400832560000","value":{"type":"Buffer","data":[120]},"valueSize":0}

@sduskis sduskis added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p0 Highest priority. Critical issue. P0 implies highest priority. labels Jun 7, 2018
@sduskis
Copy link
Contributor

sduskis commented Jun 7, 2018

I think that this is a client-side problem related to to the encoding issue.

@moander
Copy link
Author

moander commented Jun 7, 2018

Deleting the row solved the getRows problem

await table.mutate([
    {
        key: Buffer.from('ø', 'binary'),
        method: 'delete',
    },
]);

@JustinBeckwith JustinBeckwith added 🚨 This issue needs some love. and removed 🚨 This issue needs some love. labels Jun 8, 2018
@sduskis
Copy link
Contributor

sduskis commented Jun 11, 2018

@kolodny, should we keep this as a P0?

@JustinBeckwith JustinBeckwith added the 🚨 This issue needs some love. label Jun 12, 2018
@kolodny
Copy link
Contributor

kolodny commented Jun 13, 2018

Thanks @moander for this bug report.

After digging into it, I found that nodejs treats binary encoding as latin1, in the docs for buffer it says:

The character encodings currently supported by Node.js include:
...

  • 'binary' - Alias for 'latin1'

We can what's happening here manifesting in the following:

console.log(`Buffer.from('æ', 'binary') is`, Buffer.from('æ', 'binary'));
console.log(`Buffer.from('æ') is`, Buffer.from('æ'));
console.log(`Buffer.from('æ', 'binary').toString() is`, Buffer.from('æ', 'binary').toString());
console.log(`Buffer.from('æ').toString() is`, Buffer.from('æ').toString());


/* outputs
Buffer.from('æ', 'binary') is <Buffer e6>
Buffer.from('æ') is <Buffer c3 a6>
Buffer.from('æ', 'binary').toString() is �
Buffer.from('æ').toString() is æ
*/

Combining this with the fact that rowKeys are auto-stringified from a buffer even with decode set to false, is causing the keys to look all weird when in fact they are the same buffer provided from node before being passed to the insert call. I'm going to open an issue to not convert row keys to strings when decode is set to false. Let me know if that addresses your concerns. Thanks!

@moander
Copy link
Author

moander commented Jun 13, 2018

Thanks @kolodny. My issue is more related to problem 2 binary filters. The problems with the key was something I guessed would also be there because of the filtering problems ;)

kolodny added a commit that referenced this issue Jun 13, 2018
Addresses #171

- [x] Tests and linter pass
- [x] Code coverage does not decrease (if any source code was changed)
@sduskis
Copy link
Contributor

sduskis commented Jun 13, 2018

@kolodny, are we done with this issue? If so, can you please close it?

@kolodny
Copy link
Contributor

kolodny commented Jun 13, 2018

I'm gonna close this for now. @moander feel free to reopen if your issue isn't being addressed.

@kolodny kolodny closed this as completed Jun 13, 2018
@ghost ghost removed the priority: p0 Highest priority. Critical issue. P0 implies highest priority. label Jun 13, 2018
@kolodny kolodny removed the 🚨 This issue needs some love. label Jun 13, 2018
@moander moander changed the title Possibly a very serious utf-8/encoding bug utf-8/encoding bug Jun 14, 2018
@moander
Copy link
Author

moander commented Jun 14, 2018

After updating package to master branch I get this error:

const bigtable = require('@google-cloud/bigtable')();
                                                  ^
TypeError: Class constructor Bigtable cannot be invoked without 'new'

My previous version was 0.14.0. Downgrading package solves the problem.

@kolodny
Copy link
Contributor

kolodny commented Jun 14, 2018 via email

@moander
Copy link
Author

moander commented Jun 14, 2018

@kolodny same error using new require(..).

I had to do this:

const Bigtable = require('@google-cloud/bigtable');
const bigtable = new Bigtable();

The others work with new require(..) by some reason:

//const bigtable = new require('@google-cloud/bigtable')();
const datastore = new require('@google-cloud/datastore')();
const storage = new require('@google-cloud/storage')();
const pubsub = new require('@google-cloud/pubsub')();
const bigquery = new require('@google-cloud/bigquery')();

@moander
Copy link
Author

moander commented Jun 14, 2018

@kolodny filter repro returns same result from master branch

## no filter
a <Buffer 61> <Buffer 78>
b <Buffer 62> <Buffer e6>
æ <Buffer c3 a6> <Buffer e6>
� <Buffer ef bf bd> <Buffer e6>
## æ utf-8 filter
## æ binary filter
## x binary filter
a <Buffer 61> <Buffer 78>

You also see that key: Buffer.from('æ', 'binary') resulted in � (0xefbfbd)

@moander
Copy link
Author

moander commented Jun 14, 2018

The problem where table.getRows stops working on a table is still there.

The following code makes getRows (and presumably createReadStream) against the same table unusable:

        await table.mutate([
            {
                key: Buffer.from('æ', 'binary'),
                method: 'insert',
                data: {
                    c: {
                        test: Buffer.from('æ', 'binary')
                    }
                }
            },
            {
                key: Buffer.from('ø', 'binary'),
                method: 'insert',
                data: {
                    c: {
                        test: Buffer.from('x', 'binary')
                    }
                }
            },
        ]);

@sduskis sduskis reopened this Jun 14, 2018
@JustinBeckwith JustinBeckwith added triage me I really want to be triaged. 🚨 This issue needs some love. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed triage me I really want to be triaged. 🚨 This issue needs some love. labels Jun 14, 2018
@kolodny kolodny mentioned this issue Jun 15, 2018
2 tasks
@ghost ghost removed the priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. label Jun 15, 2018
@google-cloud-label-sync google-cloud-label-sync bot added the api: bigtable Issues related to the googleapis/nodejs-bigtable API. label Jan 31, 2020
@yoshi-automation yoshi-automation added triage me I really want to be triaged. 🚨 This issue needs some love. labels Apr 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable Issues related to the googleapis/nodejs-bigtable API. 🚨 This issue needs some love. triage me I really want to be triaged. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

5 participants