Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

- Address issue: https://github.com/rocketlaunchr/dataframe-go/issues… #63

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pjebs
Copy link
Collaborator

@pjebs pjebs commented Apr 2, 2022

…/62 (Chinese characters and python BOM prefix)

@tanyaofei
Copy link

tanyaofei commented Apr 3, 2022

No problem reading the file with encoding UTF-8-BOM.
And No errors exporting to parquet, but can NOT read back the parquet file.

func TestUTF8CSV(t *testing.T) {
    fr, err := os.Open("export.csv")
    if err != nil {
        panic(err)
    }

    df, err := imports.LoadFromCSV(context.Background(), fr)
    if err != nil {
        panic(err)
    }

    out, err := os.Create("export.parquet")
    if err != nil {
        panic(err)
    }

    err = exports.ExportToParquet(context.Background(), out, df)
    if err != nil {
        panic(err)
    }

    out.Close()

    fr, err = os.Open("export.parquet")
    source, err := local.NewLocalFileReader("export.parquet")
    if err != nil {
        panic(err)
    }
    df, err = imports.LoadFromParquet(context.Background(), source)
    if err != nil {
        panic(err)
    }
    fmt.Println(df)

}
=== RUN   TestUTF8CSV
--- FAIL: TestUTF8CSV (0.02s)
panic: [NextRowGroup] Column not found: Parquet_go_root.P_231188150229143183 [recovered]
	panic: [NextRowGroup] Column not found: Parquet_go_root.P_231188150229143183

goroutine 14 [running]:
testing.tRunner.func1.2({0x13b8d60, 0xc0006af1c0})
	/usr/local/opt/go/libexec/src/testing/testing.go:1389 +0x24e
testing.tRunner.func1()
	/usr/local/opt/go/libexec/src/testing/testing.go:1392 +0x39f
panic({0x13b8d60, 0xc0006af1c0})
	/usr/local/opt/go/libexec/src/runtime/panic.go:838 +0x207
github.com/rocketlaunchr/dataframe-go/aa.TestUTF8CSV(0x0?)
	.../dataframe-go/aa/utf8_csv_test.go:43 +0x1d7
testing.tRunner(0xc0005c9d40, 0x1444b28)
	/usr/local/opt/go/libexec/src/testing/testing.go:1439 +0x102
created by testing.(*T).Run
	/usr/local/opt/go/libexec/src/testing/testing.go:1486 +0x35f

@tanyaofei
Copy link

export.parquet.zip

@pjebs
Copy link
Collaborator Author

pjebs commented Apr 3, 2022

Can you read it back in python to check if the output file is valid?

@tanyaofei
Copy link

Can you read it back if python?

I don't think so, cause idea plugin Big Data Tools show Nothing to show

and here is my python scripts out:

       编号    年龄    性别    地区  身高cm  体重kg  ... 吃零食情况  跑步情况 玩电脑游戏情况  逛街情况  散步情况  夜宵情况
0    None  None  None  None  None  None  ...  None  None    None  None  None  None
1    None  None  None  None  None  None  ...  None  None    None  None  None  None
2    None  None  None  None  None  None  ...  None  None    None  None  None  None
3    None  None  None  None  None  None  ...  None  None    None  None  None  None
4    None  None  None  None  None  None  ...  None  None    None  None  None  None
..    ...   ...   ...   ...   ...   ...  ...   ...   ...     ...   ...   ...   ...
446  None  None  None  None  None  None  ...  None  None    None  None  None  None
447  None  None  None  None  None  None  ...  None  None    None  None  None  None
448  None  None  None  None  None  None  ...  None  None    None  None  None  None
449  None  None  None  None  None  None  ...  None  None    None  None  None  None
450  None  None  None  None  None  None  ...  None  None    None  None  None  None

[451 rows x 21 columns]

@pjebs
Copy link
Collaborator Author

pjebs commented Apr 3, 2022

I wonder when you used the pull-request branch, it is using the latest (incompatible) version of the parquet parsing package?

@tanyaofei
Copy link

I wonder when you used the pull-request branch, it is using the latest (incompatible) version of the parquet parsing package?

I am sure I am using github.com/xitongsys/parquet-go v1.5.2 and github.com/xitongsys/parquet-go-source v0.0.0-20200509081216-8db33acb0acf

@pjebs
Copy link
Collaborator Author

pjebs commented Apr 3, 2022

When you tried s.Rename("X" + strings.Trim(s.Name(), "\xEF\xBB\xBF")), could you read the exported parquet file back in python?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants