Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

same .tsv file, get different data-frame structure using engine 'python' and 'c' #26545

Closed
Jane-Eyre opened this issue May 28, 2019 · 6 comments

Comments

@Jane-Eyre
Copy link

commented May 28, 2019

In my Mac, I have a Tab-Separated values file with encoding UTF-8, and the version of pandas is
0.24.2.
74872_zh_CN_UI.txt

when I use read_csv function with engine 'python' like this:
b = pd.read_csv("/Users/GHIBLI/Documents/vmware-L10n/bert/zh_CN/74872_zh_CN_UI.tsv", engine="python", delimiter="\t")
print(b.shape)
I got (8, 1)

if with default engine:
b = pd.read_csv("/Users/GHIBLI/Documents/vmware-L10n/bert/zh_CN/74872_zh_CN_UI.tsv", delimiter="\t")
print(b.shape)
I got (8,22)

In contrast to 'C' engine, this 'python' engine seems that is not as simple as just 'feature-complete' I think.

@Liam3851

This comment has been minimized.

Copy link
Contributor

commented May 28, 2019

I cannot reproduce using pandas 0.24.2 on either Ubuntu or Windows -- both show the shape as (8, 22) using both engines. It could be a Mac-specific issue, though it would be odd that the Python-engine implementation would behave differently based on the OS.

@WillAyd

This comment has been minimized.

Copy link
Member

commented May 28, 2019

I can reproduce this on master. If you would like to take a deeper look and see what's going on would certainly appreciate it!

@WillAyd WillAyd added this to the Contributions Welcome milestone May 28, 2019

@Liam3851

This comment has been minimized.

Copy link
Contributor

commented May 28, 2019

@WillAyd Are you using a Mac? I can't reproduce on master using Windows/python 3.7.3 (I suppose perhaps this could also be a python version difference, rather than an OS difference)

In [1]: import pandas as pd

In [2]: pd.__version__
Out[2]: '0.25.0.dev0+615.g998a0deea'

In [3]: pd.read_csv('74872_zh_CN_UI.txt', delimiter='\t', engine='python').shape
Out[3]: (8, 22)
@WillAyd

This comment has been minimized.

Copy link
Member

commented May 28, 2019

@LuckyDenis

This comment has been minimized.

Copy link
Contributor

commented May 29, 2019

I was able to play on linux 19.04 Python 3.7 Pandas 0.24.2

@LuckyDenis

This comment has been minimized.

Copy link
Contributor

commented May 31, 2019

Good afternoon, I found where in what a mistake. But have not yet figured out how best to fix it. Do you mind if I take this task?

LuckyDenis pushed a commit to LuckyDenis/pandas that referenced this issue Jun 3, 2019

LuckyDenis pushed a commit to LuckyDenis/pandas that referenced this issue Jun 3, 2019

@jreback jreback modified the milestones: Contributions Welcome, 0.25.0 Jun 6, 2019

jreback added a commit that referenced this issue Jun 12, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.