Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aliccp_dataset_processing.py 170行 .tmp 文件缺失 #3

Open
cywuuuu opened this issue Nov 25, 2023 · 1 comment
Open

aliccp_dataset_processing.py 170行 .tmp 文件缺失 #3

cywuuuu opened this issue Nov 25, 2023 · 1 comment

Comments

@cywuuuu
Copy link

cywuuuu commented Nov 25, 2023

 def norm_df(path,out_path):                             
     df = pd.read_csv(path,dtype=np.int32)               
     print(df.shape)                                     
     df -= (min_v-1)                                     
     df[df<0]=0                                          
     df =df.astype(np.int32)                             
     print(df.head(10))                                  
     df.to_csv(out_path,index=False)                     
     return df                                           
 train_df = norm_df(data_path.format('train') + '.tmp',  
         norm_data_path.format('train'))                 
 test_df = norm_df(data_path.format('test') + '.tmp',    
         norm_data_path.format('test'))                  

此处的.tmp是指哪个文件,我复制了一份.csv到.csv.tmp但是仍然报错如下:

    df = pd.read_csv(path,dtype=np.int32)
  File "/root/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 912, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/root/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 583, in _read
    return parser.read(nrows)
  File "/root/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1704, in read
    ) = self._engine.read(  # type: ignore[attr-defined]
  File "/root/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
    chunks = self._reader.read_low_memory(nrows)
  File "pandas/_libs/parsers.pyx", line 814, in pandas._libs.parsers.TextReader.read_low_memory
  File "pandas/_libs/parsers.pyx", line 891, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 1036, in pandas._libs.parsers.TextReader._convert_column_data
  File "pandas/_libs/parsers.pyx", line 1137, in pandas._libs.parsers.TextReader._convert_tokens
ValueError: invalid literal for int() with base 10: 'bacff91692951881'

请问具体可能是什么原因呢

@cywuuuu
Copy link
Author

cywuuuu commented Nov 25, 2023

  File "pandas/_libs/parsers.pyx", line 1131, in pandas._libs.parsers.TextReader._convert_tokens
TypeError: Cannot cast array data from dtype('O') to dtype('int32') according to the rule 'safe'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/SATrans-main/aliccp_dataset_processing.py", line 308, in <module>
    normalize_train_and_test()
  File "/root/SATrans-main/aliccp_dataset_processing.py", line 170, in normalize_train_and_test
    train_df = norm_df(data_path.format('train') + '.tmp',
  File "/root/SATrans-main/aliccp_dataset_processing.py", line 162, in norm_df
    df = pd.read_csv(path,dtype=np.int32)
  File "/root/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 912, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/root/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 583, in _read
    return parser.read(nrows)
  File "/root/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1704, in read
    ) = self._engine.read(  # type: ignore[attr-defined]
  File "/root/miniconda3/lib/python3.8/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 234, in read
    chunks = self._reader.read_low_memory(nrows)
  File "pandas/_libs/parsers.pyx", line 814, in pandas._libs.parsers.TextReader.read_low_memory
  File "pandas/_libs/parsers.pyx", line 891, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 1036, in pandas._libs.parsers.TextReader._convert_column_data
  File "pandas/_libs/parsers.pyx", line 1137, in pandas._libs.parsers.TextReader._convert_tokens
ValueError: invalid literal for int() with base 10: 'bacff91692951881'

@cywuuuu cywuuuu closed this as completed Nov 28, 2023
@cywuuuu cywuuuu reopened this Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant