-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: date format changed from input to output #71
Comments
I tried to print data from to_json, found that the time value is right before to_json, so the datetime value is changed during to_json. @staticmethod
def to_jsonl(dataset, export_path, num_proc=1, **kwargs):
"""
Export method for json/jsonl target files.
:param dataset: the dataset to export.
:param export_path: the path to store the exported dataset.
:param num_proc: the number of processes used to export the dataset.
:param kwargs: extra arguments.
:return:
"""
print(dataset['time'])
dataset.to_json(export_path, force_ascii=False, num_proc=num_proc) |
I install these package with command pandas 2.0.0
datasets 2.11.0
pyarrow 14.0.1 |
This maybe a bug of pyarrow from v13.0.
|
This issue is marked as stale because there has been no activity for 21 days. Remove stale label or add new comments or this issue will be closed in 3 day. |
Close this stale issue. |
Before Reporting 报告之前
I have pulled the latest code of main branch to run again and the bug still existed. 我已经拉取了主分支上最新的代码,重新运行之后,问题仍不能解决。
I have read the README carefully and no error occurred during the installation process. (Otherwise, we recommend that you can ask a question using the Question template) 我已经仔细阅读了 README 上的操作指引,并且在安装过程中没有错误发生。(否则,我们建议您使用Question模板向我们进行提问)
Search before reporting 先搜索,再报告
OS 系统
Ubuntu
Installation Method 安装方式
from source
Data-Juicer Version Data-Juicer版本
latest
Python Version Python版本
3.8
Describe the bug 描述这个bug
I have a jsonl data to be processed, there is a time key in data records, it looks like '2023-10-13 16:06:31' originally, then I follow the
python tools/process_data.py --config configs/demo/process.yaml
command to process data, and in output jsonl, I found time is changed to 1678, a integer. I've found that it may be caused bydatasets.to_json
, there is a parameter called date_format, I set it to 'iso', the output will change to '1970-01-01T00:00:01.698', so it's not only bug in format, but the value also changed.To Reproduce 如何复现
python tools/process_data.py --config configs/demo/process.yaml
Configs 配置信息
Logs 报错日志
No response
Screenshots 截图
No response
Additional 额外信息
No response
The text was updated successfully, but these errors were encountered: