Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle carriage return '\r' in CSV new lines #686

Closed
stephenberry opened this issue Jan 4, 2024 Discussed in #684 · 2 comments · Fixed by #980
Closed

Handle carriage return '\r' in CSV new lines #686

stephenberry opened this issue Jan 4, 2024 Discussed in #684 · 2 comments · Fixed by #980
Labels
bug Something isn't working

Comments

@stephenberry
Copy link
Owner

Discussed in #684

Originally posted by Nukoooo January 4, 2024
I kinda got annoyed by this error so I decided to figure out what happened with it, because the files can be read by Libreoffice, I saw the code checks for '\r' at here

case '\r':

so I wondered if it was because of that so I made a minimum example with the following code (data is from issue #666)

    std::ifstream file("./fish_record.csv", std::ios::binary);

    std::vector<std::uint8_t> file_content{ std::istreambuf_iterator(file), {} };
    auto has_char = [&](char c)
    {
        return std::ranges::find(file_content, c) != file_content.end();
    };

    std::cout << has_char('\\') << std::endl;
    std::cout << has_char('\b') << std::endl;
    std::cout << has_char('\f') << std::endl;
    std::cout << has_char('\r') << std::endl;
    std::cout << has_char('\t') << std::endl;

    std::string buffer_with_r{ file_content.begin(), file_content.end() };

    FishRecord rec{};
    auto error = glz::read_csv<glz::colwise>(rec, buffer_with_r);
    std::cout << "Read error code: " << glz::format_error(error, std::string {}) << std::endl;

    auto it = std::ranges::find(file_content, '\r');
    std::string remaining { std::ranges::find(it, file_content.end(), ','), file_content.end() };
    std::cout << remaining << std::endl;

    for (const auto& caught_fish_name : rec.CaughtFishName)
    {
        std::cout << caught_fish_name << std::endl;
    }

    std::erase_if(file_content,
                  [&](const auto& i)
                  {
                      return i == '\r';
                  });

    std::string buffer_without_r { file_content.begin(), file_content.end() };
    FishRecord rec2 {};
    error = glz::read_csv<glz::colwise>(rec2, buffer_without_r);
    
    std::cout << "Read error code: " << glz::format_error(error, std::string {}) << std::endl;

    for (const auto& caught_fish_name : rec2.CaughtFishName)
    {
        std::cout << caught_fish_name << std::endl;
    }

and gave the result:

0
0
0
1
0
Read error code: unknown_key
,8.365,12.5,1,万能拟饵,无,无,撒饵,利姆萨·罗敏萨下层甲板,重杆,提钩,否,否
海港鲱,7.586,11.1,1,万能拟饵,无,无,撒饵,利姆萨·罗敏萨下层甲板,重杆,提钩,否,否
猛犸章鱼,26.044,55.6,1,万能拟饵,无,海港鲱,撒饵,利姆萨·罗敏萨下层甲板,鱼王杆,提钩,否,否

珊瑚蝶
Read error code :none
珊瑚蝶
海港鲱
猛犸章鱼

After removing \r from the byte array it can parse the content properly into the struct without any weird behavior. So I'm curious about if checking \r intended or not, if it is, what are the reasons behind this 👀

My code in the project:
image

@stephenberry stephenberry added the bug Something isn't working label Jan 6, 2024
@stephenberry
Copy link
Owner Author

Pardon me for not commenting on this bug. The only reason the carriage return isn't handled was because all the CSV files I was dealing with were company generated without the carriage return. I would like to support this, because windows adds \r\n for most CSV generation.

I'll add support at some point, but if you'd like to make the fix yourself and submit a pull request I would gladly merge it in.

@stephenberry stephenberry linked a pull request May 6, 2024 that will close this issue
@stephenberry
Copy link
Owner Author

Thanks for reporting this, finally got carriage returns supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant