Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot read csv files that contain '<a href=' and do not have the extension ‘.csv’ ‘.tsv’. #4036

Closed
1 task done
sinbino opened this issue May 22, 2024 · 0 comments · Fixed by #4040
Closed
1 task done

Comments

@sinbino
Copy link

sinbino commented May 22, 2024

This is:

- [x] a bug report

I think the problem is similar to #564, but it happens under certain conditions, even in recent versions.
This can be a problem if getting a file from $_FILES['upfile']['tmp_name'] in php, since there is no extension.

What is the expected behavior?

  • The file extension is not .csv and .tsv.
  • It contains “<a href=”.

Can read csv files that satisfy all of the above.

What is the current behavior?

An error occurs when trying to read a csv file that satisfies all of the above.

Fatal error: Uncaught PhpOffice\PhpSpreadsheet\Reader\Exception: test is an Invalid Spreadsheet file. in /project/vendor/phpoffice/ phpspreadsheet/src/PhpSpreadsheet/Reader/Csv.php:288

What are the steps to reproduce?

index.php

<?php
require __DIR__ . '/vendor/autoload.php';

$spreadsheet = new \PhpOffice\PhpSpreadsheet\Reader\Csv();
$spreadsheet->load('test');

// test contains: 
// aaa,bbb,"<a href="

What features do you think are causing the issue

  • Reader

Does an issue affect all spreadsheet file formats? If not, which formats are affected?

csv

Which versions of PhpSpreadsheet and PHP are affected?

PhpSpreadsheet 1.29.0
php 8.1.2

oleibman added a commit to oleibman/PhpSpreadsheet that referenced this issue May 22, 2024
Fix PHPOffice#4036. The issue was originally reported as PHPOffice#564 (and PHPOffice#811) and fixed for the most part, but this is a variation that was not covered by the original. Cells with html fragments can cause `mime_content_type` to identify the file as `text\html`. Original fix was to ignore mime_content_type when file extension is 'csv' or 'tsv'. However, if the file does not have one of those extensions, it will be rejected by Csv Reader as invalid mimetype. This PR adds text\html to the list of valid mimetypes.

I imagine that this type of problem might occur for other mimetypes. If any of those are reported in future, it might be better to just add a "suppress mimetype" check option, rather than extending the list forever. Html is unusual in that its rules are so lax, which is why it seems appropriate to add it here.

Note that IOFactory may still identify a file as Html even when intended as Csv. The sample associated with this issue does not fall into this category, but one of the unit tests on this ticket does. The file will still be read correctly by Csv Reader, but IOFactory load may cause it to use Html Reader instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

1 participant