Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OverWrite(): does not handle CRLF documents correctly #626

Open
Magnanimity opened this issue Jan 31, 2018 · 23 comments
Open

OverWrite(): does not handle CRLF documents correctly #626

Magnanimity opened this issue Jan 31, 2018 · 23 comments

Comments

@Magnanimity
Copy link

I found this bug / would like to have this new functionality

Undefined Offset: 1 - mPDF.php Line 29198

29196:        $xref = [];
29197:        preg_match("/xref\n0 (\d+)\n(.*?)\ntrailer/s", $pdf, $m);
29198:        $xref_objid = $m[1];
29199:        preg_match_all('/(\d{10}) (\d{5}) (f|n)/', $m[2], $x);
29200:        for ($i = 0; $i < count($x[0]); $i++) {
29201:            $xref[] = [intval($x[1][$i]), $x[2][$i], $x[3][$i]];

This is mPDF and PHP version and environment (fpm/cli etc) I am using

mPDF 7.0.0
PHP 5.6.31 (also tried PHP 7.0.23 & PHP 7.1.9)
Apache 2.4.27
Symfony 3.2.13
(WAMPSERVER 3.1.0 - 64-bit)

This is a PHP code snippet I use

    /**
     * @Route("/test/", name="test")
     */
    public function testAction(Request $request)
    {
        $mpdf = new \Mpdf\Mpdf();
	$mpdf->SetImportUse();

	$mpdf->percentSubset = 0;

	$search = array(
		'{{test1}}',
		'{{test2}}'
	);

	$replacement = array(
		"THIS IS A TEST",
		"THIS IS ANOTHER TEST"
	);

	if (!file_exists('../test.pdf')) die("FILE DOES NOT EXIST");
		
	$mpdf->OverWrite('../test.pdf', $search, $replacement, 'D', 'report.pdf' ) ;
     }

screencapture_563
screencapture_564

@Magnanimity

This comment has been minimized.

@Klap-in

This comment has been minimized.

@Magnanimity
Copy link
Author

Example Attached.

The PDF was generated with Windows "Print to PDF" and then the {{test}} parameters were inserted using NitroPDF.

test.pdf

@Cosmologist

This comment has been minimized.

@OctaneInteractive

This comment has been minimized.

@finwe
Copy link
Member

finwe commented Apr 30, 2018

It seems OverWrite has a problem with PDF files using CRLF line endings. I tried to edit failing regexes but to no avail (resulting files were empty). Leaving open.

@finwe finwe changed the title mpdf->OverWrite(): Undefined offset: 1 OverWrite(): does not handle CRLF documents correctly Apr 30, 2018
@h2ooooooo
Copy link

I couldn't figure out how to save a PDF in Word 2016 (office 365) that didn't end up with CRLF rather than LF in newline, and I assume that the startxref references would fail if you simply replaced all CRLF with LF. I tried using PDF printers (two, both Chrome and Microsofts) as well as having Google Drive convert it for me, and nothing helped.

Finally I went ahead and converted it with unoconv and it replaced the CRLF's with LF characters instead and allows me to use mpdf to replace.

@danieljausovec
Copy link

Is there already a solution for this?

@finwe
Copy link
Member

finwe commented Oct 12, 2018

If it were, the issue would be closed.

@studioramix
Copy link

Is this going to be resolved?

@finwe
Copy link
Member

finwe commented Feb 19, 2019

image

jokes aside: only if someone puts time into it which doesn't seem likely ATM.

@studioramix
Copy link

any temporary workaround or successful tool that can fix the original PDF to work with this function?

the "dirty" way for me around this for now is with manually placing text based on position, but no need to explain why this is really dirty ;)

$mpdf->SetTextColor(222,0,0);
$mpdf->SetXY(55,0);
$mpdf->WriteCell(22,44,"Bananna");

@h2ooooooo
Copy link

@studioramix As I mentioned in my post I successfully used unoconv to convert it to a working format (LF instead of CRLF). Obviously it requires you to run through this either manually or automatically through your code.

@Sivustonikkari
Copy link

I tried creating a pdf with Adobe Acrobat with the Create->pdf from clipboard -option. On the clipboard I had two lines of text copied from Notepad++ with EOL characters converted to LF only. Even that did not work. Is there a working example of a pdf file that can be used for testing?

@ekaprasasti

This comment has been minimized.

@romaantoniuk
Copy link

solved ?

@finwe
Copy link
Member

finwe commented Jul 3, 2019

closed ?

@Misiu
Copy link

Misiu commented Sep 2, 2019

@h2ooooooo what command did You use?

I got 4 errors:

Notice: Undefined offset: 1 in C:\xampp\htdocs\word2pdf2\Mpdf\Mpdf.php on line 26825

Notice: Undefined offset: 2 in C:\xampp\htdocs\word2pdf2\Mpdf\Mpdf.php on line 26826

Notice: Undefined offset: 1 in C:\xampp\htdocs\word2pdf2\Mpdf\Mpdf.php on line 26833

Notice: Undefined offset: 1 in C:\xampp\htdocs\word2pdf2\Mpdf\Mpdf.php on line 26899

but after replacing
preg_match("/xref\n0 (\d+)\n(.*?)\ntrailer/s", $pdf, $m);
with
preg_match("/xref\r\n0 (\d+)\r\n(.*?)\r\ntrailer/s", $pdf, $m); two firts dissaper.

When I print_r $m variable from line 26824 (preg_match("/xref\r\n0 (\d+)\r\n(.*?)\r\ntrailer/s", $pdf, $m);)
I'm getting this:

Array
(
    [0] => xref
0 21
0000000010 65535 f
0000000017 00000 n
0000000125 00000 n
0000000181 00000 n
0000000450 00000 n
0000000753 00000 n
0000000921 00000 n
0000001160 00000 n
0000001213 00000 n
0000001266 00000 n
0000000011 65535 f
0000000012 65535 f
0000000013 65535 f
0000000014 65535 f
0000000015 65535 f
0000000016 65535 f
0000000017 65535 f
0000000000 65535 f
0000001890 00000 n
0000002125 00000 n
0000184257 00000 n
trailer
    [1] => 21
    [2] => 0000000010 65535 f
0000000017 00000 n
0000000125 00000 n
0000000181 00000 n
0000000450 00000 n
0000000753 00000 n
0000000921 00000 n
0000001160 00000 n
0000001213 00000 n
0000001266 00000 n
0000000011 65535 f
0000000012 65535 f
0000000013 65535 f
0000000014 65535 f
0000000015 65535 f
0000000016 65535 f
0000000017 65535 f
0000000000 65535 f
0000001890 00000 n
0000002125 00000 n
0000184257 00000 n
)

I don't know if this is a good road, but I'd like to have this fided.

P.S. can \r be made optional in preg_match?

Maybe at start we can check if the line end is CR LF and replace it with LF?

@finwe I see that You already tried fixing regex'es, but without any luck. What do You think about replacing CRLF with LF in PDF?

@Misiu
Copy link

Misiu commented Sep 3, 2019

I also noticed that in some PDF's the regex must be different.
For example take a look at this regex:
preg_match("/<<\s*\/Type\s*\/Pages\s*\/Kids\s*\[(.*?)\]\s*\/Count/s", $pdf, $m);

In first PDF (version 1.3) I have this:


<< 
/Type /Pages 
/Kids [ 32 0 R 1 0 R 4 0 R 7 0 R ] 
/Count 4 
>> 

but in second (version 1.5) I have this:
<</Type/Pages/Count 1/Kids[ 3 0 R] >>

so the above regex won't match.

@Misiu
Copy link

Misiu commented Sep 3, 2019

I have a very simple PDF saved using Word.
It contains only 2 lines:

Witaj %imie%
Masz %wiek% lat!

but when I print $s after calling $s = gzuncompress($s); I get this:

/P <> BDC BT
/F1 11.04 Tf
1 0 0 1 70.824 760.54 Tm
0 g
0 G
[(Wi)-9(t)9(aj )9(%i)-11(m)16(i)-8(e%)] TJ
ET
BT
1 0 0 1 132.53 760.54 Tm
[( )] TJ
ET
EMC /P <> BDC BT
/F2 11.04 Tf
1 0 0 1 70.824 737.98 Tm
0.18 0.459 0.714 rg
0.18 0.459 0.714 RG
[(M)4(a)6(sz)8( )9(%)] TJ
ET
BT
1 0 0 1 105.41 737.98 Tm
[(w)6(i)7(e)-8(k)] TJ
ET
BT
1 0 0 1 127.01 737.98 Tm
[(%)] TJ
ET
BT
1 0 0 1 135.17 737.98 Tm
[( )] TJ
ET
BT
1 0 0 1 137.57 737.98 Tm
[(l)5(a)4(t!)] TJ
ET
BT
1 0 0 1 153.43 737.98 Tm
[( )] TJ
ET
0.18 0.459 0.71 rg
70.8 736.08 82.56 0.72 re
f*
EMC

as You can see there is no Witaj string, line containing it looks like this:

[(Wi)-9(t)9(aj )9(%i)-11(m)16(i)-8(e%)] TJ

so str_replace won't work 😥

@fahadstaffasia
Copy link

fahadstaffasia commented Jun 11, 2021

Is the issue solved? I have been facing same issue with following code

// Must set codepage (e.g. UTF-8 or Core fonts) the same as for original document
        // The rest of the parameters do nothing
        $mpdf = new Mpdf();
        // in a subset font
        $mpdf->percentSubset = 0;

        $search = array(
            'whatever'
        );

        $replacement = array(
            'personalised'
        );

        return $mpdf->OverWrite(__DIR__.'/test.pdf', $search, $replacement, 'F', __DIR__.'/test_output.pdf' ) ;

@Sivustonikkari
Copy link

Sivustonikkari commented Jun 11, 2021 via email

@finwe
Copy link
Member

finwe commented Jun 11, 2021

Is the issue solved?

See #626 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests