Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No spaces inserted when cursor is moved horizontally #76

Closed
archonia-chris opened this issue Aug 7, 2015 · 3 comments
Closed

No spaces inserted when cursor is moved horizontally #76

archonia-chris opened this issue Aug 7, 2015 · 3 comments

Comments

@archonia-chris
Copy link

I have a pdf that shows text that is aligned in columns (probably a pdf export of an excel file):

First Header Second Header
Cell 1 Cell 2
Cell 3 Cell 3

There is no character spacing between cell 1 and cell 2 causing the text to be output as:

First headerSecond header
Cell 1Cell 2
Cell 3Cell 4

The spacing in the pdf is apparently done using the Text Matrix "Tm" command. I adjusted the "Tm" handling to take a change in the x coordinate into consideration to get spacing between the cells.

Object.php / getText ->

case 'Tm':
$args = preg_split('/\s/s', $command[self::COMMAND]);
$y = array_pop($args);
$x = array_pop($args);

// ADDED START
if ($current_position_tm['x'] !== false) {
$delta = abs(floatval($x) - floatval($current_position_tm['x']));
if ($delta > 10) {
$text .= " ";
}
}
// ADDED STOP

if ($current_position_tm['y'] !== false) {
$delta = abs(floatval($y) - floatval($current_position_tm['y']));
if ($delta > 10) {
$text .= "\n";
}
}
$current_position_tm = array('x' => $x, 'y' => $y);
break;

Would that be a correct way to fix the problem?

@Stichoza
Copy link

@archonia You saved my day! Thanks! 👍 Open a pull request with this change, I think @smalot will merge it. Or if you're not interested in this package more, I have already forked this repo with changes committed so I can send a PR if you're not against. I mean you're the author of this fix, so I don't want to "steal" your PR without asking 😄

Stichoza added a commit to stichoza-forks/pdfparser that referenced this issue Nov 12, 2015
@archonia-chris
Copy link
Author

@Stichoza Feel free to submit a pull-request. I'm still using this package but am not so fluent with github. I'm also not sure what delta should be to justify insertion of a space (10 worked for the file I was trying to parse). Inserting a tab is another option.

@rubenvanerk
Copy link
Contributor

Fixed by #97

@j0k3r j0k3r closed this as completed Jun 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants