-
Notifications
You must be signed in to change notification settings - Fork 534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to find x and y coordinates of a text in PDF #418
Comments
Yes. Here is an example. |
Thanks for reply. I have installed it on my server and then tested it. I am using the following code to extract X and Y coordinates of all words in a PDF file: <?php
// Include Composer autoloader if not already done.
include 'vendor/autoload.php';
// Parse pdf file and build necessary objects.
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseFile('result_sheet_format_llb_1.pdf');
$pages = $pdf->getPages();
$page = $pages[0];
$dataTm = $page->getDataTm();
echo("<pre>");
var_dump($dataTm);
echo("</pre>");
?> It is working and you can see its output at the URL https://glug4muz.org/php/pdfparser/parse.php But there are two small problems that I would like to get rid of. At this moment I am using the package https://github.com/measuresforjustice/textricator to extract X and Y coordinates of each word of a PDF file. I am using this tool to extract all words and corresponding X/Y coordinates in the form of a CSV file. Then I am using importing that CSV file into a MySQL database. Then I am reading the values of X/Y coordinates using PHP script. In case of this PHP script that is pdfparser, the X coordinate of each word is same as that received using the tool textricator but there is difference in the value of Y coordinate. For example the value of Y coordinate of the word $roll_no as per textricator is 115.216255 but as per pdfparser is 721.786 but X coordinate of the same word using both tools is 48.503. Can you please tell me why it is so? Further pdfparser is not returning the X/Y coordinate of every distinct word. Sometimes it is combining two or more words. For example as you can see that it is returning X/Y coordinates of two words $held_month $held_year as combined at https://glug4muz.org/php/pdfparser/parse.php. Why it is not returning separate X/Y coordinates of $held_month and $held_year? Can you please explain how these two issues can be resolved? Then I will integrate this tool in my project. Best Regards ... |
I have the same problem, I believe that using tables (I convert html to PDF) messes with |
This is what I found out, you need to do the conversion from Point to mm before you use SetXY. Point to mm conversion refer to this link below:
|
This solution work to me |
Hi,
Is it possible to get the value of X and Y coordinates of a text in a PDF file using this library? If yes then please send me a sample code.
Best Regards ...
Pankaj Kumar
The text was updated successfully, but these errors were encountered: