/
utl_covert_pdf_tables_to_sas_tables.sas
141 lines (115 loc) · 4.2 KB
/
utl_covert_pdf_tables_to_sas_tables.sas
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
SAS/R: Coverting PDF tables to SAS datasets (simple example)
WORKING CODE
WPS/PROC-R - could use IML/R
file <- "d:/pdf/class.pdf";
Rpdf <- readPDF(control = list(text = "-layout"));
corpus <- VCorpus(URISource(file),
readerControl = list(reader = Rpdf));
classtext <- as.data.frame(content(content(corpus)[[1]])); ** first table;
see
SAS Forum: PDF to SAS Dataset
https://communities.sas.com/t5/Base-SAS-Programming/PDF-to-SAS-Dataset/m-p/401636
WPS/SAS/R: Coverting PDF tables to SAS datasets (simple example)
There are more options in the TM (text mining package)
HAVE ( PDF file with the table below)
======================================
NAME SEX AGE HEIGHT WEIGHT
Alfred M 14 69 112.5
Alice F 13 56.5 84
Barbara F 13 65.3 98
Carol F 14 62.8 102.5
Henry M 14 63.5 102.5
James M 12 57.3 83
Jane F 12 59.8 84.5
Janet F 15 62.5 112.5
Jeffrey M 13 62.5 84
John M 12 59 99.5
Joyce F 11 51.3 50.5
Judy F 14 64.3 90
Louise F 12 56.3 77
Mary F 15 66.5 112
Philip M 16 72 150
Robert M 12 64.8 128
Ronald M 15 67 133
Thomas M 11 57.5 85
William M 15 66.5 112
WANT (SAS dataset)
===================
Up to 40 obs from sashelp.class total obs=19
Obs NAME SEX AGE HEIGHT WEIGHT
1 Alfred M 14 69 112.5
2 Alice F 13 56.5 84
3 Barbara F 13 65.3 98
4 Carol F 14 62.8 102.5
5 Henry M 14 63.5 102.5
6 James M 12 57.3 83
7 Jane F 12 59.8 84.5
8 Janet F 15 62.5 112.5
9 Jeffrey M 13 62.5 84
10 John M 12 59 99.5
11 Joyce F 11 51.3 50.5
12 Judy F 14 64.3 90
13 Louise F 12 56.3 77
14 Mary F 15 66.5 112
15 Philip M 16 72 150
16 Robert M 12 64.8 128
17 Ronald M 15 67 133
18 Thomas M 11 57.5 85
19 William M 15 66.5 112
WORKING CODE
============
file <- "d:/pdf/class.pdf";
Rpdf <- readPDF(control = list(text = "-layout"));
corpus <- VCorpus(URISource(file),
readerControl = list(reader = Rpdf));
classtext <- as.data.frame(content(content(corpus)[[1]]));
FULL SOLUTION
=============
* create a pdf;
title;footnote;
ods pdf file="d:/pdf/class.pdf";
proc print data=sashelp.class noobs;
run;quit;
ods pdf close;
* xpdf executables have to be in the path;
%utl_submit_wps64('
options set=R_HOME "C:/Program Files/R/R-3.3.2";
libname wrk "%sysfunc(pathname(work))";
proc r;
submit;
source("C:/Program Files/R/R-3.3.2/etc/Rprofile.site", echo=T);
library("tm");
library("slam");
file <- "d:/pdf/class.pdf";
Rpdf <- readPDF(control = list(text = "-layout"));
corpus <- VCorpus(URISource(file),
readerControl = list(reader = Rpdf));
array <- as.data.frame(content(content(corpus)[[1]]));
colnames(array)<-"lines";
endsubmit;
import r=array data=wrk.array;
run;quit;
');
proc print data=array(where=(lines ne ' ')) width=min;
run;quit;
Obs LINES
1 NAME SEX AGE HEIGHT WEIGHT
3 Alfred M 14 69.0 112.5
5 Alice F 13 56.5 84.0
7 Barbara F 13 65.3 98.0
9 Carol F 14 62.8 102.5
11 Henry M 14 63.5 102.5
13 James M 12 57.3 83.0
15 Jane F 12 59.8 84.5
17 Janet F 15 62.5 112.5
19 Jeffrey M 13 62.5 84.0
21 John M 12 59.0 99.5
23 Joyce F 11 51.3 50.5
25 Judy F 14 64.3 90.0
27 Louise F 12 56.3 77.0
29 Mary F 15 66.5 112.0
31 Philip M 16 72.0 150.0
33 Robert M 12 64.8 128.0
35 Ronald M 15 67.0 133.0
37 Thomas M 11 57.5 85.0
39 William M 15 66.5 112.0