Using the following sasv9.cfg value with PC installed windows w/ languages

-config "C:\Program Files\SASHome\SASFoundation\9.4\nls\zh\sasv9.cfg"

In [1]:
from IPython.display import HTML 
import saspy
In [2]:
sasz = saspy.SASsession(cfgname='winlocal', encoding='gb2312')
sasz
SAS Connection established. Subprocess id is 16080

Out[2]:
Access Method         = IOM
SAS Config name       = winlocal
WORK Path             = /
SAS Version           = 9.04.01M5P09132017
SASPy Version         = 2.2.7
Teach me SAS          = False
Batch                 = False
Results               = Pandas
SAS Session Encoding  = EUC-CN
Python Encoding value = gb2312
In [3]:
cars = sasz.sasdata('cars', 'sashelp', results='html')
In [4]:
sasz.batch=True
x = cars.head()['LST']
sasz.batch=False

this ODS comes to me as utf-8 when running SAS in Chinese via local IOM

In [5]:
len(x.encode())
Out[5]:
37168
In [6]:
print(x[:256])
<!DOCTYPE html>
<html lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8"/>
<meta content="SAS 9.4" name="generator"/>
<title>SAS Output</title>
<style>
/*<![CDATA[*/
.body.c > table, .body.c > pre, .body.c di

this is the same cars ODS output but from a DMS session of the same SAS in Chinese

In [7]:
fd = open(r'C:\Users\sastpw\AppData\Local\Temp\SAS Temporary Files\_TD4288_d10a626_\sashtml.htm', 'rb')
In [8]:
bin1 = fd.read()
In [9]:
fd.close()
In [10]:
len(bin1)
Out[10]:
45150
In [11]:
print(bin1[:256].decode())
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta name="Generator" content="SAS Software Version 9.4, see www.sas.com">
<meta http-equiv="Content-type" content="text/html; charset=GBK">
<title>SAS Output</title>
<sty

The charset is not utf-8, but Chinese! Same error due to the translated ODS template

In [12]:
bin1.decode()
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-12-728ee91f2a64> in <module>()
----> 1 bin1.decode()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xca in position 41581: invalid continuation byte
In [13]:
bin1[41560:41600]
Out[13]:
b'ry="Procedure Print: \xca\xfd\xbe\xdd\xbc\xaf SASHELP.CARS'
In [14]:
bin1[41560:41600].decode()
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-14-5c198b354fab> in <module>()
----> 1 bin1[41560:41600].decode()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xca in position 21: invalid continuation byte
In [15]:
bin1[41560:41600].decode(errors='replace')
Out[15]:
'ry="Procedure Print: ���ݼ� SASHELP.CARS'
In [16]:
bin1[41560:41600].decode('gb2312')
Out[16]:
'ry="Procedure Print: 数据集 SASHELP.CARS'

If I display this w/out tweaking the html below, it messes up the whole page due to the font. This doesn't change anything else or have any bearing on the encoding or content.

In [17]:
#HTML(bin1.decode('gb2312'))
In [18]:
char1 = bin1.decode('gb2312')
In [19]:
char1 = char1.replace(chr(12), chr(10)).replace('<body class="c body">',
                                                    '<body class="l body">').replace("font-size: x-small;",
                                                                                     "font-size:  normal;")

Now we can display the html w/out messing up the whole page.

In [20]:
HTML(char1)
Out[20]:
SAS Output
SAS 系统

Obs Make Model Type Origin DriveTrain MSRP Invoice EngineSize Cylinders Horsepower MPG_City MPG_Highway Weight Wheelbase Length
1 Acura MDX SUV Asia All $36,945 $33,337 3.5 6 265 17 23 4451 106 189
2 Acura RSX Type S 2dr Sedan Asia Front $23,820 $21,761 2.0 4 200 24 31 2778 101 172
3 Acura TSX 4dr Sedan Asia Front $26,990 $24,647 2.4 4 200 22 29 3230 105 183
4 Acura TL 4dr Sedan Asia Front $33,195 $30,299 3.2 6 270 20 28 3575 108 186
5 Acura 3.5 RL 4dr Sedan Asia Front $43,755 $39,014 3.5 6 225 18 24 3880 115 197

In [21]:
cars.head()
SAS Output

SAS 系统

Obs Make Model Type Origin DriveTrain MSRP Invoice EngineSize Cylinders Horsepower MPG_City MPG_Highway Weight Wheelbase Length
1 Acura MDX SUV Asia All $36,945 $33,337 3.5 6 265 17 23 4451 106 189
2 Acura RSX Type S 2dr Sedan Asia Front $23,820 $21,761 2.0 4 200 24 31 2778 101 172
3 Acura TSX 4dr Sedan Asia Front $26,990 $24,647 2.4 4 200 22 29 3230 105 183
4 Acura TL 4dr Sedan Asia Front $33,195 $30,299 3.2 6 270 20 28 3575 108 186
5 Acura 3.5 RL 4dr Sedan Asia Front $43,755 $39,014 3.5 6 225 18 24 3880 115 197
In [ ]: