<a id='HOME'></a>
# CHAPTER 8 Data Has to Go Somewhere
## 資料的歸宿

* [8.1 資料的輸出與輸入](#IO)
* [8.2 結構化資料檔案](#StructuredText)
* [8.3 結構化二進位檔案](#StructuredBinary)
* [8.4 關聯式資料庫](#RelationalDatabases)
* [8.5 NoSQL資料庫](#NoSQL)
* [8.6 全文檢索資料庫](#Full-TextDatabases)



---
<a id='IO'></a>
## 8.1 資料的輸出與輸入
[回目錄](#HOME)

```python
fileobj = open(filename, mode)
```

|mode 第一個字母|解釋|
|:---:|----|
|r |表示讀模式|
|w |表示寫模式。如果文件不存在則新創建，如果存在則重寫新內容|
|x |表示在文件不存在的情況下新創建並寫文件|
|a |表示如果文件存在，在文件末尾追加寫內容|


|mode 第二個字母|解釋|
|:---:|----|
|t|（或者省略）代表文本類型|
|b| 代表二進位文件|


### 8.1.1 使用write()與print()寫入檔案


使用print寫入時可以使用sep與end參數指定分隔符號與結束符號
* sep，預設為' '
* end，預設為'\n'

In [1]:
abc = 'abc'
defg = 'defg'

#wirte寫入
fout = open('Data/relativity', 'wt')
fout.write('{0}-{1}。'.format(abc, 'XD'))
fout.write(defg)
fout.close()

#print寫入(預設)
fout = open('Data/relativity', 'wt')
print(abc, defg, file=fout)
fout.close()

##print寫入(更換分隔與結束符號)
fout = open('Data/relativity', 'wt')
print(abc, defg, sep='-', end='。', file=fout)
fout.close()

In [2]:
#分段寫入
poem = '''There was a young lady named Bright,
Whose speed was far faster than light;
She started one day
In a relative way,
And returned on the previous night.'''
size = len(poem)

fout = open('Data/relativity', 'wt')

offset = 0
chunk = 100
while True:
    if offset > size:
        fout.close()
        break
    fout.write(poem[offset:offset+chunk])
    offset += chunk
    

#open模式改為x可以防止覆蓋已存在之文件
try:
    fout = open('Data/relativity', 'xt')
    fout.write('stomp stomp stomp')
except FileExistsError:
    print('文件已存在。')

文件已存在。


### 8.1.2使用read()，readline()或者readlines()讀取檔案

* __fin.read()__，一次讀入全部，或是指定讀入字節數，注意記憶體占用情況
* __fin.readline()__，一次讀入一行
* __fin.readlines()__，疊代器用法，寫法更好看

In [3]:
#一次讀入
fin = open('Data/relativity', 'rt' )
poem = fin.read()
fin.close()
print(poem)


#指定一次100字節
print('\n================')
poem = ''
fin = open('Data/relativity', 'rt' )
chunk = 100
while True:
    fragment = fin.read(chunk)
    if not fragment:
        fin.close()
        break
    poem += fragment

print(poem)

#使用readline一次讀入一行
print('\n================')
poem = ''
fin = open('Data/relativity', 'rt' )
while True:
    line = fin.readline()
    if not line:
        fin.close()
        break
    poem += line

print(poem)

#使用readlines疊代器
print('\n================')
fin = open('Data/relativity', 'rt' )
lines = fin.readlines()
fin.close()
print('共', len(lines), '行')

for line in lines:
    print(line, end='。')

There was a young lady named Bright,
Whose speed was far faster than light;
She started one day
In a relative way,
And returned on the previous night.

There was a young lady named Bright,
Whose speed was far faster than light;
She started one day
In a relative way,
And returned on the previous night.

There was a young lady named Bright,
Whose speed was far faster than light;
She started one day
In a relative way,
And returned on the previous night.

共 5 行
There was a young lady named Bright,
。Whose speed was far faster than light;
。She started one day
。In a relative way,
。And returned on the previous night.。

### 8.1.3 寫入二進位檔案

In [4]:
bdata = bytes(range(0, 256))
#print(bdata)

#一次寫入
fout = open('Data/bfile', 'wb')
fout.write(bdata)
fout.close()

#批次寫入
fout = open('Data/bfile', 'wb')
size = len(bdata)
offset = 0
chunk = 100
while True:
    if offset > size:
        fout.close()
        break
    fout.write(bdata[offset:offset+chunk])
    offset += chunk


### 8.1.4 讀取二進位檔案

In [5]:
fin = open('Data/bfile', 'rb')
bdata = fin.read()
fin.close()

print(bdata)

b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'


### 8.1.5使用with自動關閉檔案

python若後續沒有再繼續使用檔案則會自動關閉，所以在一個function中開啟檔案就算沒有關閉最後也會自動關閉。  
但是在一個主程式中開啟則不會自動關閉，所以可以使用with as來做為自動關閉檔案用

In [1]:
with open('Data/relativity', 'wt') as fout:
    fout.write(poem)

NameError: name 'poem' is not defined

### 使用seek()改變位置

__file.tell()__可以查詢讀取位置
seek(offset,origin)

origin = 0，預設，從頭開始位移。  
origin = 1，從目前位置開始位移。  
origin = 2，從最後往前為位移。

In [7]:
fin = open('Data/bfile', 'rb')
fin.seek(255)

bdata = fin.read()
print(len(bdata))
print(bdata[0])
fin.close()

#使用不同方法讀取最後一個字節
fin = open('Data/bfile', 'rb')

fin.seek(-1, 2)
bdata = fin.read()
print(len(bdata))
print(bdata[0])
fin.close()

#
fin = open('Data/bfile', 'rb')
fin.seek(254, 0)
fin.tell()
fin.seek(1, 1)
fin.tell()

bdata = fin.read()
print(bdata[0])

1
255
1
255
255


---
<a id='StructuredText'></a>
## 8.2 結構化資料檔案
[回目錄](#HOME)

* CSV
* XML
* HTML
* JSON

In [2]:
import csv
villains = [
    ['Doctor', 'No'],
    ['Rosa', 'Klebb'],
    ['Mister', 'Big'],
    ['Auric', 'Goldfinger'],
    ['Ernst', 'Blofeld']]

# 寫入
with open('Data/villains', 'wt') as fout:
    csvout = csv.writer(fout)
    csvout.writerows(villains)

# 讀取
with open('Data/villains', 'rt') as fin:
    cin = csv.reader(fin)
    villains = [row for row in cin]
print(villains)

# 讀成字典
with open('Data/villains', 'rt') as fin:
    cin = csv.DictReader(fin, fieldnames=['first', 'last'])
    villains = [row for row in cin]
print(villains)

# 用字典格式寫入
villains = [{'first': 'Doctor', 'last': 'No', 'last33': 'Blofeld'},
    {'first': 'Rosa', 'last': 'Klebb', 'last33': 'Blofeld'},
    {'first': 'Mister', 'last': 'Big', 'last33': 'Blofeld'},
    {'first': 'Auric', 'last': 'Goldfinger', 'last33': 'Blofeld'},
    {'first': 'Ernst', 'last': 'Blofeld', 'last33': 'Blofeld'}]

with open('Data/villains', 'wt') as fout:
#     cout = csv.DictWriter(fout, ['first', 'last'])
    cout = csv.DictWriter(fout, ['first', 'last','last33'])
    cout.writeheader()  #寫檔頭
    cout.writerows(villains)
    
with open('Data/villains', 'rt') as fin:
    cin = csv.DictReader(fin)
    villains = [row for row in cin]
print(villains)

[['Doctor', 'No'], [], ['Rosa', 'Klebb'], [], ['Mister', 'Big'], [], ['Auric', 'Goldfinger'], [], ['Ernst', 'Blofeld'], []]
[OrderedDict([('first', 'Doctor'), ('last', 'No')]), OrderedDict([('first', 'Rosa'), ('last', 'Klebb')]), OrderedDict([('first', 'Mister'), ('last', 'Big')]), OrderedDict([('first', 'Auric'), ('last', 'Goldfinger')]), OrderedDict([('first', 'Ernst'), ('last', 'Blofeld')])]
[OrderedDict([('first', 'Doctor'), ('last', 'No'), ('last33', 'Blofeld')]), OrderedDict([('first', 'Rosa'), ('last', 'Klebb'), ('last33', 'Blofeld')]), OrderedDict([('first', 'Mister'), ('last', 'Big'), ('last33', 'Blofeld')]), OrderedDict([('first', 'Auric'), ('last', 'Goldfinger'), ('last33', 'Blofeld')]), OrderedDict([('first', 'Ernst'), ('last', 'Blofeld'), ('last33', 'Blofeld')])]


In [3]:
import xml.etree.ElementTree as et
tree = et.ElementTree(file='Data/menu.xml')
root = tree.getroot()
print(root.tag)

for child in root:
    #print('tag:', child.tag, 'attributes:', child.attrib)
    for grandchild in child:
        print('\ttag:', grandchild.tag, 'attributes:', grandchild.text)
        print('\ttag:', grandchild.tag, 'attributes:', grandchild.attrib)
        
        
print(len(root))
print(len(root[0]))

menu
	tag: item attributes: breakfast burritos
	tag: item attributes: {'price': '$6.00'}
	tag: item attributes: pancakes
	tag: item attributes: {'price': '$4.00', 'XDD': 'QQ'}
	tag: item attributes: hamburger
	tag: item attributes: {'price': '$5.00'}
	tag: item attributes: spaghetti
	tag: item attributes: {'price': '8.00'}
3
2


In [10]:
menu = \
{
    "breakfast": {
        "hours": "7-11",
        "items": {
            "breakfast burritos": "$6.00",
            "pancakes": "$4.00"
        }
    },
    "lunch" : {
        "hours": "11-3",
        "items": {
            "hamburger": "$5.00"
        }
    },
    "dinner": {
        "hours": "3-10",
        "items": {
            "spaghetti": "$8.00"
        }
    }
}

import json
menu_json = json.dumps(menu)
print(type(menu_json))
# print(menu_json['breakfast'])

menu2 = json.loads(menu_json)
print(menu2)
print(menu2['breakfast'])

<class 'str'>
{'lunch': {'items': {'hamburger': '$5.00'}, 'hours': '11-3'}, 'breakfast': {'items': {'breakfast burritos': '$6.00', 'pancakes': '$4.00'}, 'hours': '7-11'}, 'dinner': {'items': {'spaghetti': '$8.00'}, 'hours': '3-10'}}
{'items': {'breakfast burritos': '$6.00', 'pancakes': '$4.00'}, 'hours': '7-11'}


---
<a id='StructuredBinary'></a>
## 8.3 結構化二進位檔案
[回目錄](#HOME)

---
<a id='RelationalDatabases'></a>
## 8.4 關聯式資料庫 SQL
[回目錄](#HOME)


---
<a id='NoSQL'></a>
## 8.5 NoSQL資料庫
[回目錄](#HOME)




---
<a id='Full-TextDatabases'></a>
## 8.6 全文檢索資料庫
[回目錄](#HOME)

