## 作業說明

Day10 我們介紹了 XML 檔案格式並學習如何用 Python 操作 XML 格式檔案

今日作業我們針對 xml 套件來進行練習

請嘗試填空以下代碼，獲得指定結果：

# 請先上傳 `day10-example_data.xml` 到目錄
![](https://i.imgur.com/5K6baaX.png)

In [1]:
import xml.etree.ElementTree as ET
import pandas as pd

### 檔案讀取
- 使用 `open()`

In [2]:
filename = "day10-example_data.xml"
with open(filename, 'r') as f:
    root = ET.fromstring(f.read())
root

<Element 'book' at 0x7fe58837bbf0>

### 格式轉換
- 從 `str` 轉為 `xml Element`
  - 使用 `.fromstring()`




In [3]:
xml_string = """
<store name="bookstore">
  <book lang="en" id="gio" xmlns:xi="http://www.w3.org/2003/XInclude">
    <bookinfo>
      <title>GIO Reference Manual</title>
      <releaseinfo>
        The latest version of this documentation can be found on-line at
        <ulink role="online-location" url="https://developer.gnome.org/gio/unstable/">https://developer.gnome.org/gio/unstable/</ulink>.
      </releaseinfo>
    </bookinfo>
  </book>
</store>
"""

root = ET.fromstring(xml_string)
root

<Element 'store' at 0x7fe515e487c0>

### 取得特定節點內容
- 取得 `<title>` 標籤內的文字

In [4]:
root.find("book/bookinfo/title").text

'GIO Reference Manual'

- 取得 `<releaseinfo>` 內的 `<ulink>` 中的 `role` 屬性

In [5]:
root.find("book/bookinfo/releaseinfo/ulink").get('role')

'online-location'

### 新增節點屬性
- 在 `<releaseinfo>` 中新增 `publish_date` 的屬性

In [6]:
release_info = root.find('book/bookinfo/releaseinfo')
release_info.set('publish_date', '1999-01-01')
print(root.find('book/bookinfo/releaseinfo').get('publish_date'))

1999-01-01


### 修改特定節點屬性
- 將 `<ulink>` 屬性 `role` 改為 "offline-location"

In [7]:
ulink = root.find('book/bookinfo/releaseinfo/ulink')
ulink.set('role', 'offline-location')

In [8]:
root.find("book/bookinfo/releaseinfo/ulink").get('role')

'offline-location'

### 新增節點

In [9]:
book2 = ET.SubElement(root, 'book')
book2.set('id', 'hio')
book2.set('lang', 'en')

ET.dump(root)

<store name="bookstore">
  <book lang="en" id="gio">
    <bookinfo>
      <title>GIO Reference Manual</title>
      <releaseinfo publish_date="1999-01-01">
        The latest version of this documentation can be found on-line at
        <ulink role="offline-location" url="https://developer.gnome.org/gio/unstable/">https://developer.gnome.org/gio/unstable/</ulink>.
      </releaseinfo>
    </bookinfo>
  </book>
<book id="hio" lang="en" /></store>


In [10]:
bookinfo2 = ET.SubElement(book2, 'bookinfo')
# title2 及 releaseinfo2 都隸屬於 bookinfo2 節點內
title2 = ET.SubElement(bookinfo2, 'title')
title2.text = 'HIO Reference Manual'
releaseinfo2 = ET.SubElement(bookinfo2, 'releaseinfo')
releaseinfo2.set('publish_date', '1999-02-12')
releaseinfo2.text = "The latest version of this documentation can't be found online"

ET.dump(root)

<store name="bookstore">
  <book lang="en" id="gio">
    <bookinfo>
      <title>GIO Reference Manual</title>
      <releaseinfo publish_date="1999-01-01">
        The latest version of this documentation can be found on-line at
        <ulink role="offline-location" url="https://developer.gnome.org/gio/unstable/">https://developer.gnome.org/gio/unstable/</ulink>.
      </releaseinfo>
    </bookinfo>
  </book>
<book id="hio" lang="en"><bookinfo><title>HIO Reference Manual</title><releaseinfo publish_date="1999-02-12">The latest version of this documentation can't be found online</releaseinfo></bookinfo></book></store>


In [11]:
[item.get('publish_date') for item in root.findall('book/bookinfo/releaseinfo')]

['1999-01-01', '1999-02-12']

### 移除特定節點
- 使用 `root.remove(node)`

In [12]:
for book in root.findall('book'):
    pub_date = book.find('bookinfo/releaseinfo').get('publish_date')
    if pub_date < "1999-02-01":
        root.remove(book)

In [13]:
# 印出確認刪除節點後的樣貌
ET.indent(root)
ET.dump(root)

<store name="bookstore">
  <book id="hio" lang="en">
    <bookinfo>
      <title>HIO Reference Manual</title>
      <releaseinfo publish_date="1999-02-12">The latest version of this documentation can't be found online</releaseinfo>
    </bookinfo>
  </book>
</store>


### 存為 XML 格式
- 使用 `tree.write()`

In [14]:
output_filename = 'day10-output_data.xml'
tree = ET.ElementTree(root)
tree.write(output_filename, xml_declaration=True, encoding='UTF-8')

In [15]:
# 讀取確認寫出的檔案有效
with open(output_filename) as f:
    test = ET.XML(f.read())
    ET.dump(test)

<store name="bookstore">
  <book id="hio" lang="en">
    <bookinfo>
      <title>HIO Reference Manual</title>
      <releaseinfo publish_date="1999-02-12">The latest version of this documentation can't be found online</releaseinfo>
    </bookinfo>
  </book>
</store>


![](https://i.imgur.com/ogtU8zJ.png)