# Session 4


## Example of writing plain text file: Diary logging


In [None]:
import datetime

content = input("What do you want to say to Mr. Diary? ")
if len(content) > 0:
    with open('diary.txt', "a") as file_obj:
        today = datetime.date.today().isoformat()
        file_obj.write(today + ": " + content + "\n")

with open('diary.txt', "r") as file_obj:
    lines = file_obj.readlines()
    for line in lines[-3:]:
        print(line.rstrip())


What do you want to say to Mr. Diary? Hello python.
2020-06-11: Hello
2020-06-11: Hello
2020-08-10: Hello python.


## Writing DOCX

We can use `python-docx` module to write content to DOCX.

First, we need to install the module by calling `pip install python-docx` once in terminal or in Jupyter.

In [None]:
pip install python-docx

Note: you may need to restart the kernel to use updated packages.


👆🏻🤔 If you’re wondering how the above line works. It is a command executed in command line prompt. But Jupyter is smart enough to parse the `pip install` command and execute it right inside the notebook.

In [None]:
import datetime
import docx
import os

content = input("What do you want to say to Mr. Diary? ")
if len(content) > 0:
    with open('diary.txt', "a") as file_obj:
        today = str(datetime.date.today())
        file_obj.write(today + ": " + content + "\n")

if os.path.isfile("diary.docx"):
    doc = docx.Document("diary.docx")
    print('Exisited')
else:
    doc = docx.Document()
    print('New')

input('Pause')

doc.add_paragraph(content)
doc.save("diary.docx")

print(f"{content} is written to diary.docx")

What do you want to say to Mr. Diary? Testing 100
Exisited
Pause
Testing 100 is written to diary.docx


## Reading DOCX file

Given that we have a DOCX file named `Sample Document.docx`. We can read all the paragrahs in the DOCX file.

In [None]:
var1 = 3
print(var1)

3


In [None]:
import docx

doc = docx.Document("Sample Document.docx")
print(doc.paragraphs)

[<docx.text.paragraph.Paragraph object at 0x000001A768293970>, <docx.text.paragraph.Paragraph object at 0x000001A768293880>, <docx.text.paragraph.Paragraph object at 0x000001A768293EE0>, <docx.text.paragraph.Paragraph object at 0x000001A768283B80>, <docx.text.paragraph.Paragraph object at 0x000001A768283AC0>, <docx.text.paragraph.Paragraph object at 0x000001A76960BD00>, <docx.text.paragraph.Paragraph object at 0x000001A76960B460>, <docx.text.paragraph.Paragraph object at 0x000001A76960BDC0>, <docx.text.paragraph.Paragraph object at 0x000001A76960BEB0>, <docx.text.paragraph.Paragraph object at 0x000001A76960BD90>, <docx.text.paragraph.Paragraph object at 0x000001A76960BB80>]


In [None]:
for p in doc.paragraphs:
    print(p.text)

Sample Document

This is a sample paragraph.

This is the second paragraph.

Here is the result

Summary

This is the summary of the sample report document.


## Reading tables in DOCX file

We can also read the tables and the content.

In [None]:
doc.tables[0].columns[0].cells[1].text

'2020-06-01'

The following code read the data row by row into 3 lists: `dates`, `morning_visitors`, `evening_visitors`.

In [None]:
table = doc.tables[0]

dates = []
morning_visitors = []
evening_visitors = []

for row in table.rows[1:]:
    dates.append(row.cells[0].text)
    morning_visitors.append(int(row.cells[1].text))
    evening_visitors.append(int(row.cells[2].text))

evening_visitors

[17, 16, 16, 15, 16, 17, 18]

In [None]:
type(table.columns[0])

docx.table._Column

In [None]:
table = doc.tables[0]

dates = []
morning_visitors = []
evening_visitors = []

for c in table.columns[0].cells[1:]:
    dates.append(c.text)

for c in table.columns[1].cells[1:]:
    morning_visitors.append(c.text)

for c in table.columns[2].cells[1:]:
    evening_visitors.append(c.text)

dates
morning_visitors

['23', '25', '24', '26', '25', '24', '23']

In [None]:
morning_visitors

[23, 25, 24, 26, 25, 24, 23]

In [None]:
sum(morning_visitors)

170

In [None]:
evening_visitors

[17, 16, 16, 15, 16, 17, 18]

In [None]:
sum(morning_visitors) + sum(evening_visitors)

285

### Exercise: Spliting a story  

You will find a story.txt in your folder. It contains 12 chapters, try to split each chapter in to a text file.  

e.g.: `Chapter 1 The Mysterious Key.txt` contains the chapter 1 content.

In [None]:
# Hints:
# 1. You can in to determine if a substring is contained
str1 = 'Chapter 1: Hello World!'
print('Plate' in str1)

# 2. You can use slicing among list and string
print(str1[0:10])

# 3. replace character
str1 = 'Chapter 1: Hello World!'
str1 = str1.replace(':','')
print(str1)

False
Chapter 1:
Chapter 1 Hello World!


# Exercise:  

https://www.dsat.gov.mo/dsat/news.aspx  

請從交通事務局的新聞網站取得最新新聞列表。並按要求儲存成 Word DOCX 檔案。  
包括第一版的所有新聞，當中日期從 DD-MM-YYYY 改為 YYYY-MM-DD  
標題中的空格及斜號(與倘有的特殊符號，請使用底線替代)  

輸出的 docx 中只考慮是否內容齊備：標題及內容。不考慮顏色樣式等。

# Markdown 轉 docx

製作一個文字生成 DOCX 的簡單轉換器。 在這個版本中，我們不希望實現所有 Markdown 功能。 我們只要求能轉換大標題、文字段落、及分頁。   


這是一種輕量化純文字格式，例如當一行起始為 # 時，則表示為標題。當一行是 --- 或 ---- 時，則表示分頁符。其他文字每行則為段落。


而段落有個比較特別的規則，就是一個跳行不當為段落，而且一個跳行不起任何作用，即在輸出的 Word 中不會跳行。而兩個跳行的（即有空行的）才計算為段落。


In [None]:
import docx

# Create an instance of a word document
doc = docx.Document()

# Add a Title to the document
doc.add_heading('GeeksForGeeks', 0)

# Adding a paragraph
doc.add_heading('Page 1:', 3)
doc.add_paragraph('GeeksforGeeks is a Computer Science portal for geeks.')

# Adding a page break
doc.add_page_break()

# Adding a paragraph
doc.add_heading('Page 2:', 3)
doc.add_paragraph('GeeksforGeeks is a Computer Science portal for geeks.')

# Now save the document to a location
doc.save('gfg.docx')