In [1]:
import data
import extension

In [4]:
tabulation = data.tabulation(path='resource/csv/data.csv')
tabulation.read()
print(tabulation.data.shape)
print(tabulation.data['keyword'].value_counts())

(498, 6)
melanoma        108
chest           107
diabetes         98
hypertension     97
covid-19         88
Name: keyword, dtype: int64


In [5]:
vocabulary = data.vocabulary()
vocabulary.build(sentence=tabulation.data['abstract'], title=tabulation.data['title'])

100%|██████████| 498/498 [00:01<00:00, 336.72it/s]


In [7]:
machine = extension.machine(vocabulary.word, vocabulary=vocabulary)
machine.build(what='model', by='SG', window=10, dimension=150, epoch=20)
machine.collect(title=["covid-19", "diabetes", "melanoma", "hypertension", 'chest'], top=30)
machine.reduce(method="MDS", dimension=2)

loss after epoch 0: 1439338.625
loss after epoch 1: 2525426.75
loss after epoch 2: 3534746.25
loss after epoch 3: 4455019.5
loss after epoch 4: 5252533.0
loss after epoch 5: 6033547.5
loss after epoch 6: 6801783.5
loss after epoch 7: 7553366.5
loss after epoch 8: 8290129.0
loss after epoch 9: 8921958.0
loss after epoch 10: 9531420.0
loss after epoch 11: 10133417.0
loss after epoch 12: 10730749.0
loss after epoch 13: 11321132.0
loss after epoch 14: 11902631.0
loss after epoch 15: 12477099.0
loss after epoch 16: 13047792.0
loss after epoch 17: 13618377.0
loss after epoch 18: 14187149.0
loss after epoch 19: 14758481.0


In [8]:
picture = machine.plot(skip='')
picture.write_html("SG-plot.html")
picture.show()

##  相鄰的兩字意味著彼此關係較接近，證據為何？用以下例子來嘗試說明。
- 相對靠近 chest 的是 tube 
- 相對靠近 borderlin 的是 obes
- 相對靠近 attach 的是 three-bottl  
- 相對靠近 ocular 的是 uveal 

In [13]:
##  Term document matrix.
weight = extension.weight(matrix=vocabulary.matrix)
matrix = weight.transform(what='tfidf')

In [25]:
comparison = matrix.loc[(matrix.index=='chest')|(matrix.index=='tube')].sum(axis=0)
text = tabulation.data.loc[tabulation.data['title']==comparison.idxmax()]['abstract'].item()
print('-'*10, 'abstract', '-'*10, '\n', text, '\n')
print('-'*10, 'abstract (tokenize + stemming + remove stopword)', '-'*10, '\n', vocabulary.tokenize(text))

---------- abstract ---------- 
 Since chest tubes have been routinely used to drain the pleural space, particularly after lung surgery, the management of chest tubes is considered to be essential for the thoracic surgeon. The pleural drainage system requires effective drainage, suction, and water-sealing. Another key point of chest tube management is that a water seal is considered to be superior to suction for most air leaks. Nowadays, the most common pleural drainage device attached to the chest tube is the three-bottle system. An electronic chest drainage system has been developed that is effective in standardizing the postoperative management of chest tubes. More liberal use of digital drainage devices in the postoperative management of the pleural space is warranted. The removal of chest tubes is a common procedure occurring almost daily in hospitals throughout the world. Extraction of the tube is usually done at the end of full inspiration or at the end of full expiration. The t

In [21]:
comparison = matrix.loc[(matrix.index=='borderlin')|(matrix.index=='obes')].sum(axis=0)
text = tabulation.data.loc[tabulation.data['title']==comparison.idxmax()]['abstract'].item()
print('-'*10, 'abstract', '-'*10, '\n', text, '\n')
print('-'*10, 'abstract (tokenize + stemming + remove stopword)', '-'*10, '\n', vocabulary.tokenize(text))

---------- 
 Cardiovascular risk in a patient with obesity hypertension increases with the extent of risk factor clustering. It is therefore important to determine the global risk of a patient with hypertension rather than to focus solely on blood pressure. Every hypertensive should be screened for other than blood pressure risk factors, target organ damage and concomitant diseases or accompanying clinical conditions. Assessment of blood pressure and target organ damage might be more difficult in obese hypertensives than in normal-weight patients. Intensive lifestyle interventions can reduce weight, and decrease blood pressure and cardiovascular risk in obese hypertensive patients. Current guidelines do not provide specific recommendation for pharmacological management of the hypertensive patients with obesity. Recent trials have consistently shown that therapy involving beta-blockers and diuretics may induce more new-onset diabetes compared with other combination therapies. Several li

In [22]:
comparison = matrix.loc[(matrix.index=='attach')|(matrix.index=='three-bottl')].sum(axis=0)
text = tabulation.data.loc[tabulation.data['title']==comparison.idxmax()]['abstract'].item()
print('-'*10, 'abstract', '-'*10, '\n', text, '\n')
print('-'*10, 'abstract (tokenize + stemming + remove stopword)', '-'*10, '\n', vocabulary.tokenize(text))

---------- 
 Since chest tubes have been routinely used to drain the pleural space, particularly after lung surgery, the management of chest tubes is considered to be essential for the thoracic surgeon. The pleural drainage system requires effective drainage, suction, and water-sealing. Another key point of chest tube management is that a water seal is considered to be superior to suction for most air leaks. Nowadays, the most common pleural drainage device attached to the chest tube is the three-bottle system. An electronic chest drainage system has been developed that is effective in standardizing the postoperative management of chest tubes. More liberal use of digital drainage devices in the postoperative management of the pleural space is warranted. The removal of chest tubes is a common procedure occurring almost daily in hospitals throughout the world. Extraction of the tube is usually done at the end of full inspiration or at the end of full expiration. The tube removal techniqu

In [23]:
comparison = matrix.loc[(matrix.index=='ocular')|(matrix.index=='uveal')].sum(axis=0)
text = tabulation.data.loc[tabulation.data['title']==comparison.idxmax()]['abstract'].item()
print('-'*10, 'abstract', '-'*10, '\n', text, '\n')
print('-'*10, 'abstract (tokenize + stemming + remove stopword)', '-'*10, '\n', vocabulary.tokenize(text))

---------- 
 Ocular melanomas correspond to 5% of all melanomas and 85% of them have its origin in the uveal tract. Uveal melanoma is the most common primary intraocular malignant tumor in the adult. In this article, a case of uveal melanoma in a 31 year-old female patient, with photopsia, hyperemia and low visual acuity in the left eye with evolution of 4 months is presented. In the ophthalmologic examination, visual acuity was lower than 20/400, a large tumoral mass was noted at the nasal region behind the iris with anterior lens displacement, anterior chamber narrowing and serous retinal detachment. The ocular echography suggested a large tumoral mass as a choroidal melanoma extending to the ciliary body. The confirmation diagnosis was possible through the histopathologic examination.
---------- 
 ['ocular', 'melanoma', 'correspond', '5', 'melanoma', '85', 'origin', 'uveal', 'tract', 'uveal', 'melanoma', 'common', 'primari', 'intraocular', 'malign', 'tumor', 'adult', 'thi', 'articl'