-
Notifications
You must be signed in to change notification settings - Fork 1
/
wordcloud.Rmd
63 lines (58 loc) · 3.01 KB
/
wordcloud.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
---
title: "wordcloud"
output: html_document
---
# Class Assignment 2.2 - Word Cloud
### _Date: 11/7/17_
#####今天我們要繪製一個文字云,來萃取一些文章的精髓。我選了中國近代傳奇企業家,馬雲的退休演講詞。我們就來一起看看他大概都是在講什麼吧!
#####Today we are going to draw a word cloud to capture the essence of some articles. I chose the modern legendary entrepreneur, Jack Mah's resignation speech. Let us take a look at what he said together!
####開啟R包裹 (Open R packages)
```{r cars}
library(jiebaR)
library(wordcloud2)
library(magrittr)
Sys.setlocale("LC_ALL", locale = "Chinese")
```
####匯入文字切割器函數 (Insert the phrase identifier function)
```{r}
wk <- worker()
```
####匯入文字 (import text)
```{r}
text <- readLines("Mayunspeech1.txt", encoding = "UTF-8")
```
####找出無意義的單字 (Identify meaningless words into a vector)
```{r}
stopwords <- c("我","的","在","和","都","也","不","就","上","为","我们","你们","他们","这","从","是",
"会","个","有","像","把","其实","了", "谁","所","但", "更", "很","什么", "一","因", "让",
"好", "每", "多", "得", "当", "将","座", "对", "没", "你","到")
```
####建立清楚單字的二步驟 (Create the two step procedure to rid the meanless words)
先是將stopwords轉成一個長的OR string,這樣如果找到其中一個詞就會被刪掉。接下來就把他套到刪除函數。
First is to turn stopwords into a long OR string, next is to import the string into the substr() function.
```{r}
stopwords.pattern <- paste0(stopwords, sep = "|", collapse = "") %>%
substr(1, nchar(.) -1)
```
####消滅計劃開始 (Pull the trigger)
```{r}
text <- gsub(stopwords.pattern, " ", text)
```
####切割機啟動(Activate the slicer)
```{r}
speech <- wk[text]
```
####將詞彙都存到一個資料框 (Save the phrases/words into a dataframe)
```{r}
speech <- as.data.frame(table(speech))
```
####整理一下資料框將出現率最高的排第一 (Order the dataframe such that the most frequent word comes first.)
```{r}
freq <- speech[order(speech$Freq, decreasing = T),]
```
####繪製文字云 (Draw the wordcloud)
```{r}
wordcloud2(freq, size = 1.5, color = "random-light", backgroundColor = "grey")
```
#### 我們從這裡可以看到最大的依舊是人。即使科技再發達,他的初衷依然是造福人群,解決世界的問題,帶來變化。他也提到了要他相信年輕人,覺得他們一定會做得比他好。我們這些年輕人就好好努力學好電腦這一技之長,把握未來的種種機會!
#### We can see that the largest mention is the word human. The origins of technology is to help people, resolve the problems of the world, and bring about change. He also mentioned that he believes that the next generation will do it better than he can. Truly inspiring, There is a future for us, we just need to work hard and develope a strong skillset. Fling ourselves into the opportunities we can get!