## Topic Modeling

Topic Modelling is one of the popular technique in NLP  which is used to determine what topics are present in the given corpus. 

I will be using <b>Latent Dirichlet Allocation</b> abbreviated as <b>LDA</b> to perform topic modeling.

In topic modelling, the order does not matter, hence, <b>Document Term Matrix</b> is used instead of corpus. 

Process:  DTM --> Gensim(LDA) --> Topics

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import time
from datetime import datetime
import calendar
import nltk
import pickle

from gensim import matutils, models
import scipy.sparse

from nltk import word_tokenize, pos_tag
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\srija\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\srija\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


True

##### Loading Data

In [2]:
data_clean = pd.read_pickle('data/data_clean.pkl')
data_clean.head()

Unnamed: 0,content,date,retweets,favorites
0,be sure to tune in and watch trump on late nig...,2009-05-04 13:54:25,510,917
1,trump will be on the view tomorrow morning to ...,2009-05-04 20:00:10,34,267
2,trump top ten financial on late show with very...,2009-05-08 08:38:08,13,19
3,new post celebrity apprentice finale and learn...,2009-05-08 15:40:15,11,26
4,my persona will never be that of a wallflower ...,2009-05-12 09:07:28,1375,1945


##### Loading Yearly DTM

In [3]:
dtm_yearly = pd.read_pickle('data/dtm_yearly.pkl')
dtm_yearly.head()

Unnamed: 0_level_0,abandon,abandoned,abbas,abhor,abide,abiding,ability,abject,able,abnormally,...,zac,zeal,zee,zero,zimbabwe,zip,zone,zoning,zoo,zoom
year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2009,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
2010,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2011,0,2,0,0,0,0,1,0,0,0,...,0,0,0,1,0,0,0,0,0,0
2012,1,3,0,0,0,0,3,1,13,0,...,0,0,0,14,0,0,0,0,0,0
2013,2,8,0,1,0,0,8,0,17,1,...,0,1,1,21,0,0,4,0,0,0


##### Building term document matrix

In [4]:
# Required a term document matrix which is transpose of document term matrix

tdm_yearly = dtm_yearly.transpose()
tdm_yearly.head()

year,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
abandon,0,0,0,1,2,0,1,2,0,3,0,2
abandoned,0,0,2,3,8,0,1,1,1,5,2,1
abbas,0,0,0,0,0,0,0,0,1,0,0,0
abhor,0,0,0,0,1,0,0,0,0,0,0,0
abide,0,0,0,0,0,1,0,0,0,0,0,0


### Attempt 1: With all the words

For Latent Drichelet Algorithm term document matrix has to be converted into sparse matrix then into a specific gensim corpus

In [5]:
# Creating sparse matrix using term document matrix
sparse_counts = scipy.sparse.csr_matrix(tdm_yearly)
sparse_counts.todense()

matrix([[0, 0, 0, ..., 3, 0, 2],
        [0, 0, 2, ..., 5, 2, 1],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 1, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 2, 0]], dtype=int64)

In [6]:
# Creating gensim corpus using sparse matrix
corpus = matutils.Sparse2Corpus(sparse_counts)

In [9]:
# Gensim does require dictionary of all the terms and their respective locations in term-document-matrix

cv = pickle.load(open("data/cv.pkl", "rb"))
id2word = dict((v,k) for k,v in cv.vocabulary_.items())

In [10]:
# Applying LDA with 2 topics
lda = models.LdaModel(corpus=corpus, id2word=id2word, num_topics=2, passes=10)
lda.print_topics()

[(0,
  '0.024*"great" + 0.012*"thank" + 0.012*"people" + 0.011*"trump" + 0.010*"just" + 0.010*"president" + 0.008*"country" + 0.007*"big" + 0.007*"news" + 0.007*"new"'),
 (1,
  '0.028*"trump" + 0.022*"great" + 0.016*"thanks" + 0.011*"president" + 0.009*"just" + 0.009*"thank" + 0.008*"like" + 0.007*"good" + 0.007*"people" + 0.007*"new"')]

In [11]:
# Applying LDA with 3 topics
lda = models.LdaModel(corpus=corpus, id2word=id2word, num_topics=3, passes=10)
lda.print_topics()

[(0,
  '0.031*"trump" + 0.024*"great" + 0.016*"thanks" + 0.012*"president" + 0.010*"thank" + 0.010*"just" + 0.008*"like" + 0.007*"run" + 0.007*"people" + 0.007*"good"'),
 (1,
  '0.022*"great" + 0.011*"people" + 0.009*"president" + 0.008*"just" + 0.008*"country" + 0.008*"news" + 0.008*"thank" + 0.007*"big" + 0.007*"fake" + 0.007*"trump"'),
 (2,
  '0.025*"great" + 0.020*"thank" + 0.013*"trump" + 0.011*"people" + 0.011*"just" + 0.008*"new" + 0.008*"make" + 0.007*"big" + 0.007*"today" + 0.006*"crooked"')]

In [12]:
# Applying LDA with 4 topics
lda = models.LdaModel(corpus=corpus, id2word=id2word, num_topics=4, passes=10)
lda.print_topics()

[(0,
  '0.015*"great" + 0.010*"tax" + 0.007*"people" + 0.007*"trump" + 0.006*"news" + 0.006*"just" + 0.006*"president" + 0.006*"fake" + 0.005*"big" + 0.005*"today"'),
 (1,
  '0.034*"trump" + 0.023*"great" + 0.014*"president" + 0.010*"thank" + 0.010*"just" + 0.008*"run" + 0.008*"make" + 0.007*"new" + 0.007*"like" + 0.007*"people"'),
 (2,
  '0.024*"great" + 0.012*"people" + 0.012*"thank" + 0.009*"just" + 0.009*"president" + 0.008*"news" + 0.008*"country" + 0.008*"big" + 0.007*"trump" + 0.007*"new"'),
 (3,
  '0.029*"thanks" + 0.024*"great" + 0.020*"trump" + 0.010*"thank" + 0.010*"just" + 0.009*"good" + 0.008*"like" + 0.008*"think" + 0.008*"people" + 0.007*"president"')]

### Attempt 2 - Nouns Only

In [13]:
corpus_yearly = pd.read_pickle('data/corpus_yearly.pkl')

In [14]:
# Function to tokenize a text and pull out nouns
def nouns(text):
    # tag NN is for nouns
    is_noun = lambda pos: pos[:2] == 'NN'
    tokenized = word_tokenize(text)
    all_nouns = [word for (word, pos) in pos_tag(tokenized) if is_noun(pos)]
    return ' '.join(all_nouns)

In [15]:
corpus_yearly_n = pd.DataFrame(corpus_yearly.transcript.apply(nouns))
corpus_yearly_n

Unnamed: 0,transcript
2009,trump night ten list tonight trump view tomorr...
2010,celebrity apprentice list season tycoon touch ...
2011,night jimmy tomorrow night ill announcement af...
2012,interview make filing caucus interview i tomor...
2013,deal nothing i history deal deal hope deal cou...
2014,today day rest life warming planet record ice ...
2015,president club tonight everybody palm beach cl...
2016,year thank family – club fog war explanation f...
2017,year murder rate mayor cant help book race vic...
2018,aid nothing deceit thinking haven help level d...


In [16]:
from sklearn.feature_extraction import text
from sklearn.feature_extraction.text import CountVectorizer

add_stop_words = ['best', 'good', 'true', 'man', 'today', 'bad', 'today','trump','great','thanks','thank', 'just','president','people','make','people','new','time','like','think','country','big']

stop_words = text.ENGLISH_STOP_WORDS.union(add_stop_words)

# re-create a document term matrix with only nouns

cv_n = CountVectorizer(stop_words = stop_words)
data_cv_n = cv_n.fit_transform(corpus_yearly_n.transcript)
dtm_yearly_n = pd.DataFrame(data_cv_n.toarray(), columns = cv_n.get_feature_names())
dtm_yearly_n.index = corpus_yearly_n.index
dtm_yearly_n

Unnamed: 0,abbas,ability,aboard,abortion,absence,absentee,absolute,absorb,abu,abundance,...,yucca,zac,zeal,zee,zero,zimbabwe,zip,zone,zoning,zoo
2009,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
2010,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2011,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
2012,0,3,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
2013,0,8,0,0,0,0,3,0,0,0,...,0,0,1,1,6,0,0,4,0,0
2014,0,8,1,0,1,0,0,1,0,0,...,0,0,0,0,2,0,0,3,0,0
2015,0,8,0,1,0,1,1,0,2,0,...,0,1,0,0,2,1,1,1,0,0
2016,0,3,0,0,0,1,1,0,0,0,...,0,0,0,0,6,0,0,0,0,0
2017,1,1,1,0,0,0,1,0,0,0,...,0,0,0,0,2,0,0,1,0,1
2018,0,2,0,0,0,0,1,0,0,1,...,0,0,0,0,3,0,0,1,0,0


In [17]:
# create gensim corpus
corpus_n = matutils.Sparse2Corpus(scipy.sparse.csr_matrix(dtm_yearly_n.transpose()))

# creating vocab dict
id2word_n = dict((v,k) for k, v in cv_n.vocabulary_.items())

In [18]:
# models

lda_n = models.LdaModel(corpus=corpus_n, id2word=id2word_n ,num_topics=2, passes=10)
lda_n.print_topics()

[(0,
  '0.018*"news" + 0.011*"media" + 0.011*"border" + 0.010*"job" + 0.009*"house" + 0.008*"state" + 0.008*"way" + 0.007*"election" + 0.007*"vote" + 0.007*"crime"'),
 (1,
  '0.009*"interview" + 0.008*"job" + 0.008*"world" + 0.008*"golf" + 0.008*"night" + 0.007*"deal" + 0.007*"way" + 0.007*"day" + 0.007*"apprentice" + 0.007*"business"')]

In [19]:
# models

lda_n = models.LdaModel(corpus=corpus_n, id2word=id2word_n ,num_topics=3, passes=10)
lda_n.print_topics()

[(0,
  '0.009*"interview" + 0.008*"job" + 0.008*"world" + 0.008*"night" + 0.007*"golf" + 0.007*"way" + 0.007*"vote" + 0.007*"tonight" + 0.007*"day" + 0.007*"deal"'),
 (1,
  '0.020*"news" + 0.012*"border" + 0.011*"media" + 0.010*"job" + 0.009*"house" + 0.008*"state" + 0.008*"way" + 0.008*"crime" + 0.007*"election" + 0.007*"china"'),
 (2,
  '0.002*"way" + 0.002*"news" + 0.001*"media" + 0.001*"border" + 0.001*"deal" + 0.001*"job" + 0.001*"day" + 0.001*"world" + 0.001*"vote" + 0.001*"election"')]

In [20]:
# models

lda_n = models.LdaModel(corpus=corpus_n, id2word=id2word_n ,num_topics=4, passes=10)
lda_n.print_topics()

[(0,
  '0.009*"job" + 0.008*"world" + 0.008*"day" + 0.008*"work" + 0.007*"business" + 0.007*"way" + 0.007*"apprentice" + 0.007*"money" + 0.007*"golf" + 0.007*"course"'),
 (1,
  '0.013*"interview" + 0.011*"night" + 0.009*"debate" + 0.009*"china" + 0.008*"tonight" + 0.008*"tomorrow" + 0.008*"job" + 0.007*"deal" + 0.007*"vote" + 0.007*"election"'),
 (2,
  '0.020*"news" + 0.013*"border" + 0.011*"media" + 0.010*"job" + 0.010*"house" + 0.009*"state" + 0.008*"way" + 0.008*"crime" + 0.008*"election" + 0.008*"china"'),
 (3,
  '0.010*"vote" + 0.009*"golf" + 0.008*"way" + 0.008*"course" + 0.008*"job" + 0.008*"world" + 0.008*"business" + 0.007*"poll" + 0.007*"day" + 0.007*"apprentice"')]

### Attempt 3 - Nouns and Adjectives

In [21]:
# Function to tokenize a text and pull out nouns and adjectives
def nouns_adjective(text):
    # tag NN is for nouns
    is_noun_adj = lambda pos: pos[:2] == 'NN' or pos[:2] == 'JJ'
    tokenized = word_tokenize(text)
    nouns_adj = [word for (word, pos) in pos_tag(tokenized) if is_noun_adj(pos)]
    return ' '.join(nouns_adj)

In [22]:
corpus_yearly_na = pd.DataFrame(corpus_yearly.transcript.apply(nouns_adjective))
corpus_yearly_na

Unnamed: 0,transcript
2009,sure trump late night top ten list tonight tru...
2010,celebrity apprentice outstanding list season b...
2011,late night jimmy tomorrow night ill big announ...
2012,interview make great filing caucus interview p...
2013,deal nothing i worst history big deal deal hop...
2014,today first day rest life most expensive globa...
2015,president club tonight everybody biggest palm ...
2016,happy new year thank great family – club fog w...
2017,new year great murder rate record mayor cant f...
2018,united more aid last nothing deceit thinking s...


In [23]:
cv_na = CountVectorizer(stop_words = stop_words)
data_cv_na = cv_na.fit_transform(corpus_yearly_na.transcript)
dtm_yearly_na = pd.DataFrame(data_cv_na.toarray(), columns = cv_na.get_feature_names())
dtm_yearly_na.index = corpus_yearly_na.index
dtm_yearly_na

Unnamed: 0,abbas,ability,abject,able,aboard,abortion,abrupt,absence,absentee,absolute,...,yucca,zac,zeal,zee,zero,zimbabwe,zip,zone,zoning,zoo
2009,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
2010,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2011,0,1,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
2012,0,3,1,13,0,0,0,0,0,4,...,0,0,0,0,1,0,0,0,0,0
2013,0,8,0,17,0,0,0,0,0,6,...,0,0,1,1,6,0,0,4,0,0
2014,0,8,0,16,1,0,0,1,0,5,...,0,0,0,0,2,0,0,3,0,0
2015,0,8,0,22,1,1,0,0,1,4,...,0,1,0,0,2,1,1,1,0,0
2016,0,3,0,12,0,0,1,0,2,3,...,0,0,0,0,6,0,0,0,0,0
2017,1,1,0,10,1,0,0,0,0,2,...,0,0,0,0,2,0,0,1,0,1
2018,0,2,0,19,0,0,0,0,0,7,...,0,0,0,0,3,0,0,1,0,0


In [24]:
# create gensim corpus
corpus_na = matutils.Sparse2Corpus(scipy.sparse.csr_matrix(dtm_yearly_na.transpose()))

# creating vocab dict
id2word_na = dict((v,k) for k, v in cv_na.vocabulary_.items())

In [25]:
# models

lda_na = models.LdaModel(corpus=corpus_na, id2word=id2word_na ,num_topics=2, passes=10)
lda_na.print_topics()

[(0,
  '0.013*"news" + 0.010*"fake" + 0.008*"border" + 0.008*"media" + 0.007*"job" + 0.007*"united" + 0.006*"house" + 0.006*"state" + 0.006*"military" + 0.006*"way"'),
 (1,
  '0.007*"interview" + 0.006*"job" + 0.006*"world" + 0.006*"golf" + 0.006*"tonight" + 0.006*"night" + 0.005*"dont" + 0.005*"apprentice" + 0.005*"deal" + 0.005*"way"')]

In [26]:
# models

lda_na = models.LdaModel(corpus=corpus_na, id2word=id2word_na ,num_topics=3, passes=10)
lda_na.print_topics()

[(0,
  '0.007*"interview" + 0.006*"world" + 0.006*"golf" + 0.006*"job" + 0.006*"apprentice" + 0.005*"deal" + 0.005*"night" + 0.005*"dont" + 0.005*"course" + 0.005*"business"'),
 (1,
  '0.014*"news" + 0.011*"fake" + 0.009*"border" + 0.008*"media" + 0.007*"united" + 0.007*"job" + 0.007*"house" + 0.006*"military" + 0.006*"state" + 0.006*"china"'),
 (2,
  '0.010*"poll" + 0.009*"vote" + 0.009*"tonight" + 0.008*"tomorrow" + 0.008*"media" + 0.008*"debate" + 0.007*"night" + 0.006*"campaign" + 0.006*"crowd" + 0.006*"speech"')]

In [27]:
# models

lda_na = models.LdaModel(corpus=corpus_na, id2word=id2word_na ,num_topics=4, passes=10)
lda_na.print_topics()

[(0,
  '0.006*"job" + 0.006*"world" + 0.006*"day" + 0.006*"work" + 0.005*"apprentice" + 0.005*"luck" + 0.005*"golf" + 0.005*"nice" + 0.005*"business" + 0.005*"course"'),
 (1,
  '0.009*"golf" + 0.008*"world" + 0.008*"course" + 0.007*"apprentice" + 0.007*"business" + 0.007*"vote" + 0.006*"day" + 0.006*"way" + 0.006*"hotel" + 0.005*"job"'),
 (2,
  '0.008*"tonight" + 0.008*"night" + 0.008*"interview" + 0.007*"poll" + 0.007*"debate" + 0.007*"vote" + 0.006*"job" + 0.006*"tomorrow" + 0.005*"dont" + 0.005*"china"'),
 (3,
  '0.014*"news" + 0.011*"fake" + 0.009*"border" + 0.008*"media" + 0.007*"united" + 0.007*"job" + 0.007*"house" + 0.006*"military" + 0.006*"state" + 0.006*"china"')]

### Topics per year

In [28]:
lda_na = models.LdaModel(corpus=corpus_na, id2word=id2word_na ,num_topics=10, passes=80)
lda_na.print_topics()

[(0,
  '0.018*"news" + 0.014*"fake" + 0.014*"tax" + 0.009*"media" + 0.008*"honor" + 0.008*"day" + 0.007*"election" + 0.007*"house" + 0.006*"national" + 0.006*"military"'),
 (1,
  '0.000*"news" + 0.000*"way" + 0.000*"job" + 0.000*"vote" + 0.000*"day" + 0.000*"media" + 0.000*"tonight" + 0.000*"deal" + 0.000*"dont" + 0.000*"real"'),
 (2,
  '0.012*"poll" + 0.011*"vote" + 0.011*"tonight" + 0.008*"tomorrow" + 0.008*"debate" + 0.008*"media" + 0.008*"night" + 0.007*"campaign" + 0.007*"job" + 0.007*"way"'),
 (3,
  '0.000*"job" + 0.000*"news" + 0.000*"vote" + 0.000*"world" + 0.000*"right" + 0.000*"day" + 0.000*"house" + 0.000*"media" + 0.000*"fake" + 0.000*"night"'),
 (4,
  '0.014*"news" + 0.011*"fake" + 0.010*"border" + 0.008*"job" + 0.008*"united" + 0.008*"media" + 0.007*"house" + 0.007*"military" + 0.007*"china" + 0.006*"state"'),
 (5,
  '0.002*"direct" + 0.001*"chronicle" + 0.001*"lieutenant" + 0.001*"inexplicable" + 0.001*"conner" + 0.001*"hysterical" + 0.001*"showcase" + 0.001*"thoughtful"

In [33]:
lda_na.print_topics()[0]

(0,
 '0.018*"news" + 0.014*"fake" + 0.014*"tax" + 0.009*"media" + 0.008*"honor" + 0.008*"day" + 0.007*"election" + 0.007*"house" + 0.006*"national" + 0.006*"military"')

In [54]:
corpus_transformed = lda_na[corpus_na]
year=2008
for a in corpus_transformed:
    year=year+1
    print(year)
    for i in a:
        print(lda_na.print_topics()[i[0]])
    print('')

2009
(5, '0.002*"direct" + 0.001*"chronicle" + 0.001*"lieutenant" + 0.001*"inexplicable" + 0.001*"conner" + 0.001*"hysterical" + 0.001*"showcase" + 0.001*"thoughtful" + 0.001*"habitat" + 0.001*"tara"')
(7, '0.011*"apprentice" + 0.010*"luck" + 0.008*"tonight" + 0.008*"happy" + 0.007*"celebrity" + 0.007*"day" + 0.006*"job" + 0.006*"work" + 0.006*"champion" + 0.005*"money"')
(9, '0.009*"interview" + 0.007*"golf" + 0.007*"world" + 0.006*"deal" + 0.006*"real" + 0.006*"course" + 0.006*"job" + 0.006*"china" + 0.006*"business" + 0.006*"dont"')

2010
(7, '0.011*"apprentice" + 0.010*"luck" + 0.008*"tonight" + 0.008*"happy" + 0.007*"celebrity" + 0.007*"day" + 0.006*"job" + 0.006*"work" + 0.006*"champion" + 0.005*"money"')

2011
(4, '0.014*"news" + 0.011*"fake" + 0.010*"border" + 0.008*"job" + 0.008*"united" + 0.008*"media" + 0.007*"house" + 0.007*"military" + 0.007*"china" + 0.006*"state"')
(9, '0.009*"interview" + 0.007*"golf" + 0.007*"world" + 0.006*"deal" + 0.006*"real" + 0.006*"course" + 0.00

### Results

<b></b>
<b>Attempt 1 - With all the words</b>
* Using all the words does not give results which are good enough. 

<b>Attempt 2 - Only Nouns</b>
* Using just nouns does not give results which are good enough. 

<b>Attempt 3 - Only Nouns and Adjectives </b>
* <b>2009 Topics: </b>
    * <b>7:</b> celebrity, money
    * <b>9:</b> china, business
    
* <b>2010 Topics: </b>
    * <b>7:</b> luck, celebrity, money
    
* <b>2011 Topics: </b>
    * <b>4:</b> Fake news, China, border, military
    * <b>9:</b> china, business
    
* <b>2012 Topics: </b>
    * <b>4:</b> Fake news, China, border, military
    * <b>9:</b> china, business
    
* <b>2013 Topics: </b>
    * <b>7:</b> luck, celebrity, money
    * <b>9:</b> china, business
    
* <b>2014 Topics: </b>
    * <b>9:</b> china, business
    
* <b>2015 Topics: </b>
    * <b>2:</b> vote, media, campaign
    * <b>9:</b> china, business
    
* <b>2016 Topics: </b>
    * <b>2:</b> vote, media, campaign
    * <b>4:</b> Fake news, China, border, military
    
* <b>2017 Topics: </b>
    * <b>0:</b> Fake News, tax, media, election, military
    * <b>4:</b> Fake news, China, border, military
    
* <b>2018 Topics: </b>
    * <b>0:</b> Fake News, tax, media, election, military
    * <b>2:</b> vote, media, campaign
    * <b>4:</b> Fake news, China, border, military
    
* <b>2019 Topics: </b> 
    * <b>4:</b> Fake news, China, border, military
    
* <b>2020 Topics: </b>
    * <b>4:</b> Fake news, China, border, military