# Text mining BGG comments - part 1 

In [1]:
import comments as cm

User comments on games are probably the most valuable resource on BGG. You can quickly get an overview of the thoughts of people who loved, hated or were indifferent to a game, and you can create your own easy-access panel of trusted commenters.

Recently I've been doing some text mining at work and I thought it might be fun to try some things out on BGG comments. Using a Python wrapper round the BGG API somebody had helpfully created, over the course of several days I grabbed all the comments for the top 2000 games in the BGG rankings and saved a local copy.

As I was thinking about techniques to try, I happened on an excellent geeklist from a few years ago, in which Alison devised a simple but very effective way of looking at the text in comments. For a given word ('filler', 'dice' etc.) she found the percentage of comments which contained the word, giving a quick idea of the characteristics of a game.

I replicated her approach and started looking at some terms. Obvious mechanical and thematic terms produce unsurprising results:

In [2]:
cm.analyse_comments('push your luck').head(10)

Unnamed: 0_level_0,title,matches,total,term %
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
175117,Celestia,136,512,26.5625
169654,Deep Sea Adventure,163,637,25.588697
156009,Port Royal,226,918,24.618736
149155,Dead Man's Draw,104,489,21.267894
37759,Diamant,374,1997,18.728092
632,Cloud 9,129,709,18.19464
41,Can't Stop,412,2641,15.600151
15512,Diamant,274,1784,15.358744
150312,Welcome to the Dungeon,136,947,14.36114
98315,The Adventurers: The Pyramid of Horus,57,422,13.507109


In [3]:
cm.analyse_comments('egypt').head(10)

Unnamed: 0_level_0,title,matches,total,term %
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
35435,Nefertiti,22,398,5.527638
150999,Valley of the Kings,27,523,5.162524
5404,Amun-Re,106,2112,5.018939
3931,Mare Nostrum,39,873,4.467354
67185,Sobek,11,274,4.014599
127023,Kemet,57,1463,3.896104
58421,Egizia,30,812,3.694581
175223,Valley of the Kings: Afterlife,6,167,3.592814
12,Ra,129,4057,3.179689
23418,Pursuit of Glory,4,128,3.125


but I found it interesting to look at some more nebulous terms and see how they are being used:

In [4]:
cm.analyse_comments('opaque').head(10)

Unnamed: 0_level_0,title,matches,total,term %
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
35285,German Railways,9,242,3.719008
198953,Pax Renaissance,5,168,2.97619
165401,Wir sind das Volk!,6,239,2.51046
204,Stephenson's Rocket,14,651,2.150538
132018,Churchill,7,336,2.083333
75212,Grand Cru,4,207,1.932367
31730,Chicago Express,26,1671,1.555955
9215,Revolution: The Dutch Revolt 1568-1648,4,274,1.459854
29937,König von Siam,9,623,1.444623
75358,Paris Connection,7,489,1.431493


generated a great list of games I love or am interested in trying, and:

In [5]:
cm.analyse_comments('unique').head(10)

Unnamed: 0_level_0,title,matches,total,term %
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
80006,Mord im Arosa,44,306,14.379085
150293,The Ravens of Thri Sahashri,19,136,13.970588
139952,Clockwork Wars,12,87,13.793103
192135,Too Many Bones,27,210,12.857143
142830,Chaosmos,18,141,12.765957
84889,Cave Evil,15,119,12.605042
164265,Witness,34,280,12.142857
380,Polarity,64,620,10.322581
148319,Tragedy Looper,48,498,9.638554
168433,The World of Smog: On Her Majesty's Service,13,136,9.558824


is fun too.

Next I thought about looking for links between games by searching for the title of one game in the comments of another. My favourite game is Tigris & Euphrates so maybe I'd find some interesting games to look at?

In [6]:
cm.analyse_comments('tigris').head(10)

Unnamed: 0_level_0,title,matches,total,term %
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
127997,Qin,35,327,10.703364
25674,Khronos,46,458,10.043668
12962,Reef Encounter,93,1512,6.150794
111,Rheinländer,21,451,4.656319
145588,Citrus,5,145,3.448276
23730,Gheos,18,562,3.202847
42,Tigris & Euphrates,145,4913,2.951354
9674,Ingenious,91,3131,2.90642
3,Samurai,81,2971,2.726355
204,Stephenson's Rocket,13,651,1.996928


Not bad! Plenty of other Knizia tile-laying games, but also Khronos and Reef Encounter, the two names that often come up when anyone asks for 'similar to Tigris' games.

I then extended this to designers and looked at the games with comments most often mentioning Reiner Knizia. The top 60 (!) were all Knizia's own designs, but having excluded those I was left with:

In [7]:
cm.like_designer('knizia',2).head(25)

Unnamed: 0_level_0,title,term %
id,Unnamed: 1_level_1,Unnamed: 2_level_1
107190,Flash Duel: Second Edition,9.74359
118418,Divinare,3.383459
3800,Himalaya,3.162055
78954,Mousquetaires du Roy,2.547771
35435,Nefertiti,2.512563
21380,Conquest of the Fallen Lands,2.150538
42910,Peloponnes,2.145215
145588,Citrus,2.068966
154443,Madame Ching,1.796407
181796,The Prodigals Club,1.714286


A pretty great list of games that either have an obvious Knizia link (Flash Duel having controversially reimplemented En Garde) or a stylistic similarity.

This was a fun start, but I had lots of ideas for other things I could try using this great data set. More posts will follow soon...