-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
357 lines (347 loc) · 20.7 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
<html>
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta property="og:image" content="http://www.fastforwardlabs.com/cinephile_tsne/images/Cover.png" />
<meta property="og:url" content="http://www.fastforwardlabs.com/cinephile_tsne/" />
<meta property="og:type" content="article" />
<meta property="og:title" content="Visualizing The Taste of a Community of Cinephiles using t-SNE" />
<meta property="og:description" content="Exploring a community of cinephiles with an interactive visualization that clusters movies based on user ratings" />
<meta name="twitter:image" content="http://www.fastforwardlabs.com/cinephile_tsne/images/Cover.png" />
<meta name="twitter:card" content="summary_large_image" />
<meta name="twitter:site" content="@FastForwardLabs" />
<meta name="twitter:title" content="Visualizing The Taste of a Community of Cinephiles using t-SNE" />
<meta name="twitter:description" content="Exploring a community of cinephiles with an interactive visualization that clusters movies based on user ratings" />
<link rel="stylesheet" type="text/css" href="app.css" />
<link rel="shortcut icon" type="image/png" href="favicon.png"/>
<script src="d3-min.js"></script>
<script src="d3-color.js"></script>
<script src="d3-interpolate.js"></script>
<script src="d3-quadtree.js"></script>
<script src="noframework.waypoints.min.js"></script>
<title>Visualizing the Taste of a Community of Cinephiles Using t-SNE</title>
</head>
<body>
<div class="mobileWarning">This visualization requires a larger screen. Please view it on a desktop or a laptop for the best experience!</div>
<div class="loadingBar">Loading the visualization, your screen may lock up</div>
<div class="everythingWrapper">
<div class="coverImageWrapper"><img src="images/Cover.png" class="coverImage"><span id="coverTitle">Visualizing the Taste of a Community of Cinephiles Using t-SNE</span></img></div>
<div class="textContainer">
<div class="para">
<div class="headline">Introduction</div>
<p class="paragraphText">
Social networks like Facebook and Twitter are created with communities in mind.
But there are also passive, implict communities that are defined by their
members’ interactions with the service (such as Netflix and Spotify).
</p>
<p class="paragraphText">
Defining such relationships and analysing the communities the connections
imply allows us to learn about both the community and the subject. For
instance, users of a music streaming service can be grouped together based on
the kind of music they like. These groups could be used to investigate shifting
trends, and the emergence and decline of subcultures.
</p>
<p class="paragraphText">
These insights are not only culturally significant, but also valuable from a
business perspective. Streaming services spend millions of dollars each year to
acquire content. Can we build tools that help them to study their userbase and
optimize their catalogue?
</p>
<p class="paragraphText">
In this post we are going to look at an interactive visualization that clusters
movies together based on user ratings. This visualization will give us a
glimpse into a community of cinephiles.
</p>
</div>
<div class="para">
<div class="headline">The Dataset</div>
<div class="imageContainer" id="mubi"><img src="images/MUBI.png" class="images"></img><div class="imageCaption"><a href="http://www.mubi.com" target="_blank">(Source)</a></div></div>
<p class="paragraphText" id="mubiHeadline">
MUBI is an online service that integrates a subscription video-on-demand service with a massive database. The service has a truly diverse selection of content from underground cult classics to Tarantino blockbusters that attracts cinephiles from all over the world. Its 8 million users have collectively rated and reviewed thousands of movies present in its database.
</p>
<p class="paragraphText">
Given this dataset, how can we visualize the relationship between these users and
the movies that they have rated? Let's take a look at a technique called t-SNE
that can help us solve this problem.
</p>
</div>
<div class="para">
<div class="headline">Introduction to t-SNEs</div>
<div class="imageContainer" id="dreduction"><img src="images/dreduction.png" class="images"></img><div class="imageCaption">An example of a dimensionality reduction operation <br><a href="http://axon.cs.byu.edu/papers/gashler2011smc.pdf" target="_blank">(Source)</a></div></div>
<div class="imageContainer" id="graph"><img src="images/graph.png" class="images specialImage"></img><div class="imageCaption">A graph that visualizes four dimensions. Sugar per cup is encoded as the size of an icon<br><a href="http://www.flaticon.com/authors/madebyoliver" target="_blank">(Cute SVG illustrations source)</a></div></div>
<p class="paragraphText" id="graphpara">
Data visualization designers use a number of techniques and tricks to show the
relationship between items in a dataset. Of course the location in the 2D
visualization is an option, but scalar values such as rating or count can be
shown by the size of the point, and categorical variables can be shown using
colored labels.
<p>
<table>
<tr>
<th>Type of Fruit</th>
<th>Weight</th>
<th>Cost</th>
<th>Sugar per cup</th>
</tr>
<tr>
<td>Watermelon</td>
<td>280</td>
<td>13.50</td>
<td>18</td>
</tr>
<tr>
<td>Strawberry</td>
<td>7</td>
<td>2</td>
<td>7</td>
</tr>
<tr>
<td>Strawberry</td>
<td>5</td>
<td>1</td>
<td>7</td>
</tr>
<tr>
<td>Orange</td>
<td>131</td>
<td>2</td>
<td>23</td>
</tr>
</table>
<p class="paragraphText">This visualization shows a dataset with four dimensions. Each of these
dimensions is visualized in a different way, using size, symbols and location.
</p>
<p class="paragraphText">
What if we have a dataset with thousands of dimensions? The designer would
start run out of ways to visualize dimensions and the resulting
visualization would probably be incomprehensible to the user.</p>
<p class="paragraphText" id="helloThere">
This is where t-SNE can help us. t-SNE (t-distributed stochastic neighbor
embedding) is a machine learning algorithm <a href="http://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf">developed by Geoffrey Hinton and
Laurens van der Maaten</a> which reduces the dimensionality of a dataset. It is
good at reducing very high dimensional data to two or three, which
makes it much easier to visualize using simple techniques such as scatterplots.</p>
<p class="paragraphText">
Dimensionality reduction is an old technique. Principal component analysis, for
example, has been used to attack this problem <a href="http://stat.smmu.edu.cn/history/pearson1901.pdf">since 1901</a>. But t-SNE does
something that may other schemes do not: it maintains as much global and local
structure as it can. This is done by explicitly trying to maintain the distance
between points from before and after the algorithm has been applied. Its
reduction is therefore ideal for both clustering and visualization.
</p>
</div>
<div class="para">
<div class="headline">Process</div>
<div class="imageContainer" id="admatrix"><img src="images/adjmatrix.png" class="images specialImage"></img><div class="imageCaption">A matrix with m number of columns and n number of rows. <br>Notice that not all users rate every movie</div></div>
<p class="paragraphText" id="process">
We selected a subset of MUBI users who had rated at least 20 movies. An
adjacency matrix can be constructed for this subset of users, where each column is
a user and each row is a user’s rating for a particular movie.
</p>
<p class="paragraphText">
This multidimensional matrix will be t-SNE’s input. t-SNE will reduce the
number of dimensions to just two, which we can use as coordinates for an
interactive scatterplot.</p>
</div>
<div class="para">
<div class="headline">The Visualization</div>
<p class="paragraphText" id="guidedHandle">
Each movie in the visualization is represented as square. The color of the
square represents the movie's genre. The position of each square is determined
by the t-SNE algorithm which only takes movie ratings by users as an input; the
algorithm is agnostic to the metadata of the movie itself such as genre,
director and year of release.</p>
</div>
<div class="para">
<div class="headline">Guided Tour</div>
<p class="guidedText">
Let's take a tour through some interesting features in our visualization.
Scroll along to trigger zoom interactions that will lead you to interesting
parts of the visualization!
</p>
<p class="guidedText" id="tour1">
There are some genres in the visualization that show strong clustering.
Clusters of movies of the same genre or same country of origin indicate a
subset of users affinity towards that particular type of movie.
</p>
<p class="guidedText" id="tour2">
Since this is a dataset of cinephiles from all over the world, it's not
surprising that we see clusters of movies from different countries. You can see
many such clusters from countries with strong cinematic traditions such as
Italy, Iran, India, Turkey and Japan.
</p>
<p class="guidedText" id="tour3">
Sometimes clusters form around cinematic time periods, such as a cluster of
short black-and-white movies from the early 20th century.
</p>
<p class="guidedText" id="tour4">
As we go from the left to the right the number of reviews of a movie increases.
As a result popular movies stand out by themselves, far away from the crowd.
Similarly filmographies of popular directors such as Tarantino or Kubrick,
denoted by a web of lines in the visualization, tend to be found on the right
side of the visualization.
</p>
</div>
<div class="para">
<div class="headline" id="sandbox">Sandbox</div>
<p class="paragraphText">
Time to get your hands dirty in the sandbox! You can zoom into the visualization
to get a more granular view of the clusters. You can also play around with different
settings to turn different genres on/off or search for your favorite movie. You can now
also click on a movie to view more information about it and see more movies by the same
director (depicted by lines).
</p>
</div>
</div>
</div>
<div id="canvasContainer">
<canvas id="mainCanvas"></canvas>
<div class="tooltip">
<div class="tooltipContent">
<div class="tooltipName"></div>
<div class="tooltipYear"></div>
<div class="tooltipDirector"></div>
<div class="tooltipGenre"></div>
<div class="tooltipAlert"></div>
</div>
</div>
<div id="canvasLabel"></div>
</div>
<div id="demo">
<div id="outerHUD">
<input type="text" placeholder="Search for movie..." id="searchBar">
<div id="HUD">
<div id="movieinfo">
<div id="closebutton"><div class="closeButtonX">✖</div></div>
<img id="movieImage"></img>
<div id="HUDcontainer">
<div class="HUDtextline">
<span id="movieTitle"></span>
<span id="movieYear"></span>
<span id="movieDirector"></span>
</div>
<div class="HUDtextline">
<span id="movieSynopsis"></span>
</div>
<div class="HUDtextline">
<div class="secondInfoLine">
<span id="movieTime"></span><span class="labelDesc"> min</span>
<span class="movieInfoLabel">Running time</span>
</div>
<div class="secondInfoLine">
<span id="movieGenre"></span>
<span class="movieInfoLabel">Genre</span>
</div>
<div class="secondInfoLine">
<img id="HUDflag"></img>
<span class="movieInfoLabel">Country of origin</span>
</div>
</div>
<div class="HUDtextline">
<div class="thirdInfoLine">
<span id="movieAverage"></span><span class="labelDesc"> /5</span>
<span class="movieInfoLabel">Average rating</span>
</div>
<div class="thirdInfoLine">
<span id="movieTotal"></span>
<span class="movieInfoLabel">Total ratings</span>
</div>
</div>
</div>
</div>
<div id="HUDcontent">
</div>
</div>
</div>
<div id="labelButtonContainer">
<span id="warning">You can only select 12 genres at a time</span>
<div class="labelButton" selected="true">Action</div>
<div class="labelButton" selected="true">Adventure</div>
<div class="labelButton" selected="true">Animation</div>
<div class="labelButton" selected="false">Anthology</div>
<div class="labelButton" selected="true">Avant-Garde</div>
<div class="labelButton" selected="false">Biography</div>
<div class="labelButton" selected="false">Comedy</div>
<div class="labelButton" selected="false">Commercial</div>
<div class="labelButton" selected="true">Crime</div>
<div class="labelButton" selected="false">Cult</div>
<div class="labelButton" selected="true">Documentary</div>
<div class="labelButton" selected="false">Drama</div>
<div class="labelButton" selected="false">Erotica</div>
<div class="labelButton" selected="false">Fantasy</div>
<div class="labelButton" selected="false">Family</div>
<div class="labelButton" selected="false">Film noir</div>
<div class="labelButton" selected="false">Gay & Lesbian</div>
<div class="labelButton" selected="false">History</div>
<div class="labelButton" selected="false">Horror</div>
<div class="labelButton" selected="false">Music</div>
<div class="labelButton" selected="false">Musical</div>
<div class="labelButton" selected="true">Music Video</div>
<div class="labelButton" selected="false">Mystery</div>
<div class="labelButton" selected="true">Romance</div>
<div class="labelButton" selected="false">Sci-Fi</div>
<div class="labelButton" selected="true">Short</div>
<div class="labelButton" selected="false">Silent</div>
<div class="labelButton" selected="false">Sport</div>
<div class="labelButton" selected="true">Thriller</div>
<div class="labelButton" selected="false">TV Mini-series</div>
<div class="labelButton" selected="false">TV Movie</div>
<div class="labelButton" selected="false">War</div>
<div class="labelButton" selected="false">Western</div>
</div>
<form>
<div class="checkboxContainer">
<input type="checkbox" id="hideUnlabeled" checked>Only show selected genres</input>
</div>
<div class="checkboxContainer">
<input type="checkbox" id="showCountries">Show country flags</input>
</div>
</form>
</div>
<div class="everythingWrapper">
<div class="textContainer">
<div class="para">
<div class="headline">Afterword</div>
<p class="paragraphText">
t-SNE is not a deterministic algorithm. It has hyperparameters that affect the result. This visualization is an interpretation of the underlying data but is by no means the only interpretation. If you would like to learn more about the limitations of a t-SNE, we highly recommend <a href="http://distill.pub/2016/misread-tsne/">this paper by Martin Wattenberg et al.</a>
</p>
</div>
</div>
</div>
<script type="text/javascript" src="ListNode.js"></script>
<script type="text/javascript" src="movieObject.js"></script>
<script type="text/javascript" src="movieList.js"></script>
<script type="text/javascript" src="cincity.js"></script>
<script type="text/javascript" src="interface.js"></script>
<footer>
<div id="footWrapper">
<div class="footBox" id="authors">
<span class="footerHeadline">Authors</span>
<div><span>Aditya Jain</span><a href="https://twitter.com/whaleandpetunia"><div class="logoContainer"><img class="twitlogo" src="images/twitter.png"></img></div></a></div>
<div><span>Micha Gorelick</span><a href="https://twitter.com/mynameisfiber"><div class="logoContainer"><img class="twitlogo" src="images/twitter.png"></img></div></a></div>
</div>
<div class="footBox" id="thanks">
<span class="footerHeadline">Thanks to</span>
<div><span>Grant Custer</span><a href="https://twitter.com/GrantCuster"><div class="logoContainer"><img class="twitlogo" src="images/twitter.png"></img></div></a></div>
<div><span>Mike Lee-Williams</span><a href="https://twitter.com/mikepqr"><div class="logoContainer"><img class="twitlogo" src="images/twitter.png"></img></div></a></div>
<span>The MUBI Team</span>
</div>
<div id="aboutff">
<span class="footerHeadline">About us</span><br>
<p>Fast Forward Labs is a machine intelligence research company</p>
<p>You can read more posts like this one on <a href="http://blog.fastforwardlabs.com/">our blog</a></p>
</div>
</div>
</footer>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-53030428-11', 'auto');
ga('send', 'pageview');
</script>
</body>
</html>