### Questions tagging with NLP: Case of StackOvrFlow

In [181]:
# Importing librairies
import pandas as pd
import nltk
import numpy as np
from bs4 import BeautifulSoup
import stop_words
import re
import spacy
from gensim.models import Phrases, phrases
from gensim import models, corpora
import pyLDAvis.gensim
from gensim.models.coherencemodel import CoherenceModel

### Importing necessary tools

In [86]:
# Loading CSV File
data = pd.read_csv('QueryResults.csv')
data.head(3)

Unnamed: 0,Id,PostTypeId,AcceptedAnswerId,ParentId,CreationDate,DeletionDate,Score,ViewCount,Body,OwnerUserId,...,LastEditorDisplayName,LastEditDate,LastActivityDate,Title,Tags,AnswerCount,CommentCount,FavoriteCount,ClosedDate,CommunityOwnedDate
0,4,1,7.0,,2008-07-31 21:42:52,,564,36447.0,"<p>I want to use a track-bar to change a form's opacity.</p>\n\n<p>This is my code:</p>\n\n<pre><code>decimal trans = trackBar1.Value / 5000;\nthis.Opacity = trans;\n</code></pre>\n\n<p>When I build the application, it gives the following error:</p>\n\n<blockquote>\n <p>Cannot implicitly convert type <code>'decimal'</code> to <code>'double'</code>.</p>\n</blockquote>\n\n<p>I tried using <code>trans</code> and <code>double</code> but then the control doesn't work. This code worked fine in a past VB.NET project.</p>\n",8.0,...,Rich B,2018-07-02 17:55:27,2018-07-02 17:55:27,Convert Decimal to Double?,<c#><floating-point><type-conversion><double><decimal>,13.0,2,41.0,,2012-10-31 16:42:47
1,6,1,31.0,,2008-07-31 22:08:08,,253,16153.0,"<p>I have an absolutely positioned <code>div</code> containing several children, one of which is a relatively positioned <code>div</code>. When I use a <strong>percentage-based width</strong> on the child <code>div</code>, it collapses to '0' width on <a href=""http://en.wikipedia.org/wiki/Internet_Explorer_7"" rel=""noreferrer"">Internet&nbsp;Explorer&nbsp;7</a>, but not on Firefox or Safari.</p>\n\n<p>If I use <strong>pixel width</strong>, it works. If the parent is relatively positioned, the percentage width on the child works.</p>\n\n<ol>\n<li>Is there something I'm missing here?</li>\n<li>Is there an easy fix for this besides the <em>pixel-based width</em> on the\nchild?</li>\n<li>Is there an area of the CSS specification that covers this?</li>\n</ol>\n",9.0,...,Rich B,2016-03-19 06:05:48,2016-03-19 06:10:52,Percentage width child element in absolutely positioned parent on Internet Explorer 7,<html><css><css3><internet-explorer-7>,5.0,0,10.0,,
2,7,2,,4.0,2008-07-31 22:17:57,,398,,<p>An explicit cast to double like this isn't necessary:</p>\n\n<pre><code>double trans = (double) trackBar1.Value / 5000.0;\n</code></pre>\n\n<p>Identifying the constant as <code>5000.0</code> (or as <code>5000d</code>) is sufficient:</p>\n\n<pre><code>double trans = trackBar1.Value / 5000.0;\ndouble trans = trackBar1.Value / 5000d;\n</code></pre>\n,9.0,...,,2017-12-16 05:06:57,2017-12-16 05:06:57,,,,0,,,


In [87]:
data.shape

(27412, 22)

In [88]:
# Percentage of missing data by variable
data.isnull().sum().sort_values(ascending=False)*100/data.shape[0]

DeletionDate             100.000000
ClosedDate               97.336933 
CommunityOwnedDate       94.786955 
FavoriteCount            85.568364 
AcceptedAnswerId         83.025682 
ViewCount                79.855538 
AnswerCount              79.855538 
Tags                     79.855538 
Title                    79.855538 
LastEditorDisplayName    76.444623 
LastEditorUserId         59.787684 
LastEditDate             59.357216 
ParentId                 20.144462 
OwnerDisplayName         15.828834 
OwnerUserId              1.926164  
Body                     0.000000  
Score                    0.000000  
CreationDate             0.000000  
LastActivityDate         0.000000  
CommentCount             0.000000  
PostTypeId               0.000000  
Id                       0.000000  
dtype: float64

Around 80% of Tags and Title are missing values. Body have any missing values. In this case, we think already to the semi-unsupervised machine learning.

In [89]:
data_imp = data[['Body','Title','Tags']]
data_imp.head()

Unnamed: 0,Body,Title,Tags
0,"<p>I want to use a track-bar to change a form's opacity.</p>\n\n<p>This is my code:</p>\n\n<pre><code>decimal trans = trackBar1.Value / 5000;\nthis.Opacity = trans;\n</code></pre>\n\n<p>When I build the application, it gives the following error:</p>\n\n<blockquote>\n <p>Cannot implicitly convert type <code>'decimal'</code> to <code>'double'</code>.</p>\n</blockquote>\n\n<p>I tried using <code>trans</code> and <code>double</code> but then the control doesn't work. This code worked fine in a past VB.NET project.</p>\n",Convert Decimal to Double?,<c#><floating-point><type-conversion><double><decimal>
1,"<p>I have an absolutely positioned <code>div</code> containing several children, one of which is a relatively positioned <code>div</code>. When I use a <strong>percentage-based width</strong> on the child <code>div</code>, it collapses to '0' width on <a href=""http://en.wikipedia.org/wiki/Internet_Explorer_7"" rel=""noreferrer"">Internet&nbsp;Explorer&nbsp;7</a>, but not on Firefox or Safari.</p>\n\n<p>If I use <strong>pixel width</strong>, it works. If the parent is relatively positioned, the percentage width on the child works.</p>\n\n<ol>\n<li>Is there something I'm missing here?</li>\n<li>Is there an easy fix for this besides the <em>pixel-based width</em> on the\nchild?</li>\n<li>Is there an area of the CSS specification that covers this?</li>\n</ol>\n",Percentage width child element in absolutely positioned parent on Internet Explorer 7,<html><css><css3><internet-explorer-7>
2,<p>An explicit cast to double like this isn't necessary:</p>\n\n<pre><code>double trans = (double) trackBar1.Value / 5000.0;\n</code></pre>\n\n<p>Identifying the constant as <code>5000.0</code> (or as <code>5000d</code>) is sufficient:</p>\n\n<pre><code>double trans = trackBar1.Value / 5000.0;\ndouble trans = trackBar1.Value / 5000d;\n</code></pre>\n,,
3,"<p>Given a <code>DateTime</code> representing a person's birthday, how do I calculate their age in years? </p>\n",How do I calculate someone's age in C#?,<c#><.net><datetime>
4,"<p>Given a specific <code>DateTime</code> value, how do I display relative time, like:</p>\n\n<ul>\n<li>2 hours ago</li>\n<li>3 days ago</li>\n<li>a month ago</li>\n</ul>\n",Calculate relative time in C#,<c#><datetime><time><datediff><relative-time-span>


In [90]:
# #delete all missing tags values
# data_imp1 = data_imp.dropna(axis=0, subset="Tags")

### Tokenization 

In [91]:
def reg_function(x):
    tokenizer = nltk.RegexpTokenizer(r'<.*?>')
    if x == x:
        x = tokenizer.tokenize(x.lower())
        for tag in range(len(x)):
            x[tag] = x[tag].strip('>').strip('<')
    return x

In [92]:
data_imp['token_tag'] = data_imp['Tags'].apply(reg_function)
data_imp['token_tag'].head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


0    [c#, floating-point, type-conversion, double, decimal]
1    [html, css, css3, internet-explorer-7]                
2    NaN                                                   
3    [c#, .net, datetime]                                  
4    [c#, datetime, time, datediff, relative-time-span]    
Name: token_tag, dtype: object

In [93]:
data_imp.head()

Unnamed: 0,Body,Title,Tags,token_tag
0,"<p>I want to use a track-bar to change a form's opacity.</p>\n\n<p>This is my code:</p>\n\n<pre><code>decimal trans = trackBar1.Value / 5000;\nthis.Opacity = trans;\n</code></pre>\n\n<p>When I build the application, it gives the following error:</p>\n\n<blockquote>\n <p>Cannot implicitly convert type <code>'decimal'</code> to <code>'double'</code>.</p>\n</blockquote>\n\n<p>I tried using <code>trans</code> and <code>double</code> but then the control doesn't work. This code worked fine in a past VB.NET project.</p>\n",Convert Decimal to Double?,<c#><floating-point><type-conversion><double><decimal>,"[c#, floating-point, type-conversion, double, decimal]"
1,"<p>I have an absolutely positioned <code>div</code> containing several children, one of which is a relatively positioned <code>div</code>. When I use a <strong>percentage-based width</strong> on the child <code>div</code>, it collapses to '0' width on <a href=""http://en.wikipedia.org/wiki/Internet_Explorer_7"" rel=""noreferrer"">Internet&nbsp;Explorer&nbsp;7</a>, but not on Firefox or Safari.</p>\n\n<p>If I use <strong>pixel width</strong>, it works. If the parent is relatively positioned, the percentage width on the child works.</p>\n\n<ol>\n<li>Is there something I'm missing here?</li>\n<li>Is there an easy fix for this besides the <em>pixel-based width</em> on the\nchild?</li>\n<li>Is there an area of the CSS specification that covers this?</li>\n</ol>\n",Percentage width child element in absolutely positioned parent on Internet Explorer 7,<html><css><css3><internet-explorer-7>,"[html, css, css3, internet-explorer-7]"
2,<p>An explicit cast to double like this isn't necessary:</p>\n\n<pre><code>double trans = (double) trackBar1.Value / 5000.0;\n</code></pre>\n\n<p>Identifying the constant as <code>5000.0</code> (or as <code>5000d</code>) is sufficient:</p>\n\n<pre><code>double trans = trackBar1.Value / 5000.0;\ndouble trans = trackBar1.Value / 5000d;\n</code></pre>\n,,,
3,"<p>Given a <code>DateTime</code> representing a person's birthday, how do I calculate their age in years? </p>\n",How do I calculate someone's age in C#?,<c#><.net><datetime>,"[c#, .net, datetime]"
4,"<p>Given a specific <code>DateTime</code> value, how do I display relative time, like:</p>\n\n<ul>\n<li>2 hours ago</li>\n<li>3 days ago</li>\n<li>a month ago</li>\n</ul>\n",Calculate relative time in C#,<c#><datetime><time><datediff><relative-time-span>,"[c#, datetime, time, datediff, relative-time-span]"


In [94]:
# EDA of tag
data_imp1 = data_imp.dropna(subset=['Tags'])
data_imp1.head()

Unnamed: 0,Body,Title,Tags,token_tag
0,"<p>I want to use a track-bar to change a form's opacity.</p>\n\n<p>This is my code:</p>\n\n<pre><code>decimal trans = trackBar1.Value / 5000;\nthis.Opacity = trans;\n</code></pre>\n\n<p>When I build the application, it gives the following error:</p>\n\n<blockquote>\n <p>Cannot implicitly convert type <code>'decimal'</code> to <code>'double'</code>.</p>\n</blockquote>\n\n<p>I tried using <code>trans</code> and <code>double</code> but then the control doesn't work. This code worked fine in a past VB.NET project.</p>\n",Convert Decimal to Double?,<c#><floating-point><type-conversion><double><decimal>,"[c#, floating-point, type-conversion, double, decimal]"
1,"<p>I have an absolutely positioned <code>div</code> containing several children, one of which is a relatively positioned <code>div</code>. When I use a <strong>percentage-based width</strong> on the child <code>div</code>, it collapses to '0' width on <a href=""http://en.wikipedia.org/wiki/Internet_Explorer_7"" rel=""noreferrer"">Internet&nbsp;Explorer&nbsp;7</a>, but not on Firefox or Safari.</p>\n\n<p>If I use <strong>pixel width</strong>, it works. If the parent is relatively positioned, the percentage width on the child works.</p>\n\n<ol>\n<li>Is there something I'm missing here?</li>\n<li>Is there an easy fix for this besides the <em>pixel-based width</em> on the\nchild?</li>\n<li>Is there an area of the CSS specification that covers this?</li>\n</ol>\n",Percentage width child element in absolutely positioned parent on Internet Explorer 7,<html><css><css3><internet-explorer-7>,"[html, css, css3, internet-explorer-7]"
3,"<p>Given a <code>DateTime</code> representing a person's birthday, how do I calculate their age in years? </p>\n",How do I calculate someone's age in C#?,<c#><.net><datetime>,"[c#, .net, datetime]"
4,"<p>Given a specific <code>DateTime</code> value, how do I display relative time, like:</p>\n\n<ul>\n<li>2 hours ago</li>\n<li>3 days ago</li>\n<li>a month ago</li>\n</ul>\n",Calculate relative time in C#,<c#><datetime><time><datediff><relative-time-span>,"[c#, datetime, time, datediff, relative-time-span]"
6,<p>Is there any standard way for a Web Server to be able to determine a user's timezone within a web page? </p>\n\n<p>Perhaps from an HTTP header or part of the user-agent string?</p>\n,Determine a User's Timezone,<javascript><html><browser><timezone><timezoneoffset>,"[javascript, html, browser, timezone, timezoneoffset]"


In [95]:
# Occurrence of Tags
# 1. Define a list of token values from dataframe
list_tok = data_imp1.token_tag.values
list_tok


array([list(['c#', 'floating-point', 'type-conversion', 'double', 'decimal']),
       list(['html', 'css', 'css3', 'internet-explorer-7']),
       list(['c#', '.net', 'datetime']), ..., list(['html', 'css']),
       list(['.net', 'dll']), list(['windows'])], dtype=object)

In [96]:
# 2. Put all element in one list
list_tag = []
for sublist in list_tok:
    for item in sublist:
        list_tag.append(item)

In [97]:
sorted(list_tag)

['.htaccess',
 '.htaccess',
 '.htaccess',
 '.htaccess',
 '.htpasswd',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.net',
 '.

In [98]:
# Create a tuple with number of occurence of unique tag in the list
from itertools import groupby
freq = {key:len(list(group)) for key, group in groupby(np.sort(list_tag))} 

In [99]:
freq

{'.htaccess': 4,
 '.htpasswd': 1,
 '.net': 628,
 '.net-1.0': 1,
 '.net-1.1': 9,
 '.net-2.0': 29,
 '.net-3.5': 54,
 '.net-assembly': 1,
 '.net-attributes': 2,
 '.net-client-profile': 3,
 '2d': 1,
 '2d-games': 1,
 '32-bit': 3,
 '32bit-64bit': 1,
 '3d': 2,
 '3d-engine': 1,
 '3des': 1,
 '64bit': 15,
 'abap': 1,
 'abstract-class': 1,
 'abstraction': 2,
 'access-specifier': 1,
 'accessibility': 2,
 'account': 2,
 'accurev': 1,
 'acl': 1,
 'acrobat': 1,
 'action': 1,
 'actionlistener': 1,
 'actionscript': 8,
 'actionscript-2': 1,
 'actionscript-3': 36,
 'activation': 2,
 'active-directory': 13,
 'active-directory-group': 1,
 'activemq': 3,
 'activerecord': 7,
 'activereports': 1,
 'activex': 5,
 'adam': 2,
 'adc': 1,
 'add-in': 5,
 'add-on': 3,
 'addclass': 1,
 'address-bar': 1,
 'administration': 6,
 'ado': 1,
 'ado.net': 16,
 'adobe': 10,
 'adobe-reader': 1,
 'adsl': 1,
 'advanced-queuing': 1,
 'agent-based-modeling': 1,
 'aggregation': 1,
 'agile': 12,
 'air': 12,
 'aix': 1,
 'ajax': 45,
 

In [100]:
X_tag = np.array(freq)
X_tag

      dtype=object)

In [101]:
import matplotlib.pyplot as plt

In [102]:
X_tag

      dtype=object)

### Construct plot

### Dealing with Body

In [71]:
pd.set_option('display.max_colwidth', -1)

In [72]:
from tqdm import tqdm, tqdm_notebook # progress bars in Jupyter
tqdm.pandas()
tqdm_notebook()

0it [00:00, ?it/s]

In [73]:
data_imp[['Body']].head()

Unnamed: 0,Body
0,"<p>I want to use a track-bar to change a form's opacity.</p>\n\n<p>This is my code:</p>\n\n<pre><code>decimal trans = trackBar1.Value / 5000;\nthis.Opacity = trans;\n</code></pre>\n\n<p>When I build the application, it gives the following error:</p>\n\n<blockquote>\n <p>Cannot implicitly convert type <code>'decimal'</code> to <code>'double'</code>.</p>\n</blockquote>\n\n<p>I tried using <code>trans</code> and <code>double</code> but then the control doesn't work. This code worked fine in a past VB.NET project.</p>\n"
1,"<p>I have an absolutely positioned <code>div</code> containing several children, one of which is a relatively positioned <code>div</code>. When I use a <strong>percentage-based width</strong> on the child <code>div</code>, it collapses to '0' width on <a href=""http://en.wikipedia.org/wiki/Internet_Explorer_7"" rel=""noreferrer"">Internet&nbsp;Explorer&nbsp;7</a>, but not on Firefox or Safari.</p>\n\n<p>If I use <strong>pixel width</strong>, it works. If the parent is relatively positioned, the percentage width on the child works.</p>\n\n<ol>\n<li>Is there something I'm missing here?</li>\n<li>Is there an easy fix for this besides the <em>pixel-based width</em> on the\nchild?</li>\n<li>Is there an area of the CSS specification that covers this?</li>\n</ol>\n"
2,<p>An explicit cast to double like this isn't necessary:</p>\n\n<pre><code>double trans = (double) trackBar1.Value / 5000.0;\n</code></pre>\n\n<p>Identifying the constant as <code>5000.0</code> (or as <code>5000d</code>) is sufficient:</p>\n\n<pre><code>double trans = trackBar1.Value / 5000.0;\ndouble trans = trackBar1.Value / 5000d;\n</code></pre>\n
3,"<p>Given a <code>DateTime</code> representing a person's birthday, how do I calculate their age in years? </p>\n"
4,"<p>Given a specific <code>DateTime</code> value, how do I display relative time, like:</p>\n\n<ul>\n<li>2 hours ago</li>\n<li>3 days ago</li>\n<li>a month ago</li>\n</ul>\n"


In [130]:
# Cleaning text
def get_text(x):
    x = BeautifulSoup(x, 'lxml').get_text() # Get text in htlm with beautifulsoup
    tokenizer = nltk.RegexpTokenizer(r'\w+') # Use regular expression to delete \n
    x = tokenizer.tokenize(x.lower()) # apply on x by tranforming x in lower character
    return x

In [131]:
data_imp['token_body'] = data_imp['Body'].apply(get_text)
data_imp[['token_body']].head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,token_body
0,"[i, want, to, use, a, track, bar, to, change, a, form, s, opacity, this, is, my, code, decimal, trans, trackbar1, value, 5000, this, opacity, trans, when, i, build, the, application, it, gives, the, following, error, cannot, implicitly, convert, type, decimal, to, double, i, tried, using, trans, and, double, but, then, the, control, doesn, t, work, this, code, worked, fine, in, a, past, vb, net, project]"
1,"[i, have, an, absolutely, positioned, div, containing, several, children, one, of, which, is, a, relatively, positioned, div, when, i, use, a, percentage, based, width, on, the, child, div, it, collapses, to, 0, width, on, internet, explorer, 7, but, not, on, firefox, or, safari, if, i, use, pixel, width, it, works, if, the, parent, is, relatively, positioned, the, percentage, width, on, the, child, works, is, there, something, i, m, missing, here, is, there, an, easy, fix, for, this, besides, the, pixel, based, width, on, the, child, is, there, an, area, of, the, css, specification, that, covers, this]"
2,"[an, explicit, cast, to, double, like, this, isn, t, necessary, double, trans, double, trackbar1, value, 5000, 0, identifying, the, constant, as, 5000, 0, or, as, 5000d, is, sufficient, double, trans, trackbar1, value, 5000, 0, double, trans, trackbar1, value, 5000d]"
3,"[given, a, datetime, representing, a, person, s, birthday, how, do, i, calculate, their, age, in, years]"
4,"[given, a, specific, datetime, value, how, do, i, display, relative, time, like, 2, hours, ago, 3, days, ago, a, month, ago]"


In [134]:
# Define stopwords
sw = stop_words.get_stop_words(language='en')

In [139]:
# Delete stopwords with 1-gram
data_imp['tokens_clean'] = data_imp.token_body.map(
    lambda tok: [t.lower() for t in re.split(" ", re.sub(r"(\W+|_|\d+)", " ", " ".join(tok)))
                 if t.lower() not in sw and len(t) > 1]
)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until


In [140]:
data_imp[['tokens_clean']].head()

Unnamed: 0,tokens_clean
0,"[want, use, track, bar, change, form, opacity, code, decimal, trans, trackbar, value, opacity, trans, build, application, gives, following, error, implicitly, convert, type, decimal, double, tried, using, trans, double, control, doesn, work, code, worked, fine, past, vb, net, project]"
1,"[absolutely, positioned, div, containing, several, children, one, relatively, positioned, div, use, percentage, based, width, child, div, collapses, width, internet, explorer, firefox, safari, use, pixel, width, works, parent, relatively, positioned, percentage, width, child, works, something, missing, easy, fix, besides, pixel, based, width, child, area, css, specification, covers]"
2,"[explicit, cast, double, like, isn, necessary, double, trans, double, trackbar, value, identifying, constant, sufficient, double, trans, trackbar, value, double, trans, trackbar, value]"
3,"[given, datetime, representing, person, birthday, calculate, age, years]"
4,"[given, specific, datetime, value, display, relative, time, like, hours, ago, days, ago, month, ago]"


In [154]:
# Lemmatization
lem = spacy.load('en_core_web_md')
data_imp['tokens_clean_lemma'] = data_imp.tokens_clean.progress_map(
    lambda x: [tok.lemma_ for tok in lem(' '.join(x))])
data_imp.head()










  0%|                                                                                        | 0/27412 [00:00<?, ?it/s]








  0%|                                                                              | 2/27412 [00:00<2:35:15,  2.94it/s]








  0%|                                                                              | 7/27412 [00:00<1:51:34,  4.09it/s]








  0%|                                                                             | 12/27412 [00:00<1:21:54,  5.58it/s]








  0%|                                                                             | 15/27412 [00:01<1:04:03,  7.13it/s]








  0%|                                                                               | 19/27412 [00:01<49:10,  9.28it/s]








  0%|                                                                               | 25/27412 [00:01<37:18, 12.24it/s]








  0%|                                                                               | 29/27412 

  1%|▊                                                                             | 268/27412 [00:09<13:28, 33.56it/s]








  1%|▊                                                                             | 272/27412 [00:09<13:19, 33.93it/s]








  1%|▊                                                                             | 276/27412 [00:09<12:50, 35.20it/s]








  1%|▊                                                                             | 281/27412 [00:09<11:55, 37.94it/s]








  1%|▊                                                                             | 285/27412 [00:09<12:53, 35.06it/s]








  1%|▊                                                                             | 289/27412 [00:09<12:32, 36.04it/s]








  1%|▊                                                                             | 295/27412 [00:10<11:36, 38.92it/s]








  1%|▊                                                                             | 300/27412 [00:10<11

  2%|█▋                                                                            | 585/27412 [00:17<11:30, 38.86it/s]








  2%|█▋                                                                            | 592/27412 [00:17<10:23, 42.98it/s]








  2%|█▋                                                                            | 597/27412 [00:17<10:34, 42.26it/s]








  2%|█▋                                                                            | 602/27412 [00:17<12:33, 35.56it/s]








  2%|█▋                                                                            | 606/27412 [00:17<12:17, 36.35it/s]








  2%|█▋                                                                            | 612/27412 [00:18<11:26, 39.03it/s]








  2%|█▊                                                                            | 618/27412 [00:18<10:38, 41.96it/s]








  2%|█▊                                                                            | 623/27412 [00:18<11

  3%|██▌                                                                           | 919/27412 [00:25<10:09, 43.50it/s]








  3%|██▋                                                                           | 924/27412 [00:25<10:30, 41.99it/s]








  3%|██▋                                                                           | 929/27412 [00:25<10:40, 41.32it/s]








  3%|██▋                                                                           | 935/27412 [00:25<10:16, 42.98it/s]








  3%|██▋                                                                           | 940/27412 [00:25<10:06, 43.62it/s]








  3%|██▋                                                                           | 945/27412 [00:25<09:48, 44.97it/s]








  3%|██▋                                                                           | 950/27412 [00:26<09:47, 45.07it/s]








  3%|██▋                                                                           | 955/27412 [00:26<09

  5%|███▌                                                                         | 1255/27412 [00:33<12:57, 33.66it/s]








  5%|███▌                                                                         | 1259/27412 [00:33<13:49, 31.54it/s]








  5%|███▌                                                                         | 1264/27412 [00:33<13:19, 32.69it/s]








  5%|███▌                                                                         | 1272/27412 [00:34<11:12, 38.87it/s]








  5%|███▌                                                                         | 1277/27412 [00:34<10:52, 40.05it/s]








  5%|███▌                                                                         | 1282/27412 [00:34<11:01, 39.51it/s]








  5%|███▌                                                                         | 1287/27412 [00:34<11:48, 36.85it/s]








  5%|███▋                                                                         | 1291/27412 [00:34<12

  6%|████▌                                                                        | 1603/27412 [00:41<10:32, 40.79it/s]








  6%|████▌                                                                        | 1609/27412 [00:42<09:53, 43.46it/s]








  6%|████▌                                                                        | 1614/27412 [00:42<10:00, 42.98it/s]








  6%|████▌                                                                        | 1620/27412 [00:42<09:42, 44.24it/s]








  6%|████▌                                                                        | 1626/27412 [00:42<09:02, 47.54it/s]








  6%|████▌                                                                        | 1632/27412 [00:42<08:41, 49.42it/s]








  6%|████▌                                                                        | 1638/27412 [00:42<08:50, 48.59it/s]








  6%|████▌                                                                        | 1644/27412 [00:42<08

  7%|█████▍                                                                       | 1954/27412 [00:49<09:55, 42.76it/s]








  7%|█████▌                                                                       | 1959/27412 [00:49<09:38, 43.96it/s]








  7%|█████▌                                                                       | 1964/27412 [00:49<09:38, 44.01it/s]








  7%|█████▌                                                                       | 1970/27412 [00:50<09:07, 46.47it/s]








  7%|█████▌                                                                       | 1977/27412 [00:50<08:23, 50.49it/s]








  7%|█████▌                                                                       | 1983/27412 [00:50<10:40, 39.73it/s]








  7%|█████▌                                                                       | 1988/27412 [00:50<10:43, 39.48it/s]








  7%|█████▌                                                                       | 1993/27412 [00:50<11

  8%|██████▍                                                                      | 2295/27412 [00:58<12:46, 32.77it/s]








  8%|██████▍                                                                      | 2299/27412 [00:58<12:35, 33.24it/s]








  8%|██████▍                                                                      | 2306/27412 [00:58<10:49, 38.65it/s]








  8%|██████▍                                                                      | 2312/27412 [00:58<10:05, 41.44it/s]








  8%|██████▌                                                                      | 2318/27412 [00:58<09:32, 43.85it/s]








  8%|██████▌                                                                      | 2323/27412 [00:58<09:26, 44.28it/s]








  8%|██████▌                                                                      | 2328/27412 [00:58<09:19, 44.81it/s]








  9%|██████▌                                                                      | 2335/27412 [00:58<08

 10%|███████▎                                                                     | 2614/27412 [01:07<14:58, 27.61it/s]








 10%|███████▎                                                                     | 2618/27412 [01:07<14:17, 28.90it/s]








 10%|███████▎                                                                     | 2622/27412 [01:07<13:36, 30.36it/s]








 10%|███████▍                                                                     | 2627/27412 [01:07<12:30, 33.01it/s]








 10%|███████▍                                                                     | 2633/27412 [01:07<11:08, 37.06it/s]








 10%|███████▍                                                                     | 2637/27412 [01:07<11:05, 37.21it/s]








 10%|███████▍                                                                     | 2644/27412 [01:07<09:40, 42.64it/s]








 10%|███████▍                                                                     | 2649/27412 [01:07<11

 11%|████████▎                                                                    | 2951/27412 [01:15<09:23, 43.40it/s]








 11%|████████▎                                                                    | 2958/27412 [01:15<08:21, 48.80it/s]








 11%|████████▎                                                                    | 2964/27412 [01:15<10:20, 39.41it/s]








 11%|████████▎                                                                    | 2969/27412 [01:15<10:51, 37.55it/s]








 11%|████████▎                                                                    | 2974/27412 [01:16<10:59, 37.07it/s]








 11%|████████▎                                                                    | 2980/27412 [01:16<10:32, 38.64it/s]








 11%|████████▍                                                                    | 2985/27412 [01:16<10:52, 37.44it/s]








 11%|████████▍                                                                    | 2989/27412 [01:16<10

 12%|█████████▎                                                                   | 3303/27412 [01:24<10:05, 39.80it/s]








 12%|█████████▎                                                                   | 3308/27412 [01:24<12:28, 32.19it/s]








 12%|█████████▎                                                                   | 3313/27412 [01:24<11:35, 34.67it/s]








 12%|█████████▎                                                                   | 3318/27412 [01:24<10:54, 36.80it/s]








 12%|█████████▎                                                                   | 3323/27412 [01:24<11:40, 34.37it/s]








 12%|█████████▎                                                                   | 3327/27412 [01:24<11:27, 35.05it/s]








 12%|█████████▎                                                                   | 3331/27412 [01:25<11:21, 35.31it/s]








 12%|█████████▎                                                                   | 3335/27412 [01:25<12

 13%|██████████▏                                                                  | 3627/27412 [01:32<13:45, 28.81it/s]








 13%|██████████▏                                                                  | 3631/27412 [01:33<12:50, 30.85it/s]








 13%|██████████▏                                                                  | 3636/27412 [01:33<11:29, 34.47it/s]








 13%|██████████▏                                                                  | 3642/27412 [01:33<10:24, 38.06it/s]








 13%|██████████▏                                                                  | 3647/27412 [01:33<10:07, 39.10it/s]








 13%|██████████▎                                                                  | 3652/27412 [01:33<10:41, 37.05it/s]








 13%|██████████▎                                                                  | 3656/27412 [01:33<11:12, 35.34it/s]








 13%|██████████▎                                                                  | 3660/27412 [01:33<10

 14%|███████████▏                                                                 | 3965/27412 [01:40<08:58, 43.52it/s]








 14%|███████████▏                                                                 | 3970/27412 [01:40<08:57, 43.63it/s]








 15%|███████████▏                                                                 | 3977/27412 [01:41<07:59, 48.83it/s]








 15%|███████████▏                                                                 | 3983/27412 [01:41<08:06, 48.17it/s]








 15%|███████████▏                                                                 | 3989/27412 [01:41<07:58, 48.92it/s]








 15%|███████████▏                                                                 | 3995/27412 [01:41<08:48, 44.30it/s]








 15%|███████████▏                                                                 | 4001/27412 [01:41<08:09, 47.79it/s]








 15%|███████████▎                                                                 | 4006/27412 [01:41<09

 16%|████████████▏                                                                | 4320/27412 [01:48<08:40, 44.36it/s]








 16%|████████████▏                                                                | 4326/27412 [01:48<08:05, 47.52it/s]








 16%|████████████▏                                                                | 4331/27412 [01:48<08:04, 47.60it/s]








 16%|████████████▏                                                                | 4336/27412 [01:48<08:19, 46.19it/s]








 16%|████████████▏                                                                | 4341/27412 [01:48<08:52, 43.31it/s]








 16%|████████████▏                                                                | 4348/27412 [01:49<08:02, 47.76it/s]








 16%|████████████▏                                                                | 4353/27412 [01:49<08:01, 47.87it/s]








 16%|████████████▏                                                                | 4358/27412 [01:49<08

 17%|█████████████                                                                | 4668/27412 [01:56<11:00, 34.45it/s]








 17%|█████████████                                                                | 4672/27412 [01:56<11:15, 33.65it/s]








 17%|█████████████▏                                                               | 4676/27412 [01:56<11:38, 32.54it/s]








 17%|█████████████▏                                                               | 4680/27412 [01:56<11:03, 34.27it/s]








 17%|█████████████▏                                                               | 4684/27412 [01:56<11:23, 33.27it/s]








 17%|█████████████▏                                                               | 4689/27412 [01:56<10:36, 35.69it/s]








 17%|█████████████▏                                                               | 4693/27412 [01:57<12:05, 31.30it/s]








 17%|█████████████▏                                                               | 4698/27412 [01:57<10

 18%|██████████████▏                                                              | 5063/27412 [02:04<09:10, 40.61it/s]








 18%|██████████████▏                                                              | 5068/27412 [02:04<09:04, 41.04it/s]








 19%|██████████████▎                                                              | 5074/27412 [02:04<08:38, 43.10it/s]








 19%|██████████████▎                                                              | 5081/27412 [02:04<07:39, 48.59it/s]








 19%|██████████████▎                                                              | 5088/27412 [02:04<07:06, 52.38it/s]








 19%|██████████████▎                                                              | 5094/27412 [02:04<07:01, 52.97it/s]








 19%|██████████████▎                                                              | 5100/27412 [02:04<07:43, 48.17it/s]








 19%|██████████████▎                                                              | 5106/27412 [02:05<07

 20%|███████████████▎                                                             | 5465/27412 [02:11<06:54, 52.94it/s]








 20%|███████████████▎                                                             | 5471/27412 [02:12<06:43, 54.38it/s]








 20%|███████████████▍                                                             | 5477/27412 [02:12<06:54, 52.94it/s]








 20%|███████████████▍                                                             | 5483/27412 [02:12<08:21, 43.69it/s]








 20%|███████████████▍                                                             | 5488/27412 [02:12<09:19, 39.17it/s]








 20%|███████████████▍                                                             | 5493/27412 [02:12<09:12, 39.67it/s]








 20%|███████████████▍                                                             | 5498/27412 [02:12<08:59, 40.61it/s]








 20%|███████████████▍                                                             | 5503/27412 [02:12<09

 21%|████████████████▍                                                            | 5855/27412 [02:20<08:35, 41.81it/s]








 21%|████████████████▍                                                            | 5862/27412 [02:20<07:33, 47.50it/s]








 21%|████████████████▍                                                            | 5868/27412 [02:20<07:07, 50.44it/s]








 21%|████████████████▌                                                            | 5875/27412 [02:20<06:37, 54.19it/s]








 21%|████████████████▌                                                            | 5881/27412 [02:20<06:51, 52.36it/s]








 21%|████████████████▌                                                            | 5887/27412 [02:20<07:12, 49.83it/s]








 21%|████████████████▌                                                            | 5893/27412 [02:20<07:21, 48.74it/s]








 22%|████████████████▌                                                            | 5899/27412 [02:20<07

 23%|█████████████████▍                                                           | 6218/27412 [02:28<06:49, 51.71it/s]








 23%|█████████████████▍                                                           | 6224/27412 [02:28<07:09, 49.37it/s]








 23%|█████████████████▌                                                           | 6230/27412 [02:28<06:51, 51.43it/s]








 23%|█████████████████▌                                                           | 6236/27412 [02:29<07:24, 47.64it/s]








 23%|█████████████████▌                                                           | 6241/27412 [02:29<08:44, 40.37it/s]








 23%|█████████████████▌                                                           | 6248/27412 [02:29<07:39, 46.03it/s]








 23%|█████████████████▌                                                           | 6254/27412 [02:29<07:14, 48.73it/s]








 23%|█████████████████▌                                                           | 6262/27412 [02:29<06

 24%|██████████████████▍                                                          | 6565/27412 [02:37<05:47, 59.96it/s]








 24%|██████████████████▍                                                          | 6573/27412 [02:38<05:49, 59.70it/s]








 24%|██████████████████▍                                                          | 6580/27412 [02:38<05:43, 60.62it/s]








 24%|██████████████████▌                                                          | 6589/27412 [02:38<05:20, 64.97it/s]








 24%|██████████████████▌                                                          | 6596/27412 [02:38<05:49, 59.64it/s]








 24%|██████████████████▌                                                          | 6603/27412 [02:38<07:35, 45.66it/s]








 24%|██████████████████▌                                                          | 6609/27412 [02:38<07:34, 45.77it/s]








 24%|██████████████████▌                                                          | 6615/27412 [02:38<07

 25%|███████████████████▌                                                         | 6985/27412 [02:46<09:28, 35.95it/s]








 26%|███████████████████▋                                                         | 6992/27412 [02:46<08:12, 41.48it/s]








 26%|███████████████████▋                                                         | 6998/27412 [02:46<07:32, 45.16it/s]








 26%|███████████████████▋                                                         | 7005/27412 [02:46<06:44, 50.40it/s]








 26%|███████████████████▋                                                         | 7011/27412 [02:46<06:35, 51.54it/s]








 26%|███████████████████▋                                                         | 7017/27412 [02:46<07:15, 46.83it/s]








 26%|███████████████████▋                                                         | 7023/27412 [02:47<07:40, 44.31it/s]








 26%|███████████████████▋                                                         | 7029/27412 [02:47<07

 27%|████████████████████▌                                                        | 7340/27412 [02:54<07:08, 46.82it/s]








 27%|████████████████████▋                                                        | 7345/27412 [02:54<07:22, 45.35it/s]








 27%|████████████████████▋                                                        | 7350/27412 [02:54<07:45, 43.11it/s]








 27%|████████████████████▋                                                        | 7355/27412 [02:55<07:36, 43.93it/s]








 27%|████████████████████▋                                                        | 7361/27412 [02:55<07:03, 47.39it/s]








 27%|████████████████████▋                                                        | 7367/27412 [02:55<06:40, 50.04it/s]








 27%|████████████████████▋                                                        | 7373/27412 [02:55<06:30, 51.34it/s]








 27%|████████████████████▋                                                        | 7379/27412 [02:55<07

 28%|█████████████████████▋                                                       | 7731/27412 [03:02<10:01, 32.73it/s]








 28%|█████████████████████▋                                                       | 7737/27412 [03:02<08:46, 37.35it/s]








 28%|█████████████████████▊                                                       | 7744/27412 [03:02<08:32, 38.36it/s]








 28%|█████████████████████▊                                                       | 7752/27412 [03:03<07:17, 44.97it/s]








 28%|█████████████████████▊                                                       | 7758/27412 [03:03<07:15, 45.18it/s]








 28%|█████████████████████▊                                                       | 7764/27412 [03:03<07:05, 46.17it/s]








 28%|█████████████████████▊                                                       | 7772/27412 [03:03<06:19, 51.71it/s]








 28%|█████████████████████▊                                                       | 7779/27412 [03:03<05

 30%|██████████████████████▉                                                      | 8147/27412 [03:10<06:23, 50.28it/s]








 30%|██████████████████████▉                                                      | 8153/27412 [03:10<06:38, 48.33it/s]








 30%|██████████████████████▉                                                      | 8159/27412 [03:10<06:18, 50.90it/s]








 30%|██████████████████████▉                                                      | 8165/27412 [03:11<06:26, 49.83it/s]








 30%|██████████████████████▉                                                      | 8171/27412 [03:11<06:07, 52.32it/s]








 30%|██████████████████████▉                                                      | 8177/27412 [03:11<07:58, 40.18it/s]








 30%|██████████████████████▉                                                      | 8186/27412 [03:11<06:49, 46.92it/s]








 30%|███████████████████████                                                      | 8192/27412 [03:11<06

 31%|███████████████████████▉                                                     | 8525/27412 [03:18<07:26, 42.26it/s]








 31%|███████████████████████▉                                                     | 8530/27412 [03:19<07:37, 41.31it/s]








 31%|███████████████████████▉                                                     | 8535/27412 [03:19<07:23, 42.60it/s]








 31%|███████████████████████▉                                                     | 8541/27412 [03:19<06:48, 46.25it/s]








 31%|████████████████████████                                                     | 8547/27412 [03:19<06:34, 47.85it/s]








 31%|████████████████████████                                                     | 8553/27412 [03:19<06:31, 48.20it/s]








 31%|████████████████████████                                                     | 8558/27412 [03:19<06:36, 47.50it/s]








 31%|████████████████████████                                                     | 8564/27412 [03:19<07

 32%|████████████████████████▉                                                    | 8861/27412 [03:27<09:25, 32.81it/s]








 32%|████████████████████████▉                                                    | 8865/27412 [03:27<08:56, 34.56it/s]








 32%|████████████████████████▉                                                    | 8869/27412 [03:27<09:01, 34.24it/s]








 32%|████████████████████████▉                                                    | 8874/27412 [03:27<08:28, 36.47it/s]








 32%|████████████████████████▉                                                    | 8878/27412 [03:27<09:17, 33.22it/s]








 32%|████████████████████████▉                                                    | 8882/27412 [03:27<09:51, 31.30it/s]








 32%|████████████████████████▉                                                    | 8888/27412 [03:28<08:32, 36.16it/s]








 32%|████████████████████████▉                                                    | 8895/27412 [03:28<07

 34%|█████████████████████████▊                                                   | 9188/27412 [03:35<07:50, 38.75it/s]








 34%|█████████████████████████▊                                                   | 9193/27412 [03:35<07:58, 38.10it/s]








 34%|█████████████████████████▊                                                   | 9197/27412 [03:35<08:31, 35.62it/s]








 34%|█████████████████████████▊                                                   | 9201/27412 [03:35<08:36, 35.24it/s]








 34%|█████████████████████████▊                                                   | 9206/27412 [03:35<09:25, 32.20it/s]








 34%|█████████████████████████▊                                                   | 9211/27412 [03:35<08:33, 35.45it/s]








 34%|█████████████████████████▉                                                   | 9217/27412 [03:35<07:46, 39.01it/s]








 34%|█████████████████████████▉                                                   | 9222/27412 [03:36<07

 35%|██████████████████████████▋                                                  | 9504/27412 [03:43<07:54, 37.71it/s]








 35%|██████████████████████████▋                                                  | 9509/27412 [03:43<07:36, 39.24it/s]








 35%|██████████████████████████▋                                                  | 9514/27412 [03:43<07:35, 39.33it/s]








 35%|██████████████████████████▋                                                  | 9520/27412 [03:43<06:48, 43.77it/s]








 35%|██████████████████████████▊                                                  | 9525/27412 [03:43<06:51, 43.42it/s]








 35%|██████████████████████████▊                                                  | 9530/27412 [03:44<06:47, 43.86it/s]








 35%|██████████████████████████▊                                                  | 9537/27412 [03:44<06:10, 48.23it/s]








 35%|██████████████████████████▊                                                  | 9543/27412 [03:44<06

 36%|███████████████████████████▋                                                 | 9846/27412 [03:51<06:04, 48.13it/s]








 36%|███████████████████████████▋                                                 | 9851/27412 [03:51<06:16, 46.59it/s]








 36%|███████████████████████████▋                                                 | 9856/27412 [03:51<06:17, 46.46it/s]








 36%|███████████████████████████▋                                                 | 9862/27412 [03:51<06:02, 48.47it/s]








 36%|███████████████████████████▋                                                 | 9870/27412 [03:51<05:19, 54.94it/s]








 36%|███████████████████████████▋                                                 | 9876/27412 [03:51<05:52, 49.72it/s]








 36%|███████████████████████████▊                                                 | 9882/27412 [03:52<06:37, 44.05it/s]








 36%|███████████████████████████▊                                                 | 9887/27412 [03:52<06

 37%|████████████████████████████▎                                               | 10191/27412 [03:59<06:04, 47.24it/s]








 37%|████████████████████████████▎                                               | 10197/27412 [03:59<05:43, 50.05it/s]








 37%|████████████████████████████▎                                               | 10203/27412 [03:59<06:12, 46.17it/s]








 37%|████████████████████████████▎                                               | 10209/27412 [03:59<06:18, 45.39it/s]








 37%|████████████████████████████▎                                               | 10214/27412 [03:59<06:25, 44.64it/s]








 37%|████████████████████████████▎                                               | 10222/27412 [04:00<05:51, 48.92it/s]








 37%|████████████████████████████▎                                               | 10228/27412 [04:00<05:51, 48.84it/s]








 37%|████████████████████████████▎                                               | 10234/27412 [04:00<06

 38%|█████████████████████████████▏                                              | 10534/27412 [04:07<06:35, 42.70it/s]








 38%|█████████████████████████████▏                                              | 10539/27412 [04:07<06:41, 42.03it/s]








 38%|█████████████████████████████▏                                              | 10545/27412 [04:07<06:18, 44.55it/s]








 38%|█████████████████████████████▏                                              | 10550/27412 [04:07<06:30, 43.22it/s]








 39%|█████████████████████████████▎                                              | 10555/27412 [04:07<06:18, 44.54it/s]








 39%|█████████████████████████████▎                                              | 10560/27412 [04:08<06:34, 42.75it/s]








 39%|█████████████████████████████▎                                              | 10565/27412 [04:08<06:30, 43.09it/s]








 39%|█████████████████████████████▎                                              | 10570/27412 [04:08<06

 40%|██████████████████████████████▏                                             | 10868/27412 [04:15<06:58, 39.52it/s]








 40%|██████████████████████████████▏                                             | 10873/27412 [04:15<06:58, 39.48it/s]








 40%|██████████████████████████████▏                                             | 10878/27412 [04:15<06:58, 39.50it/s]








 40%|██████████████████████████████▏                                             | 10883/27412 [04:15<06:47, 40.57it/s]








 40%|██████████████████████████████▏                                             | 10888/27412 [04:15<06:28, 42.54it/s]








 40%|██████████████████████████████▏                                             | 10896/27412 [04:15<05:34, 49.43it/s]








 40%|██████████████████████████████▏                                             | 10903/27412 [04:15<05:08, 53.49it/s]








 40%|██████████████████████████████▏                                             | 10909/27412 [04:15<05

 41%|███████████████████████████████                                             | 11217/27412 [04:23<06:09, 43.83it/s]








 41%|███████████████████████████████                                             | 11223/27412 [04:23<05:45, 46.92it/s]








 41%|███████████████████████████████▏                                            | 11229/27412 [04:23<05:27, 49.42it/s]








 41%|███████████████████████████████▏                                            | 11235/27412 [04:23<05:32, 48.71it/s]








 41%|███████████████████████████████▏                                            | 11241/27412 [04:23<05:21, 50.29it/s]








 41%|███████████████████████████████▏                                            | 11247/27412 [04:23<05:59, 44.96it/s]








 41%|███████████████████████████████▏                                            | 11253/27412 [04:23<05:35, 48.16it/s]








 41%|███████████████████████████████▏                                            | 11258/27412 [04:23<05

 42%|████████████████████████████████                                            | 11563/27412 [04:31<06:37, 39.88it/s]








 42%|████████████████████████████████                                            | 11568/27412 [04:31<07:59, 33.06it/s]








 42%|████████████████████████████████                                            | 11572/27412 [04:31<08:50, 29.88it/s]








 42%|████████████████████████████████                                            | 11576/27412 [04:31<10:12, 25.86it/s]








 42%|████████████████████████████████                                            | 11579/27412 [04:31<10:40, 24.71it/s]








 42%|████████████████████████████████                                            | 11582/27412 [04:31<13:32, 19.48it/s]








 42%|████████████████████████████████                                            | 11585/27412 [04:32<13:53, 18.98it/s]








 42%|████████████████████████████████▏                                           | 11588/27412 [04:32<12

 43%|████████████████████████████████▋                                           | 11810/27412 [04:40<07:31, 34.57it/s]








 43%|████████████████████████████████▊                                           | 11815/27412 [04:40<07:35, 34.27it/s]








 43%|████████████████████████████████▊                                           | 11819/27412 [04:40<08:57, 29.03it/s]








 43%|████████████████████████████████▊                                           | 11824/27412 [04:40<08:31, 30.47it/s]








 43%|████████████████████████████████▊                                           | 11828/27412 [04:40<08:12, 31.64it/s]








 43%|████████████████████████████████▊                                           | 11832/27412 [04:40<08:10, 31.74it/s]








 43%|████████████████████████████████▊                                           | 11836/27412 [04:40<07:41, 33.72it/s]








 43%|████████████████████████████████▊                                           | 11842/27412 [04:41<06

 44%|█████████████████████████████████▋                                          | 12131/27412 [04:48<06:42, 37.99it/s]








 44%|█████████████████████████████████▋                                          | 12136/27412 [04:48<06:13, 40.92it/s]








 44%|█████████████████████████████████▋                                          | 12141/27412 [04:48<05:54, 43.13it/s]








 44%|█████████████████████████████████▋                                          | 12147/27412 [04:48<05:37, 45.29it/s]








 44%|█████████████████████████████████▋                                          | 12152/27412 [04:48<05:41, 44.67it/s]








 44%|█████████████████████████████████▋                                          | 12157/27412 [04:48<05:43, 44.37it/s]








 44%|█████████████████████████████████▋                                          | 12162/27412 [04:48<06:10, 41.15it/s]








 44%|█████████████████████████████████▋                                          | 12167/27412 [04:49<06

 45%|██████████████████████████████████▌                                         | 12447/27412 [04:56<06:07, 40.71it/s]








 45%|██████████████████████████████████▌                                         | 12452/27412 [04:56<06:27, 38.58it/s]








 45%|██████████████████████████████████▌                                         | 12457/27412 [04:56<06:42, 37.17it/s]








 45%|██████████████████████████████████▌                                         | 12462/27412 [04:56<06:18, 39.46it/s]








 45%|██████████████████████████████████▌                                         | 12467/27412 [04:56<06:25, 38.80it/s]








 45%|██████████████████████████████████▌                                         | 12472/27412 [04:57<06:08, 40.55it/s]








 46%|██████████████████████████████████▌                                         | 12477/27412 [04:57<06:04, 40.95it/s]








 46%|██████████████████████████████████▌                                         | 12482/27412 [04:57<06

 47%|███████████████████████████████████▍                                        | 12796/27412 [05:04<06:12, 39.21it/s]








 47%|███████████████████████████████████▍                                        | 12801/27412 [05:04<06:27, 37.67it/s]








 47%|███████████████████████████████████▌                                        | 12806/27412 [05:04<06:13, 39.13it/s]








 47%|███████████████████████████████████▌                                        | 12811/27412 [05:04<06:07, 39.73it/s]








 47%|███████████████████████████████████▌                                        | 12818/27412 [05:04<05:30, 44.18it/s]








 47%|███████████████████████████████████▌                                        | 12824/27412 [05:05<05:28, 44.36it/s]








 47%|███████████████████████████████████▌                                        | 12829/27412 [05:05<05:25, 44.79it/s]








 47%|███████████████████████████████████▌                                        | 12834/27412 [05:05<06

 48%|████████████████████████████████████▍                                       | 13139/27412 [05:12<05:49, 40.79it/s]








 48%|████████████████████████████████████▍                                       | 13144/27412 [05:12<05:48, 40.91it/s]








 48%|████████████████████████████████████▍                                       | 13149/27412 [05:12<06:17, 37.78it/s]








 48%|████████████████████████████████████▍                                       | 13154/27412 [05:12<06:06, 38.93it/s]








 48%|████████████████████████████████████▍                                       | 13158/27412 [05:12<06:09, 38.60it/s]








 48%|████████████████████████████████████▍                                       | 13163/27412 [05:13<06:01, 39.39it/s]








 48%|████████████████████████████████████▌                                       | 13170/27412 [05:13<05:21, 44.34it/s]








 48%|████████████████████████████████████▌                                       | 13176/27412 [05:13<05

 49%|█████████████████████████████████████▎                                      | 13445/27412 [05:20<06:34, 35.39it/s]








 49%|█████████████████████████████████████▎                                      | 13450/27412 [05:20<06:04, 38.32it/s]








 49%|█████████████████████████████████████▎                                      | 13455/27412 [05:20<05:43, 40.58it/s]








 49%|█████████████████████████████████████▎                                      | 13460/27412 [05:20<05:52, 39.57it/s]








 49%|█████████████████████████████████████▎                                      | 13465/27412 [05:20<05:34, 41.69it/s]








 49%|█████████████████████████████████████▎                                      | 13470/27412 [05:20<05:45, 40.36it/s]








 49%|█████████████████████████████████████▎                                      | 13476/27412 [05:20<05:18, 43.72it/s]








 49%|█████████████████████████████████████▍                                      | 13482/27412 [05:21<05

 50%|██████████████████████████████████████▏                                     | 13754/27412 [05:28<06:23, 35.66it/s]








 50%|██████████████████████████████████████▏                                     | 13761/27412 [05:28<05:28, 41.57it/s]








 50%|██████████████████████████████████████▏                                     | 13766/27412 [05:28<05:12, 43.70it/s]








 50%|██████████████████████████████████████▏                                     | 13772/27412 [05:28<04:57, 45.88it/s]








 50%|██████████████████████████████████████▏                                     | 13778/27412 [05:28<04:36, 49.29it/s]








 50%|██████████████████████████████████████▏                                     | 13785/27412 [05:28<04:20, 52.32it/s]








 50%|██████████████████████████████████████▏                                     | 13791/27412 [05:28<04:39, 48.75it/s]








 50%|██████████████████████████████████████▎                                     | 13798/27412 [05:28<04

 51%|███████████████████████████████████████                                     | 14093/27412 [05:35<05:12, 42.69it/s]








 51%|███████████████████████████████████████                                     | 14098/27412 [05:35<05:09, 43.00it/s]








 51%|███████████████████████████████████████                                     | 14103/27412 [05:36<05:37, 39.41it/s]








 51%|███████████████████████████████████████                                     | 14109/27412 [05:36<05:07, 43.22it/s]








 51%|███████████████████████████████████████▏                                    | 14114/27412 [05:36<05:21, 41.30it/s]








 52%|███████████████████████████████████████▏                                    | 14119/27412 [05:36<05:27, 40.63it/s]








 52%|███████████████████████████████████████▏                                    | 14125/27412 [05:36<04:55, 44.96it/s]








 52%|███████████████████████████████████████▏                                    | 14130/27412 [05:36<05

 52%|███████████████████████████████████████▉                                    | 14389/27412 [05:44<11:08, 19.47it/s]








 53%|███████████████████████████████████████▉                                    | 14392/27412 [05:44<10:07, 21.42it/s]








 53%|███████████████████████████████████████▉                                    | 14395/27412 [05:44<10:42, 20.27it/s]








 53%|███████████████████████████████████████▉                                    | 14398/27412 [05:44<11:53, 18.25it/s]








 53%|███████████████████████████████████████▉                                    | 14403/27412 [05:45<09:52, 21.97it/s]








 53%|███████████████████████████████████████▉                                    | 14407/27412 [05:45<08:51, 24.46it/s]








 53%|███████████████████████████████████████▉                                    | 14411/27412 [05:45<08:31, 25.41it/s]








 53%|███████████████████████████████████████▉                                    | 14415/27412 [05:45<07

 53%|████████████████████████████████████████▍                                   | 14579/27412 [05:55<09:48, 21.81it/s]








 53%|████████████████████████████████████████▍                                   | 14582/27412 [05:55<09:09, 23.35it/s]








 53%|████████████████████████████████████████▍                                   | 14586/27412 [05:55<08:05, 26.43it/s]








 53%|████████████████████████████████████████▍                                   | 14590/27412 [05:55<08:02, 26.59it/s]








 53%|████████████████████████████████████████▍                                   | 14594/27412 [05:56<07:15, 29.44it/s]








 53%|████████████████████████████████████████▍                                   | 14599/27412 [05:56<06:34, 32.44it/s]








 53%|████████████████████████████████████████▍                                   | 14603/27412 [05:56<06:44, 31.70it/s]








 53%|████████████████████████████████████████▍                                   | 14607/27412 [05:56<06

 54%|█████████████████████████████████████████▏                                  | 14839/27412 [06:04<17:07, 12.23it/s]








 54%|█████████████████████████████████████████▏                                  | 14841/27412 [06:04<15:22, 13.62it/s]








 54%|█████████████████████████████████████████▏                                  | 14843/27412 [06:04<14:02, 14.92it/s]








 54%|█████████████████████████████████████████▏                                  | 14846/27412 [06:04<12:54, 16.23it/s]








 54%|█████████████████████████████████████████▏                                  | 14848/27412 [06:05<14:12, 14.74it/s]








 54%|█████████████████████████████████████████▏                                  | 14851/27412 [06:05<12:05, 17.31it/s]








 54%|█████████████████████████████████████████▏                                  | 14853/27412 [06:05<11:37, 18.00it/s]








 54%|█████████████████████████████████████████▏                                  | 14856/27412 [06:05<10

 55%|█████████████████████████████████████████▉                                  | 15136/27412 [06:12<05:13, 39.20it/s]








 55%|█████████████████████████████████████████▉                                  | 15140/27412 [06:13<08:33, 23.90it/s]








 55%|█████████████████████████████████████████▉                                  | 15144/27412 [06:13<07:39, 26.73it/s]








 55%|██████████████████████████████████████████                                  | 15149/27412 [06:13<06:37, 30.87it/s]








 55%|██████████████████████████████████████████                                  | 15153/27412 [06:13<07:17, 28.04it/s]








 55%|██████████████████████████████████████████                                  | 15157/27412 [06:13<08:03, 25.35it/s]








 55%|██████████████████████████████████████████                                  | 15163/27412 [06:13<06:50, 29.81it/s]








 55%|██████████████████████████████████████████                                  | 15167/27412 [06:14<07

 56%|██████████████████████████████████████████▊                                 | 15438/27412 [06:22<06:56, 28.74it/s]








 56%|██████████████████████████████████████████▊                                 | 15442/27412 [06:22<09:10, 21.73it/s]








 56%|██████████████████████████████████████████▊                                 | 15448/27412 [06:22<07:27, 26.73it/s]








 56%|██████████████████████████████████████████▊                                 | 15452/27412 [06:22<07:43, 25.82it/s]








 56%|██████████████████████████████████████████▊                                 | 15456/27412 [06:22<07:11, 27.68it/s]








 56%|██████████████████████████████████████████▊                                 | 15460/27412 [06:22<06:44, 29.55it/s]








 56%|██████████████████████████████████████████▊                                 | 15464/27412 [06:23<06:26, 30.92it/s]








 56%|██████████████████████████████████████████▉                                 | 15468/27412 [06:23<06

 57%|███████████████████████████████████████████▋                                | 15752/27412 [06:30<03:34, 54.42it/s]








 57%|███████████████████████████████████████████▋                                | 15758/27412 [06:30<03:42, 52.27it/s]








 58%|███████████████████████████████████████████▋                                | 15764/27412 [06:30<04:29, 43.26it/s]








 58%|███████████████████████████████████████████▋                                | 15769/27412 [06:31<05:33, 34.88it/s]








 58%|███████████████████████████████████████████▋                                | 15775/27412 [06:31<04:57, 39.05it/s]








 58%|███████████████████████████████████████████▊                                | 15780/27412 [06:31<04:39, 41.66it/s]








 58%|███████████████████████████████████████████▊                                | 15785/27412 [06:31<04:31, 42.86it/s]








 58%|███████████████████████████████████████████▊                                | 15792/27412 [06:31<04

 59%|████████████████████████████████████████████▌                               | 16092/27412 [06:39<03:52, 48.69it/s]








 59%|████████████████████████████████████████████▋                               | 16098/27412 [06:39<03:39, 51.50it/s]








 59%|████████████████████████████████████████████▋                               | 16105/27412 [06:39<03:22, 55.90it/s]








 59%|████████████████████████████████████████████▋                               | 16111/27412 [06:39<03:39, 51.51it/s]








 59%|████████████████████████████████████████████▋                               | 16117/27412 [06:39<04:05, 46.05it/s]








 59%|████████████████████████████████████████████▋                               | 16123/27412 [06:39<03:48, 49.49it/s]








 59%|████████████████████████████████████████████▋                               | 16129/27412 [06:40<04:02, 46.51it/s]








 59%|████████████████████████████████████████████▋                               | 16134/27412 [06:40<04

 60%|█████████████████████████████████████████████▊                              | 16519/27412 [06:47<03:22, 53.77it/s]








 60%|█████████████████████████████████████████████▊                              | 16526/27412 [06:47<03:08, 57.74it/s]








 60%|█████████████████████████████████████████████▊                              | 16533/27412 [06:47<03:05, 58.51it/s]








 60%|█████████████████████████████████████████████▊                              | 16540/27412 [06:47<03:21, 53.95it/s]








 60%|█████████████████████████████████████████████▉                              | 16548/27412 [06:47<03:01, 59.76it/s]








 60%|█████████████████████████████████████████████▉                              | 16555/27412 [06:47<03:08, 57.64it/s]








 60%|█████████████████████████████████████████████▉                              | 16562/27412 [06:48<03:05, 58.43it/s]








 60%|█████████████████████████████████████████████▉                              | 16569/27412 [06:48<03

 62%|███████████████████████████████████████████████                             | 16973/27412 [06:55<02:56, 59.13it/s]








 62%|███████████████████████████████████████████████                             | 16981/27412 [06:55<02:50, 61.02it/s]








 62%|███████████████████████████████████████████████                             | 16988/27412 [06:55<03:12, 54.04it/s]








 62%|███████████████████████████████████████████████                             | 16995/27412 [06:55<03:00, 57.85it/s]








 62%|███████████████████████████████████████████████▏                            | 17002/27412 [06:55<03:32, 49.02it/s]








 62%|███████████████████████████████████████████████▏                            | 17008/27412 [06:55<03:24, 50.79it/s]








 62%|███████████████████████████████████████████████▏                            | 17014/27412 [06:55<03:26, 50.43it/s]








 62%|███████████████████████████████████████████████▏                            | 17022/27412 [06:56<03

 64%|████████████████████████████████████████████████▎                           | 17409/27412 [07:03<03:16, 50.81it/s]








 64%|████████████████████████████████████████████████▎                           | 17415/27412 [07:03<03:34, 46.61it/s]








 64%|████████████████████████████████████████████████▎                           | 17421/27412 [07:03<03:20, 49.92it/s]








 64%|████████████████████████████████████████████████▎                           | 17427/27412 [07:03<03:10, 52.53it/s]








 64%|████████████████████████████████████████████████▎                           | 17435/27412 [07:03<02:54, 57.02it/s]








 64%|████████████████████████████████████████████████▎                           | 17441/27412 [07:03<03:01, 54.99it/s]








 64%|████████████████████████████████████████████████▎                           | 17448/27412 [07:03<02:51, 57.93it/s]








 64%|████████████████████████████████████████████████▍                           | 17454/27412 [07:03<02

 65%|█████████████████████████████████████████████████▍                          | 17833/27412 [07:10<03:58, 40.21it/s]








 65%|█████████████████████████████████████████████████▍                          | 17838/27412 [07:11<03:57, 40.26it/s]








 65%|█████████████████████████████████████████████████▍                          | 17843/27412 [07:11<03:57, 40.34it/s]








 65%|█████████████████████████████████████████████████▍                          | 17848/27412 [07:11<04:32, 35.15it/s]








 65%|█████████████████████████████████████████████████▍                          | 17853/27412 [07:11<04:07, 38.56it/s]








 65%|█████████████████████████████████████████████████▌                          | 17858/27412 [07:11<04:49, 32.99it/s]








 65%|█████████████████████████████████████████████████▌                          | 17865/27412 [07:11<04:04, 39.03it/s]








 65%|█████████████████████████████████████████████████▌                          | 17871/27412 [07:11<03

 66%|██████████████████████████████████████████████████▍                         | 18212/27412 [07:18<02:57, 51.82it/s]








 66%|██████████████████████████████████████████████████▌                         | 18221/27412 [07:18<02:39, 57.61it/s]








 67%|██████████████████████████████████████████████████▌                         | 18229/27412 [07:18<02:26, 62.50it/s]








 67%|██████████████████████████████████████████████████▌                         | 18236/27412 [07:19<02:32, 60.17it/s]








 67%|██████████████████████████████████████████████████▌                         | 18243/27412 [07:19<02:41, 56.70it/s]








 67%|██████████████████████████████████████████████████▌                         | 18250/27412 [07:19<02:32, 60.08it/s]








 67%|██████████████████████████████████████████████████▌                         | 18257/27412 [07:19<02:26, 62.70it/s]








 67%|██████████████████████████████████████████████████▋                         | 18264/27412 [07:19<02

 68%|███████████████████████████████████████████████████▋                        | 18644/27412 [07:26<02:32, 57.31it/s]








 68%|███████████████████████████████████████████████████▋                        | 18650/27412 [07:26<02:30, 58.08it/s]








 68%|███████████████████████████████████████████████████▋                        | 18656/27412 [07:26<02:29, 58.63it/s]








 68%|███████████████████████████████████████████████████▋                        | 18662/27412 [07:26<02:41, 54.02it/s]








 68%|███████████████████████████████████████████████████▊                        | 18669/27412 [07:27<02:30, 57.95it/s]








 68%|███████████████████████████████████████████████████▊                        | 18675/27412 [07:27<03:06, 46.92it/s]








 68%|███████████████████████████████████████████████████▊                        | 18681/27412 [07:27<03:07, 46.52it/s]








 68%|███████████████████████████████████████████████████▊                        | 18688/27412 [07:27<02

 69%|████████████████████████████████████████████████████▋                       | 19000/27412 [07:34<03:29, 40.17it/s]








 69%|████████████████████████████████████████████████████▋                       | 19005/27412 [07:34<03:31, 39.82it/s]








 69%|████████████████████████████████████████████████████▋                       | 19010/27412 [07:35<03:30, 40.00it/s]








 69%|████████████████████████████████████████████████████▋                       | 19015/27412 [07:35<03:25, 40.89it/s]








 69%|████████████████████████████████████████████████████▋                       | 19020/27412 [07:35<03:22, 41.54it/s]








 69%|████████████████████████████████████████████████████▋                       | 19025/27412 [07:35<03:38, 38.41it/s]








 69%|████████████████████████████████████████████████████▊                       | 19031/27412 [07:35<03:14, 43.03it/s]








 69%|████████████████████████████████████████████████████▊                       | 19037/27412 [07:35<02

 71%|█████████████████████████████████████████████████████▋                      | 19382/27412 [07:42<02:25, 55.06it/s]








 71%|█████████████████████████████████████████████████████▊                      | 19390/27412 [07:42<02:14, 59.64it/s]








 71%|█████████████████████████████████████████████████████▊                      | 19398/27412 [07:42<02:04, 64.44it/s]








 71%|█████████████████████████████████████████████████████▊                      | 19405/27412 [07:43<02:06, 63.16it/s]








 71%|█████████████████████████████████████████████████████▊                      | 19412/27412 [07:43<02:32, 52.61it/s]








 71%|█████████████████████████████████████████████████████▊                      | 19418/27412 [07:43<02:43, 48.97it/s]








 71%|█████████████████████████████████████████████████████▊                      | 19424/27412 [07:43<03:22, 39.47it/s]








 71%|█████████████████████████████████████████████████████▉                      | 19433/27412 [07:43<02

 72%|██████████████████████████████████████████████████████▉                     | 19811/27412 [07:50<02:14, 56.65it/s]








 72%|██████████████████████████████████████████████████████▉                     | 19817/27412 [07:50<02:34, 49.28it/s]








 72%|██████████████████████████████████████████████████████▉                     | 19823/27412 [07:50<02:31, 50.13it/s]








 72%|██████████████████████████████████████████████████████▉                     | 19830/27412 [07:51<02:18, 54.76it/s]








 72%|██████████████████████████████████████████████████████▉                     | 19836/27412 [07:51<02:20, 53.82it/s]








 72%|███████████████████████████████████████████████████████                     | 19843/27412 [07:51<02:12, 57.09it/s]








 72%|███████████████████████████████████████████████████████                     | 19852/27412 [07:51<02:04, 60.75it/s]








 72%|███████████████████████████████████████████████████████                     | 19859/27412 [07:51<01

 74%|████████████████████████████████████████████████████████                    | 20232/27412 [07:58<02:11, 54.73it/s]








 74%|████████████████████████████████████████████████████████                    | 20239/27412 [07:58<02:07, 56.36it/s]








 74%|████████████████████████████████████████████████████████▏                   | 20245/27412 [07:59<02:35, 46.17it/s]








 74%|████████████████████████████████████████████████████████▏                   | 20251/27412 [07:59<02:24, 49.58it/s]








 74%|████████████████████████████████████████████████████████▏                   | 20257/27412 [07:59<02:41, 44.30it/s]








 74%|████████████████████████████████████████████████████████▏                   | 20262/27412 [07:59<02:35, 45.85it/s]








 74%|████████████████████████████████████████████████████████▏                   | 20269/27412 [07:59<02:19, 51.11it/s]








 74%|████████████████████████████████████████████████████████▏                   | 20276/27412 [07:59<02

 75%|█████████████████████████████████████████████████████████▎                  | 20673/27412 [08:06<01:59, 56.24it/s]








 75%|█████████████████████████████████████████████████████████▎                  | 20683/27412 [08:06<01:46, 63.44it/s]








 75%|█████████████████████████████████████████████████████████▎                  | 20690/27412 [08:06<01:43, 65.22it/s]








 76%|█████████████████████████████████████████████████████████▍                  | 20697/27412 [08:07<01:45, 63.69it/s]








 76%|█████████████████████████████████████████████████████████▍                  | 20704/27412 [08:07<01:58, 56.51it/s]








 76%|█████████████████████████████████████████████████████████▍                  | 20710/27412 [08:07<02:00, 55.45it/s]








 76%|█████████████████████████████████████████████████████████▍                  | 20716/27412 [08:07<02:03, 54.29it/s]








 76%|█████████████████████████████████████████████████████████▍                  | 20722/27412 [08:07<01

 77%|██████████████████████████████████████████████████████████▍                 | 21092/27412 [08:14<01:47, 58.66it/s]








 77%|██████████████████████████████████████████████████████████▍                 | 21099/27412 [08:14<01:42, 61.61it/s]








 77%|██████████████████████████████████████████████████████████▌                 | 21106/27412 [08:14<01:42, 61.24it/s]








 77%|██████████████████████████████████████████████████████████▌                 | 21113/27412 [08:14<01:42, 61.31it/s]








 77%|██████████████████████████████████████████████████████████▌                 | 21120/27412 [08:14<01:39, 63.35it/s]








 77%|██████████████████████████████████████████████████████████▌                 | 21127/27412 [08:14<01:40, 62.43it/s]








 77%|██████████████████████████████████████████████████████████▌                 | 21134/27412 [08:14<01:37, 64.47it/s]








 77%|██████████████████████████████████████████████████████████▌                 | 21141/27412 [08:15<01

 78%|███████████████████████████████████████████████████████████▋                | 21518/27412 [08:21<01:26, 67.90it/s]








 79%|███████████████████████████████████████████████████████████▋                | 21526/27412 [08:22<01:29, 66.10it/s]








 79%|███████████████████████████████████████████████████████████▋                | 21533/27412 [08:22<01:29, 65.45it/s]








 79%|███████████████████████████████████████████████████████████▋                | 21540/27412 [08:22<01:39, 58.81it/s]








 79%|███████████████████████████████████████████████████████████▋                | 21547/27412 [08:22<01:42, 57.01it/s]








 79%|███████████████████████████████████████████████████████████▊                | 21554/27412 [08:22<01:37, 60.33it/s]








 79%|███████████████████████████████████████████████████████████▊                | 21561/27412 [08:22<01:48, 53.90it/s]








 79%|███████████████████████████████████████████████████████████▊                | 21567/27412 [08:22<01

 80%|████████████████████████████████████████████████████████████▊               | 21953/27412 [08:29<01:42, 53.28it/s]








 80%|████████████████████████████████████████████████████████████▉               | 21961/27412 [08:29<01:36, 56.26it/s]








 80%|████████████████████████████████████████████████████████████▉               | 21967/27412 [08:29<01:42, 53.04it/s]








 80%|████████████████████████████████████████████████████████████▉               | 21973/27412 [08:29<01:47, 50.49it/s]








 80%|████████████████████████████████████████████████████████████▉               | 21979/27412 [08:30<01:56, 46.49it/s]








 80%|████████████████████████████████████████████████████████████▉               | 21986/27412 [08:30<01:45, 51.66it/s]








 80%|████████████████████████████████████████████████████████████▉               | 21993/27412 [08:30<01:36, 56.03it/s]








 80%|█████████████████████████████████████████████████████████████               | 22002/27412 [08:30<01

 82%|██████████████████████████████████████████████████████████████              | 22370/27412 [08:37<01:23, 60.49it/s]








 82%|██████████████████████████████████████████████████████████████              | 22378/27412 [08:37<01:20, 62.81it/s]








 82%|██████████████████████████████████████████████████████████████              | 22385/27412 [08:37<01:30, 55.29it/s]








 82%|██████████████████████████████████████████████████████████████              | 22394/27412 [08:37<01:21, 61.57it/s]








 82%|██████████████████████████████████████████████████████████████              | 22401/27412 [08:37<01:18, 63.73it/s]








 82%|██████████████████████████████████████████████████████████████▏             | 22410/27412 [08:37<01:16, 65.22it/s]








 82%|██████████████████████████████████████████████████████████████▏             | 22417/27412 [08:37<01:29, 55.52it/s]








 82%|██████████████████████████████████████████████████████████████▏             | 22423/27412 [08:38<01

 83%|███████████████████████████████████████████████████████████████▎            | 22816/27412 [08:44<01:26, 53.22it/s]








 83%|███████████████████████████████████████████████████████████████▎            | 22823/27412 [08:44<01:22, 55.37it/s]








 83%|███████████████████████████████████████████████████████████████▎            | 22829/27412 [08:45<01:28, 52.03it/s]








 83%|███████████████████████████████████████████████████████████████▎            | 22837/27412 [08:45<01:18, 58.09it/s]








 83%|███████████████████████████████████████████████████████████████▎            | 22844/27412 [08:45<01:14, 61.17it/s]








 83%|███████████████████████████████████████████████████████████████▎            | 22851/27412 [08:45<01:22, 55.50it/s]








 83%|███████████████████████████████████████████████████████████████▎            | 22857/27412 [08:45<01:27, 52.11it/s]








 83%|███████████████████████████████████████████████████████████████▍            | 22863/27412 [08:45<01

 85%|████████████████████████████████████████████████████████████████▍           | 23260/27412 [08:52<01:02, 66.69it/s]








 85%|████████████████████████████████████████████████████████████████▌           | 23268/27412 [08:52<01:09, 59.89it/s]








 85%|████████████████████████████████████████████████████████████████▌           | 23275/27412 [08:52<01:11, 57.72it/s]








 85%|████████████████████████████████████████████████████████████████▌           | 23282/27412 [08:53<01:22, 49.86it/s]








 85%|████████████████████████████████████████████████████████████████▌           | 23288/27412 [08:53<01:41, 40.47it/s]








 85%|████████████████████████████████████████████████████████████████▌           | 23293/27412 [08:53<01:36, 42.61it/s]








 85%|████████████████████████████████████████████████████████████████▌           | 23300/27412 [08:53<01:25, 48.25it/s]








 85%|████████████████████████████████████████████████████████████████▌           | 23306/27412 [08:53<01

 86%|█████████████████████████████████████████████████████████████████▋          | 23694/27412 [09:00<01:06, 55.90it/s]








 86%|█████████████████████████████████████████████████████████████████▋          | 23701/27412 [09:00<01:04, 57.24it/s]








 86%|█████████████████████████████████████████████████████████████████▋          | 23707/27412 [09:00<01:06, 55.31it/s]








 87%|█████████████████████████████████████████████████████████████████▋          | 23713/27412 [09:00<01:11, 51.38it/s]








 87%|█████████████████████████████████████████████████████████████████▊          | 23723/27412 [09:00<01:01, 60.12it/s]








 87%|█████████████████████████████████████████████████████████████████▊          | 23730/27412 [09:01<00:58, 62.73it/s]








 87%|█████████████████████████████████████████████████████████████████▊          | 23737/27412 [09:01<01:01, 60.16it/s]








 87%|█████████████████████████████████████████████████████████████████▊          | 23744/27412 [09:01<01

 88%|██████████████████████████████████████████████████████████████████▉         | 24145/27412 [09:08<01:18, 41.65it/s]








 88%|██████████████████████████████████████████████████████████████████▉         | 24152/27412 [09:08<01:08, 47.38it/s]








 88%|██████████████████████████████████████████████████████████████████▉         | 24158/27412 [09:08<01:04, 50.53it/s]








 88%|██████████████████████████████████████████████████████████████████▉         | 24165/27412 [09:08<01:00, 53.50it/s]








 88%|███████████████████████████████████████████████████████████████████         | 24171/27412 [09:08<01:01, 52.88it/s]








 88%|███████████████████████████████████████████████████████████████████         | 24178/27412 [09:08<00:57, 55.87it/s]








 88%|███████████████████████████████████████████████████████████████████         | 24185/27412 [09:08<00:54, 59.43it/s]








 88%|███████████████████████████████████████████████████████████████████         | 24192/27412 [09:09<00

 90%|████████████████████████████████████████████████████████████████████▏       | 24576/27412 [09:16<01:18, 36.21it/s]








 90%|████████████████████████████████████████████████████████████████████▏       | 24583/27412 [09:16<01:06, 42.32it/s]








 90%|████████████████████████████████████████████████████████████████████▏       | 24589/27412 [09:16<01:11, 39.71it/s]








 90%|████████████████████████████████████████████████████████████████████▏       | 24595/27412 [09:16<01:12, 38.78it/s]








 90%|████████████████████████████████████████████████████████████████████▏       | 24602/27412 [09:16<01:02, 44.71it/s]








 90%|████████████████████████████████████████████████████████████████████▏       | 24608/27412 [09:17<00:57, 48.39it/s]








 90%|████████████████████████████████████████████████████████████████████▏       | 24614/27412 [09:17<01:04, 43.30it/s]








 90%|████████████████████████████████████████████████████████████████████▎       | 24621/27412 [09:17<00

 91%|█████████████████████████████████████████████████████████████████████▎      | 25012/27412 [09:24<00:44, 53.43it/s]








 91%|█████████████████████████████████████████████████████████████████████▎      | 25020/27412 [09:24<00:41, 57.52it/s]








 91%|█████████████████████████████████████████████████████████████████████▍      | 25027/27412 [09:24<00:40, 58.65it/s]








 91%|█████████████████████████████████████████████████████████████████████▍      | 25035/27412 [09:24<00:38, 60.98it/s]








 91%|█████████████████████████████████████████████████████████████████████▍      | 25042/27412 [09:24<00:37, 63.38it/s]








 91%|█████████████████████████████████████████████████████████████████████▍      | 25049/27412 [09:25<00:43, 54.48it/s]








 91%|█████████████████████████████████████████████████████████████████████▍      | 25056/27412 [09:25<00:41, 56.36it/s]








 91%|█████████████████████████████████████████████████████████████████████▍      | 25064/27412 [09:25<00

 93%|██████████████████████████████████████████████████████████████████████▌     | 25466/27412 [09:32<00:34, 56.89it/s]








 93%|██████████████████████████████████████████████████████████████████████▌     | 25473/27412 [09:32<00:32, 60.23it/s]








 93%|██████████████████████████████████████████████████████████████████████▋     | 25480/27412 [09:32<00:33, 57.94it/s]








 93%|██████████████████████████████████████████████████████████████████████▋     | 25486/27412 [09:32<00:37, 51.56it/s]








 93%|██████████████████████████████████████████████████████████████████████▋     | 25495/27412 [09:32<00:33, 57.47it/s]








 93%|██████████████████████████████████████████████████████████████████████▋     | 25502/27412 [09:32<00:32, 59.43it/s]








 93%|██████████████████████████████████████████████████████████████████████▋     | 25509/27412 [09:32<00:34, 55.29it/s]








 93%|██████████████████████████████████████████████████████████████████████▋     | 25515/27412 [09:33<00

 94%|███████████████████████████████████████████████████████████████████████▊    | 25890/27412 [09:40<00:25, 58.74it/s]








 94%|███████████████████████████████████████████████████████████████████████▊    | 25897/27412 [09:40<00:24, 61.67it/s]








 94%|███████████████████████████████████████████████████████████████████████▊    | 25904/27412 [09:40<00:23, 63.90it/s]








 95%|███████████████████████████████████████████████████████████████████████▊    | 25911/27412 [09:40<00:22, 65.57it/s]








 95%|███████████████████████████████████████████████████████████████████████▊    | 25918/27412 [09:40<00:24, 61.12it/s]








 95%|███████████████████████████████████████████████████████████████████████▉    | 25925/27412 [09:40<00:26, 56.22it/s]








 95%|███████████████████████████████████████████████████████████████████████▉    | 25931/27412 [09:40<00:27, 54.21it/s]








 95%|███████████████████████████████████████████████████████████████████████▉    | 25937/27412 [09:40<00

 96%|████████████████████████████████████████████████████████████████████████▉   | 26298/27412 [09:48<00:23, 47.22it/s]








 96%|████████████████████████████████████████████████████████████████████████▉   | 26303/27412 [09:48<00:24, 45.93it/s]








 96%|████████████████████████████████████████████████████████████████████████▉   | 26309/27412 [09:48<00:24, 45.75it/s]








 96%|████████████████████████████████████████████████████████████████████████▉   | 26316/27412 [09:48<00:22, 48.49it/s]








 96%|████████████████████████████████████████████████████████████████████████▉   | 26321/27412 [09:48<00:23, 45.81it/s]








 96%|████████████████████████████████████████████████████████████████████████▉   | 26328/27412 [09:48<00:24, 45.14it/s]








 96%|█████████████████████████████████████████████████████████████████████████   | 26336/27412 [09:48<00:20, 51.60it/s]








 96%|█████████████████████████████████████████████████████████████████████████   | 26342/27412 [09:48<00

 97%|██████████████████████████████████████████████████████████████████████████  | 26708/27412 [09:56<00:15, 45.91it/s]








 97%|██████████████████████████████████████████████████████████████████████████  | 26716/27412 [09:56<00:13, 51.04it/s]








 97%|██████████████████████████████████████████████████████████████████████████  | 26722/27412 [09:56<00:14, 47.78it/s]








 98%|██████████████████████████████████████████████████████████████████████████  | 26728/27412 [09:56<00:14, 47.84it/s]








 98%|██████████████████████████████████████████████████████████████████████████▏ | 26737/27412 [09:56<00:12, 55.63it/s]








 98%|██████████████████████████████████████████████████████████████████████████▏ | 26744/27412 [09:56<00:11, 56.03it/s]








 98%|██████████████████████████████████████████████████████████████████████████▏ | 26752/27412 [09:56<00:11, 59.41it/s]








 98%|██████████████████████████████████████████████████████████████████████████▏ | 26759/27412 [09:57<00

 99%|███████████████████████████████████████████████████████████████████████████▎| 27146/27412 [10:03<00:04, 61.19it/s]








 99%|███████████████████████████████████████████████████████████████████████████▎| 27154/27412 [10:04<00:03, 65.78it/s]








 99%|███████████████████████████████████████████████████████████████████████████▎| 27161/27412 [10:04<00:04, 60.72it/s]








 99%|███████████████████████████████████████████████████████████████████████████▎| 27168/27412 [10:04<00:03, 61.04it/s]








 99%|███████████████████████████████████████████████████████████████████████████▎| 27175/27412 [10:04<00:04, 53.70it/s]








 99%|███████████████████████████████████████████████████████████████████████████▎| 27181/27412 [10:04<00:04, 55.17it/s]








 99%|███████████████████████████████████████████████████████████████████████████▍| 27189/27412 [10:04<00:03, 59.44it/s]








 99%|███████████████████████████████████████████████████████████████████████████▍| 27196/27412 [10:04<00

Unnamed: 0,Body,Title,Tags,token_tag,token_body,tokens_clean,tokens_clean_lemma
0,"<p>I want to use a track-bar to change a form's opacity.</p>\n\n<p>This is my code:</p>\n\n<pre><code>decimal trans = trackBar1.Value / 5000;\nthis.Opacity = trans;\n</code></pre>\n\n<p>When I build the application, it gives the following error:</p>\n\n<blockquote>\n <p>Cannot implicitly convert type <code>'decimal'</code> to <code>'double'</code>.</p>\n</blockquote>\n\n<p>I tried using <code>trans</code> and <code>double</code> but then the control doesn't work. This code worked fine in a past VB.NET project.</p>\n",Convert Decimal to Double?,<c#><floating-point><type-conversion><double><decimal>,"[c#, floating-point, type-conversion, double, decimal]","[i, want, to, use, a, track, bar, to, change, a, form, s, opacity, this, is, my, code, decimal, trans, trackbar1, value, 5000, this, opacity, trans, when, i, build, the, application, it, gives, the, following, error, cannot, implicitly, convert, type, decimal, to, double, i, tried, using, trans, and, double, but, then, the, control, doesn, t, work, this, code, worked, fine, in, a, past, vb, net, project]","[want, use, track, bar, change, form, opacity, code, decimal, trans, trackbar, value, opacity, trans, build, application, gives, following, error, implicitly, convert, type, decimal, double, tried, using, trans, double, control, doesn, work, code, worked, fine, past, vb, net, project]","[want, use, track, bar, change, form, opacity, code, decimal, tran, trackbar, value, opacity, tran, build, application, give, follow, error, implicitly, convert, type, decimal, double, try, use, tran, double, control, doesn, work, code, work, fine, past, vb, net, project]"
1,"<p>I have an absolutely positioned <code>div</code> containing several children, one of which is a relatively positioned <code>div</code>. When I use a <strong>percentage-based width</strong> on the child <code>div</code>, it collapses to '0' width on <a href=""http://en.wikipedia.org/wiki/Internet_Explorer_7"" rel=""noreferrer"">Internet&nbsp;Explorer&nbsp;7</a>, but not on Firefox or Safari.</p>\n\n<p>If I use <strong>pixel width</strong>, it works. If the parent is relatively positioned, the percentage width on the child works.</p>\n\n<ol>\n<li>Is there something I'm missing here?</li>\n<li>Is there an easy fix for this besides the <em>pixel-based width</em> on the\nchild?</li>\n<li>Is there an area of the CSS specification that covers this?</li>\n</ol>\n",Percentage width child element in absolutely positioned parent on Internet Explorer 7,<html><css><css3><internet-explorer-7>,"[html, css, css3, internet-explorer-7]","[i, have, an, absolutely, positioned, div, containing, several, children, one, of, which, is, a, relatively, positioned, div, when, i, use, a, percentage, based, width, on, the, child, div, it, collapses, to, 0, width, on, internet, explorer, 7, but, not, on, firefox, or, safari, if, i, use, pixel, width, it, works, if, the, parent, is, relatively, positioned, the, percentage, width, on, the, child, works, is, there, something, i, m, missing, here, is, there, an, easy, fix, for, this, besides, the, pixel, based, width, on, the, child, is, there, an, area, of, the, css, specification, that, covers, this]","[absolutely, positioned, div, containing, several, children, one, relatively, positioned, div, use, percentage, based, width, child, div, collapses, width, internet, explorer, firefox, safari, use, pixel, width, works, parent, relatively, positioned, percentage, width, child, works, something, missing, easy, fix, besides, pixel, based, width, child, area, css, specification, covers]","[absolutely, position, div, contain, several, child, one, relatively, position, div, use, percentage, base, width, child, div, collapse, width, internet, explorer, firefox, safari, use, pixel, width, work, parent, relatively, position, percentage, width, child, work, something, miss, easy, fix, besides, pixel, base, width, child, area, css, specification, cover]"
2,<p>An explicit cast to double like this isn't necessary:</p>\n\n<pre><code>double trans = (double) trackBar1.Value / 5000.0;\n</code></pre>\n\n<p>Identifying the constant as <code>5000.0</code> (or as <code>5000d</code>) is sufficient:</p>\n\n<pre><code>double trans = trackBar1.Value / 5000.0;\ndouble trans = trackBar1.Value / 5000d;\n</code></pre>\n,,,,"[an, explicit, cast, to, double, like, this, isn, t, necessary, double, trans, double, trackbar1, value, 5000, 0, identifying, the, constant, as, 5000, 0, or, as, 5000d, is, sufficient, double, trans, trackbar1, value, 5000, 0, double, trans, trackbar1, value, 5000d]","[explicit, cast, double, like, isn, necessary, double, trans, double, trackbar, value, identifying, constant, sufficient, double, trans, trackbar, value, double, trans, trackbar, value]","[explicit, cast, double, like, isn, necessary, double, tran, double, trackbar, value, identify, constant, sufficient, double, tran, trackbar, value, double, tran, trackbar, value]"
3,"<p>Given a <code>DateTime</code> representing a person's birthday, how do I calculate their age in years? </p>\n",How do I calculate someone's age in C#?,<c#><.net><datetime>,"[c#, .net, datetime]","[given, a, datetime, representing, a, person, s, birthday, how, do, i, calculate, their, age, in, years]","[given, datetime, representing, person, birthday, calculate, age, years]","[give, datetime, represent, person, birthday, calculate, age, year]"
4,"<p>Given a specific <code>DateTime</code> value, how do I display relative time, like:</p>\n\n<ul>\n<li>2 hours ago</li>\n<li>3 days ago</li>\n<li>a month ago</li>\n</ul>\n",Calculate relative time in C#,<c#><datetime><time><datediff><relative-time-span>,"[c#, datetime, time, datediff, relative-time-span]","[given, a, specific, datetime, value, how, do, i, display, relative, time, like, 2, hours, ago, 3, days, ago, a, month, ago]","[given, specific, datetime, value, display, relative, time, like, hours, ago, days, ago, month, ago]","[give, specific, datetime, value, display, relative, time, like, hour, ago, day, ago, month, ago]"


### Build corpus with Back of Word

In [160]:
# Use bigram and trigram to catch combination of 2/3 words that have a specific meaning together
tokens_lemma = data_imp['tokens_clean_lemma'].tolist()
bigram_lemma =  Phrases(tokens_lemma, min_count = 5)
trigram_lemma = Phrases(bigram_lemma[tokens_lemma])
# for faster implementation
bigram_mod_lemma = phrases.Phraser(bigram_lemma)
trigram_mod_lemma = phrases.Phraser(trigram_lemma)

list(trigram_mod_lemma[bigram_mod_lemma[tokens_lemma]])
tokens_lemma = [[token.lower() for token in t if token.lower() not in sw] for t in tokens_lemma]
tokens_lemma



[['want',
  'use',
  'track',
  'bar',
  'change',
  'form',
  'opacity',
  'code',
  'decimal',
  'tran',
  'trackbar_value',
  'opacity',
  'tran',
  'build',
  'application',
  'give',
  'follow_error',
  'implicitly',
  'convert',
  'type',
  'decimal',
  'double',
  'try',
  'use',
  'tran',
  'double',
  'control',
  'doesn_work',
  'code',
  'work_fine',
  'past',
  'vb_net',
  'project'],
 ['absolutely_position_div',
  'contain',
  'several',
  'child',
  'one',
  'relatively',
  'position_div',
  'use',
  'percentage',
  'base',
  'width',
  'child',
  'div',
  'collapse',
  'width',
  'internet_explorer',
  'firefox_safari',
  'use',
  'pixel',
  'width',
  'work',
  'parent',
  'relatively',
  'position',
  'percentage',
  'width',
  'child',
  'work',
  'something',
  'miss',
  'easy',
  'fix',
  'besides',
  'pixel',
  'base',
  'width',
  'child',
  'area',
  'css',
  'specification',
  'cover'],
 ['explicit',
  'cast',
  'double',
  'like',
  'isn',
  'necessary',
  'dou

### Training unsupervised model (LDA)

In [163]:
# dictionary
dictionary_LDA_lemma = corpora.Dictionary(tokens_lemma)
dictionary_LDA_lemma.filter_extremes(no_below=5, no_above=0.9)
corpus_lemma = [dictionary_LDA_lemma.doc2bow(tok) for tok in tokens_lemma]
print(dictionary_LDA_lemma)

Dictionary(13249 unique tokens: ['application', 'bar', 'build', 'change', 'code']...)


In [183]:
num_topics = 30
%time lda_model_lemma = models.LdaModel(corpus_lemma, num_topics=num_topics, \
                                  id2word=dictionary_LDA_lemma, \
                                  passes=10, alpha=[0.01]*num_topics, \
                                  eta=[0.01]*len(dictionary_LDA_lemma.keys()))

  diff = np.log(self.expElogbeta)


Wall time: 4min 13s


In [184]:
for i,topic in lda_model_lemma.show_topics(formatted=True, num_topics=3, num_words=5):
    print(str(i)+": "+ topic)
    print()

19: 0.052*"library" + 0.047*"use" + 0.031*"support" + 0.022*"include" + 0.019*"api"

20: 0.065*"com" + 0.065*"http" + 0.041*"http_www" + 0.034*"url" + 0.025*"html"

6: 0.077*"event" + 0.072*"thread" + 0.036*"service" + 0.025*"document" + 0.025*"date"



### Visualization

In [185]:
pyLDAvis.enable_notebook()
vis_lemma = pyLDAvis.gensim.prepare(lda_model_lemma, corpus_lemma, dictionary_LDA_lemma)
vis_lemma

### Model performance: LDA using Lemmatization

In [186]:
# Compute Perplexity
print('\nPerplexity: ', lda_model_lemma.log_perplexity(corpus_lemma))  # a measure of how good the model is. lower the better.

# Compute Coherence Score
coherence_model_lda_lemma = CoherenceModel(model=lda_model_lemma, texts=tokens_lemma, dictionary=dictionary_LDA_lemma, coherence='c_v')
coherence_lda_lemma = coherence_model_lda_lemma.get_coherence()
print('\nCoherence Score: ', coherence_lda_lemma)


Perplexity:  -13.673513147098323

Coherence Score:  0.43564814768786


## Fine-tuning the model

### Finding optimal number of topics

In [None]:
def compute_coherence_values(dictionary, corpus, texts, limit, start=2, step=3):
    """
    Compute c_v coherence for various number of topics

    Parameters:
    ----------
    dictionary : Gensim dictionary
    corpus : Gensim corpus
    texts : List of input texts
    limit : Max num of topics

    Returns:
    -------
    model_list : List of LDA topic models
    coherence_values : Coherence values corresponding to the LDA model with respective number of topics
    """
    perplexity_values = []
    coherence_values = []
    model_list = []
    with tqdm(total=len(range(start, limit, step)), file=sys.stdout) as pbar:
        for num_topics in range(start, limit, step):
            #model = models.wrappers.LdaMallet(mallet_path, corpus=corpus, num_topics=num_topics, id2word=dictionary)
            model = models.LdaModel(corpus=corpus, num_topics=num_topics, id2word=dictionary, passes=10, alpha=0.001, eta=0.001, random_state = 42)
            model_list.append(model)
            perplexity_values.append(model.log_perplexity(corpus))
            model_coherence = coherencemodel.CoherenceModel(model=model, texts=texts, dictionary=dictionary, coherence='c_v')
            coherence_values.append(model_coherence.get_coherence())
            pbar.update(1)
            sleep(1)

    return perplexity_values, model_list, coherence_values

In [None]:
# Can take a long time to run.
k_min = 2
k_max = 50
pas = 2
perplexity_values, model_list, coherence_values = compute_coherence_values(dictionary=dictionary_LDA_lemma, corpus=corpus_lemma,
                                                                           texts=tokens, start=k_min, limit=k_max, step=pas)

### Finding the dominant topic in each sentence

In [187]:
def format_topics_sentences(ldamodel, corpus, texts):
    # Init output
    sent_topics_df = pd.DataFrame()

    # Get main topic in each document
    for i, row in enumerate(ldamodel[corpus]):
        row = sorted(row, key=lambda x: (x[1]), reverse=True)
        # Get the Dominant topic, Perc Contribution and Keywords for each document
        for j, (topic_num, prop_topic) in enumerate(row):
            if j == 0:  # => dominant topic
                wp = ldamodel.show_topic(topic_num)
                topic_keywords = ", ".join([word for word, prop in wp])
                sent_topics_df = sent_topics_df.append(pd.Series([int(topic_num), round(prop_topic,4), topic_keywords]), ignore_index=True)
            else:
                break
    sent_topics_df.columns = ['Dominant_Topic', 'Perc_Contribution', 'Topic_Keywords']

    # Add original text to the end of the output
    contents = pd.Series(texts)
    sent_topics_df = pd.concat([sent_topics_df, contents], axis=1)
    return(sent_topics_df)

In [188]:
df_topic_sents_tag= format_topics_sentences(ldamodel=lda_model_lemma, corpus=corpus_lemma, texts=tokens_lemma)

# Format
df_dominant_topic = df_topic_sents_tag.reset_index()
df_dominant_topic.columns = ['Comment_No', 'Dominant_Topic', 'Topic_Perc_Contrib', 'TAGS', 'Text']

# Show
df_dominant_topic.head(10)

Unnamed: 0,Comment_No,Dominant_Topic,Topic_Perc_Contrib,TAGS,Text
0,0,18.0,0.277,"code, can, use, write, one, make, will, way, like, need","[want, use, track, bar, change, form, opacity, code, decimal, tran, trackbar_value, opacity, tran, build, application, give, follow_error, implicitly, convert, type, decimal, double, try, use, tran, double, control, doesn_work, code, work_fine, past, vb_net, project]"
1,1,24.0,0.2951,"text, image, display, line, size, label, button, div, position, space","[absolutely_position_div, contain, several, child, one, relatively, position_div, use, percentage, base, width, child, div, collapse, width, internet_explorer, firefox_safari, use, pixel, width, work, parent, relatively, position, percentage, width, child, work, something, miss, easy, fix, besides, pixel, base, width, child, area, css, specification, cover]"
2,2,14.0,0.4198,"type, bit, use, compiler, int, can, will, system, value, integer","[explicit, cast, double, like, isn, necessary, double, tran, double, trackbar_value, identify, constant, sufficient, double, tran, trackbar_value, double, tran, trackbar_value]"
3,3,29.0,0.4895,"table, query, -pron-_would, select, column, name, row, field, index, update","[give, datetime, represent, person, birthday, calculate_age, year]"
4,4,24.0,0.29,"text, image, display, line, size, label, button, div, position, space","[give, specific, datetime, value, display, relative, time, like, hour_ago, day_ago, month_ago]"
5,5,6.0,0.6696,"event, thread, service, document, date, day, tfs, assembly, web_service, start","[var, ts, new_timespan, datetime_utcnow, tick, dt, tick, double, delta, math, ab, ts, totalsecond, delta_return, ts, second, one, second_ago, ts, second, second_ago, delta_return, minute_ago, delta_return, ts, minute, minute_ago, delta_return, hour_ago, delta_return, ts, hour_hour, ago_delta, return, yesterday, delta_return, ts_day, day_ago, delta, int, month, convert_toint, math_floor, double, ts_day, return, month, one, month_ago, month_month, ago, int, year, convert_toint, math_floor, double, ts_day, return, year, one, year_ago, year, year_ago, suggestion, comment, way, improve, algorithm]"
6,6,9.0,0.3767,"window, run, application, server, can, use, machine, app, will, set","[standard, way, web_server, able, determine, user, timezone, within, web_page, perhaps, http_header, part, user_agent_string]"
7,7,3.0,0.3792,"address, color, socket, background, mark, sample, total, reader, sequence, bind","[difference, math_floor, math, truncate, net]"
8,8,22.0,0.3565,"view, model, map, load, product, controller, filter, create, viewstate, layer","[expose, linq_query, asmx_web_service, usually, business_tier, can, return, type, dataset_datatable, can, serialize, transport, asmx, can, linq_query, way, populate, type, dataset_datatable, via, linq_query, public_static, mydatatable, callmysproc, str, conn, mydatabasedatacontext, db, new, mydatabasedatacontext, conn, mydatatable, dt, new, mydatatable, execute, sproc, via, linq, var, query, dr, db, mysproc, asenumerable, select, dr, copy, linq_query, resultset, datatable, work, dt, query, copytodatatable, return, dt, can, get, result, set, linq_query, dataset_datatable, alternatively, linq_query, serializeable, can, expose, asmx_web_service]"
9,9,4.0,0.6091,"datum, database, use, access, can, db, store, application, need, sql","[store, binary_data, mysql]"


In [189]:
df_dominant_topic.tail(10)

Unnamed: 0,Comment_No,Dominant_Topic,Topic_Perc_Contrib,TAGS,Text
27402,27402,12.0,0.4391,"name, template, dll, message, path, import, root, target, status, system","[img_border, old_fashioned, img_border, src]"
27403,27403,24.0,0.664,"text, image, display, line, size, label, button, div, position, space","[just, add, border, img_border, remove, border, image, link, trick]"
27404,27404,24.0,0.6705,"text, image, display, line, size, label, button, div, position, space","[code, use, border, example, img, href, mypic, gif, border, within, css, border, whatev, class, image]"
27405,27405,7.0,0.2246,"function, return, call, array, use, can, variable, end, value, will","[excel, way, gathering, attribute, build, function, re_willing, use, vb, color, relate_question, answer, http_www, cpearson, com, excel, color, aspx, example, form, site, sumcolor, function, color, base, analog, sum, sumif, function, allow, specify, separate, range_range, whose, color, index, examine, range, cell, whose, value, sum, two, range, function, sum, cell, whose, color, match, specify, value, example, follow, formula, sum, value, whose, fill, color_red, sumcolor, false]"
27406,27406,18.0,0.2618,"code, can, use, write, one, make, will, way, like, need","[simply, put, static_analysis, collect, information, base, source_code, dynamic_analysis, base, system, execution, often, use, instrumentation, advantage, dynamic_analysis, able, detect, dependency, possible, detect, static_analysis, ex, dynamic, dependency, use_reflection, dependency_injection, polymorphism, can, collect, temporal, information, deal, real, input, datum, static_analysis, difficult_impossible, know, file, will, pass, input, web, request, will, come, user, will, click, etc, disadvantage, dynamic_analysis, may, negatively, impact_performance, application, guarantee, full, coverage, source_code, run, base, user_interaction, automatic, test, resource, many, dynamic_analysis, tool, market, debugger, notorious, one, hand, still, academic, research, field, many, researcher, study, use, dynamic_analysis, better, understand, software, system, annual, workshop, dedicated, dependency, analysis]"
27407,27407,2.0,0.3723,"use, will, get, good, work, one, just, can, go, think","[occasionally, program, window, machine, go, crazy, just, hang, will, call, task_manager, hit, end, process, button, however, doesn, always, work, try, enough, time, will, usually, die, eventually, really, like, able, just, kill, immediately, linux, just, kill, guarantee, process, will, die, also, use, writing, batch_script, write, batch_script, programming, program, command, come, window, will, always, kill_process, free, third_party, app, fine, although, prefer, able, machine, sit, first, time]"
27408,27408,18.0,0.4027,"code, can, use, write, one, make, will, way, like, need","[taskkill_-pron-_be, myprocess, exe, force, know, pid, can, specify, taskkill, pid, lot, option, possible, just, type, taskkill, option, kill_process, child_process, may, useful]"
27409,27409,8.0,0.4246,"project, build, change, version, file, repository, svn, use, branch, add","[problem, isn, build, script, visual_studio, support, ansi, control, code, change, color]"
27410,27410,18.0,0.2432,"code, can, use, write, one, make, will, way, like, need","[use, switch, place, ssh_client, master, mode, connection, shar, e, multiple, option, place, ssh, master, mode, confirmation, require, slave, connection, accept, refer, description, controlmaster, ssh, config, detail, don, quite, see, answer, op, question, can, expand, bit, david]"
27411,27411,11.0,0.8746,"use, tool, can, net, visual_studio, also, application, project, run, build","[get, process_explorer, sysinternal, now, microsoft, process_explorer, window, sysinternal, microsoft, doc]"
