In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import re
from collections import Counter

## First look

In [None]:
df = pd.read_csv("../input/60k-stack-overflow-questions-with-quality-rate/data.csv",index_col = "Id")
df.head()

In [None]:
df.shape

### Post quality distribution

In [None]:
df.groupby(["Y"]).Y.count().plot(kind='bar')

Seems like there are exactly 20k records for each post quality value in our data.

### Most popular tags

Function for counting tags in *Tags* column:

In [None]:
# returns dict with each tag appearance in Tags column counted
def tags_counter(tags):
        
    tag_dict = {}
    for index, value in tags.items(): # iterating over Tags column 
        tag_list = re.findall(r'\w+', value) # extracting tags into list
        for tag in tag_list:
            if tag in tag_dict.keys():
                tag_dict[tag] = tag_dict[tag] + 1
            else:
                tag_dict[tag] = 1
    
    return tag_dict

Function for printing top n tags from tag dictionary:

In [None]:
# creates bar plot with n most popular tags, title parameter for the plot
def print_top_tags(tag_dict, n, title):
    tag_dict = dict(Counter(tag_dict).most_common(n))
    fig= plt.figure(figsize=(12,6))
    plt.bar(range(n), list(tag_dict.values()), align='center')
    plt.xticks(range(n), list(tag_dict.keys()))
    plt.title("Top " + str(n) + " most popular tags " + title) 
    plt.show()

Lets print top 10 tags:

In [None]:
n = 10 # how many top tags we want to see
    
print_top_tags(tags_counter(df.Tags), n, "overall") 

Pretty suprised to see C language at the first place, considering it seems to be rather unpopular in comparison to java or python. Let me know in the comments if you have any idea why, because I don't have any clue.

### Most popular tags by post quality

In [None]:
q_list = list(df.Y.unique()) # list of quality categories

for q in q_list:
    title = "for " + q + " posts"
    print_top_tags(tags_counter(df[df.Y == q].Tags), n, title) 

Only in LQ_CLOSE (Low quality, closed) posts tag "arrays" makes top 10 (9th place).

Only in LQ_EDIT (Low quality and multiple edits) "sql" tag makes top 10 (5th place).

It seems that in HG (high quality) posts most popular topic is android studio - tag "android" is 1st, tag "studio" 9th.