In [None]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

import warnings
warnings.filterwarnings("ignore")

import pandas as pd 
import time
import datetime

import seaborn as sns 
import os

import matplotlib.pyplot as plt

for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

### Introduction : 
This kernel is based on the[ dataset with DAILY user activity on kaggle](https://www.kaggle.com/tomtillo/top-ranked-kaggle-user-activity-1-1000-ranks).
The [dataset can be found here](https://www.kaggle.com/tomtillo/top-ranked-kaggle-user-activity-1-1000-ranks)

#### A user activity is defined as 
- Making a competition submission 
- Running a script 
- Commenting on a topic 
- Creating a new dataset / updating one.

The user activity can be found on your kaggle user home page and looks like a grid, like in the image shown below

![Sample Activity](https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F285393%2F76ddd60b7a0afd22fadf3ed21510d52b%2Factivity_map.png?generation=1595260268658485&alt=media)

For now, the dataset only contains the daily activity of top ranked ( Rank 1 - Rank 1000 )  kagglers - ( competitions/discussions/scripts )


We have 3 lists and their daily activity on kaggle ! 
- Top 1000 ranked by Competitions 
- Top 1000 ranks by Discussions 
- Top 1000 ranks by Kernels

In [None]:
# A function to convert day of week (number) to String 
def get_weekday(dow): 
    dow_dict = {0:'Mon', 1:'Tue', 2:'Wed', 3:'Thu', 4:'Fri', 5:'Sat', 6:'Sun'}
    return dow_dict[dow]

**What are the contents of the USER_ACTIVITY file ?**

In [None]:
df_activity = pd.read_csv('/kaggle/input/top-ranked-kaggle-user-activity-1-1000-ranks/USER_ACTIVITY.csv')
df_activity.sample(7)

In [None]:
df_activity.dtypes # check the type 
df_activity['date']= pd.to_datetime(df_activity['date']) # convert to datetime format 


Take only the activity for this year ( Today and 1 year behind )

In [None]:
today_= datetime.datetime.strptime(str(datetime.date.today()), "%Y-%m-%d")
yearback_ = datetime.datetime.strptime(str(datetime.date.today().replace(year=today_.year - 1)), "%Y-%m-%d")
activity_year = df_activity[df_activity['date'] >= yearback_]

Add Month and Day of week to the dataset

In [None]:
activity_year['month'] = activity_year.date.apply(lambda x: x.month)
activity_year['wk_day'] = activity_year.date.apply(lambda x:get_weekday(x.weekday()))

This is how the dataset looks like now

In [None]:
activity_year.sample(5)

Comments made by top kagglers grouped by Month

In [None]:
df_month_grouping = activity_year.groupby('month').sum()
df_month_grouping = df_month_grouping.reset_index().sort_values(by = 'month')
sns.barplot(data = df_month_grouping, x='month',y='comments')
plt.show();

Looks like <font color='red'>May and June</font> have been a good months to get deep into discussions 

In [None]:
sns.barplot(data = df_month_grouping, x='month',y='submissions')
plt.show();

<font color='red'>**August**</font> is the most active competitions submissions month this year. 
* Did more competitions open up in July-August ? 
* Did August have more competition end dates ? 

### Scope for other related Ideas 
* Group by Day of week ( Do Saturdays and Sundays have more activity ?? )
* Do top 100 ranked users do more submissions on an average than the average user ? 
* Do they best ranked discussion user comment more than the average user ? 

### Let's see some User Activity 

In [None]:
df_user_grouping = activity_year.groupby('username').sum().reset_index()
df_user_grouping.head()

### Top 10 - Most Competitive kagglers ( Most submissions in this year so far )

In [None]:
sns.barplot(data = df_user_grouping.sort_values(by = 'submissions', ascending  = False).head(10), \
            x='submissions',y='username')
plt.show();

### Top 10 - Most Idea-Sharing kagglers ( Most discussions/comments in this year so far )

In [None]:
sns.barplot(data = df_user_grouping.sort_values(by = 'comments', ascending  = False).head(10), x='comments',y='username')
plt.show();

### Top 10 - Script Kiddos ! ( Most script runs in this year so far )

In [None]:
sns.barplot(data = df_user_grouping.sort_values(by = 'scripts', ascending  = False).head(10), x='scripts',y='username')
plt.show();

**Ahhh the attack of the bot -   the 'kerneler' - The kaggle kernel bot !! ( Lets remove the bots ! )**

In [None]:
#Removing the user - 'kerneler'
df_user_grouping =df_user_grouping[df_user_grouping['username'] != 'kerneler']

Rerun the plot after removing 'kerneler'

In [None]:
sns.barplot(data = df_user_grouping.sort_values(by = 'scripts', ascending  = False).head(10), x='scripts',y='username')
plt.show();

### Further ideas 
* Where some users more active on Weekends than on weekdays ?? What kind  of users ? 
* Who are the addicts ( Who log in every day and either comment )
* Who are the users who only comment ? 
* What proportion of the top users are here only for the competitions ? 

All these and more will be answered soon ! <br>
Ideas for new kernels welcome - Please create a new 'task' for other users to explore