### A long time ago, in a Kaggle galaxy far far away...


![starwars](https://img.elo7.com.br/product/zoom/10B3DF6/painel-star-wars-13-2-00-x-1-00-cenario-de-chao.jpg)

## Data Science Wars - Episode MMXIX - The Youth Menace

There were a lot of planets in the galaxy competing to see who was the best in Data Science...
Then, in the year 2017, an **Interplanetarial Organization** appeared to try to put order in the Data Science world. They are called **KAGGLE** and their followers are called **Kagglers**.

The **KAGGLE** organization started collecting valuable information about their followers living in each planet to rank them regarding their skills in the battlefield of Data Science across the galaxy.

All planets were measured by the **KAGGLE** according three skills:

**Skills:**
* Age of their population of Kagglers (the mean of the age): Measure of the **level of experience** of the planet
* Proportion of their population of Kagglers that posess a high degree (Master's Degree or Doctoral Degree): Measure of the **level of study** of the planet
* Salary of their population of Kagglers (median of the salary): Measure of the **level of welness** of the planet

### The Kaggle organization selected one proeminent Kaggler apprentice (called Kaggler Padawan) to analyze the scenario of the galaxy in each year and then see the evolution of all planets throughout the years...

## 1. Packing all tools necessary to the journey

To begin the journey through the galaxy, **Kaggler Padawan** first needs to gather all necessary tools onboard his spaceship "**Kagglenium Falcon**"...without this tools, he can't do anything, so let's pack everything and start the engines...

In [None]:
# Importing necessary libraries:
import os
import numpy as np # Linear Algebra
import pandas as pd # Data Processing, CSV file I/O (e.g. pd.read_csv)

# Data Visualization Libraries
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import plotly_express as px # Bubble Chart

## 2.  Rescuing information across the galaxy

The historical records of the planets were not very well organized, so **Kaggler Padawan** had some trouble organizing it, because the information for each year was stored in a different corner of the galaxy.

Here are the steps he took to collect, prepare, clean and combine all of the information...

In [None]:
# Importing competition datasets (from 2017 until 2019):
multiple_choice_file_2017 = '../input/kaggle-survey-2017/multipleChoiceResponses.csv'
multiple_choice_file_2018 = '../input/kaggle-survey-2018/multipleChoiceResponses.csv'
multiple_choice_file_2019  = '../input/kaggle-survey-2019/multiple_choice_responses.csv'

dataset2017  = pd.read_csv(multiple_choice_file_2017, encoding='ISO-8859-1')
dataset2018  = pd.read_csv(multiple_choice_file_2018)
dataset2019  = pd.read_csv(multiple_choice_file_2019)

# Importing conversion rates dataset:
conversion_rates_file_2017 = '../input/kaggle-survey-2017/conversionRates.csv'
dataset_conversionRates_2017 = pd.read_csv(conversion_rates_file_2017, usecols=['originCountry', 'exchangeRate'])

In [None]:
# Functions to convert range strings to numerical values:
# Functions inspired by this notebook: https://www.kaggle.com/iamleonie/japan-country-of-the-rising-women

def convert_string_to_numerical(range_string):
    mean = 0
    
    # Check if there are any number in the string:
    if any(i.isdigit() for i in range_string):
        range_string = range_string.replace(',', '').replace('+', '').replace(' ', '').replace('>', '').replace('<', '').replace('$', '').replace('years', '')
        
        # Split the string to the numbers of the range:
        range_array = np.array(range_string.split('-'))
        range_array = range_array.astype(np.float)
        
        # Calculate the mean value of the range:
        mean = np.mean(range_array)
    
    return mean

def dictionary_from_unique_values(dataframe, column_name):
    unique_values = dataframe[column_name].unique()
    dictionary = {}
    for i in range(len(unique_values)):
        if type(unique_values[i]) is str:
            dictionary[unique_values[i]] = convert_string_to_numerical(unique_values[i])
        else:
            dictionary[unique_values[i]] = 0
    
    return dictionary

In [None]:
# Data Preparation for 2017 dataset:

# Selecting just a few columns:
column_names = ['Age', 'GenderSelect', 'Country', 'FormalEducation', 'CurrentJobTitleSelect', 'CompensationAmount', 'CompensationCurrency']
dataset2017 = dataset2017[column_names]

# Changing the name of the colummns:
dataset2017.columns = ['Age', 'Gender', 'Country', 'Education', 'Role', 'Salary', 'CompensationCurrency']

# Converting column 'Salary' from str to float:
dataset2017 = dataset2017[dataset2017['Salary'].isin(['-99', '-1', '0', '-']) == False]

dataset2017['NumericSalaryFloat'] = dataset2017['Salary'].replace('140000,00', '140000')
dataset2017['NumericSalaryFloat'] = dataset2017['NumericSalaryFloat'].str.replace(',', '')
dataset2017['NumericSalaryFloat'] = dataset2017['NumericSalaryFloat'].astype('float32')

# Converting column Salary to USD currency:
dataset2017 = dataset2017.merge(dataset_conversionRates_2017, how='inner',
                                left_on='CompensationCurrency', 
                                right_on='originCountry' )

# Calculating the Salary in USD (the original value * the exchange rate):
dataset2017['NumericSalary'] = dataset2017['NumericSalaryFloat'] * dataset2017['exchangeRate']

# Deleting outliers manually:
dataset2017.drop(4394, inplace=True)
dataset2017.drop(3925, inplace=True)
dataset2017.drop(279, inplace=True)
dataset2017.drop(1388, inplace=True)
dataset2017.drop(4451, inplace=True)
dataset2017.drop(1411, inplace=True)

# Calculating the number of participants per country:
number_of_people = dataset2017.groupby(['Country']).size().to_frame(name='Number_of_people').reset_index()

# Calculating the number of people with Master or Doctoral degree per country:
high_degree = ['Master’s degree','Doctoral degree']
number_of_high_degree = dataset2017[(dataset2017.Education.isin(high_degree))].groupby(['Country']).size().to_frame(name='Number_of_high_degree').reset_index()

# Calculating the mean age per country:
mean_age = dataset2017.groupby(['Country'])['Age'].mean().to_frame(name='Mean_age').reset_index()

# Calculating the median salary per country:
median_salary = dataset2017.groupby(['Country'])['NumericSalary'].median().to_frame(name='Median_salary').reset_index()

# Merging all previous DataFrames in only one:
merge = pd.merge(number_of_people, number_of_high_degree, on='Country')
merge = pd.merge(merge, mean_age, on='Country')
data2017 = pd.merge(merge, median_salary, on='Country')

# Calculating Proportion of people with a high degree:
data2017['Proportion_high_degree'] = data2017['Number_of_high_degree'] * 100 / data2017['Number_of_people']
data2017['Year'] = 2017

In [None]:
# Data Preparation for 2018 dataset:

# Removing first row (same text as the questions):
dataset2018.drop(0, inplace=True)

# Selecting just a few columns:
column_names = ['Q2', 'Q1', 'Q3', 'Q4', 'Q6', 'Q9']
dataset2018 = dataset2018[column_names]

# Changing the name of the colummns:
dataset2018.columns = ['Age', 'Gender', 'Country', 'Education', 'Role', 'Salary']

# Converting column Salary from string to float:
dataset2018['NumericSalary'] =  np.where(dataset2018['Salary']=='0-10,000', 5000, 
                            np.where(dataset2018['Salary']=='10-20,000', 15000, 
                            np.where(dataset2018['Salary']=='20-30,000', 25000,
                            np.where(dataset2018['Salary']=='30-40,000', 35000,
                            np.where(dataset2018['Salary']=='40-50,000', 45000,
                            np.where(dataset2018['Salary']=='50-60,000', 55000,
                            np.where(dataset2018['Salary']=='60-70,000', 65000,
                            np.where(dataset2018['Salary']=='70-80,000', 75000,
                            np.where(dataset2018['Salary']=='80-90,000', 85000,
                            np.where(dataset2018['Salary']=='90-100,000', 95000,
                            np.where(dataset2018['Salary']=='100-125,000', 112500,       
                            np.where(dataset2018['Salary']=='125-150,000', 137500, 
                            np.where(dataset2018['Salary']=='150-200,000', 175000, 
                            np.where(dataset2018['Salary']=='200-250,000', 225000,
                            np.where(dataset2018['Salary']=='250-300,000', 275000,
                            np.where(dataset2018['Salary']=='300-400,000', 350000,   
                            np.where(dataset2018['Salary']=='400-500,000', 450000,     
                            np.where(dataset2018['Salary']=='500,000+', 500000,
                            np.where(dataset2018['Salary']== 'NaN', 0, 0                                                                      
                                )))))
                                )))))
                                )))))
                                ))))
dataset2018['NumericSalary'].astype('float32');

# Converting column Age from string to float:
label_encoding_2018 = {}
label_encoding_2018['Age'] = dictionary_from_unique_values(dataset2018, 'Age')
dataset2018.replace(label_encoding_2018, inplace=True)

# Calculating the number of participants per country:
number_of_people = dataset2018.groupby(['Country']).size().to_frame(name='Number_of_people').reset_index()

# Calculating the number of people with Master or Doctoral degree per country:
high_degree = ['Master’s degree','Doctoral degree']
number_of_high_degree = dataset2018[(dataset2018.Education.isin(high_degree))].groupby(['Country']).size().to_frame(name='Number_of_high_degree').reset_index()

# Calculating the mean age per country:
mean_age = dataset2018.groupby(['Country'])['Age'].mean().to_frame(name='Mean_age').reset_index()

# Calculating the median salary per country:
median_salary = dataset2018.groupby(['Country'])['NumericSalary'].median().to_frame(name='Median_salary').reset_index()

# Merging all previous DataFrames in only one:
merge = pd.merge(number_of_people, number_of_high_degree, on='Country')
merge = pd.merge(merge, mean_age, on='Country')
data2018 = pd.merge(merge, median_salary, on='Country')

# Calculating Proportion of people with a high degree:
data2018['Proportion_high_degree'] = data2018['Number_of_high_degree'] * 100 / data2018['Number_of_people']
data2018['Year'] = 2018

In [None]:
# Data Preparation for 2019 dataset:

# Removing first row (same text as the questions):
dataset2019.drop(0, inplace=True)

# Selecting just a few columns:
column_names = ['Q1', 'Q2', 'Q3', 'Q4', 'Q5', 'Q6', 'Q10']
dataset2019 = dataset2019[column_names]

# Changing the name of the colummns:
dataset2019.columns = ['Age', 'Gender', 'Country', 'Education', 'Role', 'CompanySize', 'Salary']

# Converting column Salary from string to float:
dataset2019['NumericSalary'] =  np.where(dataset2019['Salary']=='$0-999', 500, 
                            np.where(dataset2019['Salary']=='1,000-1,999', 1500, 
                            np.where(dataset2019['Salary']=='2,000-2,999', 2500,
                            np.where(dataset2019['Salary']=='3,000-3,999', 3500,
                            np.where(dataset2019['Salary']=='4,000-4,999', 4500,
                                     
                            np.where(dataset2019['Salary']=='5,000-7,499', 6250, 
                            np.where(dataset2019['Salary']=='7,500-9,999', 8750, 
                            np.where(dataset2019['Salary']=='10,000-14,999', 12500,
                            np.where(dataset2019['Salary']=='15,000-19,999', 17500,
                            np.where(dataset2019['Salary']=='20,000-24,999', 22500,
                                     
                            np.where(dataset2019['Salary']=='25,000-29,999', 27500,
                            np.where(dataset2019['Salary']=='30,000-39,999', 35000,
                            np.where(dataset2019['Salary']=='40,000-49,999', 45000,
                            np.where(dataset2019['Salary']=='50,000-59,999', 55000,
                            np.where(dataset2019['Salary']=='60,000-69,999', 65000,
                                     
                            np.where(dataset2019['Salary']=='70,000-79,999', 75000,
                            np.where(dataset2019['Salary']=='80,000-89,999', 85000,
                            np.where(dataset2019['Salary']=='90,000-99,999', 95000,
                            np.where(dataset2019['Salary']=='100,000-124,999', 112500,
                            np.where(dataset2019['Salary']=='125,000-149,999', 137500,
                                     
                            np.where(dataset2019['Salary']=='150,000-199,999', 175000,
                            np.where(dataset2019['Salary']=='200,000-249,999', 225000,
                            np.where(dataset2019['Salary']=='250,000-299,999', 275000,
                            np.where(dataset2019['Salary']=='300,000-500,000', 400000,
                            np.where(dataset2019['Salary']=='> $500,000', 500000, 
                            np.where(dataset2019['Salary']== 'NaN', 0, 0                                        
                                )))))
                                )))))
                                )))))
                                )))))
                                ))))) )
dataset2019['NumericSalary'].astype('float32');

# Converting column Age from string to float:
label_encoding_2019 = {}
label_encoding_2019['Age'] = dictionary_from_unique_values(dataset2019, 'Age')
dataset2019.replace(label_encoding_2019, inplace=True)

# Calculating the number of participants per country:
number_of_people = dataset2019.groupby(['Country']).size().to_frame(name='Number_of_people').reset_index()

# Calculating the number of people with Master or Doctoral degree per country:
high_degree = ['Master’s degree','Doctoral degree']
number_of_high_degree = dataset2019[(dataset2019.Education.isin(high_degree))].groupby(['Country']).size().to_frame(name='Number_of_high_degree').reset_index()

# Calculating the mean age per country:
mean_age = dataset2019.groupby(['Country'])['Age'].mean().to_frame(name='Mean_age').reset_index()

# Calculating the median salary per country:
median_salary = dataset2019.groupby(['Country'])['NumericSalary'].median().to_frame(name='Median_salary').reset_index()

# Merging all previous DataFrames in only one:
merge = pd.merge(number_of_people, number_of_high_degree, on='Country')
merge = pd.merge(merge, mean_age, on='Country')
data2019 = pd.merge(merge, median_salary, on='Country')

# Calculating Proportion of people with a high degree:
data2019['Proportion_high_degree'] = data2019['Number_of_high_degree'] * 100 / data2019['Number_of_people']
data2019['Year'] = 2019

In [None]:
# Concatenating all datasets together (2017, 2018 and 2019):
data_per_country = pd.concat([data2017, data2018, data2019], axis=0)

In [None]:
# Correcting the name of some countries:
data_per_country['Country'] = np.where(data_per_country['Country']=='Hong Kong (S.A.R.)', 'Hong Kong', 
                       np.where(data_per_country['Country']== 'Iran, Islamic Republic of...', 'Iran', 
                       np.where(data_per_country['Country']=='People \'s Republic of China', 'China',
                       np.where(data_per_country['Country']=='Republic of China', 'China',
                       np.where(data_per_country['Country']=='United Kingdom of Great Britain and Northern Ireland', 'United Kingdom', 
                       np.where(data_per_country['Country']=='United States of America', 'United States', data_per_country['Country']
                               ))))))

In [None]:
# Creating the column Continent:
data_per_country['Continent'] = np.where(data_per_country['Country'].isin(['Argentina', 'Brazil', 'Chile', 'Colombia', 'Peru']), 'Latin America',
                         np.where(data_per_country['Country'].isin(['Canada', 'United States', 'Mexico']), 'North America',
                         np.where(data_per_country['Country'].isin(['Australia', 'New Zealand']), 'Oceania',
                         np.where(data_per_country['Country'].isin(['Czech Republic', 'Japan', 'Singapore', 'India', 'Pakistan', 'Bangladesh'
                                                             , 'Iran', 'Israel', 'Indonesia', 'China', 'South Korea', 'Hong Kong'
                                                             , 'Malaysia', 'Philippines', 'Republic of Korea', 'Saudi Arabia'
                                                            ,'Taiwan', 'Thailand', 'Turkey', 'Viet Nam']), 'Asia',
                         np.where(data_per_country['Country'].isin(['Egypt', 'Nigeria', 'South Africa', 'Algeria', 'Kenya', 'Morocco', 'Tunisia']), 'Africa',
                         np.where(data_per_country['Country'].isin(['Belgium', 'Denmark', 'Finland', 'France', 'Germany', 'Greece', 'Hungary', 
                                                             'Ireland', 'Italy', 'Netherlands', 'Norway', 'Poland', 'Portugal', 'Romania', 
                                                             'Spain', 'Sweden', 'Switzerland', 'United Kingdom', 'Austria', 'Belarus', 
                                                             'Ukraine', 'Russia']), 'Europe'
                                  , 'Unknown'
                                 ))))))


In [None]:
# CALCULATING STATISTICS AGGREGATED BY CONTINENT:

# Calculating the number of participants per continent:
number_of_people = data_per_country.groupby(['Year','Continent'])['Number_of_people'].sum().reset_index()

# Calculating the number of people with Master or Doctoral degree per continent:
number_of_high_degree = data_per_country.groupby(['Year','Continent'])['Number_of_high_degree'].sum().reset_index()

# Calculating the mean age per continent:
data_per_country['Number_times_Mean'] = data_per_country['Number_of_people'] * data_per_country['Mean_age']
mean_age = data_per_country.groupby(['Year','Continent'])['Number_times_Mean'].sum().reset_index()
mean_age['Mean_age'] = mean_age['Number_times_Mean'] / number_of_people['Number_of_people']

# Calculating the mean salary per continent:
data_per_country['Number_times_Salary'] = data_per_country['Number_of_people'] * data_per_country['Median_salary']
mean_salary = data_per_country.groupby(['Year','Continent'])['Number_times_Salary'].sum().reset_index()
mean_salary['Mean_salary'] = mean_salary['Number_times_Salary'] / number_of_people['Number_of_people']

# Merging all previous DataFrames in only one:
merge = pd.merge(number_of_people, number_of_high_degree, on=['Year', 'Continent'])
merge = pd.merge(merge, mean_age, on=['Year', 'Continent'])
data_per_continent = pd.merge(merge, mean_salary, on=['Year', 'Continent'])

data_per_continent['Proportion_high_degree'] = data_per_continent['Number_of_high_degree'] * 100 / data_per_continent['Number_of_people']

## 3.  Preparing the maps of the galaxy

**Kaggler Padawan** created the maps below of the entire galaxy to see where each planet is inside the Data Science Wars:
* Map of 2017
* Map of 2018
* Map of 2019
* Hologram of the history of the galaxy, from 2017 to 2019
* Hologram of the galaxy, grouped by Star Clusters that each planet is located

### Back in 2017, planets were gathered into two groups across the galaxy...

* The planets were united into two groups:
   * **The Rebels**: planets with low experience (younger) and low level of study (low level of high degree)
   * **The Empire**: planets with more experience (older) and some level of study (some level of high degree)


* The United States planet is the one with more Kagglers than any other planet (1243 Kaggle followers). It has the highest welness between the Kagglers (median of USD 108,000 per Kaggler), but the level of study were not very good (~26% the level of high degree) and the population was not the least experienced (~36 years, on average)


* Some planets does not belong to any of the groups, they are isolated on the galaxy, as seen on the map below:
   * The China planet is one of the least experienced (~28 years), but with a good level of study (~33% the level of high degree)
   * The New Zealand planet is one of the most experienced (~38 years), but even with this level of experience, it has a low level of study (~17% the level of high degree)
   * The Switzerland planet, altough not the most experienced (36 years), is the one with the highest level of study (~45%)

In [None]:
# Bubble chart (Per country in 2017):
data_per_country2017 = data_per_country[data_per_country['Year']==2017]

px.scatter(data_per_country2017, x='Proportion_high_degree', y='Mean_age', opacity=0.5,
           size='Median_salary', color='Country', hover_name='Country'
           ,title='In 2017, planets were sepparated into two clusters: the younger ones with low high degree and the more experienced with some high degree'
           ,log_x = False , color_discrete_sequence=px.colors.qualitative.Dark24
           ,size_max=60 , range_x=[0, 50], range_y=[25,41]
          ,labels=dict(Proportion_high_degree='Level of study (%)', Country='Planet', Number_of_people='# of Kagglers',
                       Mean_age='Level of experience (years)', Median_salary='Level of welness (USD)')
          )

### After a long year of battle, in 2018, almost every planet increased the level of study:

* The planets are much more similar now: **The Rebels** and **The Empire** are fighting in a very fair battlefield, where the planets have approximately the same level of experience and the same level of study, with some exceptions:

    * Some planets located in the African Star Cluster (planets Nigeria and Egypt) have the lowest levels of study, comparing with the other planets (~32% the level of high degree) 
    * The United States planet increased a lot the level of study (~66%) and new volunteers joined the Kaggler's force, so it became younger, on average (~32 years). Because of this, the level of welness went down (~USD 65,000)
    * The France planet invested a lot in its Kagglers, because the level of study skyrocketed from 33% to 91%, approximately
    


In [None]:
# Bubble chart (Per country in 2018):
data_per_country2018 = data_per_country[data_per_country['Year']==2018]

px.scatter(data_per_country2018, x='Proportion_high_degree', y='Mean_age', opacity=0.5,
           size='Median_salary', color='Country', hover_name='Country'
           ,title='In 2018, after only one year, almost every planet increased the level of study'
           ,log_x = False , color_discrete_sequence=px.colors.qualitative.Dark24
           ,size_max=60 , range_x=[30, 95], range_y=[24,36]
          ,labels=dict(Proportion_high_degree='Level of study (%)', Country='Planet', Number_of_people='# of Kagglers',
                       Mean_age='Level of experience (years)', Median_salary='Level of welness (USD)')
          )

### The struggle never ends since the battle continues year after year. In 2019, the level of study of the planets didn't change significantly, but the level of experience increased a little bit...

* On average, the planets were somewhat stable concerning the level of study, but the level of experience went up by less than 10%
* The United States planet increased again the level of study (~69%) but, on average, the level of experience increased (~36 years). The level of welness increased as well (~USD 85,000)
* The Egypt planet continues with low level of study (~29%)
* There are a lot of planets with low level of welness, even though some of them have a high level of study (for example, Morocco, Iran)

In [None]:
# Bubble chart (Per country in 2019):
data_per_country2019 = data_per_country[data_per_country['Year']==2019]

px.scatter(data_per_country2019, x='Proportion_high_degree', y='Mean_age', opacity=0.5,
           size='Median_salary', color='Country', hover_name='Country'
           ,title='In 2019, almost every planet maintained the level of study'
           ,log_x = False , color_discrete_sequence=px.colors.qualitative.Dark24
           ,size_max=60 , range_x=[28, 91], range_y=[24,41]
          ,labels=dict(Proportion_high_degree='Level of study (%)', Country='Planet', Number_of_people='# of Kagglers',
                       Mean_age='Level of experience (years)', Median_salary='Level of welness (USD)')
          )

### Now, Kaggler Padawan sent us a hologram with an animation of all planets along the years...

In the below hologram, each color represent one Star Cluster that exists across the galaxy.

Some planets are not fully discovered yet, so they are identified together in the Unknown Star Cluster.

In [None]:
# Bubble chart (Per country):
px.scatter(data_per_country, x='Proportion_high_degree', y='Mean_age', animation_frame='Year', animation_group='Country', opacity=0.5,
           size='Median_salary', color='Continent', hover_name='Country', title='Planets were separated in two groups, but eventually became very similar'
           ,log_x = False , color_discrete_sequence=px.colors.qualitative.Dark24
           ,category_orders={'Continent': ['North America', 'Latin America', 'Asia', 'Europe', 'Africa', 'Oceania', 'Unknown']}
           ,size_max=45 , range_x=[0, 95], range_y=[25,40]
          ,labels=dict(Proportion_high_degree='Level of study (%)', Country='Planet', Number_of_people='# of Kagglers', Continent='Star Cluster',
                       Mean_age='Level of experience (years)', Median_salary='Level of welness (USD)')
          )


### In the below hologram, Kaggler Padawan grouped the information into the Star Clusters...

In [None]:
# Bubble chart (Per continent):
px.scatter(data_per_continent, x='Proportion_high_degree', y='Mean_age', animation_frame='Year', animation_group='Continent', opacity=0.5,
           size='Mean_salary', color='Continent', hover_name='Continent', title='Asia Star Cluster (green circle) increased more than 3 times its level of study (from 13% to 46%)'
           ,log_x=False
           ,color_discrete_sequence=px.colors.qualitative.Dark24
           ,category_orders={'Continent': ['North America', 'Latin America', 'Asia', 'Europe', 'Africa', 'Oceania', 'Unknown']}
           ,size_max=45, range_x=[10, 80], range_y=[25, 40]
          ,labels=dict(Proportion_high_degree='Level of study (%)', Country='Planet', Number_of_people='# of Kagglers', Continent='Star Cluster',
                       Mean_age='Level of experience (years)', Mean_salary='Level of welness (USD)')
          )

## Conclusion

Although the planets were separated into two groups, nowadays they are begining to come to an agreement between themselves.

This is an endless war, which has the society as the biggest winner. The knowledge produced by The Rebels and The Order is priceless. The evolution triggered by these groups is enormous. We can only hope to the next years that the Kaggle followers all over the planets keep engaged on making the galaxy smarter and better...