**Introduction**

According to Wikipedia, middle age is the period of age beyond young adulthood but before the onset of old age typically between 45 and 65 years in the United States. Britanica.com defines middle age as the period of human adulthood that immediately precedes the onset of old age. It sets the age bracket between 40 and 60 years. However, for this study, middle age shall covers the age groups 45 years to 59 years because the retirement age in many countries is 60 years.

**Assumptions**


For this study, data analysts/scientists is used broadly to cover all respondents covering the scope defined by question 5. The assumption here is that all respondents work with data in their official capacities. The terms data analysts and data scientists shall be used interchangeably in this study.

**Motivation**

For the youths the future is tomorrow but for the middle aged people the future is now. What they have achieved becomes the present realities of their families and countries. I strongly believe that the youth defines the future while the middle age people defines today. Therefore, you can trace the current success and failure of a family or a country to the current status of its middle age population 

My late grandfather would always tell us: “the best story to tell is your own story, look for the patterns in someone else’s story that matches yours, that way, you will tell a truthful and compelling story that flows from beginning to the end.”

I recently joined Kaggle, and I am less than a year old as a data scientist/analyst. I am interested in discovering the journey of people in my age group in the data science world. Of particular interest to me is this question: do we have people like me who became data scientists at this period regarded as the twilight of our careers? Other questions of interest shall be stated at the beginning of every section.

**Selling Point**

The selling point of this study is the establishment of a link between experienced middle age data scientists and economic growth and advancement of nations and why some continents are regarded as developed, developing or less developed. This provides some insights for further research into making data science an enduring career towards positively affecting the growth of nations.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

In [None]:
dataset_19 = pd.read_csv("/kaggle/input/kaggle-survey-2019/multiple_choice_responses.csv",low_memory=False)

**Middle Aged-data Analyst: Demography and General Information**

This covers information personal like: Age, Gender, Resident Country, Formal Education, Current or last official title at work. Yearly Compensation, ML Cost in the last five years and years of experience in coding. This scope covers the parameters set by survey questions: 1,2,3,4,5,10,11 and 15

In [None]:
# Renaming the columns for easy understanding of their designations 
demograpy_19 = dataset_19.rename(columns={'Q1':'AgeYears', 'Q2':'Gender', 'Q3':'ResidentCountry', 'Q4':'HighestFormalEdu',
                                   'Q5': 'OfficialTitle', 'Q10':'YearlyCompensation', 'Q11':'MLCost5Years', 
                                    'Q15':'CodingYears', 'Q23':'MLExperienceYears'}, inplace=True)

#Creating the Demograpy and general Information Subset
demograpy_19 = dataset_19[['AgeYears', 'Gender','ResidentCountry','HighestFormalEdu','OfficialTitle','YearlyCompensation', 
                     'MLCost5Years','CodingYears', 'MLExperienceYears']]

# Dropping the questions row to make for easy data viewing 
demograpy_19 = demograpy_19.drop([0], axis =0)
demograpy_19.head(3)

In [None]:
demograpy_19.info()

In [None]:
#Creating the Middle Age Dataset for Demography and General Information
# MADS_19 = Middle Aged Data Scientists 2019 
a45_49 = demograpy_19['AgeYears'] == '45-49'
a50_54 = demograpy_19['AgeYears'] == '50-54'
a55_59 = demograpy_19['AgeYears'] == '55-59'

MADS_19 = pd.concat([(demograpy_19[a45_49]), (demograpy_19[a50_54]), (demograpy_19[a55_59])])
MADS_19.head(10)

In [None]:
MADS_19.info()

In [None]:
#Get the Unique Age Frequencies of the Middle Age Group

MADS_19["AgeYears"].value_counts() 

In [None]:
# Question: What is the percentage spread of each of the age group in the middle age dataset?

MADS_19["AgeYears"].value_counts().plot.pie(title= "Middle Age Spread",figsize=(5,5), autopct= '%1.1f%%')


In most profession or career space, population of professionals reduce as people age or grow older. Data Science is not an exeption. The percentage falls drastically from 46% at age 45-49 years to 20.5% at age 54 – 59 years.

Compare this figures with the general dataset below, there was an upward swing initially:  
Age 18 – 21 = 12.7%
Age 22 – 24 = 18.3%
Age 25 – 29 = 22.6%
Why do we have a fall at such a young age group? That question is beyond the scope of this study, however, you can read more on the subject matter from the link below. The writer is entitled to his opinions

https://towardsdatascience.com/why-so-many-data-scientists-are-leaving-their-jobs-a1f0329d7ea4

In [None]:
demograpy_19["AgeYears"].value_counts().plot.pie(title= "Middle Age Spread",figsize=(9.5,9.5), autopct= '%1.1f%%')

In [None]:
# Middle Age Gender Distribution
# The middle age gender distribution is highly skewed toward the male gender at the ratio of 7.08 men to 1 woman

MADS_19["Gender"].value_counts() 

In [None]:
# Education and Qualification Distribution for the middle aged people in Data Science 
# Higest formal education for the middle aged people

MADS_19["HighestFormalEdu"].value_counts().plot.bar(title= 'Highest Formal Education',figsize=(7,7))

As expected the middles aged data scientists are more qualified with respect to education in whatever field. With Master’s and Doctoral degrees ranking first and second respectively. 

In [None]:
# The distribution of the current official titles of the middle age group
# We have more Data Scientists however,the Skewness is not one-sided as every career has its fair share   
MADS_19["OfficialTitle"].value_counts()

In [None]:
# Yearly Compensation of middle aged data scientists
# This data is as expected, compensation and years of experience are positively correlated. 
# However there is an exception in line 3, where 118 middle aged people earn below $1,000
# This requires which need more investigation for insight. 
# The question Pertaining  to this is raised below
MADS_19["YearlyCompensation"].value_counts() 

* Why do we have middle aged people on below 1,000 USD yearly?
* What is their year of experience in data science?
* Which countries are they?
* How much have they expended on ML in the last 5 years?

In [None]:
# below_1k = middle aged people earning between $0-999 yearly
below_1k = MADS_19[(MADS_19.YearlyCompensation == '$0-999')]
below_1k.head()

In [None]:
below_1k.info()

In [None]:
below_1k.groupby(["ResidentCountry"])[["YearlyCompensation"]].size()

My initial suspicion that most of those earning below 1,000 USD are likely to be from 3rd world countries was proved wrong by data, as advanced economies of USA and Japan accounted for most of them (29 and 12 respectively). India and Russia have 8  and 7 middle aged people earning below 1,000K yearly respectively 

In [None]:
below_1k.groupby(["CodingYears"])[["YearlyCompensation"]].size()

CodingYears is regarded as years of experience. Machine Learning as it is today is relatively new. For middle aged people who have spent over 20 years in the profession years of coding in whatever form is a more appropriate representation of their years of experience in the Data Science space.

Also my initial suspicion is not supported by data. I thought their inexperience accounted for the low compensation. However, data shows differently, we have 22 middle aged data scientists who have 10 years and above  

In [None]:
below_1k.groupby(["MLCost5Years"])[["YearlyCompensation"]].size()

Your spending on ML should be commensurable to your earnings. Though most of these people (47 people) have spent nothing on ML in the last 5 years, but there are also people in this group who have spent quite some money. 6 people have spent between 10,000 USD and 100,000 USD 13 people have spent between 1 USD and 10,000 USD while 7 people have spent over 100,000 USD  

How much has been expended on Machine Learning in the last 5 years by Middle Aged Data Scientists

In [None]:
MADS_19["MLCost5Years"].value_counts().plot.pie(title= "ML COSTS OVER 5 YEARS",figsize=(7,7), autopct= '%1.1f%%')


A larger percentage (27.5 percent) have spent 0 USD on or towards Machine Learning in the last 5 years. That is understandable for the younger generation who may have just graduated from school and without enough income to expend on data science. Important questions to ask include:

What has personal and career scale of preference got to do with it for middle-aged people?
Do they have other professional training where they would rather expend part of their income rather than on Data Science training?

Are companies reluctant to spend on middle-aged data scientists?
Do middle-aged data scientists see the free and open source ML materials online as enough?
Maybe more experienced data scientists do not see the need to incur additional cost on ML in the last 5 years having spent so much on data science before the last 5 years?

On the personal front I chase after all the free open source packages and materials online because I have dependents and other responsibilities which take a larger percentage of my disposable income.

**Years of Experience of Middle Aged Data Scientist**


CodingYears is regarded as years of experience. Machine Learning as it is today is relatively new. For middle aged people who have spent over 20 years in the profession years of coding in whatever form is a more appropriate representation of their years of experience in the Data Science space.

In [None]:
MADS_19["CodingYears"].value_counts().plot.pie(title= "Years of Experience",figsize=(9,9), autopct= '%1.1f%%')

I am particularly interested in those with years of experience from 0 to 2 years. The constitute 29.7percent (15.5% + 14.2%) of the entire middle age people in data science according to the survey. If you combine this with the 17.3 percent who have between 3 to 5 years of experience then, you have a relatively a large number of middle-aged people (47 percent of middle-aged people currently in the profession) just coming into the data science space. It is expected that at age between 45 and 59, you should be at the twilight of your career. Why are many middle-aged people coming into the data science space?

The main dataset for all age groups shows that we have more youthful people than middle-aged people right now in the data science career space. Some questions comes to mind

Is data science an enduring career? Will the youths entering the space right now retire as data scientists? Is there a Data Science lure for young people? How sustainable is this lure - do young data scientist get disenchanted along the way before the Middle Ages? Is there any transitional or fail-safe mechanism in place to help youths navigate the Data Science space during their middle ages?

Data science is largely still evolving and the need to learn new topics is constantly increasing. Two important questions comes to mind based on the foregoing. Therefore, is there any truth in the myth that says as you grow in age the drive to learn new things slows down? Can we do away with the experience of the middle-aged people just because they might be slowing down?

Is the middle-aged period an endangered period for data scientist? Are the middle aged data scientists endangered? Will they have a fair chance for job openings like their youthful counterparts?

I fall within this group as I am less than a year old in data science. Though I had done data analysis outside machine learning and artificial intelligence. I have a master's degree in Economics, I am familiar with regression in Econometrics.

For me, mid-life crises and self-rediscovery have a huge role to play in my venturing into data science as we know it today. Also, middle-aged people always want to have a backup or contingency plan for income generation especially in their old ages. If you have a different experience kindly comment below?

Distribution of Middle Aged Data Scientists by Continents 

Grouping data scientists by continents may provide more insights with respect to the reasons why a group of nation is termed developed, developing and less developed. The following questions arise:

1.	What is the skewness of middle aged more experienced data scientists according to continents?  
2.	What is the current growth/development terminology given to such nations with more data scientists and those with less data scientists? 
3.	Is it possible to determine growth and development based on the presence of experienced data scientists in nations? 


In [None]:
# Creating lists of continents according to the countries in the dataset and using conditional elif statement to match
# each country to its continent
asia = ['India', 'Japan', 'Taiwan', 'South Korea', 'Singapore', 'Indonesia', 'Hong Kong(S.A.R.)',
        'Pakistan', 'China', 'Thailand', 'Saudi Arabia', 'Bangladesh', 'Iran, Islamic Republic of...',
        'Philippines', 'Malaysia', 'Viet Nam', 'Republic of Korea']
northAmerica = ['United States of America', 'Canada', 'Mexico']
southAmerica = ['Brazil', 'Argentina', 'Colombia', 'Chile', 'Peru']
europe = ['Spain', 'United Kingdom of Great Britain and Northern Ireland', 'Italy', 'Germany', 'Russia', 'Portugal',
          'Turkey', 'Poland', 'France', 'Netherlands', 'Israel', 'Sweden', 'Ireland', 'Belgium', 'Ukraine', 'Greece',
          'Austria', 'Norway', 'Hungary', 'Switzerland', 'Romania', 'Czech Republic', 'Belarus', 'Denmark']
africa = ['Morocco', 'Tunisia', 'South Africa',
          'Nigeria', 'Kenya', 'Egypt', 'Algeria']
other = ['Other']
australia = ['Australia', 'New Zealand']


def GetContinents(Country):
    if Country in asia:
        return "Asia"
    elif Country in europe:
        return "Europe"
    elif Country in africa:
        return "Africa"
    elif Country in australia:
        return "Australia"
    elif Country in northAmerica:
        return "North America"
    elif Country in southAmerica:
        return "South America"
    else:
        return "Other"
MADS_19["ResidentContinent"] = MADS_19["ResidentCountry"].apply(lambda x:GetContinents(x))
demograpy_19["ResidentContinent"] =  demograpy_19["ResidentCountry"].apply(lambda x:GetContinents(x))

MADS_19.head()

In [None]:
    MADS_19.info()

In [None]:
# Resident Continents of middle aged data scientists
MADS_19["ResidentContinent"].value_counts() 

In [None]:
MADS_19["ResidentContinent"].value_counts().plot.bar(title= 'Middle-Aged Data Scientists by Continents',figsize=(7,7))

According to Peter Drucker: “If you can’t measure it, you can’t improve it” Therefore, if you don’t have data scientist or if data science is not on the front burner, how do you measure growth indices and consequently improve on them. 

Evaluating the relationship between economic growth and development with respect to active data science practices is beyond the scope of this study however, a causal reference is made here. 

In my introduction, I said "For youths the future is tomorrow but for the middle aged people the future is now what they have achieved becomes the present realities of their families and countries." The above bear a semblance of the way we currently define Developed, Developing and Less Developed 

The most advanced nations of North America and Europe regarded as developed economies ranked first and second, while the nations of Asia which has witnessed giant strides in economic and tech development in recent years and generally regarded as developing economies ranks 3rd. Africa with over 50 countries regarded as less developed ranks last on the log.

* Why is this data/ranking important? 
It is important because in the ranking by continents from the general dataset below, which considers all age groups is different from this middle-aged dataset.  Asia with 39.3% ranks the highest continent in the general dataset and the age distribution coming from Asia shows that the youths are the driving force behind their numbers. However, we have more middle aged data scientists in Europe and North America, and both are regarded as advanced economies. 

In [None]:
# Resident Continents of data scientists of all age group
demograpy_19["ResidentContinent"].value_counts() 

In [None]:
demograpy_19["ResidentContinent"].value_counts().plot.pie(title= "All Ages Data Scientists by Continents",figsize=(10,10), autopct= '%1.1f%%')

In [None]:
# The presence of experienced data scientists corresponds to economic and tech advancement  
MADS_19.groupby(["ResidentContinent"])[["CodingYears"]].describe()

Look at the middle-aged coding years of experience coming from the advanced economies of North America, Europe and Australia == 20+ years. While Africa, Asia and South America are below 1 year and belwo 2 years respectively

However, the downside to this analysis is the result of the machine learning years of experience below, where the advanced economies didn’t do to well with respect to middle age Machine Learning years of experience. However, I have said earlier that:

CodingYears is regarded as years of experience. Machine Learning as it is today is relatively new. For middle aged people who have spent over 20 years in the profession years of coding in whatever form is a more appropriate representation of their years of experience in the Data Science space.

In [None]:
MADS_19.groupby(["ResidentContinent"])[["MLExperienceYears"]].describe()

In [None]:
# Looking at the ML years of experience shows that most middle aged people have below 5 years of experience, 
# in fact many of them have less than 2 years of experience
MADS_19["MLExperienceYears"].value_counts() 

However, before we go away, can we please look for the middle aged people who have from 20+ years of experience and the continents where they are located? 

In [None]:
# Middle aged data scientists who have 20+ years in ML
ML20years = MADS_19[(MADS_19.MLExperienceYears == '20+ years')]

ML20years.head()

In [None]:
ML20years.info()

In [None]:
# Middle-aged with 20+ years in ML grouped by continents
ML20years.groupby(["ResidentContinent"])[["MLExperienceYears"]].describe()

Europe and America has 43 and 50 middle aged people respectively who have over 20 years’ experience in Machine Learning. Asia the biggest continent by size and population has 8 while Australia has 3. Africa has none. Therefore, our earlier conclusion that there is a positive link between concentration of experienced middle aged people in data science and or Machine Learning and economic growth and advancement of nations

Again can we check those 8 middle aged data scientists from Asia who have spent over 20 years in Machine Learning? Which country are they located? Probably most of them will be from China and Japan both of which are advanced economies

Checking for the 8 20+ years of ML Experience from Asia. 
My 3 guesses are: 
* China
* Japan
* India

In [None]:

ML20yearsAsia = ML20years[(ML20years.ResidentContinent == 'Asia')]

ML20yearsAsia.head()

In [None]:
#Oops!!!!!!!!, I was dead wrong with China however, India and Japan are in the list
ML20yearsAsia.groupby(["ResidentCountry"])[["MLExperienceYears"]].describe()

Why am I not feeling this sense of satisfaction? China is said to be industralizing very fast, maybe they would be included in 10-15 years of experience

In [None]:
ML10years = MADS_19[(MADS_19.MLExperienceYears == '10-15 years')]
China = ML10years[(ML10years.ResidentCountry == 'China')]
China.info()

#Oops!!!!!!!!, just 1 entry from China. Where are my middle-aged Chinese data scientists? 
#Remember that there are 87 middle-aged people in Asia who have 10-15 years’ experience in ML


**Group of Most Industrialized Nations G8**

Can we look at the G8 countries - the 8 most industrialized nations in the world?
Especially to evaluate the years of coding and years of experience with ML. Basically to further show that the concentration of middle-aged data scientists has strong connect with economic growth

In [None]:
G8MiddleAged = MADS_19[(MADS_19.ResidentCountry == 'Russia')|(MADS_19.ResidentCountry == 'Japan')|
                       (MADS_19.ResidentCountry == 'France')|(MADS_19.ResidentCountry == 'Italy')|
                       (MADS_19.ResidentCountry == 'Germany')|
                       (MADS_19.ResidentCountry == 'United Kingdom of Great Britain and Northern Ireland')|
                       (MADS_19.ResidentCountry == 'United States of America')|(MADS_19.ResidentCountry == 'Canada')]
G8MiddleAged.head(2)

In [None]:
G8MiddleAged.info()

In [None]:
# Checking the percentage of middle-aged data scientists in G8 countries with respect to years of coding

G8MiddleAged.groupby(["CodingYears"]).size().plot.pie(title= "G8 Middle-Aged Data Scientists by Frequency of Years of Coding",figsize=(10,10), autopct= '%1.1f%%')

The percentage of those who have spent 20+ years coding is higer for the G8 countries(pie chart above) than when all countries are put togther(pie chart below). However, it is worthy to know that the higest percentage for both is those who have spent 20+ years

In [None]:
# Checking the percentage of middle-aged data scientists in all countries with respect to years of coding

MADS_19.groupby(["CodingYears"]).size().plot.pie(title= "Middle-Aged Data Scientists by Frequency of Years of Coding",figsize=(10,10), autopct= '%1.1f%%')


**Part Two**

My job in this section is very simple: 

Evaluate the data science/Machine learning choices made by middle-aged people with respect to media platforms, products, IDEs, notebooks, programming languages, Data tools etc. 

Are their choices different from those of the general age groups?
Maybe I will be lucky to show that: 

What a youthful data scientists can do a middle aged data scientist can do better with his or her experience.

If our choices are not different, it means we can compete together within the same space, then there shouldn’t be this age specifications with respect to hiring data scientists


                                                Part 2: Section 1

                                Data Platforms, Programming Languages and IDEs


When I became interested in Data Science, I was coming from the MS excel background, I actually started with VBA. However. The online platform that I joined changed my entire perception of data forever. I regularly asked questions like: What IDE is best for data analysis? What programming Language is the best for data analysis etc.  The media platform where you find yourself as a data analyst go a long way to determine certain choices you made along the way.

                                            Data Science Courses Platforms

In [None]:
# Data Science Courses Platform --> DSCP
# Rename the columns for easy understanding
DSCP1 =dataset_19.rename(columns={'Q1':'AgeYears','Q13_Part_1':'Udacity', 'Q13_Part_2':'Coursera', 'Q13_Part_3':'edX', 
                                    'Q13_Part_4':'DataCamp', 'Q13_Part_5': 'DataQuest', 
                                    'Q13_Part_6':'KaggleCourses','Q13_Part_7':'Fast_ai', 
                                    'Q13_Part_8':'Udemy', 'Q13_Part_9':'LinkedIn', 
                                    'Q13_Part_10':'UniversityCourses', 'Q13_Part_11':'None', 
                                    'Q13_Part_12':'Other'}, inplace=True)

#Create the Data Science Courses Platform for all age groups
DSCP1 = dataset_19[['AgeYears','Udacity', 'Coursera', 'edX','DataCamp', 'DataQuest', 'KaggleCourses','Fast_ai', 'Udemy', 
                     'LinkedIn','UniversityCourses','None', 'Other']]

#Drop the row containing survey questions
DSCP1=DSCP1.drop([0], axis =0)
DSCP1.head(3)

In [None]:
DSCP1.info()

In [None]:
#Drop the Age column to stack the course platforms together
DSCP=DSCP1.drop(['AgeYears'], axis =1)

# Stack all the platforms into 1 column == CoursesPlatforms
DSCP = pd.concat([DSCP, DSCP.T.stack().reset_index(name='CoursesPlatforms')['CoursesPlatforms']], axis=1)
DSCP.info()

In [None]:
#Creating the Middle Age Data Science Courses Platforms Dataset --> MADS_DSCP
#getDSCP1
MADS_DSCP = DSCP1[(DSCP1.AgeYears == '45-49') | (DSCP1.AgeYears == '50-54') | (DSCP1.AgeYears == '55-59')]
MADS_DSCP.head()

In [None]:
MADS_DSCP.info()

In [None]:
#Stack the Middle-aged Course Platform into one column
MADS_DSCP=MADS_DSCP.drop(['AgeYears'], axis =1)
MADS_DSCP = pd.concat([MADS_DSCP, MADS_DSCP.T.stack().reset_index(name='CoursesPlatforms')['CoursesPlatforms']], axis=1)
MADS_DSCP.info()

In [None]:
# All Age groups Data Science Courses Platforms choices 

DSCP["CoursesPlatforms"].value_counts().plot.barh(title="Data Science Courses Platform",figsize=(8,8))

In [None]:
# Middle age groups Data Science Courses Platforms choices 

MADS_DSCP["CoursesPlatforms"].value_counts().plot.barh(title="Data Science Courses Platform",figsize=(8,8))

The choice of Data Science Courses Platforms are broadly the same except for the fact that middle-aged data scientists prefer more professional courses rather than university courses. University courses rated 4th in the all aged groups but 6th in the middle age group.


Despite these figures we should be concerned with the percentage completion rate. The average completion rate for MOOCs, hovers between 5-15 percent. The question that interest me is: 

Which age group is more likely to complete an online data science course? I got some insights from the article below with respect to how we should view MOOCs

https://www.edsurge.com/news/2018-11-28-stop-asking-about-completion-rates-better-questions-to-ask-about-moocs-in-2019

                                        Data Science Media Sources

In [None]:
# Data Science Media Sources --> DSMS
# Rename the columns for easy understanding
dataset_19 = pd.read_csv("/kaggle/input/kaggle-survey-2019/multiple_choice_responses.csv",low_memory=False)
DSMS1 = dataset_19.rename(columns={'Q1':'AgeYears','Q12_Part_1':'Twitter', 'Q12_Part_2':'Hacker News',
                                    'Q12_Part_3':'Reddit', 'Q13_Part_4':'Kaggle', 'Q12_Part_5': 'Course Forums', 
                                    'Q12_Part_6':'YouTube','Q12_Part_7':'Podcasts', 
                                    'Q12_Part_8':'Blogs', 'Q12_Part_9':'Journal Publications', 
                                    'Q12_Part_10':'Slack Communities', 'Q12_Part_11':'None', 
                                    'Q12_Part_12':'Other'}, inplace=True)

#Create the Data Science Media Sources for all age groups
DSMS1 = dataset_19[['AgeYears','Twitter','Hacker News','Reddit','Kaggle','Course Forums', 'YouTube','Podcasts', 'Blogs', 
                     'Journal Publications','Slack Communities','None', 'Other']]

#Drop the row containing survey questions
DSMS1=DSMS1.drop([0], axis =0)
DSMS1.head(3)

In [None]:
DSMS1.info()

In [None]:
#Drop the Age column to stack the Data Science Media Sources together
DSMS=DSMS1.drop(['AgeYears'], axis =1)

# Stack all the different media into 1 column for all age groups == MediaSources
DSMS = pd.concat([DSMS, DSMS.T.stack().reset_index(name='MediaSources')['MediaSources']], axis=1)
DSMS.info()

In [None]:
#Creating the Middle Age Data Science Media Sources Dataset --> MADS_DSMS

MADS_DSMS = DSMS1[(DSMS1.AgeYears == '45-49') | (DSMS1.AgeYears == '50-54') | (DSMS1.AgeYears == '55-59')]
MADS_DSMS.head(3)

In [None]:
MADS_DSMS.info()

In [None]:
MADS_DSMS=MADS_DSMS.drop(['AgeYears'], axis =1)

# Stack all the different media into 1 column for the middle age group == MediaSources
MADS_DSMS = pd.concat([MADS_DSMS, MADS_DSMS.T.stack().reset_index(name='MediaSources')['MediaSources']], axis=1)
MADS_DSMS.info()

In [None]:
# All Age Groups Data Science Media Sources

DSMS["MediaSources"].value_counts().plot.barh(title="Data Science Media Sources",figsize=(8,8))

In [None]:
# Middle Age Group Data Science Media Sources

MADS_DSMS["MediaSources"].value_counts().plot.barh(title="Data Science Media Sources",figsize=(8,8))

                                            Regular Programming Languages

In [None]:
# Regular Programming Languages--> PROG
dataset_19 = pd.read_csv("/kaggle/input/kaggle-survey-2019/multiple_choice_responses.csv",low_memory=False)
PROG1 = dataset_19.rename(columns={'Q1':'AgeYears','Q18_Part_1':'Python', 'Q18_Part_2':'R', 'Q18_Part_3':'SQL', 'Q18_Part_4':'C',
                                    'Q18_Part_5': 'C++', 'Q18_Part_6':'Java','Q18_Part_7':'JavaScript', 
                                    'Q18_Part_8':'Typescript', 'Q18_Part_9':'Bash','Q18_Part_10':'MATLAB', 
                                    'Q18_Part_11':'None', 'Q18_Part_12':'Other'}, inplace=True)

#Create the Regular Programming Languages for all age groups
PROG1 = dataset_19[['AgeYears','Python', 'R', 'SQL', 'C',
                                    'C++', 'Java','JavaScript', 
                                    'Typescript', 'Bash','MATLAB', 
                                    'None', 'Other']]

#Drop the row containing survey questions
PROG1=PROG1.drop([0], axis =0)
PROG1.head(3)

In [None]:
PROG1.info()

In [None]:
PROG=PROG1.drop(['AgeYears'], axis =1)
PROG = pd.concat([PROG, PROG.T.stack().reset_index(name='RegProgLang')['RegProgLang']], axis=1)
PROG.info()

In [None]:
#Creating the Middle Age Regular Programming Languages Dataset --> MADS_PROG

MADS_PROG = PROG1[(PROG1.AgeYears == '45-49') | (PROG1.AgeYears == '50-54') | (PROG1.AgeYears == '55-59')]
MADS_PROG.info()

In [None]:
MADS_PROG=MADS_PROG.drop(['AgeYears'], axis =1)
MADS_PROG = pd.concat([MADS_PROG, MADS_PROG.T.stack().reset_index(name='RegProgLang')['RegProgLang']], axis=1)
MADS_PROG.info()

In [None]:
# All Age Groups Regular Programming Languages

PROG["RegProgLang"].value_counts().plot.bar(title="Regular Programming Languages",figsize=(8,8))

In [None]:
# Middle Age Group Regular Programming Languages

MADS_PROG["RegProgLang"].value_counts().plot.bar(title="Regular Programming Languages",figsize=(8,8))

                                Regular Integrated Development Environments (IDE's)

In [None]:
# Regular IDEs --> IDE
# Rename the columns for easy understanding
dataset_19 = pd.read_csv("/kaggle/input/kaggle-survey-2019/multiple_choice_responses.csv",low_memory=False)
IDE1 = dataset_19.rename(columns={'Q1':'AgeYears','Q16_Part_1':'Jupyter', 'Q16_Part_2':'RStudio', 'Q16_Part_3':'PyCharm', 'Q16_Part_4':'Atom',
                                    'Q16_Part_5': 'MATLAB', 'Q16_Part_6':'Visual Studio Code','Q16_Part_7':'Spyder', 
                                    'Q16_Part_8':'Vim/Emacs', 'Q16_Part_9':'Notepad++','Q16_Part_10':'Sublime Text', 
                                    'Q16_Part_11':'None', 'Q16_Part_12':'Other'}, inplace=True)


#Create the Regular Integrated Development Environments (IDE's) for all age groups
IDE1 = dataset_19[['AgeYears','Jupyter', 'RStudio', 'PyCharm', 'Atom',
                                    'MATLAB', 'Visual Studio Code','Spyder', 
                                    'Vim/Emacs', 'Notepad++','Sublime Text', 
                                    'None', 'Other']]

#Drop the row containing survey questions
IDE1=IDE1.drop([0], axis =0)
IDE1.info()

In [None]:
IDE=IDE1.drop(['AgeYears'], axis =1)
IDE = pd.concat([IDE, IDE.T.stack().reset_index(name='RegularIDEs')['RegularIDEs']], axis=1)
IDE.info()

In [None]:
#Creating the Middle Age Regular Integrated Development Environments (IDE's) Dataset --> MADS_IDE

MADS_IDE = IDE1[(IDE1.AgeYears == '45-49') | (IDE1.AgeYears == '50-54') | (IDE1.AgeYears == '55-59')]
MADS_IDE.head(2)

In [None]:
MADS_IDE.info()

In [None]:
MADS_IDE=MADS_IDE.drop(['AgeYears'], axis =1)
MADS_IDE = pd.concat([MADS_IDE, MADS_IDE.T.stack().reset_index(name='RegularIDEs')['RegularIDEs']], axis=1)
MADS_IDE.info()

In [None]:
IDE['RegularIDEs'].value_counts() 

In [None]:
MADS_IDE['RegularIDEs'].value_counts() 

In [None]:
 # All Age Group Regular Integrated Development Environments (IDE's)

IDE['RegularIDEs'].value_counts().plot.bar(title="Regular Integrated Development Environments (IDE's)",figsize=(8,8))

In [None]:
# Middle Age Group Regular Integrated Development Environments (IDE's)

MADS_IDE['RegularIDEs'].value_counts().plot.bar(title="Regular Integrated Development Environments (IDE's)",figsize=(8,8))

                                Regular Data Visualization Libraries or Tools

In [None]:
# Regular Data Visualization Libraries or Tools--> DataV
dataset_19 = pd.read_csv("/kaggle/input/kaggle-survey-2019/multiple_choice_responses.csv",low_memory=False)
dataV1 = dataset_19.rename(columns={'Q1':'AgeYears','Q20_Part_1':'Ggplot/ggplot2', 'Q20_Part_2':'Matplotlib',
                                    'Q20_Part_3':'Altair', 'Q20_Part_4':'Shinny', 'Q20_Part_5': 'D3.js', 
                                    'Q20_Part_6':'Plotly/Plotly Express','Q20_Part_7':'Bokeh', 
                                    'Q20_Part_8':'Seaborn', 'Q20_Part_9':'Geoplotlib', 
                                    'Q20_Part_10':'Leaflet/Folium', 'Q20_Part_11':'None', 
                                    'Q20_Part_12':'Other'}, inplace=True)


#Create the Regular Data Visualization Libraries or Tools for all age groups
dataV1 = dataset_19[['AgeYears','Ggplot/ggplot2','Matplotlib',
                                    'Altair', 'Shinny', 'D3.js', 
                                    'Plotly/Plotly Express','Bokeh', 
                                    'Seaborn', 'Geoplotlib', 
                                    'Leaflet/Folium', 'None', 
                                    'Other']]

#Drop the row containing survey questions
dataV1=dataV1.drop([0], axis =0)
dataV1.head(3)

In [None]:
dataV1.info()

In [None]:
dataV=dataV1.drop(['AgeYears'], axis =1)

In [None]:
dataV = pd.concat([dataV, dataV.T.stack().reset_index(name='DataVTools')['DataVTools']], axis=1)
dataV.info()

In [None]:
#Creating the Middle Age Regular Data Visualization Libraries or Tools Dataset --> MADS_DSCP

MADS_dataV = dataV1[(dataV1.AgeYears == '45-49') | (dataV1.AgeYears == '50-54') | (dataV1.AgeYears == '55-59')]
MADS_dataV.info()

In [None]:
MADS_dataV=MADS_dataV.drop(['AgeYears'], axis =1)
MADS_dataV = pd.concat([MADS_dataV, MADS_dataV.T.stack().reset_index(name='DataVTools')['DataVTools']], axis=1)
MADS_dataV.info()

In [None]:
dataV["DataVTools"].value_counts() 

In [None]:
MADS_dataV['DataVTools'].value_counts() 

In [None]:
# All Age group Regular Data Visualization Libraries or Tools

dataV['DataVTools'].value_counts().plot.bar(title="Regular Data Visualization Libraries or Tools",figsize=(10,10))


In [None]:
# Middle Age Group Regular Data Visualization Libraries or Tools

MADS_dataV['DataVTools'].value_counts().plot.bar(title="Regular Data Visualization Libraries or Tools",figsize=(10,10))


I know you are getting bored? Why? Because the learning and specialization curves and choices for all age groups and the middle-aged people are almost the same. 

I had to look at all the choices on my local notebook, the similarity cuts across all age group. Therefore, I don’t think it is a good idea to view data science specializations based on age demarcations. The question should be what can you do? Where can we get your current and past works? Job adverts fixing age limitations on data science positions maybe limiting the organization from the wide spectrum of expertise out there. 

You don’t ask a painter how long he has been painting, you check his work. The beauty of his works overrides the years. The beauty of the singer’s voice is more significant than other factors. Data science is like a work of art, it requires creativity, willingness, and a never give up attitude…. 

Thank God!!! I am not alone in this line of thinking:

https://www.elsevier.com/connect/why-data-science-is-an-art-and-how-to-support-the-people-who-do-it

But I have this voice inside of me saying that I am not 100% correct, Why?

Data Scientists requires domain knowledge which only comes by experience. You may tell us that a village is free from epidemic because data attest to that fact, but what if just one person out of say 2 million inhabitants has Ebola virus in a congested and unhygienic environment where medical help is far away and expensive.

There is always the need for a balance 

Thank you

                                            Conclusions
 My submission is laden with numerous insights however, 2 stand out for me. 
Using the pattern present in the dataset, I think I have been able to show the following amongst others:

1. There is a strong link between the concentration of experienced middle aged data scientists and economic growth and tech advancement. Thereby affirming Peter Drucker’s popular statement “If You Can't Measure It, You Can't Improve It. Management”. For me the insight into the G8 countries  and the evaluation of each continent’s middle-aged data scientists vis-à-vis their present growth and technological advancement was memorable
2.	That data science learning curve and choices is similar for all age group in accordance to the data we have. Therefore, we should not evaluate data science expertise with respect to age.

Finally to my middle aged data scientist. I say: It is not over yet, we are in this for the long run.
