## Into the forest of kaggle: a datadventure:

My motivation to write this notebook is really seeing what goes in the large kaggle community. First we will see the data, observe few samples and then form questions which we will try to answer for the rest of the notebook. 
Here are some resources I have used for the project:<br/>
(1) [50 different matplot plots](https://www.machinelearningplus.com/plots/top-50-matplotlib-visualizations-the-master-plots-python/)<br/>
(2) [Bar plot with matplotlib](https://www.tutorialspoint.com/matplotlib/matplotlib_bar_plot.htm)<br/>
(3) [Hexcode color palette for designs](https://digitalsynopsis.com/design/color-schemes-palettes/)<br/>

In this notebook, we have answered 3 questions yet. <br/>
#### (1) What is the gender wise distributions? and is that representative enough from world view?<br/>
Ans: We found that although kaggle is heavily male dominated (79%), the representation of LGBTQ+ is somewhat similar to common statistics(0.4%). The relevant plots and calculations can be found inside the notebook [here](#section0).<br/>
#### (2) what are the country wise distributions of the kagglers? again, is each country proportionately represented in kaggle?
Ans: while we saw that India, china and united states contribute to most numbers of kagglers, both india and united states are in bottom 5 when we consider their number of kagglers as a relative representation from the respective country population. We have plotted and analyzed the relative representation, noted the top 5 and bottom 5. For this, we have taken the help of world population in 2020 data also. Details of this can be found inside the [notebook here](#section1).<br/>
#### (3) what are the educational backgrounds of the kagglers? What is the programming language preference of the kaggle community and what is the relation of that with their academic background?<br/>
Ans: while this was a rather long analysis and we have generated multiple plots on these, I will mention some highlighted results here. <br/>
(1) 76% of kagglers come from master's and bachelor's degree; as well as 12% has doctoral degree. This definitely shows a skew of data science towards higher education, while the statistics in kaggle community is similar to other survey data. <br/>
(2) There is a sharp correlation between programming language choices and academic backgrounds. We can see that R, Julia, Bash and Matlab are the preferred languages for doctorates.<br/>
(3) Python is mostly used by masters and bachelor degree people. <br/>
(4) There is a high popularity of sql only in people with professional degrees.<br/>
(5) People with lower traditional education, tend to use more software languages like swift, C++ and javascript; rather than data science oriented languages like R,python,julia.<br/>
(6) R and MATLAB are two single highest identifier of increasing education; as both of these have sharp correlation between higher educational degree and usage.<br/>
Check the details:<br/>
(a)[Educational qualification](#section2)<br/>
(b)[Coding language and educational background](#section3)<br/>
(c)[Education and coding final insights](#section4)<br/>

#### (4) What are the different business roles kagglers hold? what is the spread of salaries across different business roles and how do they vary across different countries?<br/>
Ans: This is a work in progress question I am exploring, but kagglers hold various roles like data scientist, research scientist, software engineer and others. There is an overall increase in salary as the positions become more data oriented and manager level, but there is a huge influence of geographical location in salaries.<br/>
We checked salaries across India, US and china, and noticed that both India and China underpays business roles highly(0-999$ being dominant) while US pays a good sum of money to all roles across the data domain.<br/>
We will explore the data more to check if the difference is more on a continent basis.<br/>
Check this [business roles and compensation comparison](#section5) part for the detailed analysis.<br/>
We explored the roles and their payments in different regions like America, APAC, Africa and europe. Key results in our finding have been that 0-999 dollar is dominant pay in all regions other than america and europe.<br/>
In america, people in data intensive roles get paid around 100-150k dollars par annum; while in europe the pay is around 30-60K dollars par annum.<br/>
In APAC and Africa, the dominant pay grade other than 0-999 dollar is the 10-15k dollar par annum. This shows the underpayment trend much more clearly. While some may cite the overall economies being much less progressive in some countries of APAC and Africa, more analysis on the same will be needed to confirm that; which is sort of out of scope of ml ds survey.<br/>
Another interesting observation I made in this part is that project managers are highest paid roles in both APAC and Africa, while in europe and america data scientists are similarly or higher paid roles in number. This probes a question that whether these two roles mean the same across the different regions in world. We will try to answer this question by analyzing these two roles based on their coding, years of experience and few other features.<br/>
For how I came to these derivation, do checkout the notebook! and I hope to answer more questions as I dig deeper in the survey.<br/>

##### This is my first visualization project in kaggle. So feel free to recommend changes and suggestions. Also if you liked something, let me know in the comments and upvote if you appreciate the effort :).<br/>

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
data = pd.read_csv('/kaggle/input/kaggle-survey-2020/kaggle_survey_2020_responses.csv')
print(data.shape)
data.head()

## <a id='section0'> Gender distribution</a>:<br/>
First of all, it is 2020. Let's see if in kaggle, we have proper man/woman ratio as well as proper LGBTQ+ ratio also. The question Q2 is what is your gender?

In [None]:
from matplotlib import pyplot as plt
data_sort = data['Q2'].iloc[1:].value_counts()


In [None]:
data_sort

In [None]:
wp = { 'linewidth' : 1, 'edgecolor' : "green" } 
genders = ['Man','Woman','Undisclosed','self-describing','Non-binary']
colors = ['blue','pink','white','green','violet']
# Creating autocpt arguments 
def func(pct, allvalues): 
    absolute = int(pct / 100.*np.sum(allvalues)) 
    return "{:.1f}%\n({:d} g)".format(pct, absolute) 

# Creating plot 
explode = (0.1, 0.0,0.5, 0.75,1) 
fig, ax = plt.subplots(figsize =(10, 10)) 
wedges, texts, autotexts = ax.pie(data_sort,  
                                  autopct = lambda pct: func(pct, data_sort), 
                                  explode = explode,  
                                  labels = genders, 
                                  shadow = True, 
                                  colors = colors, 
                                  startangle = 90, 
                                  wedgeprops = wp, 
                                  textprops = dict(color ="black")) 
  
# Adding legend 
ax.legend(wedges, genders, 
          title ="Gender diversity", 
          loc ="center left", 
          bbox_to_anchor =(1, 0, 0.5, 1)) 
  
plt.setp(autotexts, size = 8, weight ="bold") 
#ax.set_title("Gender diversity in kaggle community") 
  
# show plot 
plt.show() 

According to official statistics,the proportion of the UK population who define as non-binary when given a choice between male, female and another option is 0.4%, which is 1 in 250 people (Titman, 2014)[[1]](https://www.allabouttrans.org.uk/wp-content/uploads/2014/05/non-binary-gender-factsheet.pdf). Hence I guess 0.3% of kagglers being non-binary and another 0.3% being self-describing, i.e. 0.6% of kagglers being from LGBTQ+ corresponds to a sound representation from the community in kaggle.
while saying that, a 78% male percentage still represents that how much male-dominant a field like data science still is.

## <a id = 'section1'>country-wise diversity</a>:
Let's look at country wise presence in kaggle. We will compare the different countries relative presence in kaggle community based on their original population.

In [None]:
data_country = data['Q3'][1:].value_counts()
print(data_country)

In [None]:
isinstance(data_country,pd.DataFrame)

In [None]:
countries = list(data_country.index)
print(countries)
kaggle_population = list(data_country.values)
print(kaggle_population)
len(countries)

Let's add the population by countries

In [None]:
country_pop = pd.read_csv('../input/population-by-country-2020/population_by_country_2020.csv')

In [None]:
print(country_pop.columns)
print(country_pop.shape)

We need only the country specific population as of 2020; to compare with kaggle population to comment about their representation. We will also compare with urban population %; as that may relate more to kaggle population.

In [None]:
country_pop = country_pop[['Country (or dependency)', 'Population (2020)','Urban Pop %']]
country_pop = country_pop.rename(columns = {'Country (or dependency)':'country', 
                                            'Population (2020)':'population',
                                            'Urban Pop %': 'urban_pop_pct'})

Now because these are two different datasets, country names maybe written slightly different here and there. So we will first check whether all the countries in kaggle data are present in country_pop data; if not then we will correct the data so that it is.

In [None]:
set_country = set(country_pop['country'])
kag_country = set(countries)
print(kag_country.difference(set_country))

In [None]:
set_country 
#Iran,
#republic of Korea is South Korea 
#United States of America: United States
#Viet Nam: Vietnam
#United kingdom of ....: United Kingdom

Now that we have noted the name differences, let's change the names in the kaggle list.

In [None]:
dict_country = {'Iran, Islamic Republic of...':'Iran',
                'Republic of Korea':'North Korea',
                'United States of America':'United States',
                'Viet Nam':'Vietnam',
                'United Kingdom of Great Britain and Northern Ireland':'United Kingdom'}
for key in dict_country.keys():
    countries[countries.index(key)] = dict_country[key]

Let's look at the changed countries list.

In [None]:
len(countries)

Lets drop the 'other' countries, as we don't have a good approximate for the other.

In [None]:
kaggle_population.pop(countries.index('Other'))
countries.remove('Other')


In [None]:
country_pop = country_pop[country_pop.country.isin(countries)]
country_pop['kaggle_population'] = kaggle_population
country_pop['kaggle_representation_ratio'] = country_pop['kaggle_population']/country_pop['population']

In [None]:
kaggle_min = country_pop['kaggle_representation_ratio'].min()

In [None]:
country_pop['kaggle_representation_relative'] = country_pop['kaggle_representation_ratio']/kaggle_min

In [None]:
country_pop = country_pop.sort_values(by = 'kaggle_representation_relative',ascending = False)
country_pop

In [None]:
fig = plt.figure(figsize = (10,10))
ax = fig.add_axes([0,0,1,1])
countries = country_pop['country'].tolist()
relative_ratio = country_pop['kaggle_representation_relative'].tolist()
ax.bar(countries,relative_ratio)
plt.xticks(rotation = 'vertical')
plt.show()

So as we can see, The countries with top 5 relative presence countries are:<br/>
(1) Ireland<br/>
(2) Singapore<br/>
(3) Greece<br/>
(4) Switzerland<br/>
(5) Israel<br/>
And the bottom 5 relative presence countries are:<br/>
(1) India<br/>
(2) United States<br/>
(3) Brazil<br/>
(4) Nigeria<br/>
(5) Indonesia<br/>
Note that, while number wise India, china and united states are in the top, in relative representation, two of them are still in bottom 5. So in coming days, we can expect much more kagglers from these countries, given the proportion. 

## <a id = 'section2'>Educational qualification</a>:
Let's now see what is the distribution of educational qualifications among kagglers. Here's an [awesome kaggle notebook](https://www.kaggle.com/michau96/education-level-affects-data-analysis) in which educational qualification and relation with different variables is detailed out. In our notebook we will try to answer a few from our point of view. <br/>
[This article by big-data-made-simple](https://bigdata-madesimple.com/data-scientist-profile-in-2019-education-and-skills-sets-of-1001-data-scientists/) is also a good primer for working with education qualification of data scientists.<br/>

Let's first just see a pie chart with different edu backgrounds.

In [None]:
ed_data = data['Q4'].iloc[1:].value_counts()
ed_data

In [None]:
wp = { 'linewidth' : 1, 'edgecolor' : "black" } 
Educations = ['Master’s degree',
              'Bachelor’s degree',
              'Doctoral degree',
              'Some college/university study without earning a bachelor’s degree',
              'Professional degree',
              'I prefer not to answer',
              'No formal education past high school']
colors = ['#f67e7d','#ffb997','#843b62','#86A3C3','#305f72','#c4c4c4','#bac600']
# Creating autocpt arguments 
def func(pct, allvalues): 
    absolute = int(pct / 100.*np.sum(allvalues)) 
    return "{:.1f}%\n({:d} g)".format(pct, absolute) 

# Creating plot 
explode = (0,0,0.2,0.4,0.6,0.8,1) 
fig, ax = plt.subplots(figsize =(10, 10)) 
wedges, texts, autotexts = ax.pie(ed_data,  
                                  autopct = lambda pct: func(pct, ed_data), 
                                  explode = explode,  
                                  labels = Educations, 
                                  shadow = True, 
                                  colors = colors, 
                                  startangle = 90, 
                                  wedgeprops = wp, 
                                  textprops = dict(color ="black")) 
  
# Adding legend 
ax.legend(wedges, Educations, 
          title ="Educational distribution", 
          loc ="center left", 
          bbox_to_anchor =(1, 0, 0.5, 1)) 
  
plt.setp(autotexts, size = 8, weight ="bold")   
# show plot 
plt.show() 

So 76% of whole population has either a bachelor's degree or a master's degree. A 12% of the population is with doctorate degree. It will be more interesting to see that how is the age distributed along the degrees, as well as what proportion of these are students vs professionals.

In [None]:
data_shift = data.iloc[1:]
# Prepare data
x_var = 'Q4'
groupby_var = 'Q1'
df_agg = data_shift.loc[:, [x_var, groupby_var]].groupby(groupby_var)
vals = [df[x_var].values.tolist() for i, df in df_agg]

# Draw
plt.figure(figsize=(16,9), dpi= 80)
colors = [plt.cm.Spectral(i/float(len(vals)-1)) for i in range(len(vals))]
n, bins, patches = plt.hist(vals, data_shift[x_var].unique().__len__(), 
                            stacked=True, 
                            density=False, 
                            color=colors[:len(vals)])
plt.legend({group:col for group, col in zip(np.unique(data_shift[groupby_var]).tolist(), colors[:len(vals)])})
plt.title(f"Stacked Histogram of education colored by age", fontsize=22)
plt.xlabel(x_var)
plt.xticks(ticks=bins, labels=Educations+['other'], rotation=90, horizontalalignment='left')
plt.show()

## <a id='section3'>Coding language and educational background</a>:
Let's also examine the relation of coding language with the type of educational background. I have a hypotheses that people who are more from self-teaching part, will be inclined towards python or more community backed language, while others with years of academic training will choose R or something more scientific and academically preferred languages. Let's see if I am correct.

In [None]:
q7_cols = []
for col in data.columns:
    if 'Q7_Part' in col:
        q7_cols.append(col)
print(q7_cols)

In [None]:
print(data.iloc[1,][q7_cols[0]])

first choice: python --> part 1<br/>
second choice: R --> part 2<br/>
third choice: SQL --> part 3<br/>
fourth choice: C --> part 4<br/>
fifth choice: C++ --> part 5<br/>
sixth choice: java --> part 6<br/>
seventh choice: javascript --> part 7<br/>
eighth choice: julia --> part 8<br/>
ninth choice: Swift --> part 9<br/>
tenth choice: bash --> part 10<br/>
eleventh choice: MATLAB --> part 11<br/>
twelfth choice: None --> part 12<br/>

In [None]:
data_program = data[q7_cols+['Q4']].iloc[1:,]
dict_change = {'Q7_Part_1':'python',
               'Q7_Part_2':'R',
               'Q7_Part_3':'SQL',
               'Q7_Part_4':'C',
               'Q7_Part_5':'C++',
               'Q7_Part_6':'java',
               'Q7_Part_7':'javascript',
               'Q7_Part_8':'julia',
               'Q7_Part_9':'swift',
               'Q7_Part_10':'bash',
               'Q7_Part_11':'MATLAB',
               'Q7_Part_12':'Other'}
data_program = data_program.rename(columns = dict_change)

In [None]:
data_program

In [None]:
data_program = data_program.fillna(0)

In [None]:
def func(x):
    if x!=0: return 1
    return 0
for col in ['python','R','SQL','C','C++','java','javascript',
            'julia','swift','bash','MATLAB','Other']:
    data_program[col] = data_program[col].apply(lambda x: func(x))

In [None]:
data_program

In [None]:
educated_cols = {}
educations = ['Master’s degree',
              'Bachelor’s degree',
              'Doctoral degree',
              'Some college/university study without earning a bachelor’s degree',
              'Professional degree',
              'I prefer not to answer',
              'No formal education past high school']
languages = ['python','R','SQL','C','C++','java','javascript',
            'julia','swift','bash','MATLAB','Other']
for education in educations:
    curr_dict = {}
    for language in languages:
        curr_dict[language] = data_program[(data_program[language]==1)
                                        & (data_program['Q4'] == education)].shape[0]
    educated_cols[education] = curr_dict
data_new = pd.DataFrame(educated_cols)

So after a bit of coding, we end up with the frequency table of programming tool vs degrees. data_new is the frequency table.

In [None]:
data_new

In [None]:
import seaborn as sns
%matplotlib inline

sns.heatmap(data_new,annot = False)

related resource: (1) [show dataframe as heatmap](https://stackoverflow.com/questions/12286607/making-heatmap-from-pandas-dataframe)

In [None]:
data_new.style.background_gradient(cmap = 'Blues')

Let's check the fractional distribution of each degree across different languages

In [None]:
data_program_fraction = data_new/data_new.sum()

In [None]:
data_program_fraction

But more important visualization will be to see the breaking of languages across degrees, normalized by the relative frequency of each degree.

In [None]:
data_degree_fraction = data_new.T/data_new.T.sum()

In [None]:
data_degree_fraction = (data_degree_fraction.T/[0.402,0.357,0.118,0.056,0.036,0.02,0.012]).T

In [None]:
data_degree_fraction

In [None]:
sns.heatmap(data_degree_fraction,annot = False)

In [None]:
data_degree_fraction.style.background_gradient(cmap = 'Reds')

### <a id='section4'> Final observations from language vs degree plot</a>:
So now that we have normalized the degree wise frequency of languages by the relative frequency of the degrees, we can properly see that which degree relate to which languages more.<br/>
Important observations here are:<br/>
(1) python is more popular with master's and bachelor's degree people.<br/>
(2) Matlab,bash,Julia and R are the best choices for Doctoral degree. <br/>
(3) R is less popular with everyone other than Doctoral degree and master's degree people.<br/>
(4) People with lesser formal education, i.e. people with some college degree but not bachelors and people who have not read after high school are often prone to work with non-scientific languages, like javascript and swift; which are more of web-dev language than data science.<br/> 
(5) people with professional degree and master's degree are among heavy users of sql. SQL is surprisingly not that famous among the other categories. <br/>
(6) R and MATLAB are languages which are a clear mark of higher education. These have a sharp use increment as we rise through high school, to college, to bachelors, through master to doctoral. So We can go on to comment that these are strong signals for predicting educational backgrounds.<br/>

## <a id='section5'> Business role and Compensation comparison</a>:

In [None]:
roles = list(data['Q5'].unique())
compensations = list(data['Q24'].unique())
print(roles)
print(compensations)

In [None]:
data_el = data.iloc[1:,]
role_cols = {}
roles = list(data_el['Q5'].unique())
compensations = list(data_el['Q24'].unique())
print(roles)
print(compensations)
for role in roles:
    curr_dict = {}
    for compensation in compensations:
        curr_dict[compensation] = data_el[(data_el['Q5']== role)
                                           & (data_el['Q24'] == compensation)].shape[0]
    role_cols[role] = curr_dict
data_new = pd.DataFrame(role_cols)

In [None]:
cols = list(data_new.columns)
cols.remove(np.nan)
cols.remove('Student')
cols.remove('Currently not employed')

In [None]:
data_new = data_new[cols]

In [None]:
data_new = data_new.drop(np.nan,axis = 0)

In [None]:
data_new.style.background_gradient(cmap = 'Reds')

Clearly there is a difference of grade in the payment; which is skewing the current observation. My hypothesis is that based on different countries there is a stark difference in payment; therefore the high population in 0-999$ is actually coming from different countries.<br/>


In [None]:
def country_wise_breaking(country):
    data_el = data.iloc[1:,][data['Q3'] == country]
    role_cols = {}
    roles = list(data_el['Q5'].unique())
    compensations = list(data_el['Q24'].unique())
    #print(roles)
    #print(compensations)
    for role in roles:
        curr_dict = {}
        for compensation in compensations:
            curr_dict[compensation] = data_el[(data_el['Q5']== role)
                                           & (data_el['Q24'] == compensation)].shape[0]
        role_cols[role] = curr_dict
    data_new = pd.DataFrame(role_cols)
    cols = list(data_new.columns)
    cols.remove(np.nan)
    cols.remove('Student')
    cols.remove('Currently not employed')
    data_new = data_new[cols]
    data_new = data_new.drop(np.nan,axis = 0)
    return data_new

In [None]:
data_new = country_wise_breaking('India')
print("these many people have told salary:",data_new.sum().sum(),
      "out of",data['Q3'].tolist().count('India'),"people")
data_new.style.background_gradient(cmap = 'Reds')

In [None]:
data_new = country_wise_breaking('United States of America')
print("these many people have told salary:",data_new.sum().sum(),
      "out of",data['Q3'].tolist().count('United States of America'),"people")
data_new.style.background_gradient(cmap = 'Reds')

In [None]:
data_new = country_wise_breaking('China')
print("these many people have told salary:",data_new.sum().sum(),
      "out of",data['Q3'].tolist().count('China'),"people")
data_new.style.background_gradient(cmap = 'Reds')

So we can see that out of the top 3 countries with highest number of kagglers, India and china both greatly underpay their employees, with a significant of them being in the 0-999 dollars region. There is also a stark difference of salaries between indian data science community and project managers; with managers being highly paid.<br/>
China also follows the same trend, and there is a significant secrecy with the salary among chinese, with only 166(35 percent) people telling their salary. The higher payment of project manager is also observed here.<br/>
In United states, however, payments are much much higher; with almost most kagglers being from the 100,000-124,999 dollars and 150-199,999 dollars region. There is no stark difference between payments of project managers and data scientists and few other posts. But indeed there is a difference between roles with lesser data science and higher data science orientation; with an increase in pay grade from lower to higher data science oriented roles.<br/>
Let's now break the countries into different continental regions and check the differences.<br/>

In [None]:
data.iloc[1:,]['Q3'].unique()

In [None]:
europe = ['Germany','Switzerland','Russia','Netherlands','Belarus','Tunisia',
          'Ukraine','Belgium','Italy','Spain','France','Ireland',
          'United Kingdom of Great Britain and Northern Ireland',
          'Romania','Greece','Portugal','Sweden','Poland','Turkey']
APAC = ['China','India','Indonesia','Australia','Malaysia','Bangladesh',
        'Philippines','Pakistan','Nepal','South Korea','Taiwan','Japan',
        'Singapore','Sri Lanka','Thailand','Republic of Korea']
Africa = ['South Africa','Saudi Arabia','Nigeria','United Arab Emirates','Morocco',
          'Iran, Islamic Republic of...','Kenya','Israel',]
America = ['Colombia','United States of America','Argentina','Canada','Peru','Chile']

In [None]:
def continent_wise_breaking(country_list):
    data_el = data.iloc[1:,][data['Q3'].isin(country_list)]
    role_cols = {}
    roles = list(data_el['Q5'].unique())
    compensations = list(data_el['Q24'].unique())
    #print(roles)
    #print(compensations)
    for role in roles:
        curr_dict = {}
        for compensation in compensations:
            curr_dict[compensation] = data_el[(data_el['Q5']== role)
                                           & (data_el['Q24'] == compensation)].shape[0]
        role_cols[role] = curr_dict
    data_new = pd.DataFrame(role_cols)
    cols = list(data_new.columns)
    cols.remove(np.nan)
    cols.remove('Student')
    cols.remove('Currently not employed')
    data_new = data_new[cols]
    data_new = data_new.drop(np.nan,axis = 0)
    return data_new

In [None]:
datamerica = continent_wise_breaking(America)
datamerica.style.background_gradient(cmap = 'Reds')

In [None]:
daturope = continent_wise_breaking(europe)
daturope.style.background_gradient(cmap = 'Reds')

In [None]:
datAPAC = continent_wise_breaking(APAC)
datAPAC.style.background_gradient(cmap = 'Reds')

In [None]:
datfrica = continent_wise_breaking(Africa)
datfrica.style.background_gradient(cmap = 'Reds')

## continent wise result analysis:
So as we can see, other than the american countries, 0-999$ salary is quite frequent in all continents.<br/>
In europe, data scientists get the better payments; while product or project managers are quite low in payment; making a stark difference from the APAC or African regions. This suggests that there maybe a difference in the meaning of these two posts in different regions; i.e. in europe, data scientists maybe a more senior role all together, while in other regions, product manager is the more senior role.<br/>
There is also a central difference between europe, america and APAC,Africa regions in having the high payment frequencies being respectively around 30-50k, 100-150k and 10-15k excluding the 0-999dollar portion.<br/>
So with the detailed breaking and differences having clear, we will now inspect a further intriguing question: what does data scientist and project manager roles mean in these different regions?<br/>