Each year Kaggle Machine learning and data science survey brings important information on surface collected from the thousands of kagglers around the world. This data collection helps Kaggle to find important trends in data science and machine learning world to strategize and adopt accordingly. On the other hand, this data is as valuable for kagglers themselves as it allows them to understand key  aspects like what does it take to earn more or what technologies are most trending. In other words, Kaggle data survey tells us story about thousands of individuals contributing to machine learning and data science landscape. 

Humans have been taking lessons from stories of our forefathers and peers, here we have taken a leaf out of this age-old practice and developed a case study where an entrepreneur is understanding the patterns of thousands of individual Kaggle stories and trying to build her new story or in other words building a business strategy for her own business.



# The Story Begins..

Sarah is an aspiring entrepreneur in a tech world. Her modest business provides data science solutions to organizations across continents. Sarah has just come across study published by International Data Corporation which is the premier global provider of market intelligence for the technology markets. As per this market study, Sarah learned that worldwide spending on big data and business analytics (BDA) solutions is forecast to reach $215.7 billion this year, an increase of 10.1% over 2020. 

Further, the compound annual growth rate (CAGR) for global BDA spending over the 2021-2025 forecast period is expected to be 12.8%. (https://www.idc.com/getdoc.jsp?containerId=prUS48165721) 

Based on the available market studies she has been looking to scale up her business to capture fair share of market for her business. However, her data-based wish has been turned down by investors. Venture capitals and financers have asked Sarah to bring a business strategy for her SME (Small Medium Enterprise) backed on market insights. Investors are of the view that this market growth will be beneficial for large firms likes of Salesforce, Amazon etc. that can hire rare find talents and deploy expensive infrastructure for high computations. Also there weren’t much seen comprehensive data available  in terms of market intelligence about SMEs performance and strategies in arena of Big Data and Analytics.

Being a SME entrepreneur, Sarah couldn’t turn to large market intelligence firms to make available such data due to higher cost. She found much information about the impact of Covid-19 and increase in data science adoptability in SMEs but such information was quite broad and generic for her as it would include every sort of SME business and she has been looking specifically for the ones who work as service providers for big data and analytics. 

Sarah has been a vivid contributor at Kaggle from early days of her data science career. She turned up for help again towards Kaggle community where she found Kaggle data science surveys. With a bird eye view she sensed the underlying treasure of her help with just one question that will all make the case and the question is “Size of the company that you work in”.

A bulb light moment and all she knows was she had to bring her data engineering and analytics to get the insights from all over the world. Thousands of people from differing backgrounds and geographies have filled in these surveys over the period, if properly utilized this data cannot only bring trend of SME workings for investors as well as she could build her future strategy by interpreting the raw data.

Kaggle being the largest community of data scientists and professionals, she considered this data set as single source of truth to build upon her case.

In [None]:
pip install plotly

In [None]:
# importing libraries
import pandas as pd
import numpy as np
import seaborn as sns
#import chart_studio.plotly as py
import plotly.express as px
import plotly.graph_objects as go
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
#Loading data 
filepath_2021 = "../input/kaggle-survey-2021/kaggle_survey_2021_responses.csv"
data_2021 = pd.read_csv(filepath_2021)
filepath_2020 = "../input/kaggle-survey-2020/kaggle_survey_2020_responses.csv"
data_2020 = pd.read_csv(filepath_2020)
filepath_2019 = "../input/kaggle-survey-2019/multiple_choice_responses.csv"
data_2019 = pd.read_csv(filepath_2019)


In [None]:
df_2021 = data_2021.drop(0)
df_2020 = data_2020.drop(0)
df_2019 = data_2019.drop(0)


In [None]:
df_2021.dropna(axis=0 , subset=['Q21'], inplace=True)
df_2020.dropna(axis=0 , subset=['Q20'], inplace=True)
df_2019.dropna(axis=0 , subset=['Q6'], inplace=True)


# SMEs Case in World of Data Science..

Sarah is thinking that she needs to find out how much the global percentage of contribution of SMEs in the world. To her delight, data shows that SMEs have been showed up as a significant part of global contribution in data science landscape i.e. 46.9% in 2021, 51.6% in 2020 and 45.3% in 2019.

Significant increase of SME contribution in 2020 pictures the resilience of SME sector as year 2020 was brutally hit by Covid-19 pandemic and thus the contribution from large firms sector shrunk by  ~6%.

Kaggle being the source of truth provides the healthy picture of significant contribution of SME sector in as well as the growth in quantity of SMEs.


In [None]:
CS21 = df_2021[['Q21']]
CS21.dropna(inplace = True)
CS20 = df_2020[['Q20']]
CS20.dropna(inplace = True)
CS19 = df_2019[['Q6']]
CS19.dropna(inplace = True)
SME_21 = df_2021['Q21'][df_2021['Q21'].isin(['0-49 employees','50-249 employees'])]
SME_21 = pd.DataFrame(SME_21)
SME_21_pp = (len(SME_21)/len(CS21))*100
SME_20 = df_2020['Q20'][df_2020['Q20'].isin(['0-49 employees','50-249 employees'])]
SME_20 = pd.DataFrame(SME_20)
SME_20_pp = (len(SME_20)/len(CS20))*100
SME_19 = df_2019['Q6'][df_2019['Q6'].isin(['0-49 employees','50-249 employees'])]
SME_19 = pd.DataFrame(SME_19)
SME_19_pp = (len(SME_19)/len(CS19))*100

In [None]:
x = pd.DataFrame([['2019', SME_19_pp],['2020',SME_20_pp],['2021',SME_21_pp]], columns=["Year", "SME_proportion"])

In [None]:
fig = go.Figure(data=[
        go.Bar(x=x['Year'], y=x['SME_proportion'] , marker_color='rgb(120,150,170)', marker_line_color='rgb(4,40,98)'),
       
    ])
fig.update_traces(marker_color='rgb(158,202,225)', marker_line_color='rgb(8,48,107)',
                  marker_line_width=1, opacity=0.9,name='',showlegend= False)
fig.update_layout(
    title={
        'text': 'SME proportion in Survey data over time',
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},xaxis_tickfont_size=10,yaxis_tickfont_size=10)
fig.update_yaxes(title='Proportion', visible=True, showticklabels=True)
fig.update_xaxes(title='Year', visible=True, showticklabels=True)
fig.show()


# Quest of Target Customers..

Now, when the performance and resilience of SME sector is backed by data, since targeting the whole cannot be a good strategy, Sarah needs to find out a niche. 
In this regard, she categorize the data in two segments:
1.	Large Enterprises (having employees more than 250): 
As Kaggle survey doesn’t classify which of these firms are from data science service sector and how much are from the end users. Since, Sarah intends to find a niche, therefore, all of the firms in particular niche are her target prospects, since her company can offer solutions to large end users or can provide services to large firms as sub-contractors.
2.	Small and Medium Enterprises (having employees equal to or less than 250):
Sarah assumes such enterprises as competition working in same arena and striving to secure their share for market. Sarah needs to devise her strategy well to make her mark in this competitive landscape.


In [None]:
df_2021['Category'] = 'Big Firms category'
df_2021.loc[df_2021['Q21'] == '0-49 employees', 'Category'] = 'SMEs category'
df_2021.loc[df_2021['Q21'] == '50-249 employees', 'Category'] = 'SMEs category'

df_2021['Category'].value_counts()
#df_2021['Category'].value_counts().plot(kind='pie', autopct='%1.1f%%' , figsize = (20,6) )
fig = go.Figure(data=[go.Pie(labels=['Big Firms category','SMEs category'], values = df_2021['Category'].value_counts())])
colors = ['rgb(120,150,170)', 'rgb(158, 202, 225)']
fig.update_traces(hoverinfo='label+value', textinfo='percent', textfont_size=12, 
                  marker=dict(colors=colors))
fig.update_layout(
    title={
        'text': 'Distribution of Large Firms and SMEs in 2021',
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.show()

In [None]:
df_SME_2021 = df_2021[df_2021['Category'] =='SMEs category']
#df_SME_2021.shape

In [None]:
df_BFirm_2021 = df_2021[df_2021['Category'] =='Big Firms category']
#df_BFirm_2021.shape

In [None]:
#Function for getting dataframe based on required feature for single response questions
def distribution(df,colname,feat):
    dist={}
    for feature in df[feat].value_counts().index:
        dist[feature] = df[df[feat]==feature][colname].value_counts()
    return pd.DataFrame(dist).T

In order to find her target niche, she analyzed data of large firms with respect to their sectors. This gave her picture of areas where large enterprises are mostly interested in i.e. Computer Technology, Education, Finance, Manufacturing, Public services, Pharmaceuticals, Energy, Insurance and others.

In [None]:
Sector_BFirm = distribution(df_BFirm_2021,'Category','Q20')
fig = px.bar(Sector_BFirm)
fig.update_traces(marker_color='rgb(158,202,225)', marker_line_color='rgb(8,48,107)',
                  marker_line_width=1, opacity=0.9,name='',showlegend= False)
fig.update_layout(
    title={
        'text': 'Large Firms operating in Industry Sector',
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},xaxis_tickfont_size=10,yaxis_tickfont_size=10)
fig.update_yaxes(title='', visible=True, showticklabels=True)
fig.update_xaxes(title='', visible=True, showticklabels=True)
fig.show()
#Sector_BFirm.plot(kind="bar")

Sarah adopted three sectors as target niches i.e. Finance, Education and Manufacturing which match the expertise of her team as well as their willingness to learn.

In [None]:
BFirm_sector = df_BFirm_2021[df_BFirm_2021['Q20'].isin(['Academics/Education','Accounting/Finance','Manufacturing/Fabrication'])]
#BFirm_sector.shape

# Winning the territories..

Sarah understands that she must be focused in her approach towards marketing and clients acquisition efforts and in order to do so she needs to prioritize her territories with respect to niches. 

For this purpose, she analyzed in which countries she could find large clientele and market for her niches. 


In [None]:
Country_BF = distribution(BFirm_sector, 'Category','Q3')
Country_BF.drop(index=['Other'],inplace=True)
Country_BF = Country_BF.head(10)
Country_BF.rename(index={'United States of America':'USA','United Kingdom of Great Britain and Northern Ireland':'UK'},inplace=True)


In [None]:
fig = px.bar(Country_BF)
fig.update_traces(marker_color='rgb(158,202,225)', marker_line_color='rgb(8,48,107)',
                  marker_line_width=1, opacity=0.9,name='',showlegend= False)
fig.update_layout(
    title={
        'text': 'Large Firms in Top Countries for selected sectors',
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},xaxis_tickfont_size=10,yaxis_tickfont_size=10)
fig.update_yaxes(title='', visible=True, showticklabels=True)
fig.update_xaxes(title='', visible=True, showticklabels=True)
fig.show()

India, USA, Japan, Brazil and European countries mainly comprise of the market. As for her she decides to target specifically to India, USA and Brazil. 

India, being the largest in pie is a natural choice to adopt, however, expectations are that large firms that operate in India must be from service sector, so therefore, her firm could provide specialize sub-contractor services to them.
Moreover, she decides to adopt USA and Brazil from American time zones. Again, USA is the land of opportunity and Brazil being the growing economy becomes the choice due to being similar time zone of USA. This way her marketing team can intensify efforts in this time zone.

Although, Japan, Russia and European countries also seem to be attractive markets, however, Japan and Russia being in different time Zones doesn’t qualify for first focus. While European countries can be the next focus after conquering American markets.


# The Talent Hunt..

Any organization is as good as its team. Sarah knows this very well, therefore she is quite firm to make a global team bringing in the diversity in culture. In order to look for talent, she again dived in to the Kaggle data of her niche but this time in SME sector category, as she understand that her potential sources will be available in SME category.

In [None]:
SME_sector = df_SME_2021[df_SME_2021['Q20'].isin(['Academics/Education','Accounting/Finance','Manufacturing/Fabrication'])]

In [None]:
Country_SME = distribution(SME_sector, 'Category','Q3')
Country_SME.drop(index=['Other'],inplace=True)
Country_SME = Country_SME.head(10)
Country_SME.rename(index={'United States of America':'USA','United Kingdom of Great Britain and Northern Ireland':'UK'},inplace=True)

In [None]:
fig = px.bar(Country_SME)
fig.update_traces(marker_color='rgb(158,202,225)', marker_line_color='rgb(8,48,107)',
                  marker_line_width=1, opacity=0.9,name='',showlegend= False)
fig.update_layout(
    title={
        'text': 'SMEs in Top Countries for selected sectors',
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},xaxis_tickfont_size=10,yaxis_tickfont_size=10)
fig.update_yaxes(title='', visible=True, showticklabels=True)
fig.update_xaxes(title='', visible=True, showticklabels=True)
fig.show()

With the first analysis, again India comes up as the first choice. Option of India seems lucrative as she intends to target Indian firms also as was described in earlier section.

Moreover, she find other emerging markets in the list which can offer her capable resources within competitive budget. Such countries include Pakistan and Bangladesh. As these countries have English language culture due to colonial past, it seems easy to adopt them in larger team.


In [None]:
#Function for making comparison graphs between big firms and SMEs 
def figure(x1,x2,text):  
    fig = go.Figure(data=[
        go.Bar(name='Big Firms', x=x1.index, y=x1['Big Firms'] , marker_color='rgb(120,150,170)', marker_line_color='rgb(4,40,98)'),
        go.Bar(name='SMEs', x=x2.index, y=x2['SMEs'], marker_color='rgb(158,202,225)', marker_line_color='rgb(8,48,107)')    
    ])
    fig.update_layout(
       title={
          'text': text,
         'y':0.95,
          'x':0.5,
          'xanchor': 'center',
         'yanchor': 'top'},xaxis_tickfont_size=10,yaxis_tickfont_size=10)
    fig.show()

# Human Resource..

Interestingly, trend shows that SMEs are the breeding ground for young and energetic Data scientists and Engineers. Mostly, younger age group i.e 18 - 29 years are employeed by SMEs while the more experienced ones are employeed by the large firms. It implies that the vibrant and agile nature of work in SMEs is more suited to younger and energetic lot. Fortunatly, this younger lot will be more cost effective as compared to the ecperience one. Therefore, Sarah understands that her team will comprise more of young engineers and eperienced one at the lead position.


In [None]:
#What is your age (# years)?
Age_SME = distribution(SME_sector, 'Category','Q1')
Age_SME.rename(columns={"SMEs category":"SMEs"},inplace='True')
Age_BFirm = distribution(BFirm_sector, 'Category','Q1')
Age_BFirm.rename(columns={"Big Firms category":"Big Firms"},inplace='True')
figure(Age_BFirm,Age_SME,'Age Distribution across Large Firms and SMEs ')


Data shows that large firms are more interested in hiring research analyst with PhD qualification while SMEs tend to hire data scientist, data analyst and ML engineers of qualification with Masters and Bachelors Degrees

In [None]:
#What is the highest level of formal education that you have attained or plan to attain within the next 2 years?
Edu_SME = distribution(SME_sector, 'Category','Q4')
Edu_SME.rename(columns={"SMEs category":"SMEs"},inplace='True')
Edu_BFirm = distribution(BFirm_sector, 'Category','Q4')
Edu_BFirm.rename(columns={"Big Firms category":"Big Firms"},inplace='True')
figure(Edu_BFirm,Edu_SME,'Education Level Distribution across Large Firms and SMEs')


In [None]:
#Select the title most similar to your current role (or most recent title if retired?
Title_SME = distribution(SME_sector, 'Category','Q5')
Title_SME.rename(columns={"SMEs category":"SMEs"},inplace='True')
Title_BFirm = distribution(BFirm_sector, 'Category','Q5')
Title_BFirm.rename(columns={"Big Firms category":"Big Firms"},inplace='True')
figure(Title_BFirm,Title_SME,'Title Distribution across Large Firms and SMEs')


# The Social Dilemma.. 

In [None]:
#What is your gender?
Gender_SME = distribution(SME_sector, 'Category','Q2')
Gender_SME.rename(columns={"SMEs category":"SMEs"},inplace='True')
Gender_BFirm = distribution(BFirm_sector, 'Category','Q2')
Gender_BFirm.rename(columns={"Big Firms category":"Big Firms"},inplace='True')
figure(Gender_BFirm,Gender_SME,'Gender Distribution across Large Firms and SMEs')

Sarah believes in doing business with the purpose. For this reason, gender inclusivity is area of prime concern for her. In contrary to her wish, Kaggle survey data shown a grim picture of state of women participation in Big Data and Analytics market. Less than 19% contribution is far away from the ideals of gender inclusivity and hence Sarah is now determined to make sure an inclusive environment in her business and facilitate from her platform to break the social barriers particularly for women.

In [None]:
Gender_BFirm = Gender_BFirm.iloc[0:2,]
#Gender_BFirm

In [None]:
fig = go.Figure(data=[go.Pie(labels=['Man','Woman'], values = Gender_BFirm['Big Firms'])])
colors = ['rgb(120,150,170)', 'rgb(158, 202, 225)']
fig.update_traces(hoverinfo='label+value', textinfo='percent', textfont_size=12, 
                  marker=dict(colors=colors))
fig.update_layout(
    title={
        'text': 'Gender Distribution in Large Firms',
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.show()


In [None]:
Gender_SME = Gender_SME.iloc[0:2,]
#Gender_SME

In [None]:
fig = go.Figure(data=[go.Pie(labels=['Man','Woman'], values = Gender_SME['SMEs'])])
colors = ['rgb(120,150,170)', 'rgb(158, 202, 225)']
fig.update_traces(hoverinfo='label+value', textinfo='percent', textfont_size=12, 
                  marker=dict(colors=colors))
fig.update_layout(
    title={
        'text': 'Gender Distribution in SMEs',
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'})
fig.show()


In [None]:
#Function for getting dataframes of multiple choice questions
def MultiQues(df,Q):
    x=[*range(0, len(Q), 1)]
    i=0
    for value in Q:
        x[i] = df[value].value_counts()
        i=i+1
    x = pd.DataFrame(x)
    x

    res= {}
    for i in range(len(Q)):
        res[i] = x.columns[i],(x.iloc[i,i])

    res = pd.DataFrame(res)
    res.columns = res.iloc[0]
    res.drop(0,inplace=True)
    return res.T

# Technology Battleground..

The final showdown of data science services business revolves around Technology stack and environment. Sarah understands that she needs to know about the technological preferences and trends followed within her target markets i.e. large firms. Also it would be beneficial to know for competition or SMEs are adopting the technology stack.

The above survey results about popular language and IDE boosted her confidence as Large firms and SMEs both have python and Jupyter popular as language and IDE respectively. 
This implies that SMEs have the right set of technology which large firms are adapting. On the other hand, it seems SMEs pose serious competition as working in similar environments.


In [None]:
#What programming languages do you use on a regular basis? 
Q7=['Q7_Part_1','Q7_Part_2','Q7_Part_3','Q7_Part_4','Q7_Part_5','Q7_Part_6','Q7_Part_7','Q7_Part_8','Q7_Part_9','Q7_Part_10','Q7_Part_11','Q7_Part_12','Q7_OTHER']
ProgLang_SME = MultiQues(SME_sector ,Q7)
ProgLang_SME.rename(columns={1:"SMEs"},inplace='True')
ProgLang_BFirm = MultiQues(BFirm_sector ,Q7)
ProgLang_BFirm.rename(columns={1:"Big Firms"},inplace='True')
figure(ProgLang_BFirm,ProgLang_SME,'Programming Language used in Large Firms and SMEs')

In [None]:
#Which of the following integrated development environments (IDE's) do you use on a regular basis?  
Q9=['Q9_Part_1','Q9_Part_2','Q9_Part_3','Q9_Part_4','Q9_Part_5','Q9_Part_6','Q9_Part_7','Q9_Part_8','Q9_Part_9','Q9_Part_10','Q9_Part_11','Q9_Part_12','Q9_OTHER']
IDE_SME = MultiQues(SME_sector , Q9)
IDE_SME.rename(columns={1:"SMEs"},inplace='True')
IDE_SME.rename(index={' Visual Studio Code (VSCode) ':'Visual Studio',' Jupyter Notebook':'Jupyter (JupyterLab, Jupyter Notebooks, etc) ',' Visual Studio ':'Visual Studio'},inplace=True)
IDE_SME['index'] = IDE_SME.index
IDE_SME = IDE_SME.groupby(['index']).agg('sum')
IDE_BFirm = MultiQues(BFirm_sector ,Q9)
IDE_BFirm.rename(columns={1:"Big Firms"},inplace='True')
IDE_BFirm.rename(index={' Visual Studio Code (VSCode) ':'Visual Studio',' Jupyter Notebook':'Jupyter (JupyterLab, Jupyter Notebooks, etc) ',' Visual Studio ':'Visual Studio'},inplace=True)
IDE_BFirm['index'] = IDE_BFirm.index
IDE_BFirm = IDE_BFirm.groupby(['index']).agg('sum')
figure(IDE_BFirm,IDE_SME,'IDEs used in Large Firms and SMEs')


Moving forward Sarah intends to know about the infrastructure, as it can become a seriously large cost to purchase and maintain the infrastructure. If the trend between Large Firms and SME is diverging in infrastructure then it can be a challenge because being SME she might not be able to invest in infrastructure at scale of large firms. 

In [None]:
#Which types of specialized hardware do you use on a regular basis?  
Q12=['Q12_Part_1','Q12_Part_2','Q12_Part_3','Q12_Part_4','Q12_Part_5','Q12_OTHER']
HW_SME = MultiQues(SME_sector , Q12)
HW_SME.rename(columns={1:"SMEs"},inplace='True')
HW_BFirm = MultiQues(BFirm_sector , Q12)
HW_BFirm .rename(columns={1:"Big Firms"},inplace='True')
figure(HW_BFirm,HW_SME,'Specialized hardware used in Large Firms and SMEs')


In [None]:
#What type of computing platform do you use most often for your data science projects??
CompPlatform_SME = distribution(SME_sector, 'Category','Q11')
CompPlatform_SME.rename(columns={"SMEs category":"SMEs"},inplace='True')
CompPlatform_BFirm = distribution(BFirm_sector, 'Category','Q11')
CompPlatform_BFirm.rename(columns={"Big Firms category":"Big Firms"},inplace='True')
figure(CompPlatform_BFirm,CompPlatform_SME,'Computing Platform used in Large Firms and SMEs')

In [None]:
#Which of the following big data products (relational databases, data warehouses, data lakes, or similar) do you use on a regular basis?
Q32A=['Q32_A_Part_1','Q32_A_Part_2','Q32_A_Part_3','Q32_A_Part_4','Q32_A_Part_5','Q32_A_Part_6','Q32_A_Part_7','Q32_A_Part_8','Q32_A_Part_9','Q32_A_Part_10','Q32_A_Part_11','Q32_A_Part_12','Q32_A_Part_13','Q32_A_Part_14','Q32_A_Part_15','Q32_A_Part_16','Q32_A_Part_17','Q32_A_Part_18','Q32_A_Part_19','Q32_A_Part_20','Q32_A_OTHER']
BigDataProduct_SME = MultiQues(SME_sector , Q32A)
BigDataProduct_SME.rename(columns={1:"SMEs"},inplace='True')
BigDataProduct_BFirm = MultiQues(BFirm_sector , Q32A)
BigDataProduct_BFirm .rename(columns={1:"Big Firms"},inplace='True')
figure(BigDataProduct_BFirm,BigDataProduct_SME,'Big Data Products used in Large Firms and SMEs')

However, Google Cloud is more famous in SMEs while Amazon Web Services and Microsoft Azure are more famous in Large firms. Sarah needs to delve in further to understand this divergence of adaptation of cloud platform, however, as cloud are not upfront investments therefore, she hopes it will be managed in her context.

In [None]:
#Which of the following cloud computing platforms do you use on a regular basis?
Q27A=['Q27_A_Part_1','Q27_A_Part_2','Q27_A_Part_3','Q27_A_Part_4','Q27_A_Part_5','Q27_A_Part_6','Q27_A_Part_7','Q27_A_Part_8','Q27_A_Part_9','Q27_A_Part_10','Q27_A_Part_11','Q27_A_OTHER']
CCPlatform_SME = MultiQues(SME_sector , Q27A)
CCPlatform_SME.rename(columns={1:"SMEs"},inplace='True')
CCPlatform_BFirm = MultiQues(BFirm_sector , Q27A)
CCPlatform_BFirm .rename(columns={1:"Big Firms"},inplace='True')
figure(CCPlatform_BFirm,CCPlatform_SME,'Cloud Computing Platforms used in Large Firms and SMEs')

Delving into the techniques, it also exhibits similar patterns as following techniques are equally popular and adapted by Large firms and SMEs:

In [None]:
#What data visualization libraries or tools do you use on a regular basis?
Q14=['Q14_Part_1','Q14_Part_2','Q14_Part_3','Q14_Part_4','Q14_Part_5','Q14_Part_6','Q14_Part_7','Q14_Part_8','Q14_Part_9','Q14_Part_10','Q14_Part_11','Q14_OTHER']
VisualLib_SME = MultiQues(SME_sector , Q14)
VisualLib_SME.rename(columns={1:"SMEs"},inplace='True')
VisualLib_BFirm = MultiQues(BFirm_sector , Q14)
VisualLib_BFirm .rename(columns={1:"Big Firms"},inplace='True')
figure(VisualLib_BFirm,VisualLib_SME,'Visualization Libraries used in Large Firms and SMEs')

In [None]:
#Which of the following machine learning frameworks do you use on a regular basis?
Q16=['Q16_Part_1','Q16_Part_2','Q16_Part_3','Q16_Part_4','Q16_Part_5','Q16_Part_6','Q16_Part_7','Q16_Part_8','Q16_Part_9','Q16_Part_10','Q16_Part_11','Q16_Part_12','Q16_Part_13','Q16_Part_14','Q16_Part_15','Q16_Part_16','Q16_OTHER']
MLFrameWork_SME = MultiQues(SME_sector , Q16)
MLFrameWork_SME.rename(columns={1:"SMEs"},inplace='True')
MLFrameWork_BFirm = MultiQues(BFirm_sector , Q16)
MLFrameWork_BFirm .rename(columns={1:"Big Firms"},inplace='True')
figure(MLFrameWork_BFirm,MLFrameWork_SME,'ML FrameWorks used in Large Firms and SMEs')

In [None]:
#Which of the following ML algorithms do you use on a regular basis?
Q17=['Q17_Part_1','Q17_Part_2','Q17_Part_3','Q17_Part_4','Q17_Part_5','Q17_Part_6','Q17_Part_7','Q17_Part_8','Q17_Part_9','Q17_Part_10','Q17_Part_11','Q17_OTHER']
MLAlgo_SME = MultiQues(SME_sector , Q17)
MLAlgo_SME.rename(columns={1:"SMEs"},inplace='True')
MLAlgo_BFirm = MultiQues(BFirm_sector , Q17)
MLAlgo_BFirm .rename(columns={1:"Big Firms"},inplace='True')
figure(MLAlgo_BFirm,MLAlgo_SME,'ML Algorithms used in Large Firms and SMEs')

In [None]:
#Which categories of computer vision methods do you use on a regular basis?
Q18=['Q18_Part_1','Q18_Part_2','Q18_Part_3','Q18_Part_4','Q18_Part_5','Q18_Part_6','Q18_OTHER']
CompVisMethod_SME = MultiQues(SME_sector , Q18)
CompVisMethod_SME.rename(columns={1:"SMEs"},inplace='True')
CompVisMethod_BFirm = MultiQues(BFirm_sector , Q18)
CompVisMethod_BFirm .rename(columns={1:"Big Firms"},inplace='True')
figure(CompVisMethod_BFirm,CompVisMethod_SME,'Computer Vision Methods used in Large Firms and SMEs')

In [None]:
#Which of the following natural language processing (NLP) methods do you use on a regular basis?  
Q19=['Q19_Part_1','Q19_Part_2','Q19_Part_3','Q19_Part_4','Q19_Part_5','Q19_OTHER']
NLPMethod_SME = MultiQues(SME_sector , Q19)
NLPMethod_SME.rename(columns={1:"SMEs"},inplace='True')
NLPMethod_BFirm = MultiQues(BFirm_sector , Q19)
NLPMethod_BFirm .rename(columns={1:"Big Firms"},inplace='True')
figure(NLPMethod_BFirm,NLPMethod_SME,'NLP Methods used in Large Firms and SMEs')

However, Microsoft Power BI seems more popular in large firms while Tableau is more popular is SMEs in Business Intelligence category. The trend seems to complement the fact that large firms purchase whole Microsoft 365 for office management and Power BI is already included in the suite of Microsoft 365.

In [None]:
#Which of the following business intelligence tools do you use on a regular basis?
Q34A=['Q34_A_Part_1','Q34_A_Part_2','Q34_A_Part_3','Q34_A_Part_4','Q34_A_Part_5','Q34_A_Part_6','Q34_A_Part_7','Q34_A_Part_8','Q34_A_Part_9','Q34_A_Part_10','Q34_A_Part_11','Q34_A_Part_12','Q34_A_Part_13','Q34_A_Part_14','Q34_A_Part_15','Q34_A_Part_16','Q34_A_OTHER']
BITools_SME = MultiQues(SME_sector , Q34A)
BITools_SME.rename(columns={1:"SMEs"},inplace='True')
BITools_BFirm = MultiQues(BFirm_sector , Q34A)
BITools_BFirm .rename(columns={1:"Big Firms"},inplace='True')
figure(BITools_BFirm,BITools_SME,'Business Intelligence Tools used in Large Firms and SMEs')

In [None]:
#Do you use any tools to help manage machine learning experiments?
Q38A=['Q38_A_Part_1','Q38_A_Part_2','Q38_A_Part_3','Q38_A_Part_4','Q38_A_Part_5','Q38_A_Part_6','Q38_A_Part_7','Q38_A_Part_8','Q38_A_Part_9','Q38_A_Part_10','Q38_A_Part_11','Q38_A_OTHER']
ExpTools_SME = MultiQues(SME_sector , Q38A)
ExpTools_SME.rename(columns={1:"SMEs"},inplace='True')
ExpTools_BFirm = MultiQues(BFirm_sector , Q38A)
ExpTools_BFirm .rename(columns={1:"Big Firms"},inplace='True')
figure(ExpTools_BFirm,ExpTools_SME,'Tools used to manage ML experiemnts in Large Firms and SMEs')

In [None]:
#Where do you publicly share your data analysis or machine learning applications? 
Q39=['Q39_Part_1','Q39_Part_2','Q39_Part_3','Q39_Part_4','Q39_Part_5','Q39_Part_6','Q39_Part_7','Q39_Part_8','Q39_Part_9','Q39_OTHER']
SharePlat_SME = MultiQues(SME_sector , Q39)
SharePlat_SME.rename(columns={1:"SMEs"},inplace='True')
SharePlat_BFirm = MultiQues(BFirm_sector , Q39)
SharePlat_BFirm .rename(columns={1:"Big Firms"},inplace='True')
figure(SharePlat_BFirm,SharePlat_SME,'Shared Platforms used in Large Firms and SMEs')

# Learning and Development..

A Data scientist is a confidante of its client. Being the keeper of secrets in form of database, data scientist shall have integrity as well as capability. Where integrity is a subjective matter which evolves overtime, capability is something which can be developed with access to adequate environment and trainings. In the digital world of data, online platforms of learning are getting increasingly famous. These platforms allow data scientists to keep learning and remain updated with latest trends.

Fortunately, Kaggle survey dataset again provided much needed insights to Sarah as to what platforms are trending in data scientists of her selected niche in large firms and SMEs.

For her surprise, Coursera, Kaggle and Udemy are equally trending in large firms and SMEs. This learning pattern indicates that SMEs can come aboard with large firms with similar set of technical knowledge gained via similar learning platforms. 

In [None]:
#On which platforms have you begun or completed data science courses?
Q40=['Q40_Part_1','Q40_Part_2','Q40_Part_3','Q40_Part_4','Q40_Part_5','Q40_Part_6','Q40_Part_7','Q40_Part_8','Q40_Part_9','Q40_Part_10','Q40_Part_11','Q40_OTHER']
LearningPlat_SME = MultiQues(SME_sector , Q40)
LearningPlat_SME.rename(columns={1:"SMEs"},inplace='True')
LearningPlat_BFirm = MultiQues(BFirm_sector , Q40)
LearningPlat_BFirm .rename(columns={1:"Big Firms"},inplace='True')
figure(LearningPlat_BFirm,LearningPlat_SME,'Learning Platforms used in Large Firms and SMEs')

In [None]:
#What is your current yearly compensation (approximate $USD)?
Compensation_SME = distribution(SME_sector, 'Category','Q25')
Compensation_SME.rename(columns={"SMEs category":"SMEs"},inplace='True')
Compensation_BFirm = distribution(BFirm_sector, 'Category','Q25')
Compensation_BFirm.rename(columns={"Big Firms category":"Big Firms"},inplace='True')
figure(Compensation_BFirm,Compensation_SME,'Yearly Compensation in Large Firms and SMEs')

# The Business Plan..

Summarizing all the above information she extracted from Kaggle survey dataset, she has drafted a guiding template which will work as her business plan:

![image.png](attachment:7986678e-8a23-419a-94c5-4d89599e0d50.png)

# Road to Success..

Sarah has managed to excavate a goldmine from the Kaggle dataset. The above findings are key to make focused efforts to ensure sustainable business growth and remain ahead of the competition. The business plan will remain a living document and Sarah will adopt the market changes with new requirements, trends and technological breakthroughs. However, the focus and findings extracted from Kaggle dataset will provide much needed confidence to Sarah’s investors and it will allow Sarah to confidently enter the market to **COMPETE AND WIN..!!**