<img src="https://alessiamusio.com/other/Kaggle/Cover_KaggleSurvey.jpg" alt="cover">

In [None]:
%%HTML
    <link rel="preconnect" href="https://fonts.googleapis.com" /><link rel="preconnect" href="https://fonts.gstatic.com" crossorigin /><link href="https://fonts.googleapis.com/css2?family=Work+Sans:ital,wght@0,100;0,200;0,300;0,400;0,500;0,600;0,700;0,800;0,900;1,100;1,200;1,300;1,400;1,500;1,600;1,700;1,800;1,900&display=swap"
        rel="stylesheet" /><style>@import url("https://fonts.googleapis.com/css2?family=Work+Sans:ital,wght@0,100;0,200;0,300;0,400;0,500;0,600;0,700;0,800;0,900;1,100;1,200;1,300;1,400;1,500;1,600;1,700;1,800;1,900&display=swap");</style>
<h1 style="color:#316fa5; font-family:Work Sans; font-weight:600" id="TOC">Table of Contents</h1>
<ol class="toc_list">
  <li><a style="color:#316fa5" href="#Introduction">Introduction</a></li>
  <li><a style="color:#316fa5" href="#Methodology">Methodology</a></li>
  <li><a style="color:#316fa5" href="#Automated-Machine-Learning:-A-short-overview">Automated Machine Learning: A short overview</a></li>
  <li><a style="color:#316fa5" href="#Personas-definition">Personas definition</a></li>
  <li><a style="color:#316fa5" href="#Exploratory-Data-Analysis">Exploratory Data Analysis</a></li>
    <ol>
        <li><a style="color:#316fa5" href="#Miscellaneous">Miscellaneous</a></li>
        <li><a style="color:#316fa5" href="#Personas-analysis">Personas analysis</a></li>
    </ol>
  <li><a style="color:#316fa5" href="#Conclusions">Conclusions</a></li>
   <li><a style="color:#316fa5" href="#Breakdown">Breakdown</a></li>
     <ol>
        <li><a style="color:#316fa5" href="#Topic">Topic</a></li>
        <li><a style="color:#316fa5" href="#Data-Wrangling">Data Wrangling</a></li>
        <li><a style="color:#316fa5" href="#Data-Visualization">Data Visualization</a></li>
        <li><a style="color:#316fa5" href="#Exploratory-Matrix">Exploratory Matrix</a></li>
    </ol></ol>

<h1  style="color:#316fa5; font-family:Work Sans; font-weight:600" id="Introduction">Introduction</h1>

At the [Data Science Salon 2020 in Austin](https://roundtable.datascience.salon/what-indeeds-job-market-data-can-tell-us-about-trends-in-data-science), Indeed.com revealed that **job postings** on its site **for Data Scientists had more than tripled** since December 2013.
<br>
While this is great for data science professionals, the same study revealed that the supply is still way lower than the demand.
<br>
In a world increasingly dominated by data, companies are trying to **bridge this gap** to keep up with the competition. When it comes to hiring data scientists, top salaries may not be enough and therefore, there are those who are adapting and **exploiting new methods** that could gradually solve this issue.
<br>
**Automated machine learning (ML) tools, or AutoML**, are designed to automate many steps in the data science process; these methods have been proliferating over the past few years, making it easier to create machine learning models by **removing repetitive tasks** without requiring the expertise of many data scientists.
<br><br>
The dilemma seems to have already found a solution, that is the birth of a new professional figure: the **Citizen Data Scientist**, a concept firstly introduced by [Gartner](https://www.gartner.com/en/newsroom/press-releases/2017-01-16-gartner-says-more-than-40-percent-of-data-science-tasks-will-be-automated-by-2020) years ago.
<br>
Suddenly, thanks to AutoML methods, **other technical members** of the organization with deep domain knowledge like BI analysts, data analysts, business analysts can also become **valuable contributors** to an organization's development of ML and AI models.
<br><br>
Should data scientists be worried about these methods? How will our role evolve? But most importantly, are we aware of this trend?
<br><br>
In this survey analysis, we will try to study people's traits and behavior towards these tools to understand if, as a community, **we are ready to embrace what could become a new way of doing data science in the future**.
<br>
Before starting, how did we think about setting up this analysis?

<h1  style="color:#316fa5; font-family:Work Sans; font-weight:600" id="Methodology">Methodology</h1>

Since we are questioning the role of a future Data Scientist, why not also question **the way** in which the results of a survey **can be reported** to the data science community and how they can and should **interact**?
<br><br>
This notebook, in fact, **is not meant to be the usual data visualization**: you will not be a passive spectator whose only purpose is to digest graphs and simply read the associated evidence.
<br>
Instead, we believe that you must be an **active listener**, becoming an explorer of the story that will be presented to you: this is why **all the charts have been condensed into a single Tableau dashboard**.
<br><br>
For all the details, the section [Breakdown](#Breakdown) is what you are looking for!
<br>
Having said that, we can start.
But first of all, what is AutoML?

<h5 style="text-align:right; font-style:normal;"><a style="color:#316fa5; text-decoration:none" href="#TOC">Back to the Table of Contents â†‘</a></h5>

<h1  style="color:#316fa5; font-family:Work Sans; font-weight:600" id="Automated-Machine-Learning:-A-short-overview">Automated Machine Learning: A short overview</h1>


Automated machine learning, also referred to as automated ML or AutoML, is the process of **automating** the time-consuming, iterative tasks of the machine learning model development.<a href="#f1" id="a1"><sup>[1]</sup></a>
<br>
Several major AutoML libraries have become very popular **since 2013** with [Auto-Weka](https://www.cs.ubc.ca/labs/beta/Projects/autoweka/). The aim is always the same: to **automate one or more phases** of the classic **machine learning pipeline**, making it easier for non-experts to create machine learning models or allowing expert users to build models quicker and more efficiently.
<br>
In general, the **main components** of the pipeline that can be automated are: the initial data preparation and feature engineering,  hyperparameter optimization and model evaluation and neural architecture search.<a href="#f2" id="a2"><sup>[2]</sup></a>

Below, an image showing the **areas heavily affected by AutoML**, adapted from <a href="#f3" id="a3"><sup>[3]</sup></a> and <a href="#f4" id="a4"><sup>[4]</sup></a>.

In [None]:
from IPython.display import IFrame
from IPython.core.display import display

display(IFrame('https://alessiamusio.com/other/Kaggle/AutoML.html', '800px', '410px'))

One of the main advantages of the AutoML platforms, therefore, is the **true Data Science democratization**<a href="#f5" id="a5"><sup>[5]</sup></a>, in other words enabling a **more diverse** and larger group of users to **contribute to the data science process**.
<br>
With the economic uncertainty of these times, creating a **new class of AI/ML developers** with minimal investment allows maintaining or increasing **competitive advantage**.
<br><br>
Having said that, we are ready to dive into the data and analyze the results of the survey.

<h5 style="text-align:right; font-style:normal;"><a style="color:#316fa5; text-decoration:none" href="#TOC">Back to the Table of Contents â†‘</a></h5>

<h1  style="color:#316fa5; font-family:Work Sans; font-weight:600" id="Personas-definition">Personas definition</h1>


Before presenting you the dashboard you will explore with us, we need to **introduce** you to the **characters** that personify our story.
<br>
The analysis is based on the answer that users gave to the two questions, **Q36 A and B**, respectively on **current use and future interest in AutoML**:

> Q36-A: Do you use any automated machine learning tools (or partial AutoML tools) on a regular basis?

> Q36-B: Which categories of automated machine learning tools (or partial AutoML tools) do you hope to become more familiar with in the next 2 years?

To simplify the narrative, let's consider the **following legend** to classify the answers to the questions by distinguishing what is PartialAutoML â†’ <span style="background-color:#fcd17a">&nbsp;Partial&nbsp;</span>, FullAutoML â†’ <span style="background-color:#afdba5">&nbsp;Full&nbsp;</span>, No/None â†’ <span style="background-color:#f99d8f">&nbsp;None&nbsp;</span>, or skipped question â†’ <span style="background-color:#cecece">&nbsp;NaN&nbsp;</span>:

- <span style="background-color:#fcd17a">&nbsp;Automated data augmentation (e.g. imgaug, albumentations)&nbsp;</span>
- <span style="background-color:#fcd17a">&nbsp;Automated feature engineering/selection (e.g. tpot, boruta_py)&nbsp;</span>
- <span style="background-color:#fcd17a">&nbsp;Automated model selection (e.g. auto-sklearn, xcessiv)&nbsp;</span>
- <span style="background-color:#fcd17a">&nbsp;Automated model architecture searches (e.g. darts, enas)&nbsp;</span>
- <span style="background-color:#fcd17a">&nbsp;Automated hyperparameter tuning (e.g. hyperopt, ray.tune, Vizier)&nbsp;</span>
- <span style="background-color:#afdba5">&nbsp;Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI)&nbsp;</span>
- <span style="background-color:#f99d8f">&nbsp;No / None&nbsp;</span>
- <span style="background-color:#fcd17a">&nbsp;Other&nbsp;</span>

Having said that, here are the protagonists of our story:

| Personas    |                      AutoML Present                            |                         AutoML Future                          |
|-------------|----------------------------------------------------------------|----------------------------------------------------------------|
| Evangelist  | <span style="background-color:#afdba5">&nbsp;Full&nbsp;</span> *           | <span style="background-color:#afdba5">&nbsp;Full&nbsp;</span> *          |
| Supporter   | <span style="background-color:#afdba5">&nbsp;Full&nbsp;</span> *           | <span style="background-color:#fcd17a">&nbsp;Partial&nbsp;</span> <span style="background-color:#f99d8f">None</span> <span style="background-color:#cecece">&nbsp;NaN&nbsp;</span> |
| Believer    | <span style="background-color:#f99d8f">&nbsp;None&nbsp;</span> <span style="background-color:#fcd17a">&nbsp;Partial&nbsp;</span> <span style="background-color:#cecece">&nbsp;NaN&nbsp;</span> | <span style="background-color:#afdba5">&nbsp;Full&nbsp;</span> *          |
| Resolved    | <span style="background-color:#fcd17a">&nbsp;Partial&nbsp;</span>          | <span style="background-color:#fcd17a">&nbsp;Partial&nbsp;</span>          |
| Guarded     | <span style="background-color:#fcd17a">&nbsp;Partial&nbsp;</span>          | <span style="background-color:#cecece">&nbsp;NaN&nbsp;</span> <span style="background-color:#f99d8f">&nbsp;None&nbsp;</span>         |
| Sympathizer | <span style="background-color:#cecece">&nbsp;NaN&nbsp;</span> <span style="background-color:#f99d8f">&nbsp;None&nbsp;</span>         | <span style="background-color:#fcd17a">&nbsp;Partial&nbsp;</span>          |
| Unconcerned | <span style="background-color:#cecece">&nbsp;NaN&nbsp;</span> <span style="background-color:#f99d8f">&nbsp;None&nbsp;</span>         | <span style="background-color:#cecece">&nbsp;NaN&nbsp;</span> <span style="background-color:#f99d8f">&nbsp;None&nbsp;</span>         |

<span>*</span> It represents hybrid users who, in addition to FullAutoML, have also chosen PartialAutoML

What are their characteristics? Are there any differences? What insights have emerged? Do any of them represent the definition of Citizen Data Scientist?

<h5 style="text-align:right; font-style:normal;"><a style="color:#316fa5; text-decoration:none" href="#TOC">Back to the Table of Contents â†‘</a></h5>

<h1  style="color:#316fa5; font-family:Work Sans; font-weight:600" id="Exploratory-Data-Analysis">Exploratory Data Analysis</h1>

Imagine having <b>7 different files on your desktop</b>, each representing a <b>different CV</b> for each Persona.
<br>
Let's explore some data together, starting with Miscellaneous!

<div class='tableauPlaceholder' id='viz1638056477365' style='position: relative'><noscript><a href='#'><img alt='CV ' src='https://public.tableau.com/static/images/CV/CVsKaggleSurvey2021/CV/1_rss.png' style='border: none' /></a></noscript><object class='tableauViz' style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='CVsKaggleSurvey2021/CV' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https://public.tableau.com/static/images/CV/CVsKaggleSurvey2021/CV/1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='language' value='it-IT' /><param name='filter' value='publish=yes' /></object></div> <script type='text/javascript'> var divElement = document.getElementById('viz1638056477365'); var vizElement = divElement.getElementsByTagName('object')[0]; vizElement.style.width='800px';vizElement.style.height='1227px'; var scriptElement = document.createElement('script'); scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js'; vizElement.parentNode.insertBefore(scriptElement, vizElement); </script>

<h2  style="color:#316fa5; font-family:Work Sans; font-weight:500" id="Miscellaneous">Miscellaneous</h2>

Our starting point is to explore the <b>overall behavior</b> of the survey respondents, about <b>25k people</b>, but also to familiarize ourselves with the tool by exploring some macro results.
<br>
In the dashboard, this is called **Miscellaneous**.
<br>
To look at the all respondents, you need to view all files at once. You achieve that in **two ways**:
- you can **unselect the current file** you are looking at (by clicking again on the file name). This is the fastest way.
- you can **select all the files** by clicking them while holding **CTRL/COMMAND**

You can also see Miscellanous even if you select 2 Personas at a time. This is still considered Miscellaneous because it's a mixture of people.
<br><br>
After selecting all files, let's examine the information together.
<br>
In <b>General Information</b>, it is immediately obvious that the <b>average age is relatively low</b>, with about <b>56%</b> of respondents being under <b>30 years old</b>.
<br>
Geographically, the countries with the most respondents are <b>India and the USA</b>, which represent <b>28.7% and 10.2% of the total</b>, respectively.
<br><br>
Our hypothetical CV then moves from <b>Education&Work</b> on to <b>Skills</b>: in the former, the <b>most frequent answers</b> are reported, while in the latter, there is a <b>heatmap comparing ML Experience & Coding Experience</b>.
<br>
Generally speaking, we see that most of the respondents are <b>students</b> (26%) or <b>data scientists</b> (14%) and, if workers, employed in the Computers/Technology (25%) or Education (20%) sectors.
<br>
Almost <b>36% of people</b>, on a daily basis, <b>analyze and explore data</b> to influence business decisions. This is followed by <b>20%</b> who apply <b>ML to explore new areas</b>. The young age coincides with little experience both in coding and in Machine Learning, with more than <b>50% of respondents having less than 3 years</b>.
<br><br>
<b>A key aspect</b> on which we would like to reflect, however, is this: in the AutoML tools section on the right, it is interesting to note that <b>AutoML is either not used</b> (5%) or <b>very few are familiar with it</b>. In fact, those who are familiar with AutoML, selected for the most part Google, Microsoft and Amazon, which together accounts for only 6%.
<br><br>
What do these results mean? Are we facing a generation of young people who have not grasped the potential of these tools? How do the niches of people who have bought into it behave? Are there any Citizen Data Scientists among them?
<br>
It's time to explore our Personas. Expand the Personas sections to read about each of them in the next chapter!

In [None]:
%%HTML
<!DOCTYPE html>
<html>
<head>
<meta name="viewport" content="width=device-width, initial-scale=1">
<style>
.accordion {
  background-color: #8bb3d5;
  color: #fff;
  cursor: pointer;
  padding: 18px;
  width: 764px;
  border: none;
  text-align: left;
  outline: none;
  transition: 0.4s;
  border-radius: 0px;
}

.active, .accordion:hover {
  background-color: #316fa5;
}

.accordion:after {
  content: '\002B';
  color: #fff;
  font-weight: bold;
  float: right;
  margin-left: 5px;
}

.active:after {
  content: "\2212";
}

.panel {
  padding: 0px 18px;
  background-color: white;
  max-height: 0;
  width: 764px;
  overflow: hidden;
  transition: max-height 0.2s ease-out;
}

.list {
  padding-left: 14px;
  margin-before: 0;
  margin-after: 0;
  margin-start: 0;
  margin-end: 0;
  padding-start: 0;
}

.text {
  padding-left: 24px;
}

</style>
</head>
<body>

<h2  style="color:#316fa5; font-family:Work Sans; font-weight:500" id="Personas-analysis">Personas analysis</h2><br>

<button class="accordion">Evangelist</button>
<div class="panel">
  <p><br>
    The first Persona that we are going to analyze is the Evangelist, which represents those who have <b>at least selected FullAutoML</b> in regards the use of AutoML in the present and as a future interest.
    <br>
    We remind you that those who have selected Full and then Partial also fall into this category as hybrid users.
    <br><br>
    Only <b>20 people show</b> this particular trend, which is a niche that in our opinion is very interesting to analyze.
    <br><br>
    First of all, the age distribution is bimodal, centered on <b>22-24 and 30-34 years old</b>, 30% and 25% respectively.
    <br>
    Evangelists are also mainly Bachelor's graduates (45%), currently employed as Software Engineers (30%) in Computers/Technology (35%).
    <br><br>
    A typical day of an Evangelist differs from the average of the survey respondents.
    Most of them, in fact, replied that <b>none of the proposed activities are part of their professional role (35%)</b>. This element is partly consistent with the average type of work which is probably more <b>focused on data infrastructure</b> than data analysis per se.
    <br><br>
    Another particular feature of this category of users is that <b>all of them</b> have specified that they have <b>no experience in Machine Learning</b>. This can be seen from the matrix which presents only the <code>None</code> line as an answer to the question <code>Q15: For how many years have you used machine learning methods?</code>.
    <br><br>
    Doesn't this element, combined with the interest in AutoML, represent the <b>main characteristic of a Citizen Data Scientist</b>?
    <br>
    In the absence of specific experience in Machine Learning, AutoML plays a fundamental role in allowing them to <b>actively contribute as valuable figures to the ML field</b>.
    <br><br>
    As for the tools, <b>Google Cloud AutoML</b> is currently used by 40% of them, and 80% have specified that they will deepen their knowledge in the future.
    <br><br>
    Within the Evangelists, however, there are <b>some strange answers</b>. For instance, despite having replied that they use AutoML, in the tools section <b>40%</b> of them answered <code>No/None</code> instead of <code>Other</code>.
    <br>
    Another singularity emerges looking at three factors: (1) 15% are ML Engineers, (2) no Evangelist has experience in the ML field and (3) 25% of the people have specified that they carry out research in the ML field on a daily basis.
    <br>
    What's happening here?
    <br><br>
    <div style="padding: 30px 25px; background-color:#efefef;">
    <b>ðŸ’¡&nbsp;&nbsp;Key findings</b><br>
    <div class="text">Software engineers with no experience in ML, compensated by current use of AutoML and a willingness to study it further in the future (aka Citizen Data Scientist).</div>
    </div>
    
    <br>
    
    <div style="padding: 30px 25px 15px 25px; background-color:#efefef;">
    <b>ðŸ“Œ&nbsp;&nbsp;Questions we would like to ask</b>
    <div class="list">
    <ul>
      <li>What contribution have you brought to your job thanks to AutoML libraries?</li>
      <li>If it is true that you carry out research in the ML field, but it is equally true that you have no experience with ML, how do you manage to contribute? Do AutoML libraries play a relevant role in this?</li>
      <li>How is this interest of yours perceived within the company where you work?</li>
    </ul></div>
    </div>
    
<br></div>

<button class="accordion">Supporter</button>
<div class="panel">
  <p><br>
    Supporters, like the Evangelists, have <b>bought into FullAutoML</b> but have <b>not</b> shown equal interest <b>in the future</b>.
    <br><br>
    This cluster has 865 people, about 3.5% of the respondents, and they represent the <b>oldest group on average</b>, with more than 80% aged over 25.  
    <br>
    Mostly graduates with a Master's degree (48%), they hold a role as Data Scientists (34%) and perform some typical activities of the position, such as analyzing and understanding data (68%), exploring ML in new areas (60%) and improving existing ML models (55%).
    <br>
    This last point, which tends to require a certain technical expertise, coincides with a <b>Coding/ML experience matrix</b> that, compared to other Personas, has <b>higher percentages in the bottom right</b>.
    <br>
    As a matter of fact, <b>20% of them have more than 5 years of Coding and ML experience</b>, a factor evidently favored by the higher average age.
    <br><br>
    Another peculiarity for this type of user, in our opinion, is the interest in AutoML in the future, which is almost <br>nonexistent</b>.
    <br>
    There can be multiple reasons for this. The most reasonable one could be that those who are currently familiar with AutoML (45% of them use Google Cloud AutoML) do not want to further explore this area, perhaps because they <b>prefer to focus on other technical skills in the future</b>.
    <br><br>
    
    <div style="padding: 30px 25px; background-color:#efefef;">
    <b>ðŸ’¡&nbsp;&nbsp;Key findings</b><br>
    <div class="text">Data Scientists already employed, some with many years of coding and ML experience, who currently use AutoML but do not have as much interest in the future, probably because they are already experts or because they are interested in something else.</div>
    </div>
    
    <br>
    
    <div style="padding: 30px 25px 15px 25px; background-color:#efefef;">
    <b>ðŸ“Œ&nbsp;&nbsp;Questions we would like to ask</b>
    <div class="list">
    <ul>
      <li>What benefits are you getting from its current use?</li>
      <li>Why don't you want to deepen your knowledge in AutoML in the future? Do you think it will become irrelevant?</li>
    </ul></div>
    </div>
    

  <br></div>

<button class="accordion">Believer</button>
<div class="panel">
  <p><br>
    
    The Believer is the <b>last</b> type of Persona who specified <b>FullAutoML</b>, in this case only as a <b>future interest</b> and not as a current skill set.
    <br><br>
    If we imagine Supporters as the Yin, Believers would be the Yang.
    <br>
    They are <b>young</b> (44% are under the age of 24) and they represent 18% of all respondents.
    <br>
    40% are students or people unemployed (15%) with <b>little experience in Coding and Machine Learning</b>.
    <br><br>
    While currently using only PartialAutoML techniques, their future interest in FullAutoML methods <b>denotes awareness on the subject</b>: the Believers are the <b>likely Citizen Data Scientists of tomorrow</b>, aware of the competitive advantages of AutoML and determined to fully exploit its features.
    <br>
    It is therefore reasonable to think that the current percentage of people who analyze and understand data (24%) is destined to <b>grow</b> in the future when they start using FullAutoML.
    <br><br>
    Among the future tools they would like to become familiar with, <b>Google Cloud AutoML stands out with 72%</b>, followed by Microsoft and Amazon with more than 40% of the choices.
    <br><br>
    <div style="padding: 30px 25px; background-color:#efefef;">
    <b>ðŸ’¡&nbsp;&nbsp;Key findings</b><br>
    <div class="text">Young students, with little experience in coding and ML, likely future Citizen Data Scientists given the interest shown in AutoML, particularly in FullAutoML.</div>
    </div>
    
    <br>
    
    <div style="padding: 30px 25px 15px 25px; background-color:#efefef;">
    <b>ðŸ“Œ&nbsp;&nbsp;Questions we would like to ask</b>
    <div class="list">
    <ul>
      <li>What contribution have you brought to your job thanks to AutoML libraries?</li>
      <li>Seeing that you currently use PartialAutoML, why don't you use FullAutoML?</li>
      <li>Have you ever used or talked about AutoML during your school career?</li>
    </ul></div>
    </div>
    
    
  <br></div>

<button class="accordion">Resolved</button>
<div class="panel">
  <p><br>
    <code>Supporter : Believer = Evangelist : ?</code>
    <br>
    If we solved this equation, the result would probably be "Resolved".
    <br><br>
    The Resolved are a <b>niche</b> of people (32) who don't use or want to use FullAutoML but they have <b>only specified PartialAutoML</b>.
    <br>
    The Resolved are mainly Bachelor's graduates (47%) currently employed as Software Engineers (22%) or Data Analysts (16%).
    <br><br>
    Like the Evangelists, they too share the same particularity: the whole group has <b>no experience in Machine Learning</b>.
    <br>
    This would lead us to think that we are once again dealing with <b>Citizen Data Scientists</b>. However, this time the Resolved do <b>not</b> know or want to deepen their knowledge in <b>FullAutoML</b> tools.
    <br>
    The interpretation we tried to give is that the Resolved <b>probably</b> manage daily tasks that require a particular <b>control of the pipeline</b> they are building. That might be the reason why they didn't choose FullAutoML. Evangelists, on the other hand, would have no problem relying on FullAutoML techniques.
    <br><br>
    This interpretation explains the <b>rationale of the name</b> for this Persona: currently they benefit from PartialAutoML and they <b>don't need to scale up to Full tools</b>.
    <br><br>
    <div style="padding: 30px 25px; background-color:#efefef;">
    <b>ðŸ’¡&nbsp;&nbsp;Key findings</b><br>
    <div class="text">Software Engineers or Data Analysts with no experience in ML, who unlike their Evangelist counterpart currently use and will continue to utilize only PartialAutoML techniques. Behavior probably dictated by a need to control the ML pipeline.</div>
    </div>
    
    <br>
    
    <div style="padding: 30px 25px 15px 25px; background-color:#efefef;">
    <b>ðŸ“Œ&nbsp;&nbsp;Questions we would like to ask</b>
    <div class="list">
    <ul>
      <li>What is holding you back from adopting FullAutoML tools?</li>
    </ul></div>
    </div>
    
<br></div>

<button class="accordion">Guarded</button>
<div class="panel">
  <p><br>
    About 2000 respondents (8%) currently use PartialAutoML techniques but they are not willing to use anything in the future. Here are the Guarded.
    <br><br>
    Like the Supporters, we are again faced with a group of <b>older people</b> on average, with 78% over 25 years old.
    <br>
    The Guarded are currently employed as Data Scientists (32%) holding a Masters' degree (43%).
    <br><br>
    Their <b>name</b> describes what we believe is their <b>position towards AutoML</b>: they currently know and use PartialAutoML techniques but in the future they have <b>no intention of studying</b> the topic, as if there was generally <b>little interest or skepticism</b>.
    <br>
    Someone might reasonably argue that the same logic <b>could also be valid</b> in the case of <b>Supporters</b>, where a current knowledge of FullAutoML does not follow an equal interest in the future.
    <br>
    The assumption that the Guarded are skeptical or not interested in AutoML stems from our belief that it is <b>stronger to switch from Partial to None</b> than Full to Partial. Someone deciding not to further educate themselves instead of understanding the process better as in the case of the transition from Full to Partial means they probably are not interested.
    <br><br>
    In a future increasingly favorable towards AutoML and the Guarded being mainly Data Scientists, it's interesting to wonder if <b>their behavior with respect to AutoML will change</b>. If so, how?
    <br><br>
    <div style="padding: 30px 25px; background-color:#efefef;">
    <b>ðŸ’¡&nbsp;&nbsp;Key findings</b><br>
    <div class="text">Data Scientists who are currently familiar with PartialAutoML but have not specified any interest in AutoML in general in the future. This may indicate some skepticism towards the technology.</div>
    </div>
    
    <br>
    
    <div style="padding: 30px 25px 15px 25px; background-color:#efefef;">
    <b>ðŸ“Œ&nbsp;&nbsp;Questions we would like to ask</b>
    <div class="list">
    <ul>
      <li>Why are you not planning to delve into AutoML in the future?</li>
    </ul></div>
    </div>
    

  <br></div>

<button class="accordion">Sympathizer</button>
<div class="panel">
  <p><br>
    Young, inexperienced and with a bachelors degree, the Sympathizers are about 13% of the respondents. They currently do not know AutoML but <b>want to start exploring it</b> in the future, but only <b>PartialAutoML</b>.
    <br><br>
    To describe them we could compare them with the Believers. The Believers are those who are to willing to move from Partial to FullAutoMl. In the same way, the Sympathizers showed the <b>same tendency of interest in upgrading their knowledge</b> from None to Partial.
    <br><br>
    Like the Believers, we are faced with people who are <b>still learning</b>. In fact, the Sympathizers are very young (43.4% are under 24 years old) and at the moment either students (37%) or unemployed (12%).
    <br>
    Their desire to deepen their knowledge on the topic of AutoML is only in their favor since they are <b>intercepting a trend</b> that will most likely become dominant in the future.
    <br><br>
    The reason why they did not choose FullAutoML is also understandable in our opinion.
    <br>
    This is our theory: why go directly to FullAutoML without <b>first learning and having some fun coding</b> and building a ML pipeline from A to Z? In this scenario, PartialAutoML allows them to be more flexible and have more control.
    <br><br>
    <div style="padding: 30px 25px; background-color:#efefef;">
    <b>ðŸ’¡&nbsp;&nbsp;Key findings</b><br>
    <div class="text">Young students, with little experience in coding and ML and who in the future have shown interest in AutoML but only Partial.</div>
    </div>
    
    <br>
    
    <div style="padding: 30px 25px 15px 25px; background-color:#efefef;">
    <b>ðŸ“Œ&nbsp;&nbsp;Questions we would like to ask</b>
    <div class="list">
    <ul>
      <li>Do you see yourself as someone who wants to learn ML first and then, eventually, automate it?</li>
      <li>Why didn't you choose FullAutoML tools?</li>
    </ul></div>
    </div>
    
  <br></div>

<button class="accordion">Unconcerned</button>
<div class="panel">
  <p><br>
    About <b>56% of the people</b> who participated in the survey (more than 14k people) belong to the Unconcerned, i.e. those who <b>do not use AutoML</b> and are not interested in the future.
    <br><br>
    The Unconcerned are a fairly varied pool of people, equally composed of respondents with a Bachelor's degree (39%) and a Master's  (38%), represented by students (24%) and Data Scientists (14%).
    <br><br>
    <b>Choosing the name</b> of this group was not easy, because <b>various options</b> were possible. For example "Not Interested" or "Unaware" could also have been reasonable choices. We ended up on Unconcerned because <b>(1)</b> they don't use AutoML on a daily basis and <b>(2)</b> they don't see a future in which it will be relevant. 
    <br><br>
    1 out of 2 respondents is represented by this Persona, indicating a <b>general tendency of the community to "ignore" the topic of AutoML</b>: is it skepticism? Is it a desire to learn and have total control of the ML pipeline? Or is it a simple lack of information on current trends in Data Science? 
    <br><br>
    <div style="padding: 30px 25px; background-color:#efefef;">
    <b>ðŸ’¡&nbsp;&nbsp;Key findings</b><br>
    <div class="text">56% of survey respondents do not use AutoML tools and are not interested in further study in the future.</div>
    </div>
    
    <br>
    
    <div style="padding: 30px 25px 15px 25px; background-color:#efefef;">
    <b>ðŸ“Œ&nbsp;&nbsp;Questions we would like to ask them</b>
    <div class="list">
    <ul>
      <li>What do you think about AutoML tools?</li>
      <li>Why are you not interested in AutoML tools?</li>
    </ul></div>
    </div>
    
  <br></div>

<script>
var acc = document.getElementsByClassName("accordion");
var i;

for (i = 0; i < acc.length; i++) {
  acc[i].addEventListener("click", function() {
    this.classList.toggle("active");
    var panel = this.nextElementSibling;
    if (panel.style.maxHeight) {
      panel.style.maxHeight = null;
    } else {
      panel.style.maxHeight = panel.scrollHeight + "px";
    } 
  });
}
</script>

<h5 style="text-align:right; font-style:normal;"><a style="color:#316fa5; text-decoration:none" href="#TOC">Back to the Table of Contents â†‘</a></h5>

</body>
</html>

<h1  style="color:#316fa5; font-family:Work Sans; font-weight:600" id="Conclusions">Conclusions</h1>

Each Personas analyzed tells its **own story and its own approach** to data analysis and tools currently available.
<br>
Here is a brief **summary** of the evidence that emerged:

| Personas    |  # of respondents |AutoML Present    |   AutoML Future  | Key Insights |
|-------------|-------------------|-------------------|------------------|--------------|
| Evangelist  |20| <span style="line-height:1.5; background-color:#afdba5">&nbsp;Full&nbsp;</span> * | <span style="line-height:1.5; background-color:#afdba5">&nbsp;Full&nbsp;</span> * | Software engineers with no experience in ML, compensated by current use of AutoML and a willingness to study it further in the future (aka Citizen Data Scientist)|
| Supporter   |865| <span style="line-height:1.5; background-color:#afdba5">&nbsp;Full&nbsp;</span> * | <span style="line-height:1.5; background-color:#fcd17a">&nbsp;Partial&nbsp;</span> <span style="line-height:1.5; background-color:#f99d8f">&nbsp;None&nbsp;</span> <span style="line-height:1.5; background-color:#cecece">&nbsp;NaN&nbsp;</span> | Data Scientists already employed, some with many years of coding and ML experience, who currently use AutoML but do not have as much interest in the future, probably because they are already experts or because they are interested in something else.|
| Believer    |4658| <span style="line-height:1.5; background-color:#f99d8f">&nbsp;None&nbsp;</span> <span style="line-height:1.5; background-color:#fcd17a">&nbsp;Partial&nbsp;</span> <span style="line-height:1.5; background-color:#cecece">&nbsp;NaN&nbsp;</span> | <span style="line-height:1.5; background-color:#afdba5">&nbsp;Full&nbsp;</span> *  | Young students, with little experience in coding and ML, likely future Citizen Data Scientists given the interest shown in AutoML, particularly in FullAutoML.|
| Resolved    |32| <span style="line-height:1.5; background-color:#fcd17a">&nbsp;Partial&nbsp;</span>  | <span style="line-height:1.5; background-color:#fcd17a">&nbsp;Partial&nbsp;</span>  | Software Engineers or Data Analysts with no experience in ML, who unlike their Evangelist counterpart currently use and will continue to utilize only PartialAutoML techniques.<br>Behavior probably dictated by a need to control the ML pipeline. |
| Guarded     |2006| <span style="line-height:1.5; background-color:#fcd17a">&nbsp;Partial&nbsp;</span>          | <span style="line-height:1.5; background-color:#cecece">&nbsp;NaN&nbsp;</span> <span style="line-height:1.5; background-color:#f99d8f">&nbsp;None&nbsp;</span>         | Data Scientists who are currently familiar with PartialAutoML but have not specified any interest in AutoML in general in the future.<br>This may indicate some skepticism towards the technology.| 
| Sympathizer |3406| <span style="line-height:1.5; background-color:#cecece">&nbsp;NaN&nbsp;</span> <span style="line-height:1.5; background-color:#f99d8f">&nbsp;None&nbsp;</span>         | <span style="line-height:1.5; background-color:#fcd17a">&nbsp;Partial&nbsp;</span>          | Young students, with little experience in coding and ML and who in the future have shown interest in AutoML but only Partial.|
| Unconcerned |14339| <span style="line-height:1.5; background-color:#cecece">&nbsp;NaN&nbsp;</span> <span style="line-height:1.5; background-color:#f99d8f">&nbsp;None&nbsp;</span>         | <span style="line-height:1.5; background-color:#cecece">&nbsp;NaN&nbsp;</span> <span style="line-height:1.5; background-color:#f99d8f">&nbsp;None&nbsp;</span>         | 56% of survey respondents do not use AutoML tools and are not interested in further study in the future. |

Of course, **not all** data science challenges **can be solved using AutoML tools**. At the moment, the most **suitable use cases** are those in which the use of **black-box models** is allowed. In this case it is possible to take advantage of the simplifications that the tools provide, allowing you to **focus more on other aspects of the pipeline**.<a href="#f3" id="a3"><sup>[3]</sup></a>
<br>
Models that require more **in-depth skills** or where data modeling is particularly difficult, **still require** the experience of **qualified data scientists**. In this case, it is very likely that one **relies on PartialAutoML** and not Full techniques to have **greater control of the decision-making steps** along the design workflow.
<br><br>
In any case, we believe that it is **time to adapt** and give value to the knowledge of AutoML tools, thus favoring the **proliferation of Citizen Data Scientists**.
<br>
The advantage, as mentioned, is **twofold** and affects both **less experienced professionals and experts**. On the one hand, it becomes **more efficient and economical beneficial** to employ many of the standard Data Science activities, a trend that will be even more prevalent in the future as these tools improve. At the same time, **experienced** data scientists will be **free to take on more technically demanding tasks**, allowing them to use their skills more efficiently and innovate faster, while **increasing their job satisfaction**. This benefits both the worker and companies seeking to maximize their production and **employee retention**, as correctly pointed out here<a href="#f3" id="a3"><sup>[3]</sup></a>.
<br><br>
To conclude, it is our hope that AutoML will lead to a true **Data Science democratization**, allowing more diverse and numerous user groups to actively contribute.
<br>
Doesn't this resemble the concept of ensemble learning so common in our competition challenges here on Kaggle? ðŸ™‚

We have come to the end of this journey, the topic is clearly very interesting and if you want to share your opinion do not hesitate to leave us a comment, we would love to know what you think or if you see yourself in one of these Personas!
<br><br>
For the more curious cats, we have created a chapter specifically to explain the **background of this analysis plus a little more**. We suggest you take a look at the Breakdown!

**ALESSIA AND JACOPO**

---

<h1  style="color:#316fa5; font-family:Work Sans; font-weight:600" id="Breakdown">Breakdown</h1>

<h2  style="color:#316fa5; font-family:Work Sans; font-weight:500" id="Topic">Topic</h2>

The choice of the topic was almost **immediate** and without second thoughts.
<br>
The reason is that AutoML is often approached in terms of "Data Scientists vs AutoML", in the same way that Artificial Intelligence is portrayed as "Robots will replace us".
<br>
In our opinion, the key is **coexistence and not opposition**, Data Scientists **AND** AutoML and not **VERSUS**. We want to be masters of the changes that occur around us, professionals who evolve **WITH** change and **not DUE** to change.
<br>
We then chose AutoML to analyze the current situation in the community and from this starting point **provoke a discussion about the topic**.

<h2  style="color:#316fa5; font-family:Work Sans; font-weight:500" id="Data-Wrangling">Data Wrangling</h2>

The process of transforming the dataset was relatively **simple**: it was simply a matter of creating the Personas based on the answers to the questions and using the [`melt`](https://pandas.pydata.org/docs/reference/api/pandas.melt.html) function of pandas to unpivot the dataframe.
<br>
We considered it convenient to apply a **filter** to exclude some respondents to the survey: those who answered in **less than 2.5 minutes** (429) and those who took **more than a day** to finish the survey (218). 
<br>
The motivation is quite intuitive: whoever takes too little time is probably **not thinking much**, and viceversa, whoever takes too much time is probably doing something else and therefore, again, is not very focused.

Below, the code.

In [None]:
import numpy as np
import pandas as pd
import re
import warnings

warnings.filterwarnings('ignore')

survey = pd.read_csv('../input/kaggle-survey-2021/kaggle_survey_2021_responses.csv')
responses_in_scope = survey.copy()

###################### PREPROCESSING ######################
# Drop first row and fix time of response
responses_in_scope.drop(0, inplace=True)
responses_in_scope.reset_index(drop=True, inplace=True)
responses_in_scope.iloc[:, 0] = responses_in_scope.iloc[:, 0].astype(float)

# Replace NaN with 'NaN' in one column to detect who answered to Q36 A and B with all NaNs 
filter_nan_q36a = responses_in_scope.loc[:, responses_in_scope.columns.str.contains('Q36_A')].isna().all(1)
filter_nan_q36b = responses_in_scope.loc[:, responses_in_scope.columns.str.contains('Q36_B')].isna().all(1)
responses_in_scope.loc[filter_nan_q36a, responses_in_scope.columns.str.contains('Q36_A_Part_1')] = 'NaN'
responses_in_scope.loc[filter_nan_q36b, responses_in_scope.columns.str.contains('Q36_B_Part_1')] = 'NaN'

# Filter out who wasn't focused enough (less than 2.5 mins and more than 24 hours to complete the survey)
filter_upper = responses_in_scope.iloc[:, 0]<=(1*24*60*60)
filter_lower = responses_in_scope.iloc[:, 0]>=(2.5*60)
responses_in_scope = responses_in_scope.loc[(filter_lower & filter_upper)].reset_index(drop=True).iloc[:, 1:]

###################### PERSONAS CREATION ######################
# Conditions
partial_left = (responses_in_scope['Q36_A_Part_6'].isna() & responses_in_scope['Q36_A_Part_7'].isna() & ~responses_in_scope.loc[:, responses_in_scope.columns.str.contains('Q36_A')].isna().all(1))
partial_right = (responses_in_scope['Q36_B_Part_6'].isna() & responses_in_scope['Q36_B_Part_7'].isna() & ~responses_in_scope.loc[:, responses_in_scope.columns.str.contains('Q36_B')].isna().all(1))

full_full = ~responses_in_scope['Q36_A_Part_6'].isna() & ~responses_in_scope['Q36_B_Part_6'].isna()
full_partial = ~responses_in_scope['Q36_A_Part_6'].isna() & partial_right
full_none = ~responses_in_scope['Q36_A_Part_6'].isna() & ~responses_in_scope['Q36_B_Part_7'].isna()
full_nan = ~responses_in_scope['Q36_A_Part_6'].isna() & responses_in_scope.loc[:, responses_in_scope.columns.str.contains('Q36_B')].isna().all(1)
partial_full = partial_left & ~responses_in_scope['Q36_B_Part_6'].isna()
partial_partial = partial_left & partial_right
partial_none = partial_left & ~responses_in_scope['Q36_B_Part_7'].isna()
partial_nan = partial_left & responses_in_scope.loc[:, responses_in_scope.columns.str.contains('Q36_B')].isna().all(1)
none_full = ~responses_in_scope['Q36_A_Part_7'].isna() & ~responses_in_scope['Q36_B_Part_6'].isna()
none_partial = ~responses_in_scope['Q36_A_Part_7'].isna() & partial_right
none_none = ~responses_in_scope['Q36_A_Part_7'].isna() & ~responses_in_scope['Q36_B_Part_7'].isna()
none_nan = ~responses_in_scope['Q36_A_Part_7'].isna() & responses_in_scope.loc[:, responses_in_scope.columns.str.contains('Q36_B')].isna().all(1)
nan_full = responses_in_scope.loc[:, responses_in_scope.columns.str.contains('Q36_A')].isna().all(1) & ~responses_in_scope['Q36_B_Part_6'].isna()
nan_partial = responses_in_scope.loc[:, responses_in_scope.columns.str.contains('Q36_A')].isna().all(1) & partial_right
nan_none = responses_in_scope.loc[:, responses_in_scope.columns.str.contains('Q36_A')].isna().all(1) & ~responses_in_scope['Q36_B_Part_7'].isna()
nan_nan = responses_in_scope.loc[:, responses_in_scope.columns.str.contains('Q36_A')].isna().all(1) & responses_in_scope.loc[:, responses_in_scope.columns.str.contains('Q36_B')].isna().all(1)

# Personas creation
evangelist = full_full
supporter = full_partial | full_none | full_nan
believer = none_full | partial_full | nan_full
resolved = partial_partial
guarded = partial_nan | partial_none
sympathizer = nan_partial | none_partial
unconcerned = nan_nan | nan_none | none_nan | none_none

# Label assignment
responses_in_scope['label'] = None
responses_in_scope.loc[evangelist, 'label'] = 'evangelist'
responses_in_scope.loc[supporter, 'label'] = 'supporter'
responses_in_scope.loc[believer, 'label'] = 'believer'
responses_in_scope.loc[resolved, 'label'] = 'resolved'
responses_in_scope.loc[guarded, 'label'] = 'guarded'
responses_in_scope.loc[sympathizer, 'label'] = 'sympathizer'
responses_in_scope.loc[unconcerned, 'label'] = 'unconcerned'

###################### OUTPUT CREATION ######################
# Matrix Coding/ML experience for Tableau CV section
responses_matrix = responses_in_scope.loc[:, ['label', 'Q6', 'Q15']]

# Melt all responses for Tableau dashboards
responses_exploded = responses_in_scope.melt(
    id_vars=['label'],
    value_vars=responses_in_scope.columns[~responses_in_scope.columns.str.contains('label')])

# Columns renaming
responses_exploded.columns = ['label', 'Question', 'Answer']
responses_matrix.columns = ['label', 'Q6', 'Q15']

# Format Question column, removing redundant text such as "_part_#"
responses_exploded['Question'] = responses_exploded['Question'].apply(lambda x: re.sub('_part\w+|_other', '', x.lower()))

# Saving output for CVs and Heatmap in Tableau.
responses_exploded.loc[~(responses_exploded['Answer'].isna())].reset_index(drop=True).to_csv('personas_all_exploded.csv')

# Saving output for Coding/ML matrix
responses_matrix.to_csv('personas_coding_ml_matrix.csv')

# Little table helper for Interest in AutoML section in CV dashboard
interest_cv = pd.DataFrame({'label': ['evangelist', 'supporter', 'believer', 'resolved', 'guarded', 'sympathizer', 'unconcerned'],
                            'present': ['Full', 'Full', 'Partial/None', 'Partial', 'Partial', 'None/NaN', 'None/NaN'],
                            'future': ['Full', 'Partial/None/NaN', 'Full', 'Partial', 'None/NaN', 'Partial', 'None/NaN']
              })
interest_cv.to_csv('interest_cv.csv')

<h2  style="color:#316fa5; font-family:Work Sans; font-weight:500" id="Data-Visualization">Data Visualization</h2>

<h3  style="color:#316fa5; font-family:Work Sans; font-weight:500">Dataviz and Storytelling</h3>

What is the most effective way of communicating information? 
<br>The right answer to this question is *it depends*. It depends on the purpose of the project and on the target audience. It depends on the level of detail one wants to reach, on the visualization channel and on the tool to be used. It depends on many other variables.
<br>
For us, **the most important was delivering a message by telling a story**.

<h3  style="color:#316fa5; font-family:Work Sans; font-weight:500">CV as a visual metaphor</h3>

Based on the message we wanted to convey, creating a **traditional notebook was not the best choice**. Rather, we wanted our visual project to be **interactive** as well as eye-catching and strongly distinguished on a graphic level. For this reason, we have decided to display the CVs of our Personas, **recalling the *desktop-file* interaction** that anyone among us is familiar with. The **metaphor** used is relevant both to the **topic** of our analysis (we are in fact talking about how the working landscape in data science will transform) and to the **type** of data displayed (education, work experience, tools, skills, etc.).
<br>
The next challenge was to understand **how** to do it (with all the limitations of the case).

<h3  style="color:#316fa5; font-family:Work Sans; font-weight:500">The design process and the choice of the Tool</h3>

Why Tableau?
<br>
Out of all the data visualization tools that we master, it was the one that came **closest** to providing us with the tools to be able to achieve what we had in mind from a technical point of view. But clearly, **Tableau is not a graphic layout program**; for this reason, we encountered many difficulties. During the process we experimented with [Flourish](https://flourish.studio/), then discarded it as it was not technically suitable for what we wanted to achieve.
<br>
The final result came after **multiple tests**: the added value was certainly brought by the **continuous exchange of feedback**, demonstrating that working as a team is always enriching.

<br>
<img src="https://alessiamusio.com/other/Kaggle/Process.jpg" alt="process">
<br>

Once we identified the tool, we worked on the **graphic mask**, essential to compose the layout we had imagined.
<br>
Another aspect we worked on was to **always associate color with a measure rather than a categorical dimension**. In this way, those who interact with our views will learn to **associate** the measurement of a **count** variable with this visual aid.

<h2  style="color:#316fa5; font-family:Work Sans; font-weight:500" id="Exploratory-Matrix">Exploratory Matrix</h2>

The creation of the story and the design of the dashboard derive from a **careful exploratory analysis of the dataset**.
<br>
This took place thanks to a **simple heatmap built in Tableau** that maps the Personas and the answers to the different questions.
<br>
Since we had to make choices and simplifications, **we leave it to you so that you can explore further insights too**!

<div class='tableauPlaceholder' id='viz1638056622506' style='position: relative'><noscript><a href='#'><img alt='Percentage ' src='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Ex&#47;ExploratoryDataAnalysisKaggleSurvey2021&#47;Percentage&#47;1_rss.png' style='border: none' /></a></noscript><object class='tableauViz'  style='display:none;'><param name='host_url' value='https%3A%2F%2Fpublic.tableau.com%2F' /> <param name='embed_code_version' value='3' /> <param name='site_root' value='' /><param name='name' value='ExploratoryDataAnalysisKaggleSurvey2021&#47;Percentage' /><param name='tabs' value='no' /><param name='toolbar' value='yes' /><param name='static_image' value='https:&#47;&#47;public.tableau.com&#47;static&#47;images&#47;Ex&#47;ExploratoryDataAnalysisKaggleSurvey2021&#47;Percentage&#47;1.png' /> <param name='animate_transition' value='yes' /><param name='display_static_image' value='yes' /><param name='display_spinner' value='yes' /><param name='display_overlay' value='yes' /><param name='display_count' value='yes' /><param name='language' value='it-IT' /><param name='filter' value='publish=yes' /></object></div>                <script type='text/javascript'>                    var divElement = document.getElementById('viz1638056622506');                    var vizElement = divElement.getElementsByTagName('object')[0];                    vizElement.style.width='800px';vizElement.style.height='957px';                    var scriptElement = document.createElement('script');                    scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js';                    vizElement.parentNode.insertBefore(scriptElement, vizElement);                </script>

<h5 style="text-align:right; font-style:normal;"><a style="color:#316fa5; text-decoration:none" href="#TOC">Back to the Table of Contents â†‘</a></h5>

# <span class="title-section w3-xxlarge">References</span>

<span id="f1">1.</span> [What is automated machine learning (AutoML)?](https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/machine-learning/concept-automated-ml.md)<br>
<span id="f2">2.</span> [Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence](https://arxiv.org/pdf/2002.04803.pdf)<br>
<span id="f3">3.</span> [Rethinking AI talent strategy as automated machine learning comes of age](https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/rethinking-ai-talent-strategy-as-automated-machine-learning-comes-of-age)<br>
<span id="f4">4.</span> [Taking the Human out of Learning Applications: A Survey on Automated Machine Learning](https://arxiv.org/pdf/1810.13306.pdf)<br>
<span id="f5">5.</span> [AutoML 2.0: Is The Data Scientist Obsolete?](https://www.forbes.com/sites/cognitiveworld/2020/04/07/automl-20-is-the-data-scientist-obsolete/?sh=2e5661ef53c9)<br>