# Profiling the TPU Users

In [None]:
%%HTML
<style type="text/css">

div.h2 {
    background-color: steelblue; 
    color: white; 
    padding: 8px; 
    padding-right: 300px; 
    font-size: 24px; 
    max-width: 1500px; 
    margin-top: 50px;
    margin-bottom:4px;
}

</style>

In yet another year of the comprehensive kaggle data science and machine learning survey,there were 20,036 responses from 171 countries.While the survey has provided a peek into the world of data scientist,in this notebook I am going to specifically focus on TPU users - those who have selected TPU's as answer for the question **Which type of specialization hardware do you use on a regular basis?**.The purpose of the notebook is to get an insight of the TPU users and answer few questions like,

* Who are the TPU users ? Are they from all over the world or they belong to typical age group,come from a specific background etc ?
* Are TPU Users also use GPU ? If so what percentage ?
* From which country the majority of TPU usage comes from ?
* What are the common problems for which the TPUs are used ? How does it differ from a regular user ?

After reading this notebook, I hope that you will be inspired to try out interesting problems using TPUs.Lets start.

# Executive Summary
<a id='exec'></a>

According to TPU [launch page](https://www.kaggle.com/product-feedback/129828) ,TPU's(tensor proccessing units) are hardware acclerators specialized in deep learning tasks and provide significantly more computational power for mixed percision and matrix multiplications.They were first developed and used by Google to process large image databases such as extracting all the text from street view.This custom-designed machine learning ASIC also powers Google products like Translate, Photos, Search, Assistant, and Gmail.[Source](https://cloud.google.com/tpu/?utm_source=google&utm_medium=cpc&utm_campaign=japac-IN-all-en-dr-bkws-all-pkws-trial-e-dr-1009137&utm_content=text-ad-none-none-DEV_c-CRE_396375661581-ADGP_Hybrid%20%7C%20AW%20SEM%20%7C%20BKWS%20~%20T2%20%7C%20EXA%20%7C%20AI%20Platform%20%7C%20M%3A1%20%7C%20IN%20%7C%20en%20%7C%20cloud%20tpu%20%7C%20pricing%20-%20PKWS-KWID_43700049545260967-kwd-842288534372&userloc_1007809-network_g&utm_term=KW_gcp%20cloud%20tpu%20price&gclid=EAIaIQobChMIr8XCtMOh7QIV6INLBR1XPgs6EAAYASAAEgJtbfD_BwE)


While the percentage of TPU users remains at 4 % ,it is to be noted that more than half of the respondents have also used GPU computing for their data science projects which suggests that the choice of hardware acclerators depends upon the problem at hand. 

The adoption is higher among the [young professionals](#1) between 1-3 years of experience and students who might have completed either Masters or Bachelors [degree](#4) who are keen to learn new technology in the field and stay relavant.This percentage may grow in the coming years with more and more TPU specific competitions,datasets are added to the Kaggle datasets.A quarter of these users are from [India](#3) and an typical user is likely to have used TPU [2-3 times](#6).More than [80 %](#5) of them would also have used GPU for hardware acclearation along with the TPUs who would have completed [courses](#11) in Coursera and Kaggle Learn to pick up deep learning skills and lay their hands on with either TPU/GPU.Both have the capability to increase the computational performance of problems involving large martrices and hence could be used depending upon the size and type of problem.

Being a product of Google,TPU's are widely [available](#7) and used in Google platforms like Colab and Kaggle by majority of data scientist where one is provided with a limited time (30 hrs weeky in Kaggle while in Colab free account should not exceed more than 12 hrs of continuous usage).This has enabled begineers and students alike to access these resourses with [zero investment](#8).Time and passion are the only investment for learning.Also,to troubleshoot problems and help the community,Kaggle has created a separate forum for [TPU](https://www.kaggle.com/tags/tpu) to encourage its growth and build better optimized networks. It has already got some amazing topics to explore and do not forget to check it out.While Sklearn might be the most used [framework](#10) for machine learning problems,Tensorflow,Keras,Pytorch frameworks are the go to frameworks for deep learning problems and close to 90 % of data scientist would prefer to use [Python](#9) as a language to code.

Though there are newer and better models growing day by day,Data scientists always prefer to start with either a [regression](#12) or tree based models before trying their hands on neural networks.But this could not turn our to be true for either a computer vision or a NLP [problem](#13) where techniques like GANs and BERT are increasingly used.

### Importing required packages

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import textwrap
import plotly.express as px

### Reading the data

In [None]:
survey_df=pd.read_csv('../input/kaggle-survey-2020/kaggle_survey_2020_responses.csv',low_memory=False)
#survey_df=pd.read_csv('../data/kaggle_survey_2020_responses.csv',low_memory=False)

In [None]:
survey_df.head()

In [None]:
##removing the first row from the df,
responses_df=survey_df[1:]

In [None]:
responses_df.head()

In [None]:
print(f'There are {responses_df.shape[0]} responses from 2020 Kaggle DS and ML survey')

In [None]:
hardware_list={'GPUs':sum(~responses_df['Q12_Part_1'].isna())*100/responses_df.shape[0],
              'TPUs':sum(~responses_df['Q12_Part_2'].isna())*100/responses_df.shape[0],
              'None':sum(~responses_df['Q12_Part_3'].isna())*100/responses_df.shape[0],
              'Other':sum(~responses_df['Q12_OTHER'].isna())*100/responses_df.shape[0]}
df_hardware=pd.DataFrame.from_dict(hardware_list,orient='index',columns=['hardware_usage_percentage']).reset_index().rename(columns={'index':'hardware'})
df_hardware.sort_values('hardware_usage_percentage',ascending=False,inplace=True)

In [None]:
def create_dict(df,key_list,qno,n):
    """
    Create a dictionary having count of multiple choice responses for a question.
    Params:
    df:dataframe
    key_list:list of multiple choice responses
    qno:question number for the key_list
    n:total multiple choice responses
    Returns:
    A dictionary with count of instances of multiple choice responses.
    """
    result={}
    for i,k in enumerate(key_list):
        if i+1==len(key_list):
            result[k]=sum(~df[f'{qno}_OTHER'].isna())
        else:
            result[k]=sum(~df[f'{qno}_Part_{i+1}'].isna())
        
    return result

In [None]:
def tpu_plot_bar_graph(df,x,y,title=None,xlabel=None,ylabel=None,rotate=False,overall=False):
    """
    Function to create a bar graph with count and % value for a chosen column.
    :Params
    df:dataframe
    x:xaxis column
    y:yaxis column
    title:title for the plot
    xlabel:x axis title for the plot
    ylabel:y axis title for the plot
    rotate:Whether to rotate the x axis title or not
    overall:Whether to plot for overall dataframe responses or only for TPU specific responses
    """
    n=df.shape[0]
    df=df.sort_values(y,ascending=False).reset_index(drop=True)
    plt.figure(figsize=(13,10))
    if overall:
        df_shape=responses_df.shape[0]
    else:
        df_shape=tpu_df.shape[0]
    #print(x,y)
    p=sns.barplot(x,y,data=df,palette=sns.color_palette('colorblind'))
    for t in p.patches:
        #print(t)
        p.annotate("{:.2f}%".format((t.get_height()*100/df_shape)), (t.get_x() + t.get_width() / 2., t.get_height()),
             ha='center', va='center', fontsize=15, color='black', xytext=(0, 10),
             textcoords='offset points')
    if rotate:
        p.set_xticklabels(p.get_xticklabels(),rotation=90)
        p.set_xticklabels([textwrap.fill(df[x].head(n)[i],width=18) for i in range(n)])
    plt.title(title,fontsize=15,fontweight='bold')
    plt.xlabel(xlabel,fontsize=10,fontweight='bold')
    plt.ylabel(ylabel,fontsize=10,fontweight='bold')
    plt.show()

In [None]:
plt.figure(figsize=(10,10))
p=sns.barplot(x='hardware',y='hardware_usage_percentage',data=df_hardware,palette=sns.color_palette('colorblind'))
plt.title('Hardware usage %',fontsize=15,fontweight='bold')
plt.xlabel('Hardware Type',fontsize=10,fontweight='bold')
plt.ylabel('% of users',fontsize=10,fontweight='bold')
for t in p.patches:
    #print(t)
    p.annotate("{:.2f}%".format(t.get_height()), (t.get_x() + t.get_width() / 2., t.get_height()),
         ha='center', va='center', fontsize=15, color='black', xytext=(0, 10),
         textcoords='offset points')
    

* The percentage of people using tpu is very nascent and it contributes to 4.8 % whereas on the other hand it is seen that ~42 % of the users have used GPU's.There is a marginal difference between the users who have used GPU's to those users who have not used either a GPU or TPUs.
* Availability of GPU's in Colab , Kaggle and a surge in ML competition related to image classification ,NLP makes it a popular tool of choice for kagglers.

In [None]:
# Q12_Part_2 - Which types of specialized hardware do you use on a regular basis?  (Select all that apply) - Selected Choice - TPUs
tpu_df=responses_df.loc[responses_df['Q12_Part_2']=='TPUs',]

<div class=h2>1.Demographic Analysis</div>

<div class=h3>1.1 Young Professionals & Students are the main users of TPU</div>
<a id="1"></a>


<a href='#exec'>Go back to Executive summary</a>

In [None]:
#For how many years have you used machine learning methods?
exp_df=tpu_df['Q15'].value_counts().sort_index().reset_index().rename(columns={'index':'Exp.Yrs','Q15':'Count'})

In [None]:
tpu_plot_bar_graph(exp_df,'Exp.Yrs','Count','Years of Experience in ML','Experience','# of TPU Users',rotate=True,overall=False)

TPU hardware is relatively new and came to widespread usage approximately an year ago.When it comes to TPU specific users,the years of experience is usually around 1-2 years which means professionals and young graduates who are starting their career as a data scientist are adopting the hardware accelerator for their ML requirements.Data analytics is a rapidly changing field with lot of concepts,techniques discovered everyday.Therefore it is important to keep up with pace,adopt and practise new technology.Inline with this,it is no surprise that the highest percentage of response comes from people with 1-2 years of experience.

In [None]:
#Select the title most similar to your current role (or most recent title if retired): - Selected Choice
role_df=tpu_df['Q5'].value_counts().sort_index().reset_index().rename(columns={'index':'Role','Q5':'Count'})

In [None]:
tpu_plot_bar_graph(role_df,'Role','Count','Current Title of TPU Users','Role','# of TPU Users',rotate=True,overall=False)

The graph seems to support our earlier hypothesis that young professionals having 1-2 years of experience are adopting TPU for their ML projects.~40 % of the TPU user would be either a Student or a data scientist.

<div class=h3>1.2 TPU Usage is spread across all Age Group</div>
<a id="2"></a> 

<a href='#exec'>Go back to Executive summary</a>

In [None]:
age_dist=tpu_df['Q1'].value_counts().sort_index().reset_index().rename(columns={'index':'age','Q1':'Count'})

In [None]:
tpu_plot_bar_graph(age_dist,'age','Count','Age Distribution of TPU Users','Age Bucket','# of TPU Users',rotate=False,overall=False)

TPU Users are spread out across all age group and we have ~ 5 % of kagglers in age group above 55 years who have adopted TPU for their deep learning requirements. This indicates that kagglers are willing to adopt and use new technology and are keeping up with latest technology advancements in the field.

<div class=h3>1.3 India is word #1 in TPU adoption</div>
<a id="3"></a> 

<a href='#exec'>Go back to Executive summary</a>

In [None]:
country_dist=tpu_df['Q3'].value_counts().sort_index().reset_index().rename(columns={'index':'Country','Q3':'Count'})

In [None]:
country_dist['Perc']=round(country_dist['Count']*100/tpu_df.shape[0],2)

In [None]:
fig=px.choropleth(country_dist,locations='Country',color='Count',
                  locationmode='country names',color_continuous_scale=px.colors.sequential.Agsunset,
                  title='TPU Usage by Region',range_color=[0,500],labels={'Count':'# of TPU Users'},hover_data={'Country':True,'Count':True,'Perc':True})
fig.update(layout=dict(title=dict(x=0.5)))
fig.show()

According to the survey,India has seen a rapid adoption of the TPU's with over 33 % of the response followed by US.These two countries contribute to a significant number of active Kaggle users and hence its very natural to expect them in top 5 list.

<div class=h3>1.4 More than half of TPU users have either Bachelors or a Masters degree</div>
<a id="4"></a> 

<a href='#exec'>Go back to Executive summary</a>

In [None]:
edu_dist=tpu_df['Q4'].value_counts().sort_index().reset_index().rename(columns={'index':'Degree','Q4':'Count'})

In [None]:
tpu_plot_bar_graph(edu_dist,"Degree","Count",'Education Status of TPU Users','Education','# of TPU Users',rotate=True,overall=False)

* Irrespective of the education qualification , TPUs are used for solving the problems .Those with masters degree and above(~50 %) have shown an increasing adoption compared to other degrees.This might be because of inherent skewness in the survey which is tilted towards bachelors and masters degree.
* This also shows that irrespective of the profession and education ,data scientists are not restricted to a specific technology and show interest to expand outside their formal education.

<div class=h2>2.Insights</div>

<div class=h3>2.1 TPU Users have also worked on problems with GPU compute</div>
<a id="5"></a> 

<a href='#exec'>Go back to Executive summary</a>

Lets filter out only the TPU users for this notebook and analyse them.

In [None]:
tpu_gpu_list={'GPUs':sum(~tpu_df['Q12_Part_1'].isna()),
              'None':sum(~tpu_df['Q12_Part_3'].isna()),
              'Other':sum(~tpu_df['Q12_OTHER'].isna())}
tpu_hardware=pd.DataFrame.from_dict(tpu_gpu_list,orient='index',columns=['Count']).reset_index().rename(columns={'index':'hardware'})
#tpu_hardware.sort_values('hardware_usage_percentage',ascending=False,inplace=True)

In [None]:
tpu_plot_bar_graph(tpu_hardware,'hardware','Count','Hardware Type','% of users',False,False)

* GPUs and TPUs are both hardware accelerators which are used to speed up complex math computations in a deep learning problem.Since both are here to solve the same problem,it is natural to think GPU as an alternative and use them.This is the reason we see that ~82 % of the responses is shared between GPU and TPU compute.
* The main advantage of using TPU is that it is run on cloud server and it lets to scale operations across different machines.GPUs dont have this advantage and run out of memory while doing the calculations.

<div class=h3>2.2 Minimum TPU Usage is between 2-5 times</div>
<a id="6"></a> 

<a href='#exec'>Go back to Executive summary</a>

In [None]:
times_tpu=tpu_df['Q13'].value_counts().sort_index().reset_index().rename(columns={'index':'tpu_times','Q13':'Count'})
tpu_plot_bar_graph(times_tpu,'tpu_times','Count','Frequency of TPU Usage','Usage Frequency','# of TPU Users',rotate=False,overall=False)

* It is not very clear from the question definition as to what is the time interval used for this frequency - is it monthly , weekly,daily ? . To approximate the results,I have considered this response to the question as minumum frequency of using a TPU.
* Using TPUs in Kaggle comes with a time restriction of 30 hrs per week and going by this calculation,it is seen that the minimum number of times a data scientist has used the tpu is between 2-5.

<div class=h3>2.3 Colab and Kaggle are the choice for TPU Compute</div>
<a id="7"></a> 

<a href='#exec'>Go back to Executive summary</a>

In [None]:
platforms=['Kaggle Notebooks',
           'Colab Notebooks',
           'Azure Notebooks',
           'Paperspace / Gradient',
           'Binder / JupyterHub',
           'Code Ocean',
           'IBM Watson Studio',
          'Amazon Sagemaker Studio',
          'Amazon EMR Notebooks',
          'Google Cloud AI Platform Notebooks',
          'Google Cloud Datalab Notebooks',
          'Databricks Collaborative Notebooks',
          'None','Other']

In [None]:
hosted_platform=create_dict(tpu_df,platforms,'Q10',len(platforms))
hosted_platform_df=pd.DataFrame.from_dict(hosted_platform,orient='index',columns=['Count']).reset_index().rename(columns={'index':'hosted_platform'})

In [None]:
tpu_plot_bar_graph(hosted_platform_df,'hosted_platform','Count','Preferred platform of Choice','Platform','# of TPU Users',True,False)

TPU is a Google-designed hardware acclerators and are available in all its cloud platforms like Colab,AI notebooks.Therefore it is no surprise to note that the top 4 platforms are from Google.In terms of percentage,Colab slightly has an edge over Kaggle followed by Google Cloud AI notebooks.Both Kaggle and Google colab offers eight TPU cores.Easy to use interface,free compute resources,less downtime might be few reasons for this preference.

<div class=h3>2.4 Data Scientist prefer free cloud resources over setting up their own DL machine</div>
<a id="8"></a> 

<a href='#exec'>Go back to Executive summary</a>

In [None]:
#Approximately how much money have you (or your team) spent on machine learning and/or cloud computing services at home (or at work) in the past 5 years (approximate $USD)?
money_spent_df=tpu_df['Q25'].value_counts().sort_index().reset_index().rename(columns={'index':'Money','Q25':'Count'})
money_spent_overall_df=responses_df['Q25'].value_counts().sort_index().reset_index().rename(columns={'index':'Money','Q25':'Count'})

In [None]:
tpu_plot_bar_graph(money_spent_df,"Money","Count",'Money Spent on Cloud/ML Services','$(USD)','# of TPU Users',rotate=True,overall=False)

In [None]:
tpu_plot_bar_graph(money_spent_overall_df,"Money","Count",'Money Spent on Cloud/ML Services(Overall)','$(USD)','# of Responses',rotate=True,overall=True)

[TPU Prototype Coral Accelearator](https://coral.ai/products/accelerator/) costs around $60 but due to the availablity TPU in cloud platforms like Kaggle and Colab,most of the data scientist who are just starting their journey (remember that our target TPU users are either students or a data scientist starting their career) use these platforms rather than buying and setting up the hardware.This is consistent with the overall view as well.

<div class=h3>2.5 Python,C++,C are the go to languages for TPU Compute</div>
<a id="9"></a> 

<a href='#exec'>Go back to Executive summary</a>

In [None]:
##What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Python
prog_lang=['Python',
               'R',
               'SQL',
               'C',
               'C++',
               'Java',
                'Javascript',
               'Julia',
               'Swift',
               'Bash',
               'MATLAB',
               'None',
               'Other']
prog_lang_dict=create_dict(tpu_df,prog_lang,'Q7',len(prog_lang))
overall_prog_lang=create_dict(responses_df,prog_lang,'Q7',len(prog_lang))

In [None]:
prog_lang_df=pd.DataFrame.from_dict(prog_lang_dict,orient='index',columns=['Count']).reset_index().rename(columns={'index':'prog_lang'})
prog_lang_overall_df=pd.DataFrame.from_dict(overall_prog_lang,orient='index',columns=['Count']).reset_index().rename(columns={'index':'prog_lang'})
tpu_plot_bar_graph(prog_lang_df,'prog_lang','Count','Preferred Programming Language(For TPU Users)','Language','# of TPU Users',rotate=False,overall=False)   

In [None]:
tpu_plot_bar_graph(prog_lang_overall_df,'prog_lang','Count','Preferred Programming Language(Overall)','Language','# of Response',rotate=False,overall=True) 

* When it comes to programming language of choice - we have a clear winner for TPU specific problems . 93 % of the data scientist code using Python.While SQL,C++ and C make it in top 5.Python frameworks like Tensorflow and Pytorch have XLAs and enable mixed percision which makes it easier to adapt and code for using TPUs.
* The result is consistent and reflects the overall trend - when it comes to language of choice- Python and SQL are used by data scientist.
* While the data used for deep learning tasks are typically big running to few GBs they could be stored either in enterprise/cloud databases and one might require SQL to query and fetch them.That might be the reason for SQL being used by 50 % of TPU users.
* Python bindings enable one to use C++ or a C code to run in Python if there are no built-in modules available in Python or to speed up this.This could be the reason why a data scientist using TPU might prefer C++ or C over R whereas when it is seen for overall data scientist,R is preferred over C++ and C.

<div class=h3>2.6 Tensorflow,Keras has a edge over Pytorch for TPU Compute</div>
<a id="10"></a> 

<a href='#exec'>Go back to Executive summary</a>

In [None]:
#Which of the following machine learning frameworks do you use on a regular basis? (Select all that apply)
ml_framework=['Scikit-learn',
             'TensorFlow',
             'Keras',
             'PyTorch',
             'Fast.ai',
             'MXNet',
             'Xgboost',
             'LightGBM',
             'CatBoost',
             'Prophet',
             'H2O 3',
             'Caret',
             'Tidymodels',
             'JAX',
             'None',
             'Other']
ml_framework_dict=create_dict(tpu_df,ml_framework,'Q16',len(ml_framework))
ml_framework_ovall_dict=create_dict(responses_df,ml_framework,'Q16',len(ml_framework))
ml_frame_df=pd.DataFrame.from_dict(ml_framework_dict,orient='index',columns=['Count']).reset_index().rename(columns={'index':'ml_framework'})
ml_frame_ovall_df=pd.DataFrame.from_dict(ml_framework_ovall_dict,orient='index',columns=['Count']).reset_index().rename(columns={'index':'ml_framework'})

In [None]:
tpu_plot_bar_graph(ml_frame_df,'ml_framework','Count','ML Framework Share for TPU Users','Framework','# of TPU Users',rotate=True,overall=False)  

In [None]:
tpu_plot_bar_graph(ml_frame_ovall_df,'ml_framework','Count','ML Framework Share(Overall)','Framework','# of Response',rotate=True,overall=True)   

* Tensorflow,Keras has an slight edge over Pytorch when it comes to framework of choice for TPU users.It is interesting to note that almost an equal percentage of users are using SKLearn for their modelling using TPU compute.While the difference is narrow for TPU users,a margin of ~20 % is noticed when overall response is considered between SKlearn and TF.

<div class=h3>2.7 Coursera and Kaggle Learn are the preferred MOOC</div>
<a id="11"></a> 

<a href='#exec'>Go back to Executive summary</a>

In [None]:
#On which platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Coursera
learn_list=['Coursera',
          'edX',
          'Kaggle Learn Courses',
          'DataCamp',
          'Fast.ai',
          'Udacity',
          'Udemy',
          'LinkedIn Learning',
          'Cloud-certification programs (direct from AWS, Azure, GCP, or similar)',
          'University Courses (resulting in a university degree)',
          'None',
          'Other']
mooc_dict=create_dict(tpu_df,learn_list,'Q37',len(learn_list))

In [None]:
mooc_df=pd.DataFrame.from_dict(mooc_dict,orient='index',columns=['Count']).reset_index().rename(columns={'index':'mooc'})

In [None]:
tpu_plot_bar_graph(mooc_df,'mooc','Count','Learning Platforms for TPU Users','Platform','# of TPU Users',rotate=True,overall=False)  

In line with the global results, Coursera is preferred MOOC among the TPU Users.The change comes in the 2nd position where Kaggle learn courses are prefered over the other MOOC platforms.Kaggle learn courses are small micro courses on a specific skill/topic with notebooks available for reading and practise.Currenly the learning stack ranges from basic introduction to Python to Reinforcement learning.Interactive notebooks,rich datasets to try out our concepts and availability of resources like TPU and GPU might be the reasons why the Kaggle learn courses are very popular among data scientists/students.

<div class=h3>2.8 Regression and Tree based models remains the top choice of algorithms</div>
<a id="12"></a> 

<a href='#exec'>Go back to Executive summary</a>

In [None]:
algos=['Linear or Logistic Regression',
      'Decision Trees or Random Forests',
      'Gradient Boosting Machines (xgboost, lightgbm, etc)',
       'Bayesian Approaches',
      'Evolutionary Approaches',
      'Dense Neural Networks (MLPs, etc)',
      'Convolutional Neural Networks',
      'Generative Adversarial Networks',
      'Recurrent Neural Networks',
      'Transformer Networks (BERT, gpt-3, etc)',
      'None',
      'Other']
algo_dict=create_dict(tpu_df,algos,'Q17',len(algos))

In [None]:
algo_df=pd.DataFrame.from_dict(algo_dict,orient='index',columns=['Count']).reset_index().rename(columns={'index':'algos'})

In [None]:
tpu_plot_bar_graph(algo_df,'algos','Count','ML Algorithms','Algorithm','# of TPU Users',rotate=True,overall=False)  

TPUs are used for training deep neural networks and we expect either RNNs or CNN based networks to be prefered over either regression or tree based networks.But,we see precisely that - regression and tree based models are adopted widely and models are built using those models before jumping into neural network based models.It is also possible that the responses would have been provided keeping in mind the overall use of the algorithm rather than being TPU specific.Therefore lets look specifically at computer vision algorithms to get a clear picture.

<div class=h3>2.9 GANs for CV ,Embeddings for NLP !!! </div>
<a id="13"></a> 

<a href='#exec'>Go back to Executive summary</a>

In [None]:
#Which categories of computer vision methods do you use on a regular basis?  (Select all that apply) - Selected Choice - General purpose image/video tools (PIL, cv2, skimage, etc)
cv_algos=['General purpose image/video tools (PIL, cv2, skimage, etc)',
      'Image segmentation methods (U-Net, Mask R-CNN, etc)',
      'Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc)',
       'Generative Networks (GAN, VAE, etc)',
      'None',
      'Other']
cv_dict=create_dict(tpu_df,cv_algos,'Q18',len(cv_algos))

In [None]:
cv_df=pd.DataFrame.from_dict(cv_dict,orient='index',columns=['Count']).reset_index().rename(columns={'index':'cv_algos'})

In [None]:
tpu_plot_bar_graph(cv_df,'cv_algos','Count','Computer Vision Methods','CV Method','# of TPU Users',rotate=True,overall=False)  

42 % of the data scientist frequently used GANs for computer vision followed by equal number of people using image segmentation and classification methods.GANs are a clever way of training a generative model by framing the problem as a supervised learning problem with two sub-models: the generator model that we train to generate new examples, and the discriminator model that tries to classify examples as either real (from the domain) or fake (generated)[6].They have a wide application in image-to-image translation,text-to-image translation and wherever the data is less to train an adversarial model could be developed to synthesis new images for training.

In [None]:
#Which of the following natural language processing (NLP) methods do you use on a regular basis?  (Select all that apply) - Selected Choice - Word embeddings/vectors (GLoVe, fastText, word2vec)
nlp_algos=['Word embeddings/vectors (GLoVe, fastText, word2vec)',
          'Encoder-decorder models (seq2seq, vanilla transformers)',
          'Contextualized embeddings (ELMo, CoVe)',
           'Transformer language models (GPT-3, BERT, XLnet, etc)','None','Other']
nlp_dict=create_dict(tpu_df,nlp_algos,'Q19',len(nlp_algos))

In [None]:
nlp_df=pd.DataFrame.from_dict(nlp_dict,orient='index',columns=['Count']).reset_index().rename(columns={'index':'nlp_algos'})

In [None]:
tpu_plot_bar_graph(nlp_df,'nlp_algos','Count','NLP Methods','NLP Method','# of TPU Users',rotate=True,overall=False)  

On text based problems ,pre-BERT technique of word embeddings are widely used while a quarter of data scientists prefer BERT,GPT-3 as their first choice.
In our algorithmic analysis one this is clear - that data scientists prefer to start with good old established approaches like tree based/regression/embeddings before adopting neural network based models like CNNs/BERT.

## Conclusion

Relatively new to the deep learning world , TPUs appear to be giving deep learning researchers , ML engineers and students the required computational efficiency for problems that would have taken days to train on a CPU or even a GPU. This has also acclerated the developments in various fields like training on huge text corpus , computer vision problems in medicine and biology. The percentage of TPU users might be very low(~4%) according to the survey for this year but it will keep growing in the coming years and it is set to transform the field forever.

## References

1. https://www.kaggle.com/jpmiller/some-best-practices-for-analytics-reporting
2. https://www.kaggle.com/kabure/extensive-usa-youtube-eda
3. https://www.kaggle.com/rohanrao/a-deep-learning-of-deep-learning
4. https://www.kaggle.com/paultimothymooney/2020-kaggle-data-science-machine-learning-survey
5. https://plotly.com/python/choropleth-maps/
6. https://machinelearningmastery.com/what-are-generative-adversarial-networks-gans/