# 6 Types of Data Scientist

<br>

Often you cannot tell what the position entails just from the job description. You would often find out more by talking to the hiring manger or data scientist(s) who work at the company. 

Broadly speaking, there are 6 types of data science roles:

1. **Analyst**
2. **Experimenter**
3. **Machine Learning Engineer**
4. **Data Pipeline Master**
5. **Sales Engineer**
6. **Specialist (Computer Vision, Deep Learning, NLP, GeoSpatial)**

<br>

## Analyst

- **Where is this found ?**

  Companies with:
  - A robust data infrastructure, i.e. good data engineers
  - An established product (mostly **not** a small start-up with fewer than 20 people)
  - An expressed goal to be more data-driven beyond just using simple analysis
  - Already a team of at least 2/3 data scientists

    <br>

- **What do you do ?**
  - Solve open-ended business problems
    - Why is user growth slow?
    - How much should be spent getting these users (Customer Life Time Value) ?
  - EDA to find trends, patterns and anomalies
  - Make reasonable assumptions about the data at hand to solve problem
  - Consider engineering constraints when you suggest features or changes
  - Use simple heuristics / statistics to get a base model rolling in a short time
  - Use machine learning model for feature selection / importance

    <br>

- **What do you need ?**
  - Product knowledge
  - Use product knowledge to define business metrics
  - Ability to explain your decisions to non-technical people
  - Strong statistical background (Frequentist Statistics, Hypothesis Testing, Probability)
  - Basic machine learning (Logistic / Linear Regression, Random Forest, possibly Kernelized SVM)
  - SQL 
  - Basic MapReduce or Spark (Depending on what the company is using, mostly conceptual instead of code)
  - Basic data structure and algorithms (Much less than machine learning engineer)
  - Python / R

<br>

## Experimenter

- **Where is this found ?**

  - Only at bigger companies where a small change in design makes a big difference
    - Twitter
    - Google
    - Quora
    - Facebook ...
  - E-commerce / Customer facing services / software
  - Otherwise, this role would be lumped with the analyst role at smaller companies and requirements would be a bit lower

    <br>

- **What do you do ?**
  - Experimental Design 
  - A/B or Multiple Testing
  - Establish casual with observational data
  - Basically various means to establish if A is better than B statistically speaking

    <br>

- **What do you need ?**
  - Extensive knowledge about experimental design (Have to study beyond curriculum)
  - Knowledge about casual inference
  - Strong frequentist statistics background and possibly bayesian statistics
  - SQL
  - Python / R

<br>

## Machine Learning Engineer

- **Where is this found ?**

  - Machine learning start-ups (H2O, wise.io, IBM Spark Technology Center, any machine learning as SaaS)
  - Companies big enough to have an in-house team of machine learning engineer (Apple, Twitter, Google, AirBnB, Uber)

    <br>

- **What do you do ?**

  - Implement and mantain machine learning algorithms at scale
  - Usually in a MapReduce or Spark framework
  - Contribute to production code base
  - Exercise the whole software development cycle (Source Control, Test...)

    <br>

- **What do you need ?**

  - Algorithmic and implementation details of machine learning algorithms
    - Random Forest, Logistic Regression, SVM, Gradient Boosting, Clustering ...
  - Knowledge in optimization / tuning machine learning algorithms to maximize prediction performance
    - How hyperparamters of machine learning algorithms affect performance
    - Second order gradient methods for optimization and other optimization methods
  - Fluency in operating in MapReduce or Spark
  - Almost CS level data structure and algorithms (Achievable with previous experience plus lots of studying beyond curriculum)
  - Either a strong CS background or strong machine learning background and lighter on the other to enter as a junior
  - Java / C++ / Scala / Python

<br>
   
## Data Pipeline Master
   
- **Where is this found ?**
  - Companies where you are the first data scientist(s)
  - Companies with young products that are constantly changing (a lot of the start-ups)
  - Successful companies that are not traditionally data-driven (Some sales / retail companies)
  - In some sense, every role mention requires you to build some data pipelines at some point, but this is more than the other roles mentioned 

    <br>

- **What do you do ?**
  - Extract, Transform and Load (ETL)
  - Moving data from one form of storage to another (cloud based)
  - Quality assurance on data quality and validity
  - Basic statistical analysis on data / EDA (correlation, median, mean, quantile, frequentist statistics)
  - Manage data storage and update (you will learn that on the job). Possible web scraping
  - Just about anything described on the curriculum and things mentioned in other roles
  - Depending on the companies needs. You will learn a lot about a lot

    <br>

- **What do you need ?**
  - Solid coding and data munging skills
  - Product Knowledge
  - Some CS fundamentals (Data Structure and Algorithms)
  - Basic frequentist statistics (on average not as much as an analyst)
  - Maybe logistic / linear regression (Little machine learning)
  - SQL
  - Python / R 

<br>
   
## Sales Engineer

- **Where is this found ?**

  - Companies that sells software as a service
    - **Database:** IBM, Oracle, MemSQL
    - **Machine learning / Consulting:** H2O, wise.io, Palantir
  - You will also find this under the job title **Forward Deploy Engineer**

    <br>
  
- **What do you do ?**
 
  - There is a range, from more sales-like to more technical, depending on the company
  - **Sales:**
    - Explain to clients the value proposition of the software or the service
    - Communicate specific needs of the client to the engineering team
    - Technical support 
  - **Technical:**
    - Design customized features on a high level based on clients' needs
    - Build prototype using the companies product for clients' needs
    - Embedded in another companies for long term projects / support (More engineering there, also rare) 
  - This role is usually less data science and more using data to design clever solutions to solve client problems
   
    <br>

- **What do you need ?**

  - Excellent communication of technical concept to clients
  - Ability to tie business to the technical (Business background is a plus)
  - Skills to handle external clients and represent the company
  - Basic data structure and algorithms (on average less than a data analyst)
  - Depending on what the company does, technical skills in that area 
  - How much technical skills will depend on the role and the company
  - Much more emphasis on the ability to translate tech to business
  - Python / Java / R ... (Depends, usually not that specific)

    <br> 

## Specialist   

- **Where is this found ?**

  - Big companies (Apple, Google, Twitter, Facebook, Pinterest ...)
  - Specialized startups (SmartNews, Shazam, Uber, Instacart) 
    
    <br>

- **What do you do ?**

  - Work on a very specific problem
    - Image recognition or matching using deep learning
    - Audio classification that requires heavy signal processing
    - Path finding for ride-sharing that require geo-spatial and AI-like algorithm
    - Fraud detection that requires special algorithm
    
    <br>

- **What do you need ?**

  - Usually a PhD in the relevant field, otherwise a Master with extensive expertises
  - Usually not hiring juniors
  - Typically a second / third job off from this program (if at all)

    <br>
    