# So you are looking for a job ...

<br>

There are a few things you need to understand to maximize your chances of landing a data science job. 

**This introduction covers:**

- What kinds of data science jobs are out there in the wild 
- A general overview of job interviews, including general tips and tricks about how to do well in those interviews

<br>

**More specific interview questions in other notebooks:**

- `Machine_Learning.ipynb`
- `Statistics.ipynb`
- `Product_Knowledge.ipynb`
- `Behavioral_Assessment.ipynb`

<br>


# 6 Types of Data Scientist

<br>

Often you cannot tell what the position entails just from the job description. You would often find out more by talking to the hiring manger or data scientist(s) who work at the company. 

Broadly speaking, there are 6 types of data science roles:

1. **Analyst**
2. **Experimenter**
3. **Machine Learning Engineer**
4. **Data Pipeline Master**
5. **Sales Engineer**
6. **Specialist (Computer Vision, Deep Learning, NLP, GeoSpatial)**

<br>

## Analyst

- **Where is this found ?**

  Companies with:
  - A robust data infrastructure, i.e. good data engineers
  - An established product (mostly **not** a small start-up with fewer than 20 people)
  - An expressed goal to be more data-driven beyond just using simple analysis
  - Already a team of at least 2/3 data scientists

    <br>

- **What do you do ?**
  - Solve open-ended business problems
    - Why is user growth slow?
    - How much should be spent getting these users (Customer Life Time Value) ?
  - EDA to find trends, patterns and anomalies
  - Make reasonable assumptions about the data at hand to solve problem
  - Consider engineering constraints when you suggest features or changes
  - Use simple heuristics / statistics to get a base model rolling in a short time
  - Use machine learning model for feature selection / importance

    <br>

- **What do you need ?**
  - Product knowledge
  - Use product knowledge to define business metrics
  - Ability to explain your decisions to non-technical people
  - Strong statistical background (Frequentist statistics, Hypothesis testing, Probability)
  - Basic machine learning (Logistic / Linear Regression, Random Forest, possibly kernelized SVM)
  - SQL 
  - Basic MapReduce or Spark (Depending on what the company is using, mostly conceptual instead of code)
  - Basic data structure and algorithms (Much less than machine learning engineer)
  - Python / R

<br>

## Experimenter

- **Where is this found ?**

  - Only at bigger companies where a small change in design makes a big difference
    - Twitter
    - Google
    - Quora
    - Facebook ...
  - E-commerce / Customer facing services / software
  - Otherwise, this role would be lumped with the analyst role at smaller companies and requirements would be a bit lower

    <br>

- **What do you do ?**
  - Experimental Design 
  - A/B or multiple tesing
  - Establish causual with observational data
  - Basically various means to establish if A is better than B statisitcally speaking

    <br>

- **What do you need ?**
  - Extensive knowledge about experimental design (Have to study beyond curriculum)
  - Knowledge about causual inference
  - Strong frequentist statistics background and possibly bayesian statistics
  - SQL
  - Python / R

<br>

## Machine Learning Engineer

- **Where is this found ?**

  - Machine learning start-ups (H2O, wise.io, IBM Spark Technology Center, any machine learning as SaaS)
  - Companies big enough to have an in-house team of machine learning engineer (Apple, Twitter, Google, AirBnB, Uber)

    <br>

- **What do you do ?**

  - Implement and mantain machine learning algorithms at scale
  - Usually in a MapReduce or Spark framework
  - Contribute to production code base
  - Exercise the whole software development cycle (Source Control, Test...)

    <br>

- **What do you need ?**

  - Algorithmic and implementation details of machine learning algorithms
    - Random Forest, Logistic Regression, SVM, Gradient Boosting, Clustering ...
  - Knowledge in optimization / tuning machine learning algorithms to maximize prediction performance
    - How hyperparamters of machine learning algorithms affect performance
    - Second order gradient methods for optimization and other optimization methods
  - Fluency in operating in MapReduce or Spark
  - Almost CS level data structure and algorithms (Achievable with previous experience plus lots of studying beyond curriculum)
  - Either a strong CS background or strong machine learning background and lighter on the other to enter as a junior
  - Java / C++ / Scala / Python

<br>
   
## Data Pipeline Master
   
- **Where is this found ?**
  - Companies where you are the first data scientist(s)
  - Companies with young products that are constantly changing (a lot of the start-ups)
  - Successful companies that are not traditionally data-driven (Some sales / retail companies)
  - In some sense, every role mention requires you to build some data pipelines at some point, but this is more than the other roles mentioned 

    <br>

- **What do you do ?**
  - Extract, Transform and Load (ETL)
  - Moving data from one form of storage to another (cloud based)
  - Quality assurance on data quality and validity
  - Basic statistical analysis on data / EDA (correlation, median, mean, quantile, frequentist statistics)
  - Manage data storage and update (you will learn that on the job). Possible web scraping
  - Just about anything described on the curriculum and things mentioned in other roles
  - Depending on the companies needs. You will learn a lot about a lot

    <br>

- **What do you need ?**
  - Solid coding and data munging skills
  - Product Knowledge
  - Some CS fundamentals (Data Structure and Algorithms)
  - Basic frequentist statistics (on average not as much as an analyst)
  - Maybe logistic / linear regression (Little machine learning)
  - SQL
  - Python / R 

<br>
   
## Sales Engineer

- **Where is this found ?**

  - Companies that sells software as a service
    - **Database:** IBM, Oracle, MemSQL
    - **Machine learning / Consulting:** H2O, wise-io, Palentir
  - You will also find this under the job title Forward Deploy Engineer

    <br>
  
- **What do you do ?**
 
  - There is a range, from more sales-like to more technical, depending on the company
  - **Sales:**
    - Explain to clients the value proposition of the software or the service
    - Communicate specific needs of the client to the engineering team
    - Technical support 
  - **Technical:**
    - Design customized features on a high level based on clients' needs
    - Build prototype using the companies product for clients' needs
    - Embedded in another companies for long term projects / support (More engineering there, also rare) 
  - This role is usually less data science and more using data to design clever solutions to solve client problems
   
    <br>

- **What do you need ?**

  - Excellent communication of technical concept to clients
  - Ability to tie business to the technical (Business background is a plus)
  - Skills to handle external clients and represent the company
  - Basic data structure and algorithms (on average less than a data analyst)
  - Depending on what the company does, technical skills in that area 
  - How much technical skills will depend on the role and the company
  - Much more emphasis on the ability to translate tech to business
  - Python / Java / R ... (Depends, usually not that specific)

    <br> 

## Specialist   

- **Where is this found ?**

  - Big companies (Apple, Google, Twitter, Facebook, Pinterest ...)
  - Specialized startups (SmartNews, Shazam, Uber, Instacart) 
    
    <br>

- **What do you do ?**

  - Work on a very specific problem
    - Image recognition or matching using deep learning
    - Audio classification that requires heavy signal processing
    - Path finding for ride-sharing that require geo-spatial and AI-like algorithm
    - Fraud detection that requires special algorithm
    
    <br>

- **What do you need ?**

  - Usually a PhD in the relevant field, otherwise a Master with extensive expertises
  - Usually not hiring juniors
  - Typically a second / third job off from this program (if at all)

    <br>
    

# Data Science Interviews

In general, there are 3 rounds of interviews before a company would give you an offer

1. **Recruiter Phone Interview**
2. **Technical Phone Interview**
3. **Onsite Interview**

Some variants include the first round being a technical interview or multiple onsite interviews in rare cases or multiple rounds of technical phone interviews. You might get a take-home assignment at some point. 

Typically between each stage you have a couple of days to 3 weeks gaps. The expected waiting time between stages is closer to 1 week.

The whole process usually take from 3 to 6 weeks. 

<br>

## Take-home

It is paramount that you ace your take-home if you decide to do the take-home. Acing the take-home would make the technical interviews down the road easier and more just about explaining what your have done for the take-home. 

<br>

To ace the take-home:

- Offer extra insight / knowledge that is not being asked
- Figure out the main theme behind the assignment and add your own interpretation to the assignment
- Push your code to a private Github repo (\$7 a month) and make it object oriented and PEP8 compliant 
- Write a `readme.md` explaining what your have done, reasoning behind and what you could have done
- Alternatively write a presentation (especially if asked to) and have it up on the repo as well

<br>

## Recruiter Phone Interview

Most of you would pass if you have spent more than an hour preparing. You should always ask if the phone interview is technical or not. If not, then most likely it would be a recruiter interview, otherwise see technical phone interview below. 

<br>

To pass the interview, you should:

- Read the job description and make sure you understand it
- **Be able talk through your resume chrnologically, from college to present**
- **Focus on the relevant data science experience in broad strokes**
- Do not focus on technical details 
- **Important: Get an idea what the team does. If the recruiter does not know, then at least try to get the name of some one on the team, ideally the person who will be doing your technical interview**
- Also always ask when you are going to hear back
- If you are asked about if X to Y salary range is ok, do not say yes / no, unless it is very significantly less than what you have expected. Say it depends on other things and it is still to early to tell this is a good range

<br>



## Technical Phone Interview

This round varies from company to company, **almost all companies will do a coding screen**. This round would be done by a data scientist or a software engineer.

Apart from the coding screen, the rest of the interview could draw from one or more elements from the list below:

<br>

- **Coding screen**
  - Basic algorithms in sorting and searching, similar to the anagrams question
  - Data structure question about dictionary and list
  - Trade-offs between memory and speed
  
  <br>
  
- **Probability question**
  - Baye's theorem questions
  - Counting permutations and combination problems, given 5 coin flip, probability of observing at least 1 head
  - Distribution question, usually Poisson, Binomial or Exponential
  
  <br>

- **Product question**
  - How would you make suggestions about what we should change in the company
    - Your answer should be relevant to the role and easily implemented
    - This is an opportunity for you to show what analysis you would do as an analyst
    - Or what experiments you would run as an experimenter
    - Basically you should visit the website / use the product of the company and be prepared to answer the question
  - Basically questions would be surrounding acquisition, retention, referral and measuring value of users
  
  <br>

- **Machine learning question**
  - Mostly logistic / linear regression at this stage unless you are going for specialist or machine learning engineer role (More specific questions in another notebook)
 
<br>

## Onsite Interview

This is likely to be the last round. In terms of dress code, a button-down shirt and long jeans or a dress would be good. Most companies will have you meet with one or two data scientists, a product manager / VP, and / or software engineer. 

<br>

**Content-wise, it will be the same as the technical phone screeen, except**
- **More in depth questions**
  - **Coding:** Use of Stack and Queues / Recursion problems / Dynamic programming (Very rare)
  - **ML:** Details of logistic / linear, RF, SVM, Naive Bayes 
  - **Stats:** Statistical Power, Multiple testing corrections, Bootstrapping
  - **Behvaioral:** Team work, your weakness and strength, what can you improve on
  
  <br>
  
- **More questions about details of your previous data science experience**
  - What model did you use and why ?
  - How does the data science work you do matter in a business context ?
  - What assumptions are you making ?

<br>

**Statisitcally speaking, unreasonable brain teasers do not come up very often.**

**If they do,**

- Your thinking process is being accessed. Not how fast you are
- Stay calm
- Come up with the answer that pops first into your head (will not be the best solution)
- Vocalize almost every thought you have
- Scale down the problem to something you can count and simulate (from like 100 to 10)
- Get feedback before you try to go down one path too far
  - Slience from the interviewer is also a form of feedback (probably wrong...)
- **Try thinking about:**
  - **Sorting**
  - **Dividing in the middle and search (Binary search)**
  - **Break down into repeated tasks and do recursion**
