### Computer Science Fundamentals and Programming


**Essential Skills Needed by Machine Learning Engineers**
1. Computer Science Fundamentals
2. Probability and Statistics
3. Data Modeling and Evaluation
4. Applying Machine Learning Algorithms and Libraries
5. Software Engineering and System Design

### 1. Computer Science Fundamentals and Programming

**Topics:**
- Data structures: Lists, stacks, queues, strings, hash maps, vectors, matrices, classes & objects, trees, graphs, etc.
- Algorithms: Recursion, searching, sorting, optimization, dynamic programming, etc.
- Computability and complexity: P vs. NP, NP-complete problems, big-O notation, approximate algorithms, etc.
- Computer architecture: Memory, cache, bandwidth, threads & processes, deadlocks, etc.

**Questions:**
- How would you check if a linked list has cycles?
- Given two elements in a binary search tree, find their lowest common ancestor.
- Write a function to sort a given stack.
- What is the time complexity of any comparison-based sorting algorithm? Can you prove it?
- How will you find the shortest path from one node to another in a weighted graph? What if some weights are negative?
- Find all palindromic substrings in a given string.

For all such questions, you should be able to:
1. Reason about the time and space complexity of your approach (usually in big-O notation).
2. Try to aim for the lowest complexity possible.

**Extensive practice** is the only way to familiarize yourself with the different classes of problems so that you can quickly converge on an efficient solution.

Coding/interview prep platforms like InterviewBit, LeetCode and Pramp are highly beneficial for this purpose.

### 2. Probability and Statistics

**Topics:**
- Basic probability: Conditional probability, Bayes rule, likelihood, independence, etc.
- Probabilistic models: Bayes Nets, Markov Decision Processes, Hidden Markov Models, etc.
- Statistical measures: Mean, median, mode, variance, population parameters vs. sample statistics etc.
- Proximity and error metrics: Cosine similarity, mean-squared error, Manhattan and Euclidean distance, log-loss, etc.
- Distributions and random sampling: Uniform, normal, binomial, Poisson, etc.
- Analysis methods: ANOVA, hypothesis testing, factor analysis, etc.

**Questions:**
- The mean heights of men and women in a population were calculated to be M and W. What is the mean height of the total population?
- A recent poll revealed that a third of the cars in Italy are Ferraris, and that half of those are red. If you spot a red car approaching from a distance, what is the likelihood that it is a Ferrari?
- **You’re trying to find the best place to put in an advertisement banner on your website. You can make the size (thickness) small, medium or large, and choose vertical position top, middle or bottom. At least how many total page visits (n) and ad clicks (m) do you need to say with 95% confidence that one of the designs performs better than all the other possibilities?**
- The time period between consecutive eruptions of the Old Faithful geyser in Yellowstone National Park is found to have the following distribution. How would you describe/characterize it? What can you infer from it?

![4.4%20Probability_and_statistics.JPG](attachment:4.4%20Probability_and_statistics.JPG)

Remember: many machine learning algorithms have a basis in probability and statistics. Conceptual clarity of these fundamentals is extremely important, but at the same time, you must be able to relate abstract formulae with real-world quantities.

### 3. Data Modeling and Evaluation

**Topics:**
- Data preprocessing: Munging/wrangling, transforming, aggregating, etc.
- Pattern recognition: Correlations, clusters, trends, outliers & anomalies, etc.
- Dimensionality reduction: Eigenvectors, Principal Component Analysis, etc.
- Prediction: Classification, regression, sequence prediction, etc.; suitable error/accuracy metrics.
- Evaluation: Training-testing split, sequential vs. randomized cross-validation, etc.

**Questions:**
- A dairy farmer is trying to understand the factors that affect milk production of her cattle. She has been keeping logs of the daily temperature (usually 30-40°C), humidity (60-90%), feed consumption (2000-2500 kgs), and milk produced (500-1000 liters).

    - How would you begin processing the data in order to model it, with the goal of predicting liters of milk produced in a day?
    - What kind of machine learning problem is this?

- Your company is building a facial expression coding system, which needs to take input images from a standard HD 1920x1080 pixel webcam, and continuously tell whether the user is in one of the following states: neutral, happy, sad, angry or afraid. When the user’s face is not visible in the camera frame, it should indicate a special state: none.

    - What class of machine learning problems does this belong to?
    - If each pixel is made up of 3 values (for red, green, blue channels), what is the raw input data complexity (no. of dimensions) for processing each image? Is there a way to reduce the no. of dimensions?
    - How would you encode the output of the system? Explain why.

- Climate data collected over the past century reveals a cyclic pattern of rising and falling temperatures. How would you model this data (a sequence of average annual temperature values) to predict the average temperature over the next 5 years?
- Your job at an online news service is to collect text reports from around the world, and present each story as a single article with content aggregated from different sources. How would you go about designing such a system? What ML techniques would you apply?

### 4. Applying Machine Learning Algorithms and Libraries

**Topics:**
- Models: Parametric vs. nonparametric, decision tree, nearest neighbor, neural net, support vector machine, ensemble of multiple models, etc.
- Learning procedure: Linear regression, gradient descent, genetic algorithms, bagging, boosting, and other model-specific methods; regularization, hyperparameter tuning, etc.
- Tradeoffs and gotchas: Relative advantages and disadvantages, bias and variance, overfitting and underfitting, vanishing/exploding gradients, missing data, data leakage, etc.

**Questions:**
- You’re trying to classify images of cats and dogs. Plotting the images in some transformed 2-dimensional feature space reveals the following pattern (on the left). In some other space, images of dogs and wolves show a different pattern (on the right).
    - What model would you use to classify cats vs. dogs, and what would you use for dogs vs. wolves? Why?
![4.4.2%20ML_Algo.JPG](attachment:4.4.2%20ML_Algo.JPG)

- I’m trying to fit a single hidden layer neural network to a given dataset, and I find that the weights are oscillating a lot over training iterations (varying wildly, often swinging between positive and negative values). What parameter do I need to tune to address this issue? **Kernel functions**
- When training a support vector machine, what value are you optimizing for?
- Lasso regression uses the L1-norm of coefficients as a penalty term, while ridge regression uses the L2-norm. Which of these regularization methods is more likely to result in sparse solutions, where one or more coefficients are exactly zero? **Lasso vs Redge Regression**
- When training a 10-layer neural net using backpropagation, I find that the weights for the top 3 layers are not changing at all! The next few layers (4-6) are changing, but very slowly. What’s going on and how do I fix this? **Dead neurons** because of ReLu activation, use **Leaky relu** instead 
- I’ve found some data about wheat-growing regions in Europe that includes annual rainfall (R, in inches), mean altitude (A, in meters) and wheat output (O, in kgs/km2). A rough analysis and some plots make me believe that output is related to the square of rainfall, and log of altitude: O = β0 + β1 × R2 + β2 × loge(A)
    - Can I fit the coefficients (β) in my model to the data using linear regression?

Machine Learning challenges such as those on Kaggle are a great way to get exposed to different kinds of problems and their nuances. Try to participate in as many as you can, and apply different machine learning models.

### 5. Software Engineering and System Design

**Topics:**
- Software interface: Library calls, REST APIs, data collection endpoints, database queries, etc.
- User interface: Capturing user inputs & application events, displaying results & visualization, etc.
- Scalability: Map-reduce, distributed processing, etc.
- Deployment: Cloud hosting, containers & instances, microservices, etc.

**Questions:**
- You run an ecommerce website. When a user clicks on an item to open its details page, you would like to suggest 5 more items that the user may be interested in, based on item features as well as the user’s purchase history, and display them at the bottom of the page. What services and database tables would you need to support this behavior? Assuming they’re available, write a query or procedure to fetch the 5 items to suggest.
- What data would you like to collect from an online video player (like YouTube) to measure user engagement and video popularity?
- A very simple spam detection system works as follows: It processes one email at a time and counts the number of occurrences of each unique word in it (term frequency), and then it compares those counts with those of previously seen emails which have been marked as spam or not. In order to scale up this system to handle a large volume of email traffic, can you design a map-reduce scheme that can run on a cluster of computers?
- You want to generate a live visualization of what portion of a webpage users are currently viewing and clicking, sort of like a heat map. What components/services/APIs do you need in place, on the client and server end, to enable this?

## Additional Resources

**General Interview Advice** <br>
Inside the Mind of a Recruiter Check out Udacity's interview with Jason Wong, a head recruiter, to get the inside scoop on what he looks for in job candidates.
https://www.udacity.com/blog/2017/07/inside-the-mind-of-a-recruiter.html

Acing Your Interview This blog post outlines guidelines for success including preparation, strategic responses, and appropriate follow-up. It also includes a bonus webinar recording from Udacity Careers VP, Kathleen Mullaney, Udacity Engineer, Art Gillespie, and Data Scientist, Katie Malone.
https://career-resource-center.udacity.com/interviews/acing-your-interview

**Phone Screening**<br>
Phone interviews are often the first stage of the hiring process – doing well will increase your odds of being called back for an on-site interview! Check out these tips for success when you get the call.

How to Ace a Developer Phone Interview Learn from Palantir how to rock a phone interview and make it to the next phase of interviewing.

**Onsite Interview**<br>
The final step before receiving a job offer is an interview with the team you would be working with in your new job. This final interview is usually on-site and comprises a behavioral and technical portion.

These interviews can be intimidating – it’s okay to feel nervous, everyone does! To make sure you're well prepared on the interview day, begin practicing for interviews well before you begin your job search to refine your interviewing skills and address anything you need more practice on.

Perfecting Body Language Feeling nervous about your interview? This article details how to have body language that communicates confidence and calmness while you are interviewing.

**Technical Questions**<br>
Coding Interview Tips Interview Cake describes easy-to-adopt behaviors that will help you succeed in the coding interview. <br>
https://www.interviewcake.com/coding-interview-tips

The Coding Interview Palantir's guide on preparing for the coding portion of your technical interview.

21 Machine Learning Interview Questions and Answers Here are some sample questions from EliteData Science to help keep your mind in shape.<br>
**https://elitedatascience.com/machine-learning-interview-questions-answers**

**More Practice!** <br>
LeetCode LeetCode has over 950 practice questions organized by difficulty, topic, and company.
https://leetcode.com/

Interviewing.io Practice interviewing with engineers from top companies, anonymously.
https://interviewing.io/

InterviewBit Practice with coding interview questions asked historically and get job referrals.
https://www.interviewbit.com/

**Books** <br>
Cracking the Coding Interview This best-selling book from Gayle Laakmann McDowell offers 189 programming questions and solutions to help you practice coding and answer technical interview questions with confidence.

Programming Interviews Exposed: Coding Your Way Through the Interview This popular guide to programming interviews includes code examples, information on the latest languages, chapters on sorting and design patterns, tips on using LinkedIn, and a downloadable app to help prepare applicants for the interview.

Elements of Programming Interviews: The Insiders’ Guide This book from Adnan Aziz, Tsung-Hsien Lee, and Amit Prakash features a great compilation of programming-related problems for interview prep and general refreshers.

