# Contents:

This notebook contains the questions asked in various technical interviews.

1. <font color = blue> Say you had to plan for a company's server capacity in Belgium (the country is irrelevant) in 2020, and you had access to any data you wanted about anything in the world, how would you go about doing this? </font>

From a modeling point of view, look at this as a time-series modeling question. That is try to predict the capacity at any point in time as a function of the past capacities (capacities at previous points in time) as well as past values of other features.

#### If access to any data is possible:
Look at factors like mobile usage of Google over time, number of users over time, population of Belgium over time, etc. And of course the server capacity (target variable over time).


Modeling could be ARIMA model for time-series prediction.


2. <font color = blue> It is told that it is okay to overestimate the capacity, but disastrous to underestimate it. How do you deal with this fact in your estimation? </font>
    
Install capacity that is greater than the model estimate by some arbitrary number, like 5%.
To do this in a systematic way, calculate the error bars in the estimate, and choose the upper end of the error bar.

3. <font color = blue> How to explain confidence interval to a layman person? </font>
    
If I am performing an experiment and I obtain a mean of 0.5 with 95% confidence interval, it means that while repeating the same experiemnt 100 times, I would get the mean value as 0.5 95 times.

4. <font color = blue> How to estimate the 95% confidence interval of the mean of data where the distribution function is unknown, and that is seen to be very non-symmetric and non-Gaussian? </font>
    
Use the Central Limit Theorem (CLT) to your advantage, by randomly sampling the dataset several times, and computing the means of these samples. By CLT, the sample means will be Gaussian distributed and you can then use the variance of these sample means to get the confidence interval on the dataset mean.

5. <font color = blue> Two people bid on a product and the bid values are unknown to the players </font>.

Given uniform distribution for the bid amounts, to be within \[0, 1\].

The expectation value is given by

$E(x) = \int_0^1 x p(x) dx$,

where $p(x)$ is the probability distribution function (pdf) and it is the derivative of the cumulative distribution function (cdf), given by

$F(x) = \int_0 ^ x x dx$.

CDF basically gives the probability of obtaining the value which is less than or equal to the observed/target value.

6. <font color = blue> The following questions are part of the same question.
    - You have a historical record of who wins a game, A or B. How will you decide who is strong player?
    - If the total number of games played is 99, and A wins 50, B 49,  is it too close to call who is better?
    - If there’s just one game played, and A wins, what is the standard deviation? </font>

7. <font color = blue> What is the probability that a 9 set game will go into the 9th game? </font>


## Central Limit Theorem
It establishes that, in some situations, when independent random variables are added, their properly normalized sum tends toward a normal distribution even if the original variables themselves are not normally distributed. The theorem is a key concept in probability theory because it implies that probabilistic and statistical methods that work for normal distributions can be applicable to many problems involving other types of distributions.

Median is robust to outliers, whereas mean is sensitive to it. Depending on the problem, it may be useful to use either of them.

Do you know the following:
1. Do you know SVD?
2. What are the assumptions for ANOVA?
3. What is the hardest job of translating a paper's algorithm to actual coding?
4. What is the most mathematical project you have done so far?
5. Have you played the game?
6. What ki d of projects do you want to work on?
7. What is the interpretation of feature importance of a gbm algorithm?

What happens when there is multi collinearity and which regularization to use in case there is multi collinearity.
The other were pretty standard, by which I mean a subset of the following list
Supervised vs. Unsupervised learning?
Generative vs. Discriminative models
Bias - Variance trade off. Explain what it is. Why is it a tradeoff? 
What is regularization? 
What is the trade off of using a regularization parameter? 
Regularization increases bias, decreases variance. 
Does a larger training set increase/ decrease variance? Decrease. 
When is it a good idea to get more data? 
When is it not worth it? Learning curve. 
How will increasing the depth of a random forest affect bias-variance. 
kNN with low k has high variance thus low bias. 
Draw a picture of what low/high bias low/high variance look like?
What is curse of dimensionality? How can you combat it?
Can you describe linear regression model?
What does the cost function look like? Why squared error?
What do the coefficients give you?
What metric(s) do you use to evaluate the result of a linear regression problem?
What are some assumptions of linear regression? https://www.statisticssolutions.com/assumptions-of-linear-regression/
Why is multicollinearity a problem? 
How do you combat multicollinearity?
What is the difference between L2 and L1 when talking about regularization?
How does Neural Network with one layer and one input compare to a logistic regression.


1. What's the difference between generative and discriminative models?
2. You have two models, Naive Bayes and Logistic with balanced data containing 1000 samples, in the black box. How will you check which model is better?
3. What are the axes for a ROC curve?
4. You have a baseline model with 85% accuracy and a new model with 86% accuracy? How will you decide which one is better, i.e., should you use the new model?
5. How would you tackle overfitting? What are L1 and L2 regularization?

how many 45 feet beams are required to make 100 14 feet, 50 12 feet and 27 7 feet beams? You cannot melt the 45 feet beams

Given a bar plot and imagine you are pouring water from the top, how to qualify how much water can be kept in the bar chart.

How will you use random forest for fraud detection?

# <font color = blue> Algorithm/Coding Questions </font>

### Write a function to calculate the number of anagrams in a sentence.
Input: A string containing a sentence with one or more words including punctuation. 

Output: A count of the number of anagrams (a word formed by rearranging the letters of another).

Example:

Input: “The angel pat, glean tap”

Expected output: 2

In [37]:
def find_anagram(sentence):
    sent = sentence.lower().replace(',', '').split(' ')
    d = {}
    count = 0
    for word in sent:
        anag = sorted(word)
        if anag in d.values():
            count += 1
        else:
            d[word] = anag
    return count

In [38]:
s = 'The angel pat nale, glean tap lean'
find_anagram(s)

3

### Questions 
1. Recursively reverse a string.
2. Print a number vertically with each digit on its own line, without converting to a string.
3. Remove all prime numbers from a linked list. (no mention of how large numbers could be)
4. Given a continous stream of characters, find the first non repeating character at any given point.
5. Given a tree with left, right and next pointer, update next pointer to point to the right node at same level.
6. Given two arrays containing numbers, find the difference of closest greatest of each number from left and right ?

3 2 1 7 5

0 3 2 0 7 => from left (here for number 1 since 2 and 3 are both greater, we pick the closest viz. 2)

7 7 7 0 0 => from right

7 4 5 0 7 => difference

## Candy-crush: Remove all continuous same numbers
Given a 1-d array candy crush, return the shortest array after removing all the continuous same numbers (the repeating number >= 3)

input: 1-d array \[1, 3, 3, 3, 2, 2, 2, 3, 1\], return: \[1, 1\]
Time complexity should be better than O(n^2).

In [21]:
def prepCandyPacks(arr):
    ''''
    helper func to transform inputs: candies to packs
    '''
    candyPacks = []
    
    prevCandy = None
    count = 0
    for candy in arr:
        if candy == prevCandy:
            count += 1
        else:
            if prevCandy:
                candyPacks.append((prevCandy, count))
            prevCandy = candy
            count = 1
    candyPacks.append((prevCandy, count))
    return candyPacks

In [22]:
def candyCrush(packs, k, total, memo):
    if memo.get(tuple([]), None) is not None:
        return # all can be crushed, stop exploring other crushing choices
    
    key = tuple(packs)
    if memo.get(key, None) is not None:
        return
    
    # print(packs, total)
    memo[key] = total
    if total < memo['min']:
        memo['min'] = total
        memo['remainingPacks'] = packs

    for i in range(len(packs)):
        candy, count = packs[i]
        if count >= k:
            if i > 0 and i < len(packs) - 1 and packs[i - 1][0] == packs[i + 1][0]:
                candyCrush(
                    packs[:i - 1] + 
                    [(packs[i - 1][0], packs[i - 1][1] + packs[i + 1][1])] + 
                    packs[i + 2:], 
                    k, total - count, memo)
            else:
                candyCrush(
                    packs[:i] + 
                    packs[i + 1:], 
                    k, total - count, memo)

In [35]:
def candyresult(arr):
    print('array:', arr)
    p = prepCandyPacks(arr)
    mem = dict({'min': len(arr), 'remainingPacks': p})
    candyCrush(p, 3, len(arr), mem)

    result = []
    for candy, count in mem['remainingPacks']:
        result += [candy] * count
    print('result:', result)

In [36]:
A = [1, 3, 3, 3, 2, 2, 2, 3, 1]
candyresult(A)

array: [1, 3, 3, 3, 2, 2, 2, 3, 1]
result: [1, 1]


## Count minimum number of moves to sort an array
Given an array of size $n$ such that array elements are in range from 1 to $n$. The task is to count number of move-to-front operations to arrange items as {1, 2, 3, ... $n$}. The move-to-front operation is to pick any item and place it at first position.

Examples :

Input: arr = {3, 2, 1, 4}, Output: 2, Why: First, we pull out 2 and places it on top, so the array becomes (2, 3, 1, 4). After that, pull out 1 and becomes (1, 2, 3, 4).

Input:  arr = {5, 7, 4, 3, 2, 6, 1}, Output:  6, Why: We pull elements in following order 7, 6, 5, 4, 3 and 2

Input: arr = {4, 3, 2, 1}, Output: 3

In [37]:
def minMoves(arr):
    expected = len(arr)
    
    for i in range(len(arr) - 1, -1, -1):
        if arr[i] == expected:
            expected -= 1
    return expected

In [40]:
minMoves([4,3,2,1])

3

In [51]:
def reverse_string(string):
    if len(string) == 0:
        return 'no string'
    #return string[::-1]
    # this is for simple reversing,
    # follow the steps for recurssively reversing it

    temp = string[0]
    reverse_string(string[1:])
    print(temp, end = '')

reverse_string('amanda')

adnama

In [50]:
def vert(num):
    if num < 10:
        print(num)
    else:
        if num//10:
            if vert(num//10) is not None:
                print(vert(num//10))
            print(num % 10)
vert(11967)

1
1
9
6
7


Given a continous stream of characters, find the first non repeating character at any given point.

Variations: Find the index of the first non repeating character at any given point, find all unique characters.

In [60]:
def uniqueChar(string):
    d = {}
    for x in string:
        if x in d:
            d[x] += 1
        else:
            d[x] = 1
    
    for x in string:
        if d[x] == 1:
            return x
        
    # for index do the following
    
    #index = 0
    #for x in string:
    #    if d[x] == 1:
    #        return x, index
    #    else:
    #        index += 1
    #return -1
    
    # for returning all non-repeating characters
    #return [k for k, v in d.items() if v == 1]

In [61]:
uniqueChar('aaaaannnbbdhssss')

'd'

In [3]:
class Node:
    def __init__(self, dataval=None):
        self.dataval = dataval
        self.nextval = None

class SLinkedList:
    def __init__(self):
        self.headval = None
        

In [4]:
def is_prime(num):
    if num < 2:
        return False
    for i in range(2, int(num**(1/2)) + 1):
        if num%i == 0:
            return False
    return True


def remove_node(llist):
    temp = llist.headval 
    
    if temp is not None:
        if is_prime(temp.dataval):
            # if head node is the item that needs to be removed
            
            llist.headval = temp.nextval
            return
            temp = None
     
    while temp is not None:
        if is_prime(temp.dataval):
            break
        prev = temp
        temp = temp.nextval

    # if the item to be removes is not present in the linked list
    if temp == None:
        return 
    
    prev.nextval = temp.nextval
        
    temp = None

In [9]:
llist = SLinkedList()
llist.headval = Node(1)
e2, e3 = Node(2), Node(3)
llist.headval.nextval = e2
e2.nextval = e3
e3.nextval = Node(8)

In [15]:
remove_node(llist)

In [17]:
llist.headval.dataval

1

### Write a function to find all pairs with equal sums, from an unsorted array with unique numbers.

Input: \[9, 4, 3, 1, 7, 12\]

Output: \[1, 12\] & \[4, 9\], \[3, 7\] & \[1, 9\], \[4, 12\] & \[7, 9\]

In [11]:
def pairWithEqualSum(A, n):

    # Map1 to store pairs and their sum,
    # An ordered map is used here to
    # avoid duplicate pairs of elements
    mp = {}

    # Insert all unique pairs and their
    # corresponding sum in the map
    for i in range(0, n - 1):
        for j in range(i + 1, n):
            mp[(A[i], A[j])] = A[i] + A[j]

    # Second map with key as sum and value
    # as list of pairs with that sum
    mp2 = {}

    # Start iterating first map mp and insert all
    # pairs with corresponding sum in second map mp2
    for itr in mp:
        Sum = mp[itr]
        if Sum not in mp2:
            mp2[Sum] = []
            mp2[Sum].append(itr)

    # Traverse the second map mp2, and for
    # sum with more than one pair, print
    # all pairs and the corresponding sum
    for itr in mp2:
        if len(mp2[itr]) > 1:
            print('Pairs : ', end = ' ')

    for i in range(0, len(mp2[itr])):
        print('(', mp2[itr][i][0], ',', mp2[itr][i][1], ')', end = ' ')

    print('have sum:', itr)

    # Driver Code
if __name__ == '__main__':

    A = [6, 4, 12, 10, 22, 54, 32, 42, 21, 11, 8, 2]
    n = len(A)

    pairWithEqualSum(A, n)

( 11 , 2 ) have sum: 13


## Google - Decreasing subsequences
Given an int array nums of length $n$. Split it into strictly decreasing subsequences. Output the min number of subsequences you can get by splitting.

Example 1:

Input: \[5, 2, 4, 3, 1, 6\]
Output: 3

Explanation:
You can split this array into: \[5, 2, 1\], \[4, 3\], \[6\]. And there are 3 subsequences you get.
Or you can split it into \[5, 4, 3\], \[2, 1\], \[6\]. Also 3 subsequences.
But \[5, 4, 3, 2, 1\], \[6\] is not legal because \[5, 4, 3, 2, 1\] is not a subsuquence of the original array.

Example 2:

Input: \[2, 9, 12, 13, 4, 7, 6, 5, 10\]
Output: 4

Explanation: \[2\], \[9, 4\], \[12, 10\], \[13, 7, 6, 5\]

In [None]:
d = {}
for i in range(len(arr)):
    if 

___
### Part A

John likes strings of numbers but he is very particular about the strings he likes. However, his rules are very complicated. He would like to be able to provide you with a rule and a string, and you should help by letting him know if the string satisfies the rule.

John's rules has a few symbols.

`N`: any number

`H`: higher than previous

`L`: lower than previous

For instance, for the rule `NH` '13' is acceptable but '33' and '31' are not. '020' works for `NHL`.

Additionally, combinations of these three characters can be in parentheses. This does not change their meaning. Parentheses won't be empty and will only contain these three characters.

After any character or parenthesized group, John can put a `1` or a `9`. `1` means apply 0 or 1 times. `9` means apply 0 or any number of times.

For instance, `NNN1` means the string can match `NN` or `NNN`. The rule `N(NN)1` means the string must match `N` or `NNN`. And the rule `N(NH)9` means the string must match `N`, `NNH`, `NNHNH`, `NNHNHNH`, `...`.


Write a function `f` to test strings. The function should take two strings, the first is the rule, and the second is the string to test. The rule will be at most 12 characters long. Also, each rule will contain at most 2 numbers (`1` or `9`). Output a boolean indicating whether the string fulfills the rule.

For instance, `f('NH9', '1234')` should return `True` and `f('NH9', '1233')` should return `False`.

How many strings will fulfill the rule `N(NL)1(NN)1`?

Create a rule using as few characters as possible that captures all alternating strings. These are strings where the second digit increases, the third digit decreases, the fourth digit increaes, the fifth decreases, and so on. 8 and 14263 are examples of alternating strings.

Can there be an 11-character rule that will match only one string? Yes or no, and **justify the answer** with an example or explanation.

In [None]:
def f(string1, string2):
    
    if len(string1) > 12:
        return False
    if len(string2) < len(string1):
        return False

    def rem_parentheses(string1):
        str1 = string1.split('(')[0]
        str2 = string1.split('(')[1].split(')')[0]
        str3 = str2.join(str2)
        return str3

    def check_nhl(s1, s2):    
        h_index = [pos for pos, char in enumerate(s1) if char == 'H']
        l_index = [pos for pos, char in enumerate(s1) if char == 'L']
        
        if h_index:
            for x in h_index:
                if s2[x] > s2[x-1]:
                    return True
                else:
                    return False
        if l_index:
            for x in l_index:
                if s2[x] < s2[x-1]:
                    return True
                else:
                    return False
                
    if '1' not in string1 and '9' not in string1:
        if '(' not in string1:
            return check_nhl(string1, string2)
        else:
            rule = rem_parentheses(string1)
            return check_nhl(rule, string2)

    if '1' in string1:
        if '(' not in string1:
            rule1 = string1[:string1.index('1')]
            rule2 = string1[:string1.index('1')] + string1[string1.index('1') - 1]
            return check_nhl(rule1, string2) and check_nhl(rule2, string2)
        
        else:
            rule = rem_parentheses(string1)
            return check_nhl(rule, string2)

    elif '9' in string1:
        if '(' not in string1:
            rule1 = string1[:string1.index('9')]
            rule2 = string1[:string1.index('9')] + string1[string1.index('9') - 1]
            rule3 = string1[:string1.index('9')] + string1[string1.index('9') - 1] + string1[string1.index('9') - 1]
            return check_nhl(rule1, string2) and check_nhl(rule2, string2) and check_nhl(rule3, string2)
    

### Part B

We are creating a game, Sweets Crush. The game is played on a 4x4 board where each of the 16 slots contains one specific candy. There are three types of candy, taffies, lollipops, and jelly beans.

At the start the board is populated with the candy types T, L, and J. Imagine there is a bank of candies above the board that just repeats this pattern over and over again. Then during each iteration:

1. All groups of candies are removed at the same time.
    a. Groups are multiple adjacent candies of the same type.
    b. A candy is adjacent to the candies above, below, to the right, and to the left.
2. Add to the score the product of the sizes of the groups.
3. The remaining candies drop down the the board is filled using candies from the bank.

The start board will be indicated using a tuple of 4 strings that represent each row.

Example input: `('TTLJ', 'LTJL', 'TTLJ', 'LJTJ')`

This represents the following board state with copies in the bank:
```
TTLJ
LTJL
TTLJ
LJTJ
```

Iteration 1, groups are removed. There is a group of 5 Ts and a group of 2 Js. So the score goes from 0 to 10.
```
  LJ
L JL
  L 
LJT 
```

The remaining candies drop down...
```
  L 
  J 
L LJ
LJTL
```

...and the board is filled with candies from the bank.
```
TTLJ
LTJJ
LJLJ
LJTL
```

Iteration 2, there is a group of 3 Ts, a group of 3 Ls, a group of 2 Js, and a group of 4 Js. The product of these sizes is 72 so the score becomes 82. The groups are removed...
```
  L 
    
  L
  TL
```

After the groups are removed, candies drop down and the spaces are filled using the state of the bank after Iteration 1. 
```
TTTJ
LTLJ
TJLL
LTTL
```

Write a function `g` to play Sweets Crush. It should take as input a tuple of four 4-character strings and a number of iterations `n`. It should return the total score after applying the number of iterations to the starting board.

If at any iteration there are no groups to remove, the final score will just be the current score.

`n` will be at least 1 and at most 100.

Write a starting board that would score 105 points in the first iteration.

In a single iteration, what is the highest possible score?

Given an empty board and the location and shape of all groups on the board, would you always be able to add candies to the board to match the provided groups? **Justify the answer.**

### Part C

Let's say a `tenny` integer is one where the sum of the ith digit from the left and right is 10 for all i. For instance, 24568 is a `tenny` number because 2+8, 4+6, and 5+5 are all 10. The 1st `tenny` number is 5 and the 2nd is 19.

Write a function `h` that takes an integer n and returns the nth `tenny` number. For instance, `h(1)` should return 5 and `h(10)` should return 91.

`n` will be at least 1 and at most 1 billion.

Of all the 9 digit numbers that use each digit 1-9 once, how many are `tenny`?

The 1e18th `tenny` number has how many digits?


Let A be the set of `tenny` numbers that contain 1000 digits and one or more 5s. Let B be the set of `tenny` numbers with 1001 digits and one or more 5s. Are there more numbers in A, more numbers in B, or the same in both? **Justify the answer.**

In [None]:
import math

def h(n):
    
    def get_pairs():
        list_num = {}
        list_num[1] = 5

        for i in range(1, 10):  
            nth_teeny_num = 19 + (i-1)*9
            list_num[i+1] = nth_teeny_num

        return list_num

    lst_num = get_pairs()

    for i in range(2, 11):
        nth_teeny_num = int(str(lst_num[i])[0] + str(lst_num[1]) + str(lst_num[i])[1])
        lst_num[10+i-1] = nth_teeny_num
    
    
    if n > 19:
        num = n//10

        for i in range(1, num + 1):
            for j in range(1, 11):
                nth_teeny_num = int(str(lst_num[i])[0] + str(lst_num[j]) + str(lst_num[i])[1])
                lst_num[10*(num) + i - 1] = nth_teeny_num
        return lst_num[n]
    
    return lst_num[n]