# Data Science Interview Questions

My approach to solving some interview questions.

## Find the Missing Number

You have an array of integers of length n spanning 0 to n with one missing. Write a function missing_number that returns the missing number in the array.

**Note:*** Complexity of O(N) required.

**Example:**

**Input:**
```
nums = [0,1,2,4,5] 
missing_number(nums) -> 3
```

In [5]:
def missing_number(nums):
    """ Return the missing number in the array
    INPUT: list, list of numbers of length n spanning 0 to n with one missing
    OUTPUT: int, missing number
    """
    nums = set(nums)
    n = len(nums) + 1
    for number in range(n):
        if number not in nums:
            return number

In [6]:
nums = [0,1,2,4,5] 
missing_number(nums)

3

## Find Bigrams

Write a function called `find_bigrams` that takes a sentence or paragraph of strings and returns a list of all bigrams.

**Input:**
```
sentence = """
Have free hours and love children? 
Drive kids to school, soccer practice 
and other activities.
"""
```

**Output:**
```
def find_bigrams(sentence) ->

 [('have', 'free'),
 ('free', 'hours'),
 ('hours', 'and'),
 ('and', 'love'),
 ('love', 'children?'),
 ('children?', 'drive'),
 ('drive', 'kids'),
 ('kids', 'to'),
 ('to', 'school,'),
 ('school,', 'soccer'),
 ('soccer', 'practice'),
 ('practice', 'and'),
 ('and', 'other'),
 ('other', 'activities.')]
```

In [6]:
def find_bigrams(sentence):
    """Returns a list of all bigrams
    INPUT: str, sentence or paragrah of strings
    OUTPUT: list, list of all bigrams
    """
    
    word_list = sentence.split()
    bigram_list = []
    
    for i in range(len(word_list) - 1):
        bigram = (word_list[i].strip().lower(), word_list[i+1].strip().lower())
        bigram_list.append(bigram)
    return bigram_list

In [7]:
sentence = """
Have free hours and love children? 
Drive kids to school, soccer practice 
and other activities.
"""
find_bigrams(sentence)

[('have', 'free'),
 ('free', 'hours'),
 ('hours', 'and'),
 ('and', 'love'),
 ('love', 'children?'),
 ('children?', 'drive'),
 ('drive', 'kids'),
 ('kids', 'to'),
 ('to', 'school,'),
 ('school,', 'soccer'),
 ('soccer', 'practice'),
 ('practice', 'and'),
 ('and', 'other'),
 ('other', 'activities.')]

## Recurring Character

Given a string, write a function recurring_char to find its first recurring character. Return None if there is no recurring character.

Treat upper and lower case letters as distinct characters.

Assume the input string includes no spaces.

**Example 1:**

**Input:**
```
input = "interviewquery"
```
**Output:**
```
output = "i"
```
**Example 2:**

**Input:**
```
input = "interv"
```
**Output:**
```
output = None
```

In [13]:
def recurring_char(input):
    for i in range(len(input)):
        if input[i] in input[i+1:]:
            return input[i]
        else:
            return None

In [12]:
input = "interviewquery"
recurring_char(input)

'i'

## Merge Sorted Lists

Given two sorted lists, write a function to merge them into one sorted list.

Bonus: What’s the time complexity?

**Example:**

**Input:**
```
list1 = [1,2,5]
list2 = [2,4,6]
```
**Output:**
```
def merge_list(list1,list2) -> [1,2,2,4,5,6]
```

In [50]:
import numpy as np
def merge_list(list1, list2):
    """Merge two sorted lists into one sorted list.
    """
    l = list1+list2
    l.sort()
    return l

In [51]:
list1 = [1,2,5]
list2 = [2,4,6]
print(merge_list(list1,list2))

[1, 2, 2, 4, 5, 6]


## Permutation Palindrome

Given a string str, write a function perm_palindrome to determine whether there exists a permutation of str that is a palindrome.

**Example:**

**Input:**
```
str = 'carerac'
def perm_palindrome(str) -> True
```
“carerac” returns True since it can be rearranged to form “racecar” which is a palindrome.

In [53]:
from collections import Counter

def perm_palindrome(str):
    """Determine whether there exists a permutation of str that is a palindrome
    """
    c = Counter(str)
    num_odds = 0 

    for char, count in c.items():
        if count % 2 != 0:
            num_odds += 1

    return num_odds <= 1

In [54]:
str = 'carerac'
perm_palindrome(str)

True

## Equivalent Index

Given a list of integers, find the index at which the sum of the left half of the list is equal to the right half.

If there is no index where this condition is satisfied return -1.

**Example 1:**

**Input:**
```
nums = [1, 7, 3, 5, 6]
```
**Output:**
```
equivalent_index(nums) -> 2
```
**Example 2:**

**Input:**
```
nums = [1,3,5]
```
**Output:**
```
equivalent_index(nums) -> -1
```

In [18]:
def equivalent_index(nums):
    """Find the index at which the sum of the left half of the list is equal to the right half.
    """
    
    for i in range(len(nums)):
        left = nums[:i+1]
        right = nums[i+1:] 
        if sum(left) == sum(right):
            return i
    return -1

In [19]:
nums = [1, 7, 3, 5, 6]
equivalent_index(nums)

2

In [20]:
nums = [1,3,5]
equivalent_index(nums)

-1

## Over 100 Dollars

You’re given two dataframes: transactions and products.

The transactions dataframe contains transaction ids, product ids, and the total amount of each product sold.

The products dataframe contains product ids and prices.

Write a function to return a dataframe containing every transaction with a total value of over $100. Include the total value of the transaction as a new column in the dataframe.

In [21]:
import pandas as pd

transactions = {"transaction_id" : [1, 2, 3, 4, 5], "product_id" : [101, 102, 103, 104, 105], "amount" : [3, 5, 8, 3, 2]}

products = {"product_id" : [101, 102, 103, 104, 105], "price" : [20.00, 21.00, 15.00, 16.00, 52.00]}

df_transactions = pd.DataFrame(transactions)

df_products = pd.DataFrame(products)

In [31]:
def over100(df_transactions, df_products):
    """Return a dataframe containing every transaction with a total value of over $100
    """
    df = df_transactions.merge(df_products, on="product_id")
    df["total_value"] = df.amount * df.price
    df = df[df.total_value > 100]
    df = df.drop("price", axis=1)
    return df

In [30]:
over100(df_transactions, df_products)

Unnamed: 0,transaction_id,product_id,amount,total_value
1,2,102,5,105.0
2,3,103,8,120.0
4,5,105,2,104.0


## Good Grades and Favorite Colors

You’re given a dataframe of students named `students_df`

Write a function named grades_colors to select only the rows where the student’s favorite color is green or red and their grade is above 90.

In [41]:
import pandas as pd

students = {"name" : ["Tim Voss", "Nicole Johnson", "Elsa Williams", "John James", "Catherine Jones"], "age" : [19, 20, 21, 20, 23], "favorite_color" : ["red", "yellow", "green", "blue", "green"], "grade" : [91, 95, 82, 75, 93]}

students_df = pd.DataFrame(students)

In [51]:
import numpy as np
def grades_colors(students_df):
    color_filter = np.logical_or(students_df.favorite_color == "red", students_df.favorite_color == "green")
    grade_filter = students_df.grade > 90
    return students_df[color_filter & grade_filter]

In [52]:
grades_colors(students_df)

Unnamed: 0,name,age,favorite_color,grade
0,Tim Voss,19,red,91
4,Catherine Jones,23,green,93


## Generate Normal Distribution

Write a function to generate N samples from a normal distribution and plot the histogram. You may omit the plot to test your code.

In [55]:
import scipy.stats
import matplotlib.pyplot as plt

def norm_dist(N):
    dist = scipy.stats.norm(0, 1)
    samples = dist.rvs(N)
    plt.hist(samples)
    plt.xlabel(str(N)+' samples')
    plt.ylabel('Count')
    plt.show()


<scipy.stats._distn_infrastructure.rv_frozen at 0x7fbbdd754c40>