https://app.datacamp.com/learn/courses/practicing-coding-interview-questions-in-python

#### List methods
Let's practice list methods!

Let's imagine a situation: you went to the market and filled your baskets (basket1 and basket2) with fruits. You wanted to have one of each kind but realized that some fruits were put in both baskets.

Task 1. Your first task is to remove everything from basket2 that is already present in basket1.

Task 2. After the removal it is reasonable to anticipate that one of the baskets might weigh more compared to the another (all fruit kinds weight the same). Therefore, the second task is to transfer some fruits from a heavier basket to the lighter one to get approximately the same weight/amount of fruits.

In [None]:
# Remove fruits from basket2 that are present in basket1
for item in basket1:
    if item in basket2:
        basket2.remove(item)

print('Basket 1: ' + str(basket1))
print('Basket 2: ' + str(basket2))

# Transfer fruits from basket1 to basket2
while len(basket1) > len(basket2):
    item_to_transfer = basket1.pop()
    basket2.append(item_to_transfer)

print('Basket 1: ' + str(basket1))
print('Basket 2: ' + str(basket2))


#### Storing data in a dictionary
The surface you see below is called circular paraboloid:

Circular Paraboloid

It can be described by the following equation:
 
 
Let's set the coefficient  to 1. Therefore, the radius at each cut will be equal to .

Your task is to create a dictionary that stores the mapping from the pair of coordinates  to the  coordinate (the lists storing considered ranges for  and  are given: range_x and range_y, respectively).

In [None]:
circ_parab = dict()

for x in range_x:
    for y in range_y:       
        # Calculate the value for z
        z = x**2 + y**2
        # Create a new key for the dictionary
        key = (x, y)
        # Create a new key-value pair      
        circ_parab[key] = z


#### String indexing and concatenation
You are presented with one of the earliest known encryption techniques - Caesar cipher. It is based on a simple shift of each letter in a message by a certain number of positions down the given alphabet. For example, given the English alphabet, a shift of 1 for 'xyz' would imply 'yza' and vice versa in case of decryption. Notice that 'z' becomes 'a' in this case.

Thus, encryption/decryption requires two arguments: text and an integer key denoting the shift (key = 1 for the example above).

Your task is to create an encryption function given the English alphabet stored in the alphabet string.

In [None]:
def encrypt(text, key):
  
    encrypted_text = ''

    # Fill in the blanks to create an encrypted text
    for char in text.lower():
        idx = (alphabet.index(char) + key) % len(alphabet)
        encrypted_text = encrypted_text + alphabet[idx]

    return encrypted_text

# Check the encryption function with the shift equals to 10
print(encrypt("datacamp", 10))


##### Operations on strings
You are given the variable text storing the following string 'StRing ObJeCts haVe mANy inTEResting pROPerTies'.

Your task is to modify this string in such a way that would result in 'string OBJECTS have MANY interesting PROPERTIES' (every other word in text is lowercased and uppercased, otherwise). You will obtain this result in three steps.

In [None]:
# Create a word list from the string stored in 'text'
word_list = text.split()

# Make every other word lowercased; otherwise - uppercased
for i in range(len(word_list)):
    if (i + 1) % 2 == 0:
        word_list[i] = word_list[i].upper()
    else:
        word_list[i] = word_list[i].lower()

print(word_list)

# Join the words back and form a new string
new_text = " ".join(word_list)
print(new_text)

#### Fixing string errors in a DataFrame
You are given the heroes dataset containing the information on different comic book heroes. However, you'll need to make some refinements in order to use this dataset further.

Comparing Eye color, Hair color, and Skin color columns, you can see that strings in the Hair color columns are capitalized, whereas in other two the strings are lowercased.

Moreover, some rows in the Gender column contain a spelling error (Fmale instead of Female).

Your task is to make the strings in the Hair color column lowercased and to fix the spelling error in the Gender column.

In [None]:
# Make all the values in the 'Hair color' column lowercased
heroes['Hair color'] = heroes['Hair color'].str.lower()
  
# Check the values in the 'Hair color' column
print(heroes['Hair color'].value_counts())

# Substitute 'Fmale' with 'Female' in the 'Gender' column
heroes['Gender'] = heroes['Gender'].str.replace("Fmale", "Female")

# Check if there is no occurences of 'Fmale'
print(heroes['Gender'].value_counts())

#### Write a regular expression
Let's write some regular expressions!

Your task is to create a regular expression matching a valid temperature represented either in Celsius or Fahrenheit scale (e.g. '+23.5 C', '-4 F', '0.0 C', '73.45 F') and to extract all the appearances from the given string text. Positive temperatures can be with or without the + prefix (e.g. '5 F', '+5 F'). Negative temperatures must be prefixed with -. Zero temperature can be used with a prefix or without.

The re module is already imported.

Tips:

The + symbol within the square brackets [] matches the + symbol itself (e.g. the regular expression [1a+] matches to '1', 'a', or '+').
You can also apply ? to the characters within the square brackets [] to make the set optional (e.g. [ab]? matches to 'a', 'b', or '').

In [None]:
# Define the pattern to search for valid temperatures
pattern = re.compile(r'[+-]?\d+\.?\d* [CF]')

# Print the temperatures out
print(re.findall(pattern, text))

# Create an object storing the matches using 'finditer()'
matches_storage = re.finditer(pattern, text)

# Loop over matches_storage and print out item properties
for match in matches_storage:
    print('matching sequence = ' + match.group())
    print('start index = ' + str(match.start()))
    print('end index = ' + str(match.end()))


#### Splitting by a pattern
You are given the movies list where each element stores a movie name, its release date, and the director (e.g. "The Godfather, 1972, Francis Ford Coppola").

Let's practice some splitting with the help of regular expressions. Your task is to retrieve from each element of the list its name and the director. For example, if the element is "The Godfather, 1972, Francis Ford Coppola", the result would be:

['The Godfather', 'Francis Ford Coppola']
Eventually, this result should be modified to represent a single string, e.g.

"The Godfather, Francis Ford

In [None]:
# Compile a regular expression
pattern = re.compile(r', \d+, ')

movies_without_year = []
for movie in movies:
    # Retrieve a movie name and its director
    split_result = re.split(pattern, movie)
    # Create a new string with a movie name and its director
    movie_and_director = ', '.join(split_result)
    # Append the resulting string to movies_without_year
    movies_without_year.append(movie_and_director)
    
for movie in movies_without_year:
    print(movie)


#### enumerate()
Let's enumerate! Your task is, given a string, to define the function retrieve_character_indices() that creates a dictionary character_indices, where each key represents a unique character from the string and the corresponding value is a list containing the indices/positions of this letter in the string.

For example, passing the string 'ukulele' to the retrieve_character_indices() function should result in the following output: {'e': [4, 6], 'k': [1], 'l': [3, 5], 'u': [0, 2]}.

For this task, you are not allowed to use any string methods!

In [None]:
def retrieve_character_indices(string):
    character_indices = dict()
    # Define the 'for' loop
    for index, character in enumerate(string):
        # Update the dictionary if the key already exists
        if character in character_indices:
            character_indices[character].append(index)
        # Update the dictionary if the key is absent
        else:
            character_indices[character] = [index]
            
    return character_indices
  
# print(retrieve_character_indices('ukulele'))
print(retrieve_character_indices('enumerate an Iterable'))


#### Traversing a DataFrame
Let's iterate through a DataFrame! You are given the heroes DataFrame you're already familiar with. This time, it contains only categorical data and no missing values. You have to create the following dictionary from this dataset:

Each key is a column name.
Each value is another dictionary:
Each key is a unique category from the column.
Each value is the amount of heroes falling into this category.
Tip: a Series object is also an Iterable. It traverses through the values it stores when you put it in a for loop or pass it to list(), tuple(), or set() initializers.

In [None]:
column_counts = dict()

# Traverse through the columns in the heroes DataFrame
for column_name, series in heroes.iteritems():
    # Retrieve the values stored in series in a list form
    values = list(series)
    category_counts = dict()  
    # Traverse through unique categories in values
    for category in set(values):
        # Count the appearance of category in values
        category_counts[category] = values.count(category)
    
    column_counts[column_name] = category_counts
    
print(column_counts)


#### Basic list comprehensions
For this task, you will have to create a bag-of-words representation of the spam email stored in the spam variable (you can explore the content using the shell). Recall that bag-of-words is simply a counter of unique words in a given text. This representation can be further used for text classification, e.g. for spam detection (given enough training examples).

We created a small auxiliary function create_word_list() to help you split a string into words, e.g. applying it to 'To infinity... and beyond!' will return ['To', 'infinity', 'and', 'beyond']

In [None]:
# Convert the text to lower case and create a word list
words = create_word_list(spam.lower())

# Create a set storing only unique words
word_set = set(words)

# Create a dictionary that counts each word in the list
tuples = [(word, words.count(word)) for word in word_set]
word_counter = dict(tuples)

# Printing words that appear more than once
for (key, value) in word_counter.items():
    if value > 1:
        print("{}: {}".format(key, value))


#### Prime number sequence
A prime number is a natural number that is divisible only by 1 or itself (e.g. 3, 7, 11 etc.). However, 1 is not a prime number.

Your task is, given a list of candidate numbers cands, to filter only prime numbers in a new list primes.

But first, you need to create a function is_prime() that returns True if the input number  is prime or False, otherwise. To do so, it's sufficient to test if a number is not divisible by any integer number from 2 to .

Tip: you might need to use the % operator that calculates a remainder from a division (e.g. 8 % 3 is 2).

The math module is already imported.

In [None]:
def is_prime(n):
    # Define the initial check
    if n < 2:
       return False
    # Define the loop checking if a number is not prime
    for i in range(2, int(math.sqrt(n)) + 1):
        if n % i == 0:
            return False
    return True
    
# Filter prime numbers into the new list
primes = [num for num in cands if is_prime(num)]
print("primes = " + str(primes))

#### Coprime number sequence
Two numbers  and  are coprime if their Greatest Common Divisor (GCD) is 1. GCD is the largest positive number that divides two given numbers  and . For example, the numbers 7 and 9 are coprime because their GCD is 1.

Given two lists list1 and list2, your task is to create a new list coprimes that contains all the coprime pairs from list1 and list2.

But first, you need to write a function for the GCD using the following algorithm:

check if 
if true, return  as the GCD between  and 
if false, go to step 2
make a substitution  and 
go back to step 1

In [None]:
def gcd(a, b):
    # Define the while loop as described
    while b != 0:
        temp_a = a
        a = b
        b = temp_a % b    
    # Complete the return statement
    return a
    
# Create a list of tuples defining pairs of coprime numbers
coprimes = [(i, j) for i in list1 
                   for j in list2 if gcd(i, j) == 1]
print(coprimes)


#### Combining iterable objects
You are given the list wlist that contains lists of different words. Your task is to create a new iterable object, where each element represents a tuple. Each tuple should contain a list from the wlist, the length of this list, and the longest word within this list. If there is ambiguity in choosing the longest word, the word with the lowest index in the considered list should be taken into account. For example, given the list

[
    ['dog', 'pigeon'],
    ['cat', 'wolf', 'seal']
]
the resulting tuples will be:
(['dog', 'pigeon'], 2, 'pigeon')
and
(['cat', 'wolf', 'seal'], 3, 'wolf')

In [None]:
# Define a function searching for the longest word
def get_longest_word(words):
    longest_word = ''
    for word in words:
        if len(word) > len(longest_word):
            longest_word = word
    return longest_word

# Create a list of the lengths of each list in wlist
lengths = [len(item) for item in wlist]

# Create a list of the longest words in each list in wlist
words = [get_longest_word(item) for item in wlist]

# Combine the resulting data into one iterable object
for item in zip(wlist, lengths, words):
    print(item)

#### Extracting tuples
In the previous exercise, you used two list comprehensions to create lists lengths and words that, respectively, refer to the lengths of the constituent lists in wlist and the longest words stored in those lists. In this exercise, you'll create them in a slightly different way. First, you'll need to put the same calculations into one list comprehension, which should result in a list of tuples. Second, apply the unzip operation to generate two distinct tuples, resembling lengths and words from the previous exercise.

The list wlist and the function get_longest_word() are already available in your workspace.

In [None]:
# Create a list of tuples with lengths and longest words
result = [
    (len(item), get_longest_word(item)) for item in wlist
]

# Unzip the result    
lengths, words = zip(*result)

for item in zip(wlist, lengths, words):
    print(item)

#### Creating a DataFrame
Your last task in this lesson is to create a DataFrame from a dictionary supplied by a zip object. You have to take each single word stored in the list wlist and calculate its length. This data should be stored in two separate tuples that are supplied to the zip() initializer. The resulting zip object should be used to construct a DataFrame where the first column will store words and the second column will store their lengths.

The module pandas is already imported for you as pd.

In [None]:
# Create a list of tuples with words and their lengths
word_lengths = [
    (item, len(item)) for items in wlist for item in items
]

# Unwrap the word_lengths
words, lengths = zip(*word_lengths)

# Create a zip object
col_names = ['word', 'length']
result = zip(col_names, [words, lengths])

# Convert the result to a dictionary and build a DataFrame
data_frame = pd.DataFrame(dict(result))
print(data_frame)


#### Shift a string
You're going to create a generator that, given a string, produces a sequence of constituent characters shifted by a specified number of positions shift. For example, the string 'sushi' produces the sequence 'h', 'i', 's', 'u', 's' when we shift by 2 positions to the right (shift = 2). When we shift by 2 positions to the left (shift = -2), the resulting sequence will be 's', 'h', 'i', 's', 'u'.

Tip: use the % operator to cycle through the valid indices. Applying it to a positive or negative number gives a non-negative remainder, which can be helpful when shifting your index.

For example, consider the following variable string = 'python', holding a string of 6 characters:

2 % 6 = 2 (thus, string[2 % 6] is t)
0 % 6 = 0 (thus, string[0 % 6] is p)
-2 % 6 = 4 (thus, string[-2 % 6] is o)

In [None]:
def shift_string(string, shift):
    len_string = len(string)
    # Loop over the indices of a string
    for idx in range(0, len_string):
        # Find which character will correspond to the index.
        yield string[(idx - shift) % len_string]
       
# Create a generator
gen = shift_string('DataCamp', 3)

# Create a new string using the generator and print it out
string_shifted = ''.join(gen)
print(string_shifted)


#### Throw a dice
Let's create an infinite generator! Your task is to define the simulate_dice_throws() generator. It generates the outcomes of a 6-sided dice tosses in the form of a dictionary out. Each key is a possible outcome (1, 2, 3, 4, 5, 6). Each value is a list: the first value is the amount of realizations of an outcome and the second, the ratio of realizations to the total number of tosses total.

Tip: use the randint() function from the random module (already imported). It generates a random integer in the specified interval (e.g. randint(1, 2) can be 1 or 2).

In [None]:
def simulate_dice_throws():
    total, out = 0, dict([(i, [0, 0]) for i in range(1, 7)])
    while True:
        # Simulate a single toss to get a new number
        num = random.randint(1, 6)
        total += 1
        # Update the number and the ratio of realizations
        out[num][0] = out[num][0] + 1
        for j in range(1, 7):
            out[j][1] = round(out[j][0]/total, 2)
        # Yield the updated dictionary
        yield out

# Create the generator and simulate 1000 tosses
dice_simulator = simulate_dice_throws()
for i in range(1, 1001):
    print(str(i) + ': ' + str(next(dice_simulator)))


#### Generator comprehensions
You are given the following generator functions (you can test them in the console):

def func1(n):
  for i in range(0, n):
    yield i**2
def func2(n):
  for i in range(0, n):
     if i%2 == 0:
       yield 2*i
def func3(n, m):
  for i in func1(n):
    for j in func2(m):
      yield ((i, j), i + j)
Note that func3() uses internally func1() and func2()

In [None]:
# Rewrite func1() as a generator comprehension
gen = (i**2 for i in range(0,10))

for item in zip(gen, func1(10)):
    print(item)


In [None]:
# Rewrite func1() as a generator comprehension
gen = (i*2 for i in range(0,10) if i%2==0)

for item in zip(gen, func2(20)):
    print(item)


In [None]:
# Rewrite func3() as a generator comprehension
gen = (((i, j), i + j) for i in func1(8) for j in func2(10))

for item in zip(gen, func3(8, 10)):
    print(item)




#### Positional arguments of variable size
Let's practice positional arguments of variable size. Your task is to define the function sort_types(). It takes a variable number of positional arguments and checks if each argument is a number or a string. The checked item is inserted afterwards either in the nums or strings list. Eventually, the function returns a tuple containing these lists.

Use the Python's built-in isinstance() function to check if an object is of a certain type (e.g. isinstance(1, int) returns True) or one of the types (e.g. isinstance(5.65, (int, str)) returns False).

Types to use in this task are int, float, and str.

In [None]:
# Define the function with an arbitrary number of arguments
def sort_types(*args):
    nums, strings = [], []   
    for arg in args:
        # Check if 'arg' is a number and add it to 'nums'
        if isinstance(arg, (int, float)):
            nums.append(arg)
        # Check if 'arg' is a string and add it to 'strings'
        elif isinstance(arg, str):
            strings.append(arg)
    
    return (nums, strings)
            
print(sort_types(1.57, 'car', 'hat', 4, 5, 'tree', 0.89))


#### Keyword arguments of variable size
Now let's move to keyword arguments of variable size! Your task is to define the function key_types(). It should take a variable number of keyword arguments and return a new dictionary: the keys are unique object types of arguments passed to the key_types() function and the associated values represent lists. Each list should contain argument names that follow the type defined as a key (e.g. calling the key_types(kwarg1='a', kwarg2='b', kwarg3=1) results in {<class 'int'>: ['kwarg3'], <class 'str'>: ['kwarg1', 'kwarg2']}).

To retrieve the type of an object, you need to use the type() function (e.g. type(1) is int).

In [None]:
# Define the function with an arbitrary number of arguments
def key_types(**kwargs):
    dict_type = dict()
    # Iterate over key value pairs
    for key, value in kwargs.items():
        # Update a list associated with a key
        if type(value) in dict_type:
            dict_type[type(value)].append(key)
        else:
            dict_type[type(value)] = [key]
            
    return dict_type
  
res = key_types(a=1, b=2, c=(1, 2), d=3.1, e=4.2)
print(res)


#### Combining argument types
Now you'll try to combine different argument types. Your task is to define the sort_all_types() function. It takes positional and keyword arguments of variable size, finds all the numbers and strings contained within them, and concatenates type-wise the results. Use the sort_types() function you defined before (available in the workspace). It takes a positional argument of variable size and returns a tuple containing a list of numbers and a list of strings (type sort_types? to get additional help).

Keep in mind that keyword arguments of variable size essentially represent a dictionary and the sort_types() function requires that you pass only its values.

Tip: To call the sort_types() function correctly, you'd have to recall another usage of the * symbol.

In [None]:
# Define the arguments passed to the function
def sort_all_types(*args, **kwargs):

    # Find all the numbers and strings in the 1st argument
    nums1, strings1 = sort_types(*args)
    
    # Find all the numbers and strings in the 2nd argument
    nums2, strings2 = sort_types(*kwargs.values())
    
    return (nums1 + nums2, strings1 + strings2)
  
res = sort_all_types(
    1, 2.0, 'dog', 5.1, num1 = 0.0, num2 = 5, str1 = 'cat'
)
print(res)


#### Define lambda expressions
Let's write some lambda expressions! You will be given three tasks: each will require you to define a lambda expression taking some values as arguments and using them to calculate a specific result.

In [None]:
# Take x and return x squared if x > 0 and 0, otherwise
squared_no_negatives = lambda x: x**2 if x>0 else 0 
print(squared_no_negatives(2.0))
print(squared_no_negatives(-1))


In [None]:
# Take a list of integers nums and leave only even numbers
get_even = lambda nums: [n for n in nums if n % 2 == 0]
print(get_even([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]))


In [None]:
# Take strings s1, s2 and list their common characters
common_chars = lambda s1, s2: set(s1).intersection(set(s2))
print(common_chars('pasta', 'pizza'))


#### Converting functions to lambda expressions
Convert these three normally defined functions into lambda expressions:

##### Returns a bigger of the two numbers
def func1(x, y):
    if x >= y:
        return x

    return y
##### Returns a dictionary counting charaters in a string
def func2(s):
    d = dict()
    for c in set(s):
        d[c] = s.count(c)

    return d
##### Returns a squared root of a sum of squared numbers
def func3(*nums):
    squared_nums = [n**2 for n in nums]
    sum_squared_nums = sum(squared_nums)

    return math.sqrt(sum_squared_nums)

In [None]:
# Convert func1() to a lambda expression
lambda1 = lambda x,y: x if x>y else y
print(str(func1(5, 4)) + ', ' + str(lambda1(5, 4)))
print(str(func1(4, 5)) + ', ' + str(lambda1(4, 5)))


In [None]:
# Convert func2() to a lambda expression
lambda2 = lambda s: dict([(c, s.count(c)) for c in set(s)])
print(func2('DataCamp'))
print(lambda2('DataCamp'))


In [None]:
# Convert func3() to a lambda expression
lambda3 = lambda *nums: math.sqrt(sum(n**2 for n in nums))
print(str(func3(3, 4)) + ', ' + str(lambda3(3, 4)))
print(str(func3(3, 4, 5)) + ', ' + str(lambda3(3, 4, 5)))


#### Using a lambda expression as an argument
Let's pass lambda expressions as arguments to functions. You will deal with the list .sort() method. By default, it sorts numbers in increasing order. Characters and strings are sorted alphabetically. The method can be defined as .sort(key=function). Here, key defines a mapping of each item in the considered list to a sortable object (e.g. a number or a character). Thus, the items in a list are sorted the way sortable objects are.

Your task is to define different ways to sort the list words using the key argument with a lambda expression.

In [None]:
# Sort words by the string length
words.sort(key=lambda s: len(s))
print(words)


In [None]:
# Sort words by the last character in a string
words.sort(key=lambda s: s[-1])
print(words)


In [None]:
# Sort words by the total amount of certain characters
words.sort(key=lambda s: s.count('a') + s.count('b') + s.count('c'))
print(words)


#### The map() function
Let's do some mapping!

Do you remember how zip() works? It merges given Iterables so that items with the same index fall into the same tuple. Moreover, the output is restricted by the shortest Iterable.

Your task is to define your own my_zip() function with *args depicting a variable number of Iterables, e.g. lists, strings, tuples etc. Rather than a zip object, my_zip() should already return a list of tuples.

Comment: args should be checked whether they contain Iterables first. But we omit it for simplicity.

In [None]:
def my_zip(*args):
    
    # Retrieve Iterable lengths and find the minimal length
    lengths = list(map(len, args))
    min_length = min(lengths)

    tuple_list = []
    for i in range(0, min_length):
        # Map the elements in args with the same index i
        mapping = map(lambda x: x[i], args)
        # Convert the mapping and append it to tuple_list
        tuple_list.append(tuple(mapping))
        
    return tuple_list

result = my_zip([1, 2, 3], ['a', 'b', 'c', 'd'], 'DataCamp')
print(result)


#### The filter() function
Let's do some filtering! You will be given three corresponding tasks you have to complete. Use lambda expressions! And remember: the filter() function keeps all the elements that are mapped to the True value.

The variables nums, string and spells are available in your workspace.

In [None]:
# Exclude all the numbers from nums divisible by 3 or 5
print(nums)
fnums = filter(lambda x: x % 3 != 0 and x % 5 != 0, nums)
print(list(fnums))


In [None]:
# Return the string without its vowels
print(string)
vowels = ['a','e','i','o','u']
fstring = filter(lambda x: x.lower()  not in vowels,string)
print(''.join(fstring))


In [None]:
# Filter all the spells in spells with more than two 'a's
print(spells)
fspells = filter(lambda x: x.count('a') > 2, spells)
print(list(fspells))

#### The reduce() function
Now, it is time for some reduction! As before you'll be given three tasks to complete. Use lambda expressions!

The necessary functions from the functools module are already imported for you.

In [None]:
# Reverse a string using reduce()
string = 'DataCamp'
inv_string = reduce(lambda x, y: y + x, string)
print('Inverted string = ' + inv_string) 

In [None]:
# Find common items shared among all the sets in sets
sets = [{1, 4, 8, 9}, {2, 4, 6, 9, 10, 8}, {9, 0, 1, 2, 4}]
common_items = reduce(lambda x,y: x.intersection(y), sets)
print('common items = ' + str(common_items))


In [None]:
# Convert a number sequence into a single number
nums = [5, 6, 0, 1]
num = reduce(lambda x,y: str(x)+str(y) , nums)
print(str(nums) + ' is converted to ' + str(num))


#### Calculate an average value
We all know how to calculate an average value iteratively:

def average(nums):

    result = 0

    for num in nums:
        result += num

    return result/len(nums)
Could you provide a recursive solution? A formula for updating an average value given a new input might be handy:

 

Here, 
 stands for an average value,  is a new supplied value which is used to update the average, and  corresponds to the recursive call number (excluding the initial call to the function).

In [None]:
# Calculate an average value of the sequence of numbers
def average(nums):
  
    # Base case
    if len(nums) == 1:  
        return nums[0]
    
    # Recursive call
    n = len(nums)
    return (nums[0] + (n - 1) * average(nums[1:])) / n  

# Testing the function
print(average([1, 2, 3, 4, 5]))

#### Approximate Pi with recursion
The number  can be computed by the following formula:
 
 
 
 
 
 
 
Your task is to write a recursive function to approximate  using the formula defined above (the approximation means that instead of infinity , the sequence considers only a certain amount of elements ).

Here are examples of  for some of the values of :


In [None]:
# Write an expression to get the k-th element of the series 
get_elmnt = lambda k: ((-1)**k)/(2*k+1)

def calc_pi(n):
    curr_elmnt = get_elmnt(n)
    
    # Define the base case 
    if n == 0:
    	return 4
      
    # Make the recursive call
    return 4 * curr_elmnt + calc_pi(n-1)
  
# Compare the approximated Pi value to the theoretical one
print("approx = {}, theor = {}".format(calc_pi(500), math.pi))

#### Accessing subarrays
Let's access elements in NumPy arrays! Your task is to convert a square two-dimensional array square of size size to a list created by following a spiral pattern:

Traversing the matrix in spiral way

Rather than simply accessing certain slices, you will define a more general solution using a for loop (the solution should work for all the square two-dimensional arrays of odd size).

The module numpy is already imported as np.

You will need the reversed() function, which reverses an Iterable.

In [None]:
spiral = []

for i in range(0, size):
    # Convert each part marked by a red arrow to a list
    spiral += list(square[i, i:size-i])
    # Convert each part marked by a green arrow to a list
    spiral += list(square[i+1:size-i, size-i-1])
    # Convert each part marked by a blue arrow to a list
    spiral += list(reversed(square[size-i-1, i:size-i-1]))
    # Convert each part marked by a magenta arrow to a list
    spiral += list(reversed(square[i+1:size-i-1, i]))
        
print(spiral)

#### Operations with NumPy arrays
The following blocks of code create new lists given input lists input_list1, input_list2, input_list3 (you can check their values in the console). If you had analogous NumPy arrays with the same values input_array1, input_array2, input_array3 (you can check their values in the console), how would you create similar output as NumPy arrays using the knowledge on broadcasting, accessing element in NumPy arrays, and performing element-wise operations?

Block 1

list(map(lambda x: [5*i for i in x], input_list1))
Block 2

list(filter(lambda x: x % 2 == 0, input_list2))
Block 3

[[i*i for i in j] for j in input_list3]

In [None]:
# Substitute the code in the block 1 given the input_array1
output_array1 = 5 * input_array1
print(list(map(lambda x: [5*i for i in x], input_list1)))
print(output_array1)

In [None]:
# Substitute the code in the block 2 given the input_array2
output_array2 = input_array2[input_array2 % 2 == 0]
print(list(filter(lambda x: x % 2 == 0, input_list2)))
print(output_array2)

In [None]:
# Substitute the code in the block 3 given the input_array3
output_array3 = input_array3 * input_array3
print([[i*i for i in j] for j in input_list3])
print(output_array3)

#### Simple use of .apply()
Let's get some handful experience with .apply()!

You are given the full scores dataset containing students' performance as well as their background information.

Your task is to define the prevalence() function and apply it to the groups_to_consider columns of the scores DataFrame. This function should retrieve the most prevalent group/category for a given column (e.g. if the most prevalent category in the lunch column is standard, then prevalence() should return standard).

The reduce() function from the functools module is already imported.

Tip: pd.Series is an Iterable object. Therefore, you can use standard operations on it.

In [None]:
def prevalence(series):
    vals = list(series)
    # Create a tuple list with unique items and their counts
    itms = [(x, vals.count(x)) for x in set(series)]
    # Extract a tuple with the highest counts using reduce()
    res = reduce(lambda x, y: x if x[1] > y[1] else y, itms)
    # Return the item with the highest counts
    return res[0]

# Apply the prevalence function on the scores DataFrame
result = scores[groups_to_consider].apply(prevalence)
print(result)

#### Additional arguments
Let's use additional arguments in the .apply() method!

Your task is to create two new columns in scores:

mean is the row-wise mean value of the math score, reading score and writing score
rank defines how high the mean score is:
'high' if the mean value 
'medium' if the mean value  but  90
'low' if the mean value 
To accomplish this task, you'll need to define the function rank that, given a series, returns a list with two values: the mean of the series and a string defined by the aforementioned rule.

The module numpy is already imported for you as np.

In [None]:
def rank(series):
    # Calculate the mean of the input series
    mean = np.mean(series)
    # Return the mean and its rank as a list
    if mean > 90:
        return [mean, 'high']
    if mean > 60:
        return [mean, 'medium']
    return [mean, 'low']

# Insert the output of rank() into new columns of scores
cols = ['math score', 'reading score', 'writing score']
scores[['mean', 'rank']] = scores[cols].apply(rank, axis=1,
                                              result_type='expand')
print(scores[['mean', 'rank']].head())

#### Functions with additional arguments
Let's add some arguments to the function definition!

Numeric data in scores represent students' performance scaled between 0 and 100. Your task is to rescale this data to an arbitrary range between low and high. Rescaling should be done in a linear fashion, i.e. for any data point  in a column:

 = 
 

To do rescaling, you'll have to define the function rescale(). Remember, the operation written above can be applied to Series directly. After defining the function, you'll have to apply it to the specified columns of scores.

In [None]:
def rescale(series, low, high):
   # Define the expression to rescale input series
   return series * (high - low)/100 + low

# Rescale the data in cols to lie between 1 and 10
cols = ['math score', 'reading score', 'writing score'] 
scores[cols] = scores[cols].apply(rescale, args=[1, 10])
print(scores[cols].head())

In [None]:
# Redefine the function to accept keyword arguments
def rescale(series, low=0, high=100):
   return series * (high - low)/100 + low

# Rescale the data in cols to lie between 1 and 10
cols = ['math score', 'reading score', 'writing score']
scores[cols] = scores[cols].apply(rescale, low=1, high=10)
print(scores[cols].head())

#### Standard DataFrame methods
You are given the diabetes dataset storing information on female patients tested for diabetes. You will focus on blood glucose levels and the test results. Subjects, tested positively, usually have higher blood glucose levels after performing the so-called glucose tolerance test. Your task is to investigate whether it is true for this specific dataset.

The plasma glucose column corresponds to the glucose levels. The test result column corresponds to the diabetes test results.

You must use standard DataFrame methods (the numpy module is not imported for you).

In [None]:
# Load the data from the diabetes.csv file
diabetes = pd.read_csv('diabetes.csv')
print(diabetes.info())

# Calculate the mean glucose level in the entire dataset
print(diabetes['plasma glucose'].mean())

# Group the data according to the diabetes test results
diabetes_grouped = diabetes.groupby('test result')

# Calculate the mean glucose levels per group
print(diabetes_grouped['plasma glucose'].mean())

#### BMI of villains
Let's return to the heroes dataset containing the information on different comic book heroes. We added a bmi column to the dataset calculated as Weight divided by (Height/100)**2. This index helps define whether an individual has weight problems.

Your task is to find out what is the mean value and standard deviation of the BMI index depending on the character's 'Alignment' and the 'Publisher' whom this character belongs to. However, you'll need to consider only those groups that have more than 10 valid observations of the BMI index.

Tip: use .count() to calculate the number of valid observations.

In [None]:
import numpy as np

# Group the data by two factors specified in the context
groups = heroes.groupby(['Publisher', 'Alignment'])

# Filter groups having more than 10 valid bmi observations
fheroes = groups.filter(lambda x: x['bmi'].count() > 10)

# Group the filtered data again by the same factors
fgroups = fheroes.groupby(['Publisher', 'Alignment'])

# Calculate the mean and standard deviation of the BMI index
result = fgroups['bmi'].agg([np.mean, np.std])
print(result)

#### NaN value imputation
Let's try to impute some values, using the .transform() method. In the previous task you created a DataFrame fheroes where all the groups with insufficient amount of bmi observations were removed. Our bmi column has a lot of missing values (NaNs) though. Given two copies of the fheroes DataFrame (imp_globmean and imp_grpmean), your task is to impute the NaNs in the bmi column with the overall mean value and with the mean value per group defined by Publisher and Alignment factors, respectively.

Tip: pandas Series and NumPy arrays have a special .fillna() method which substitutes all the encountered NaNs with a value specified as an argument.

In [None]:
# Define a lambda function that imputes NaN values in series
impute = lambda series: series.fillna(np.mean(series))

# Impute NaNs in the bmi column of imp_globmean
imp_globmean['bmi'] = imp_globmean['bmi'].transform(impute)
print("Global mean = " + str(fheroes['bmi'].mean()) + "\n")

groups = imp_grpmean.groupby(['Publisher', 'Alignment'])

# Impute NaNs in the bmi column of imp_grpmean
imp_grpmean['bmi'] = groups['bmi'].transform(impute)
print(groups['bmi'].mean())

#### Plot a histogram
Let's further investigate the retinol dataset. Your task now is to create a histogram of the plasma retinol feature.

In [None]:
# Plot a simple histogram of the plasma retinol feature
plt.hist(retinol['plasma retinol'])
plt.show()

In [None]:
# Redefine the histogram to have 20 bins
plt.hist(retinol['plasma retinol'], bins=20)
plt.show()

In [None]:
plt.hist(retinol['plasma retinol'], bins=20)

# Add a title to the plot
plt.title('Distribution of Plasma Retinol')
plt.show()

In [None]:
plt.hist(retinol['plasma retinol'], bins=20)
plt.title('Histogram of Plasma Retinol')

# Add other missing parts to the plot
plt.xlabel('Plasma Retinol')
plt.ylabel('Counts')
plt.show()

#### Creating boxplots
Let's get back to our heroes dataset. As we previously discovered, the BMI index is in average much higher for villains than for good characters (taking into account only Marvel and DC publishers). Your task is to plot the corresponding distributions of BMI indices using boxplots.

Tip: to select rows in a DataFrame, for which a specific column follows a certain condition, use this expression dataframe[condition for column_name] (e.g. heroes[heroes['Alignment'] == 'good'] selects rows that have a 'good' Alignment in the heroes dataset).

In [None]:
import seaborn as sns

# Create a boxplot of BMI indices for 'good' and 'bad' sides
sns.boxplot(x='Alignment', y='bmi', data=heroes)
plt.show()

In [None]:
import seaborn as sns

# Select rows from heroes for which the BMI index < 1000
heroes_filtered = heroes[heroes['bmi'] < 1000]

In [None]:
import seaborn as sns

# Select rows from 'heroes' for which the BMI index < 1000
heroes_filtered = heroes[heroes['bmi'] < 1000]

# Create a new boxplot of BMI indices
sns.boxplot(x='Alignment', y='bmi', data=heroes_filtered)
plt.show()