# Problem 1
Below is a daily table for an active accounts at Shopify (an online ecommerce, retail platform).

The table is called store_account and the columns are:

Column Name	Data Type	Description

- store_id	|  integer	|   a unique Shopify store id
- date      |  string	|   date
- status	|  string	|   Possible values are: [‘open’, 'closed’, ‘fraud’]
- revenue	|  double	|   Amount of spend in USD

Here's some more information about the table:

The granularity of the table is store_id and day

Assume “close” and “fraud” are permanent labels

Active = daily revenue > 0

Accounts get labeled by Shopify as fraudulent and they no longer can sell product

Every day of the table has every store_id that has ever been used by Shopify

Given the above, what percent of active stores were fraudulent by day? 

## Thoughts

We want to group the stores by day, then revenues, then measure number of frauds.

## Answer

In [None]:
# assume that the table is a pandas table called accounts

# create a df for just the active accounts
active_accounts = accounts.loc[accounts['revenue'] > 0]

# create a binary column for fraud
active_accounts['fraud'] = active_accounts['status'] = 'fraud'

# group by the date then take the mean which will give the percentage
percentage = active_accounts.groupby('date').fraud.mean()*100

# Problem 2

You are give a list of numbers J and a single number p.
Write a function to return the minimum and maximum averages of the sequences of p numbers in J.

Example:

J = [4, 4, 4, 9, 10, 11, 12]
p = 3

The sequences will be:

- (4, 4, 4)
- (4, 4, 9)
- (4, 9, 10)
- (9, 10, 11)
- (10, 11, 12)

Here the minimum average will be 4 and the maximum average will be 11, which cooresponds to the first and last sequences. 

## Thoughts

The example is in numerical order, but it should not be assumed that the first and last set of sequences are the minimum and maximum averages.

It should be assumed that p will be less than or equal to the length of J

## Answer

In [1]:
J = [4, 4, 4, 9, 10, 11, 12]
p = 3

In [7]:
def get_sequences_and_averages(J, p):
    # set the averages to that of the first sequence
    min_average = sum(J[0:p]) / p
    max_average = sum(J[0:p]) / p
    min_sequence = J[0:p]
    max_sequence = J[0:p]
    
    x = 0
    # figure out the something
    while x + p <= len(J):
        sequence = J[x:x + p]
        print(sequence)
        x += 1
        average = sum(sequence) / p
        if average < min_average:
            min_average = average
            min_sequence = sequence
        elif average > max_average:
            max_average = average
            max_sequence = sequence
    
    print(f"Minimum average is {min_average} for sequence {min_sequence}")
    print(f"Maximum average is {max_average} for sequence {max_sequence}")
    

In [8]:
# test the example
get_sequences_and_averages(J,p)

[4, 4, 4]
[4, 4, 9]
[4, 9, 10]
[9, 10, 11]
[10, 11, 12]
Minimum average is 4.0 for sequence [4, 4, 4]
Maximum average is 11.0 for sequence [10, 11, 12]


In [9]:
# since it worked the way I expected, let us try another test
K = [4, 8, 1, 6, 2, 9, 0, -1, 5, -100]
r = 4

get_sequences_and_averages(K, r)

[4, 8, 1, 6]
[8, 1, 6, 2]
[1, 6, 2, 9]
[6, 2, 9, 0]
[2, 9, 0, -1]
[9, 0, -1, 5]
[0, -1, 5, -100]
Minimum average is -24.0 for sequence [0, -1, 5, -100]
Maximum average is 4.75 for sequence [4, 8, 1, 6]


In [11]:
L = [0, 0, 8, 9, -9, 0, 0]
s = 2

get_sequences_and_averages(L, s)

[0, 0]
[0, 8]
[8, 9]
[9, -9]
[-9, 0]
[0, 0]
Minimum average is -4.5 for sequence [-9, 0]
Maximum average is 8.5 for sequence [8, 9]
