# Тестовые задания на интервью

## Задача №1
Дана таблица `Orders`. Необходимо написать запрос для команды маркетинга, который выявит всех киентов, кто купил iPhone, но не купил Airpods, чтобы отправить им маркетинговое предложение. Также следует определить количество таких клиентов за каждый месяц.

```sql
CREATE TABLE orders (
  customer_name VARCHAR(50),
  order_date DATE FORMAT 'YY-MM-DD',
  order_id VARCHAR(10),
  product_name VARCHAR(10),
  qty INTEGER,
  price INTEGER
)
```
**Пример данных**

    
|customer_name|order_day|order_id|prod_name|qty|price|
|--|--|--|--|--|--|
|Mahesh|	2022-01-01|	1|	iPhone|	1|	1000|
|Mahesh|	2022-01-01|	1|	iPad|	1|	500|
|Mahesh|	2022-01-01|	1|	Airpods|	1|	100|
|Wayne|	2022-01-01|	2|	iPhone|	1|	1000|
|Wayne|	2022-01-01|	2|	Shirt|	1|	100|
|Wayne|	2022-01-01|	2|	Cable|	1|	20|
|Ben|	2022-01-01|	3|	iPhone|	1|	1000|
|Ben|	2022-01-01|	3|	Software|	2|	30|

Инструменты: SQL, Python

In [3]:
%%html
<style>
    .rendered_html table,
    div[data-mime-type="text-markdown"] table {
        margin-left: 0
    }
</style>

In [4]:
import pandas as pd
import numpy as np
from pathlib import Path
from sqlalchemy import create_engine

In [5]:
with open(Path.home().joinpath('.pgpass')) as auth:
    host, port, table, user, _ = auth.read().rstrip().split(':')
    
engine = create_engine(f'postgresql+psycopg2://{user}:@{host}:{port}/{table}')

Данные для этого задания были частично взяты из датасета Sample Superstore https://datalens.yandex.ru/marketplace/f2e0000r63qkp2ywqpco с необходимой корректировкой под условия задачи.

In [17]:
orders = pd.read_csv('~/store_data.csv', index_col=0, parse_dates=['order_date'])
orders.head()

Unnamed: 0,customer_name,order_id,order_date,quantity,product_name,price
0,Art Ferguson,CA-2017-125269,2021-04-24,1,iPhone,999
1,Raymond Buch,CA-2016-166674,2021-04-01,6,MacBook Air,1000
2,Darren Budd,CA-2016-110975,2021-12-25,2,MacBook Pro,1300
3,Theresa Swint,CA-2016-120005,2021-03-03,3,iPad,900
4,Russell D'Ascenzo,CA-2017-125115,2021-04-10,3,Accecories,50


In [12]:
orders.to_sql('orders', engine, if_exists='replace')

Первый вариант решения — универсальный и подходящий для большинства СУБД, включая SQLite. В запросе мы выполним поиск заказов, в которых присутствуют оба искомых продукта: iPhone и Airpods, и выберем разницу между результатом этого запроса и множеством заказов, в котором присутствует только iPhone.

In [9]:
query = '''
SELECT DISTINCT order_id, customer_name
FROM orders
WHERE product_name = 'iPhone'
EXCEPT
SELECT
  o1.order_id
  , o1.customer_name
FROM orders AS o1, orders AS o2
WHERE o1.product_name = 'iPhone'
  AND o2.product_name = 'Airpods'
  AND o1.customer_name = o2.customer_name
  AND o1.order_id = o2.order_id;
'''

In [10]:
pd.read_sql(query, engine)

Unnamed: 0,order_id,customer_name
0,US-2017-165344,Sean Braxton
1,US-2014-120740,Paul Stevenson
2,CA-2016-161543,Roger Demir
3,CA-2015-135853,Cynthia Arntzen
4,US-2015-163433,Michael Paige
...,...,...
255,CA-2016-125080,Victoria Wilson
256,CA-2017-121027,Helen Wasserman
257,CA-2017-151981,Gary Mitchum
258,US-2014-103905,Arthur Wiediger


Второй вариант решения, реализуемый в PostgreSQL, через сравнение массивов. В нем мы генерируем массив из продуктов в каждом заказе и сравниваем с массивом, в котором присутствуют оба искомых товара.

In [11]:
query = '''
SELECT DISTINCT customer_name, order_id
FROM orders
WHERE product_name = 'iPhone'
EXCEPT
SELECT customer_name, order_id
FROM orders
GROUP BY customer_name, order_id
HAVING ARRAY_AGG(product_name) @> ARRAY['iPhone', 'Airpods'];
'''

In [12]:
pd.read_sql(query, engine)

Unnamed: 0,customer_name,order_id
0,Lena Cacioppo,US-2017-125647
1,Steven Cartwright,CA-2015-119627
2,Adam Shillingsburg,CA-2017-136448
3,Pete Armstrong,US-2016-158309
4,Eudokia Martin,US-2015-159499
...,...,...
255,Frank Olsen,CA-2016-112578
256,Luke Schmidt,CA-2017-131618
257,Deborah Brumfield,CA-2014-107181
258,Rob Lucas,CA-2014-133830


Чтобы определить количество таких клиентов в каждом месяце, добавим в запрос дату покупки и сгруппируем по концу месяца.

In [13]:
query = '''
SELECT
  DATE_TRUNC('month', order_date::DATE) AS month
  , COUNT(*)
FROM (
  SELECT DISTINCT order_date, customer_name, order_id FROM orders
  WHERE product_name = 'iPhone'
  EXCEPT
  SELECT order_date, customer_name, order_id
  FROM orders
  GROUP BY 1,2,3
  HAVING ARRAY_AGG(product_name) @> ARRAY['iPhone', 'Airpods']
) AS t
GROUP BY 1
ORDER BY 1;
'''

In [14]:
pd.read_sql(query, engine)

Unnamed: 0,month,count
0,2020-12-31 21:00:00+00:00,7
1,2021-01-31 21:00:00+00:00,5
2,2021-02-28 21:00:00+00:00,17
3,2021-03-31 21:00:00+00:00,14
4,2021-04-30 21:00:00+00:00,20
5,2021-05-31 21:00:00+00:00,21
6,2021-06-30 21:00:00+00:00,20
7,2021-07-31 21:00:00+00:00,18
8,2021-08-31 21:00:00+00:00,38
9,2021-09-30 21:00:00+00:00,28


Проверим результат, выполнив решение задачи на python

In [15]:
orders.groupby(['order_id', 'customer_name'])['product_name'] \
      .apply(list) \
      .apply(lambda x: 'iPhone' in x and 'Airpods' not in x) \
      .reset_index() \
      .query('product_name == True')[['order_id', 'customer_name']]

Unnamed: 0,order_id,customer_name
4,CA-2014-101560,Chris Selesnick
6,CA-2014-102274,Dave Hallsten
9,CA-2014-102988,Greg Maxwell
29,CA-2014-105872,James Galang
34,CA-2014-107181,Deborah Brumfield
...,...,...
1565,US-2017-148054,Nick Zandusky
1572,US-2017-155999,Jay Kimmel
1577,US-2017-165344,Sean Braxton
1579,US-2017-165869,Luke Schmidt


In [18]:
orders.groupby(['order_date', 'order_id', 'customer_name'])['product_name'] \
      .apply(list) \
      .apply(lambda x: 'iPhone' in x and 'Airpods' not in x) \
      .reset_index() \
      .query('product_name == True') \
      .groupby(pd.Grouper(key='order_date', freq='M'))['order_id'].count() \
      .reset_index()

Unnamed: 0,order_date,order_id
0,2021-01-31,7
1,2021-02-28,5
2,2021-03-31,17
3,2021-04-30,14
4,2021-05-31,20
5,2021-06-30,21
6,2021-07-31,20
7,2021-08-31,18
8,2021-09-30,38
9,2021-10-31,28


## Задача №2
Дана таблица с историей подписки. Есть начало и конец периода подписки, и статус.

|customer_id|membership_start_date|membership_end_date|membership_status|
|--|--|--|--|
|114	|2015-04-01	|2015-10-01	|Paid|
|114	|2015-02-15	|2015-03-15	|Paid|
|114	|2015-01-01	|2015-02-15	|Free|
|114	|2015-03-15	|2015-04-01	|Non-Member|
|114	|2015-10-01	|2016-01-01	|Paid|

Нужно получить результат, как в примере ниже:

|customer_id|	change_date|	event|
|--|--|--|
|114	|2015-01-01	|WarmStart|
|114	|2015-02-15	|Convert|
|114	|2015-03-15	|Cancel|
|114	|2015-04-01	|ColdStart|
|114	|2015-10-01	|Renewal|
|114	|2016-01-01	|Cancel|

**Условие для типа подписки**

- Free -> Paid: Convert
- Paid -> Free: ReverseConvert
- Paid -> Non-Member: Cancel
- Free -> Non-Member: Cancel
- Non-Member -> Paid: ColdStart
- Non-Member -> Free: WarmStart
- Free -> Free: Renewal
- Paid -> Paid: Renewal

In [19]:
query = '''
WITH subs(
  customer_id
  , membership_start_date
  , membership_end_date
  , membership_status
) AS (
  SELECT 114, '2015-01-01'::DATE, '2015-02-15'::DATE, 'Free'
  UNION SELECT 114, '2015-02-15', '2015-03-15', 'Paid'
  UNION SELECT 114, '2015-03-15', '2015-04-01', 'Non-Member'
  UNION SELECT 114, '2015-04-01', '2015-10-01', 'Paid'
  UNION SELECT 114, '2015-10-01', '2016-01-01', 'Paid'
),
events AS (
  SELECT
    COALESCE(s1.customer_id, s2.customer_id) AS customer_id
    , COALESCE(s1.membership_start_date, s2.membership_end_date) AS change_date
    , COALESCE(s1.membership_status, 'Non-Member') AS end_event
    , COALESCE(s2.membership_status, 'Non-Member') AS start_event
  FROM subs AS s1
  FULL OUTER JOIN subs AS s2
  ON s1.membership_start_date = s2.membership_end_date
)

SELECT
  customer_id
  , change_date
  , CASE
      WHEN start_event = 'Free' AND end_event = 'Paid' THEN 'Convert'
      WHEN start_event = 'Paid' AND end_event = 'Free' THEN 'ReverseConvert'
      WHEN start_event = 'Paid' AND end_event = 'Non-Member' THEN 'Cancel'
      WHEN start_event = 'Free' AND end_event = 'Non-Member' THEN 'Cancel'
      WHEN start_event = 'Non-Member' AND end_event = 'Paid' THEN 'ColdStart'
      WHEN start_event = 'Non-Member' AND end_event = 'Free' THEN 'WarmStart'
      WHEN start_event = 'Free' AND end_event = 'Free' THEN 'Renewal'
      WHEN start_event = 'Paid' AND end_event = 'Paid' THEN 'Renewal'
      END AS event
FROM events
ORDER BY change_date;
'''

In [20]:
pd.read_sql(query, engine)

Unnamed: 0,customer_id,change_date,event
0,114,2015-01-01,WarmStart
1,114,2015-02-15,Convert
2,114,2015-03-15,Cancel
3,114,2015-04-01,ColdStart
4,114,2015-10-01,Renewal
5,114,2016-01-01,Cancel


In [21]:
subs = pd.DataFrame({
    'customer_id': [114] * 5,
    'membership_start_date': [
        '2015-01-01',
        '2015-02-15',
        '2015-03-15',
        '2015-04-01',
        '2015-10-01'
    ],
    'membership_end_date': [
        '2015-02-15',
        '2015-03-15',
        '2015-04-01',
        '2015-10-01',
        '2016-01-01'
    ],
    'membership_status': [
        'Free',
        'Paid',
        'Non-Member',
        'Paid',
        'Paid'
    ]
})

In [24]:
change_status_rules = {
    'Free'+'Paid': 'Convert',
    'Paid'+'Free': 'ReverseConvert',
    'Paid'+'Non-Member': 'Cancel',
    'Free'+'Non-Member': 'Cancel',
    'Non-Member'+'Paid': 'ColdStart',
    'Non-Member'+'Free': 'WarmStart',
    'Free'+'Free': 'Renewal',
    'Paid'+'Paid': 'Renewal'
}

In [31]:
events = subs[['customer_id', 'membership_start_date', 'membership_status']] \
  .merge(
      subs[['membership_end_date', 'membership_status']],
      how='outer',
      left_on='membership_start_date',
      right_on='membership_end_date',
      suffixes=('_to', '_from')
  )
events = events.fillna(
    {
        'customer_id': 114,
        'membership_start_date': events['membership_end_date'],
        'membership_status_to': 'Non-Member',
        'membership_status_from': 'Non-Member'
    }
  )

In [32]:
events \
 .assign(
    event=events['membership_status_from'].add(events['membership_status_to']).map(change_status_rules),
    customer_id=events['customer_id'].astype(int)  
).rename(columns={'membership_start_date': 'change_date'}) \
 .drop(['membership_status_to', 'membership_end_date', 'membership_status_from'], axis=1)

Unnamed: 0,customer_id,change_date,event
0,114,2015-01-01,WarmStart
1,114,2015-02-15,Convert
2,114,2015-03-15,Cancel
3,114,2015-04-01,ColdStart
4,114,2015-10-01,Renewal
5,114,2016-01-01,Cancel


# Задача №3

Дан список символов, нужно написать функцию, возращающую его задом наперед.

Например:
```python
['A', 'B', 'C', 'D', 'E'] => ['E', 'D', 'C', 'B', 'A']
```

### Решение №1

In [34]:
a = ['A', 'B', 'C', 'D', 'E']

def reverse(a):
    res = []
    while a:
        res += [a.pop()]
    return res

assert reverse(a) == ['E', 'D', 'C', 'B', 'A']; print('ok')

ok


In [35]:
%timeit reverse(a)

223 ns ± 68.6 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


### Решение №2

In [37]:
a = ['A', 'B', 'C', 'D', 'E']

def reverse(a):
    return a[::-1]

assert reverse(a) == ['E', 'D', 'C', 'B', 'A']; print('ok')

ok


In [38]:
%timeit reverse(a)

294 ns ± 32 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


### Решение №3

In [39]:
def reverse(a):
    return list(reversed(a))

assert reverse(a) == ['E', 'D', 'C', 'B', 'A']; print('ok')

ok


In [40]:
%timeit reverse(a)

552 ns ± 37.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


### Решение №4

In [41]:
def reverse(a):
    a.reverse()
    return a

assert reverse(a) == ['E', 'D', 'C', 'B', 'A']; print('ok')

ok


In [42]:
%timeit reverse(a)

202 ns ± 29.2 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


В разрезе производительности можно выделить 1-е и 4-е решения. Последнее предпочтительнее, так как не создает в памяти дополнительного списка.