# [Модуль "Pythob для аналитиков данных"](https://ru.hexlet.io/programs/python-for-data-analysts)

## [Проект: Дашборд конверсий](https://ru.hexlet.io/programs/python-for-data-analysts/projects/100)


In [1]:
import pandas as pd
import re
import requests

#### Данные (образец):

[Визиты](https://drive.google.com/file/d/1QosQQ4RRNR9rkL4t7sB707h2Uy0XfYJe/view?usp=drive_link) - тысяча записей с визитами

[Регистрации](https://drive.google.com/file/d/1AeQz0kaSgz0lxYSDtuNm36muhy5fRCzZ/view?usp=drive_link) - тысяча записей о первых регистрациях


## Скачивание данных

In [21]:
API_URL = 'https://data-charts-api.hexlet.app'
DATE_BEGIN = '2023-03-01'
DATE_END = '2023-09-01'

In [37]:
def download_from_api(url, endpoint, path_to_save, start_date=None, end_date=None):
    download_link = f'{url}/{endpoint}'
    if start_date or end_date:
        download_link += "?"
        if start_date:
            download_link += f'begin={start_date}'
        if end_date:
            download_link += f'&end={end_date}' if start_date else f'end={end_date}'
            
    print(f'Download from the link {download_link}...')
    response = requests.get(download_link)
    if response.ok:
        with open(path_to_save, 'wb') as file:
            file.write(response.content)
        print(f"File successfully saved as '{path_to_save}'")
    else:
        print(f"Error downloading file: status code {response.status_code}")

In [24]:
visits_path = './data/visits.json'
registrations_path = './data/registrations.json'

In [41]:
download_from_api(url=API_URL,
         endpoint='visits',
         path_to_save=visits_path,
         start_date=DATE_BEGIN,
         end_date=DATE_END)

Download from the link https://data-charts-api.hexlet.app/visits?begin=2023-03-01&end=2023-09-01...
File successfully saved as './data/visits.json'


In [43]:
download_from_api(url=API_URL,
         endpoint='registrations',
         path_to_save=registrations_path,
         start_date=DATE_BEGIN,
         end_date=DATE_END)

Download from the link https://data-charts-api.hexlet.app/registrations?begin=2023-03-01&end=2023-09-01...
File successfully saved as './data/regsistrations.json'


In [18]:
registrations_path = './data/registrations_sample.csv'

In [23]:
download_to_csv(url=API_URL,
         endpoint='registrations',
         path_to_save=registrations_path,
         start_date=DATE_BEGIN,
         end_date='2023-03-03')

Download from the link https://data-charts-api.hexlet.app/registrations?begin=2023-03-01&end=2023-03-03...
File successfully saved as './data/regsistrations_sample.csv'


## Предварительная подготовка датафреймов

### visits

In [145]:
visits = pd.read_json(visits_path)

In [147]:
# Преобразование datetime к date, что более уместно для наших задач
visits['date'] = pd.to_datetime(visits.datetime).dt.date
visits.drop(['datetime'], axis=1, inplace=True)

In [149]:
visits.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 263459 entries, 0 to 263458
Data columns (total 4 columns):
 #   Column      Non-Null Count   Dtype 
---  ------      --------------   ----- 
 0   platform    263459 non-null  object
 1   user_agent  263459 non-null  object
 2   visit_id    263459 non-null  object
 3   date        263459 non-null  object
dtypes: object(4)
memory usage: 8.0+ MB


In [151]:
visits.describe()

Unnamed: 0,platform,user_agent,visit_id,date
count,263459,263459,263459,263459
unique,4,32,146085,184
top,web,Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109...,64167edd-323a-4ab0-be9b-acd237a1ac30,2023-03-08
freq,236301,13623,4,2624


In [153]:
visits.head()

Unnamed: 0,platform,user_agent,visit_id,date
0,web,Mozilla/5.0 (Windows NT 10.0; Win64; x64) Appl...,1de9ea66-70d3-4a1f-8735-df5ef7697fb9,2023-03-01
1,web,Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7...,f149f542-e935-4870-9734-6b4501eaf614,2023-03-01
2,web,Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2...,08f0ebd4-950c-4dd9-8e97-b5bdf073eed1,2023-03-01
3,web,Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7...,19322fed-157c-49c6-b16e-2d5cabeb9592,2023-03-01
4,web,Mozilla/5.0 (Windows NT 10.0; Win64; x64) Appl...,04762a22-3c9f-40c9-9ac9-6628c4381836,2023-03-01


### registrations

In [155]:
registrations = pd.read_json(registrations_path)

In [157]:
# Преобразование datetime к date, что более уместно для наших задач
registrations['date'] = pd.to_datetime(registrations.datetime).dt.date
registrations.drop(['datetime'], axis=1, inplace=True)

In [159]:
registrations.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21836 entries, 0 to 21835
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   email              21836 non-null  object
 1   platform           21836 non-null  object
 2   registration_type  21836 non-null  object
 3   user_id            21836 non-null  object
 4   date               21836 non-null  object
dtypes: object(5)
memory usage: 853.1+ KB


In [161]:
registrations.describe()

Unnamed: 0,email,platform,registration_type,user_id,date
count,21836,21836,21836,21836,21836
unique,20868,3,4,21836,184
top,ujones@example.com,android,email,2e0f6bb8-b029-4f45-a786-2b53990d37f1,2023-03-06
freq,6,10582,8996,1,230


In [131]:
registrations.head()

Unnamed: 0,email,platform,registration_type,user_id,date
0,ebyrd@example.org,web,google,2e0f6bb8-b029-4f45-a786-2b53990d37f1,2023-03-01
1,knightgerald@example.org,web,email,f007f97c-9d8b-48b5-af08-119bb8f6d9b6,2023-03-01
2,cherylthompson@example.com,web,apple,24ff46ae-32b3-4a74-8f27-7cf0b8f32f15,2023-03-01
3,halldavid@example.org,web,email,3e9914e1-5d73-4c23-b25d-b59a3aeb2b60,2023-03-01
4,denise86@example.net,web,google,27f875fc-f8ce-4aeb-8722-0ecb283d0760,2023-03-01


## Расчет метрик

    Сгруппируйте данные визитов по датам и платформам
    Сгруппируйте также данные регистраций по датам и платформам
    Объедините датайфреймы, сделайте итоговый датафрейм с расчетом конверсии
    Сохраните датафрейм в формате JSON conversion.json
    Поля датафрейма:
        date_group - сагрегированный столбец дат
        platform - платформа (android,web,ios)
        visits - визиты за период date_group
        registrations - регистрации за период date_group
        conversion - конверсия по платформе

In [170]:
visits_grouped = visits.groupby(['date','platform'])