# [Модуль "Pythob для аналитиков данных"](https://ru.hexlet.io/programs/python-for-data-analysts)

## [Проект: Дашборд конверсий](https://ru.hexlet.io/programs/python-for-data-analysts/projects/100)


In [1]:
import pandas as pd
import re
import requests

#### Данные:

[Визиты](https://drive.google.com/file/d/1QosQQ4RRNR9rkL4t7sB707h2Uy0XfYJe/view?usp=drive_link) - тысяча записей с визитами

[Регистрации](https://drive.google.com/file/d/1AeQz0kaSgz0lxYSDtuNm36muhy5fRCzZ/view?usp=drive_link) - тысяча записей о первых регистрациях


In [29]:
visits_file_link = 'https://drive.google.com/file/d/1QosQQ4RRNR9rkL4t7sB707h2Uy0XfYJe/view?usp=drive_link'
registrations_file_link = 'https://drive.google.com/file/d/1AeQz0kaSgz0lxYSDtuNm36muhy5fRCzZ/view?usp=drive_link'

### Вспомогательные функции для скачивания с google disk по заданным ссылкам

In [18]:
def get_file_id(google_file_link):
    search_result = re.search(r'/d/([_\w]+)/', google_file_link)
    if search_result:
        return search_result.group(1)


def get_download_link(google_file_link):
    file_id = get_file_id(google_file_link)
    return f'https://drive.google.com/uc?export=download&id={file_id}'


def get_file_from_google(google_file_link, path_to_save):
    download_link = get_download_link(google_file_link)
    print(f'Download from the link {download_link}...')
    response = requests.get(download_link)
    if response.ok:
        with open(path_to_save, 'wb') as file:
            file.write(response.content)
        print(f"File successfully saved as '{path_to_save}'")
    else:
        print(f"Error downloading file: status code {response.status_code}")

### Скачиваем файлы

In [31]:
visits_path = './data/visits_1k.csv'
registrations_path = './data/regs_1k.csv'

In [33]:
get_file_from_google(visits_file_link, visits_path)
get_file_from_google(registrations_file_link, registrations_path)

Download from the link https://drive.google.com/uc?export=download&id=1QosQQ4RRNR9rkL4t7sB707h2Uy0XfYJe...
File successfully saved as './data/visits_1k.csv'
Download from the link https://drive.google.com/uc?export=download&id=1AeQz0kaSgz0lxYSDtuNm36muhy5fRCzZ...
File successfully saved as './data/regs_1k.csv'


### Предварительная подготовка датафреймов

In [61]:
start_date = '2023-03-01'
final_date = '2023-09-01'

In [65]:
start_date = pd.to_datetime(start_date)
final_date = pd.to_datetime(final_date)

#### visits

In [94]:
visits = pd.read_csv(visits_path)

In [81]:
visits.describe()

Unnamed: 0,uuid,platform,user_agent,date
count,1000,1000,1000,1000
unique,519,3,28,996
top,251a0926-ece3-4d77-aa42-ab569fdf9fe2,web,Mozilla/5.0 (Windows NT 10.0; Win64; x64) Appl...,2023-03-01T08:01:45
freq,4,954,71,2


In [41]:
visits.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   uuid        1000 non-null   object
 1   platform    1000 non-null   object
 2   user_agent  1000 non-null   object
 3   date        1000 non-null   object
dtypes: object(4)
memory usage: 31.4+ KB


In [39]:
visits.head()

Unnamed: 0,uuid,platform,user_agent,date
0,1de9ea66-70d3-4a1f-8735-df5ef7697fb9,web,Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2...,2023-03-01T13:29:22
1,f149f542-e935-4870-9734-6b4501eaf614,web,Mozilla/5.0 (X11; CrOS x86_64 8172.45.0) Apple...,2023-03-01T16:44:28
2,f149f542-e935-4870-9734-6b4501eaf614,web,Mozilla/5.0 (X11; CrOS x86_64 8172.45.0) Apple...,2023-03-06T06:12:36
3,08f0ebd4-950c-4dd9-8e97-b5bdf073eed1,web,Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109...,2023-03-01T20:16:37
4,08f0ebd4-950c-4dd9-8e97-b5bdf073eed1,web,Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109...,2023-03-05T17:42:47


In [96]:
visits.date = pd.to_datetime(visits.date)

In [98]:
visits.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   uuid        1000 non-null   object        
 1   platform    1000 non-null   object        
 2   user_agent  1000 non-null   object        
 3   date        1000 non-null   datetime64[ns]
dtypes: datetime64[ns](1), object(3)
memory usage: 31.4+ KB


In [100]:
visits = visits[(start_date < visits.date) & (visits.date < final_date)].sort_values('date')
visits

Unnamed: 0,uuid,platform,user_agent,date
264,d72d0452-c34d-4a29-9e7d-ad1cee0256b8,web,Mozilla/5.0 (X11; CrOS x86_64 8172.45.0) Apple...,2023-03-01 00:05:35
699,0e63cfdc-84dc-4bf9-a646-aef87212761d,web,Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:1...,2023-03-01 00:08:44
790,a1772ff8-4cfa-47fb-b192-e9b8347fa7c0,web,Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:15....,2023-03-01 00:10:50
346,7827eafe-c8cf-4472-9b29-07132085dc12,web,Mozilla/5.0 (Windows NT 10.0; Win64; x64) Appl...,2023-03-01 00:17:01
285,bcd5f9f0-dd3f-43e7-87a1-1010cb626433,web,Mozilla/5.0 (Windows NT 10.0; Win64; x64) Appl...,2023-03-01 00:20:31
...,...,...,...,...
55,458c3cdb-2d66-4a7b-8a4a-db41ce779a93,web,Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2...,2023-03-06 23:14:41
889,66108cbb-1892-4836-9715-1cba35701533,web,Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:15....,2023-03-07 00:01:06
998,3f78ac76-6f81-43ec-85e8-f3cf74fc8fdc,web,Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7...,2023-03-07 00:07:34
946,d408aafb-662b-4cac-9a55-83ffef4269fe,web,Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKi...,2023-03-07 20:23:16


#### registrations

In [102]:
registrations = pd.read_csv(registrations_path)

In [104]:
registrations.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   date               1000 non-null   object
 1   user_id            1000 non-null   int64 
 2   email              1000 non-null   object
 3   platform           1000 non-null   object
 4   registration_type  1000 non-null   object
dtypes: int64(1), object(4)
memory usage: 39.2+ KB


In [106]:
registrations.head()

Unnamed: 0,date,user_id,email,platform,registration_type
0,2023-03-01T00:25:39,8838849,joseph95@example.org,web,google
1,2023-03-01T14:53:01,8741065,janetsuarez@example.net,web,yandex
2,2023-03-01T14:27:36,1866654,robert67@example.org,web,google
3,2023-03-01T02:42:34,1577584,elam@example.net,web,apple
4,2023-03-01T10:27:14,4765395,stephanie68@example.net,web,yandex


In [108]:
registrations.date = pd.to_datetime(registrations.date)

In [110]:
registrations = registrations[(start_date < registrations.date) & (registrations.date < final_date)].sort_values('date')
registrations

Unnamed: 0,date,user_id,email,platform,registration_type
37,2023-03-01 00:12:22,100719,aanderson@example.org,android,google
0,2023-03-01 00:25:39,8838849,joseph95@example.org,web,google
14,2023-03-01 00:50:15,368030,charlesstevens@example.net,web,yandex
48,2023-03-01 01:32:12,2938785,debra86@example.com,ios,email
16,2023-03-01 01:56:21,968533,fmcgrath@example.org,web,google
...,...,...,...,...,...
947,2023-03-05 21:11:40,8509874,zsmith@example.org,web,apple
969,2023-03-05 21:50:21,1557199,watsonsarah@example.org,web,google
970,2023-03-05 21:52:46,8833853,robinsonkatie@example.net,web,yandex
965,2023-03-05 22:00:42,6650724,andrewparks@example.com,web,google


In [114]:
API_URL = 'https://data-charts-api.hexlet.app'
DATE_BEGIN = '2023-03-01'
DATE_END = '2023-09-01'

In [116]:
visit_json = requests.get(f"{API_URL}/visits?begin={DATE_BEGIN}&end={DATE_END}")

In [118]:
visit_json

<Response [200]>