# Сбор данных с хедхантера. 

Тестирование и проверка сбора данных. Итоговая версия - скриптом, по расписанию.

## Ссылки на доки по API
* [hh_research](https://github.com/hukenovs/hh_research)
* [Хабр](https://habr.com/ru/post/666062/)
* [HeadHunter API](https://github.com/hhru/api)

## TODO:
* Add automation down to the date to date periods
* Refactor back into the scripts
* Dynamic fields download (maybe download&save a raw JSON object and process it later?)

In [1]:
# modules import
import hashlib
import os
import pickle
import re

from datetime import timedelta as td, datetime, date

from concurrent.futures import ThreadPoolExecutor
from typing import Dict, List, Optional
from urllib.parse import urlencode

import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

from tqdm import tqdm

import pandas as pd

## Возвращаемые столбцы и соответствие в API **vacancies**:

| Возвращаемый столбец       | Путь в API                                              |
| -------------------------- | ------------------------------------------------------- |
| vacancy_id | id |
| vacancy_name | name |
| area_name | area\name |
| employer_id | employer\id |
| employer_name | employer\name |
| employer_alternate_url | employer\alternate_url |
| salary_from | salary\from |
| salary_to | salary\to |
| salary_currency | salary\currency |
| experience_name | experience\name |
| schedule_name | schedule\name |
| employment_name | employment\name |
| key_skills | key_skills - список key_skills\id\name |
| specializations | specializations\id\name |
| professional_roles | professional_roles\id\name |
| published_at | published_at |
| created_at | created_at |
| initial_created_at | initial_created_at |
| alternate_url | alternate_url |
| description | description |
| responses | counters\responses |

In [2]:
# class constants
# API base url address
__API_BASE_URL = "https://api.hh.ru/vacancies/"

# cache folder 
CACHE_DIR = "../cache/"

# column names for the returned table
__DICT_KEYS = (
    "vacancy_id",
    "vacancy_name",
    "area_name",
    "employer_id",
    "employer_name",
    "employer_alternate_url",
    "salary_from",
    "salary_to",
    "salary_currency",
    "experience_name",
    "schedule_name",
    "employment_name",
    "key_skills",
    "specializations",
    "professional_roles",
    "published_at",
    "created_at",
    "initial_created_at",
    "alternate_url",
    "description",
    "responses",
    "retrieve_date",
)


In [3]:
def clean_tags(html_text: str) -> str:
    """Remove HTML tags from the string
    Parameters
    ----------
    html_text: str
        Input string with tags
    Returns
    -------
    result: string
        Clean text without HTML tags
    """
    pattern = re.compile("<.*?>")
    return re.sub(pattern, "", html_text)

In [4]:
def get_vacancy(vacancy_id: str, resps: int):
    # Getting vacancy data from URL
    # TODO - dynamic field processing?
    # Return fields (in order):
    # vacancy_id
    # ["name"]
    # ["area"]["name"]
    # ["employer"]["id"]
    # ["employer"]["name"]
    # ["employer"]["alternate_url"]
    # ["salary"]["from"]
    # ["salary"]["to"]
    # ["salary"]["currency"]
    # ["experience"]["name"]
    # ["schedule"]["name"]
    # ["employment"]["name"]
    # list of ["key_skills"]["id"]["name"]
    # list of ["specializations"]["id"]["name"]
    # list of ["professional_roles"]["id"]["name"]
    # ["published_at"]
    # ["created_at"]
    # ["initial_created_at"]
    # ["alternate_url"]
    # ["description"]
    # "responses" - taken as an argument
    # "retrieve_date" - current date
    url = f"{__API_BASE_URL}{vacancy_id}"
    try:
        vacancy = requests.api.get(url).json()
    except:
        return (
            vacancy_id,
            None,
            None,
            None,
            None,
            None,
            None,
            None,
            None,
            None,
            None,
            None,
            None,
            None,
            None,
            None,
            None,
            None,
            None,
            None,
            None,
            None,
        )
    # Checking salary as it can be null
    salary = vacancy.get("salary")
    salary_data = {"from": None, "to": None, "currency": None}
    if salary is not None:
        salary_data["from"] = salary["from"]
        salary_data["to"] = salary["to"]
        salary_data["currency"] = salary["currency"]
    # Checking employer for None
    employer = vacancy.get("employer")
    employer_data = {"id": None, "name": None, "alternate_url": None}
    if employer is not None:
        employer_data["id"] = employer.get("id")
        employer_data["name"] = employer.get("name")
        employer_data["alternate_url"] = employer.get("alternate_url")
    key_skills = vacancy.get("key_skills")
    if key_skills is not None:
        key_skills_data = [item["name"] for item in key_skills]
    else:
        key_skills_data = []
    specializations = vacancy.get("specializations")
    if specializations is not None:
        specializations_data = [item["name"] for item in specializations]
    else:
        specializations_data = []
    professional_roles = vacancy.get("professional_roles")
    if professional_roles is not None:
        professional_roles_data = [item["name"] for item in professional_roles]
    else:
        professional_roles_data = []
    # Create pages tuple
    return (
        vacancy_id,
        vacancy.get("name"),
        vacancy.get('area', {}).get('name'),
        employer_data["id"],
        employer_data["name"],
        employer_data["alternate_url"],
        salary_data["from"],
        salary_data["to"],
        salary_data["currency"],
        vacancy.get('experience', {}).get('name'),
        vacancy.get('schedule', {}).get('name'),
        vacancy.get('employment', {}).get('name'),
        key_skills_data,
        specializations_data,
        professional_roles_data,
        vacancy.get("published_at"),
        vacancy.get("created_at"),
        vacancy.get("initial_created_at"),
        vacancy.get("alternate_url"),
        clean_tags(str(vacancy.get("description") or "")),
        resps,
        date.today().isoformat()
    )

In [25]:
def collect_vacancies(query: Optional[Dict],
                      existing_ids: Optional[List],
                      refresh: bool = False,
                      responses: bool = False,
                      progress_info: bool = True,
                      max_workers: int = 1) -> (Dict, int):
    """Parse vacancy JSON: get vacancy name, salary, experience etc.
    Parameters
    ----------
    query : dict
        Search query params for GET requests.
    existing_ids : list
        List with existing vacancy ids (taken either for the same date beforehand or the whole dataset)
    refresh :  bool
        Refresh cached data
    responses : bool
        Whether to collect the number of vacancy responses or not
    max_workers :  int
        Number of workers for threading.
    Returns
    -------
    dict
        Dict of useful data from vacancies
    int
        API request counter
    """

    # Get cached data if exists...
    cache_name: str = urlencode(query)
    cache_hash = hashlib.md5(cache_name.encode()).hexdigest()
    cache_file = os.path.join(CACHE_DIR, cache_hash)
    result = {}
    api_counter = 0
    
    try:
        if not refresh:
            if progress_info:
                print(f"[INFO]: Geting results from cache! Enable refresh option to update results.")
            return pickle.load(open(cache_file, "rb"))
    except (FileNotFoundError, pickle.UnpicklingError):
        pass
    
    if existing_ids is None:
        existing_ids = []   
   
    if responses:
        query['responses_count_enabled'] = True

    # Customize HTTPAdapter and Retry Strategy
    retry_strategy = Retry(
        total=10,
        status_forcelist=[413, 429, 503],
        allowed_methods=["HEAD", "GET", "PUT", "DELETE", "OPTIONS", "TRACE"],
        backoff_factor=1
    )
    adapter = HTTPAdapter(max_retries=retry_strategy)
    http = requests.Session()
    http.mount("https://", adapter)
    http.mount("http://", adapter)
        
    target_url = __API_BASE_URL + "?" + urlencode(query)
    num_pages = http.get(target_url).json().get("pages")
    if num_pages is None:
        return result, 1
    
    # Collect vacancy IDs...
    ids = []
    resps = []
    
    for idx in range(num_pages + 1):
        response = requests.get(target_url, {"page": idx})
        api_counter +=1
        data = response.json()
        if "items" not in data:
            break
        ids.extend(x["id"] for x in data["items"])
        resps.extend(x.get('counters', {}).get('responses') for x in data["items"])

    ids = list(set(ids) - set(existing_ids))
    
    # Collect vacancies...
    jobs_list = []
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        if progress_info:
            for vacancy in tqdm(
                executor.map(get_vacancy, ids, resps),
                desc="Get data via HH API",
                ncols=100,
                total=len(ids),
            ):
                jobs_list.append(vacancy)
                api_counter+=1
        else:
            for vacancy in executor.map(get_vacancy, ids, resps):
                jobs_list.append(vacancy)
                api_counter+=1
                
    unzipped_list = list(zip(*jobs_list))

    if len(unzipped_list) > 0:
        for idx, key in enumerate(__DICT_KEYS):
            result[key] = unzipped_list[idx]
        pickle.dump(result, open(cache_file, "wb"))
        
    return result, api_counter

In [41]:
def get_responses(
                     query: Optional[Dict],
                     max_workers: int = 1) -> (pd.DataFrame, int):
    """Add response numbers to the separate dataframe.
    Dataframe structure: vacancy_id, retrieve_date, responses
    Parameters
    ----------
    query : dict
        Search query params for GET requests.
    max_workers :  int
        Number of workers for threading.
    Returns
    -------
    dataframe
        dataframe with the responses
    int
        API request counter
    """
    api_counter = 0
    
    # Customize HTTPAdapter and Retry Strategy
    retry_strategy = Retry(
        total=10,
        status_forcelist=[413, 429, 503],
        allowed_methods=["HEAD", "GET", "PUT", "DELETE", "OPTIONS", "TRACE"],
        backoff_factor=1
    )
    adapter = HTTPAdapter(max_retries=retry_strategy)
    http = requests.Session()
    http.mount("https://", adapter)
    http.mount("http://", adapter)
        
    target_url = __API_BASE_URL + "?" + urlencode(query)
    num_pages = http.get(target_url).json().get("pages")
    if num_pages is None:
        return result, 1
    
    # Collect vacancy IDs...
    ids = []
    retrieve_dates = []
    resps = []
    
    for idx in range(num_pages + 1):
        response = requests.get(target_url, {"page": idx})
        api_counter +=1
        data = response.json()
        if "items" not in data:
            break
        for item in data["items"]:
            ids.append(item["id"])
            resps.append(round(item.get('counters', {}).get('responses')))
            retrieve_dates.append(date.today().isoformat())
    result = pd.DataFrame(data={'vacancy_id': ids,'retrieve_date': retrieve_dates, 'responses': resps})
    return result, api_counter

In [6]:
# Старые ключевые слова
query_texts_old = [
    "Аналитик",
    "Инженер данных",
    "Data Scientist",
]
# Текущие ключевые слова
query_texts = [
    "Аналитик данных",
    "Системный аналитик",
    "Бизнес аналитик",
    "Продуктовый аналитик",
    "Веб-аналитик",
    "Инженер данных",
    "Data Engineer",
    "Data Scientist",
]

In [7]:
# Текущие ключевые слова
query_texts2 = [
    "Аналитик данных",
    "Системный аналитик",
    "Бизнес аналитик",
    "Продуктовый аналитик",
    "Marketing Analyst",
    "Веб-аналитик",
    "Аналитик BI",
    "Младший аналитик",
    "Руководитель отдела аналитики",
    "Аналитик баз данных",
    
    "Инженер баз данных",
    
    "Data Engineer",
    "Data Scientist",
]

In [8]:
# Ключевые слова из дашборда revealthedata
texts_rsd = [
    "Бизнес аналитик",
    "Аналитик данных",
    "Marketing Analyst",
    "Продуктовый аналитик",
    "Аналитик BI",
    "Младший аналитик",
    "Руководитель отдела аналитики",
    "Аналитик баз данных",
    
    "Разработчик BI",
    
    "Инженер баз данных",
    
    "Data scientist",
    "Руководитель направления предикативной аналитики",
    "Руководитель data science",
    "Data Science Director"
]

In [48]:
def update_responses(query_texts, query_date):
    """
    Update responses for the vacancies from the certain date
    Data is saved to f"../data/download/responses/responses.csv" as CSV files.
    Parameters
    ----------
    query_texts : list
        List of strings with the search query request
    query_date : string
        Date string in ISO 8601 format (YYYY-MM-DD)
    Returns
    -------
    bool
        True if all the data retrieved, False if API limit reached
    """
    # a counter for 3k/hour API limit
    api_hourly_limit = 3000
    api_counter = 0
    # distribute vacancies by hour to avoid the 2k API limit
    timelist = pd.to_datetime(pd.date_range(start = query_date,
                             periods=25,
                             freq = "H")).strftime('%Y-%m-%dT%H:%M:%S').to_list()
    try:
        responses_df = pd.read_csv(f"../data/download/responses/responses.csv", dtype={
            'vacancy_id': str,
        })
    except:
        responses_df = None
    responses_data = []
    # iterate through texts and every hour in the day 
    for query_id in range (0, len(query_texts)):
        print(f"Downloading for query '{query_texts[query_id]}' for {query_date}")          
        for i in range(1, 25):
            temp_data = get_responses(
                    query={"text": query_texts[query_id],
                           "per_page": 50,
                           "date_from": timelist[i-1],
                           "date_to": timelist[i],
                           'responses_count_enabled': True,
                          },
                )
            responses_data.append(pd.DataFrame(temp_data[0]))
            api_counter += temp_data[1]
            if api_counter >= api_hourly_limit:
                break
        if api_counter >= api_hourly_limit:
            print(f"API download limit reached, downloaded {api_counter} vacancies for {query_date}")
            responses_df_new = pd.concat(responses_data)
            responses_df = pd.concat([responses_df, responses_df_new])
            responses_df = responses_df.drop_duplicates(subset=['vacancy_id', 'retrieve_date'], keep='last') 
            responses_df.to_csv(f"../data/download/responses/responses.csv",index=False)
            return False
    print(f"Downloaded all {api_counter} responses for {query_date}")
    
    # write the dataframe
    responses_df_new = pd.concat(responses_data)
    responses_df = pd.concat([responses_df, responses_df_new])
    responses_df = responses_df.drop_duplicates(subset=['vacancy_id', 'retrieve_date'], keep='last') 
    responses_df.to_csv(f"../data/download/responses/responses.csv",index=False)      
    return True

In [49]:
def retrieve_queries(query_texts, query_date, refresh = True, progress_info = True):
    """
    Retrieve data for the list of queries for the certain date.    
    Data is saved to f"../data/download/{query_date}_{query_texts[query_id]}.csv" as CSV files.
    Parameters
    ----------
    query_texts : list
        List of strings with the search query request
    query_date : string
        Date string in ISO 8601 format (YYYY-MM-DD)
    Returns
    -------
    bool
        True if all the data retrieved, False if API limit reached
    """
    # a counter for 3k/hour API limit
    api_hourly_limit = 3000
    api_counter = 0
    dropped_counter = 0
    # distribute vacancies by hour to avoid the 2k API limit
    timelist = pd.to_datetime(pd.date_range(start = query_date,
                             periods=25,
                             freq = "H")).strftime('%Y-%m-%dT%H:%M:%S').to_list()
    # iterate through texts and every hour of the day 
    for query_id in range (0, len(query_texts)):
        print(f"Downloading for query '{query_texts[query_id]}' for {query_date}")
        try:
            vacancies_df = pd.read_csv(f"../data/download/{query_date}_{query_texts[query_id]}.csv", dtype={
                'vacancy_id': str,
                'employer_id': str,
            })
        except:
            vacancies_df = None
        if (vacancies_df is not None):
            if not refresh:
                continue
            existing_ids = vacancies_df['vacancy_id'].to_list()
        else:
            existing_ids = []
        vacancies_data = []
        for i in range(1, 25):
            temp_data = collect_vacancies(
                    query={"text": query_texts[query_id],
                           "per_page": 50,
                           "date_from": timelist[i-1],
                           "date_to": timelist[i],
                          },
                    existing_ids=existing_ids,
                    responses=True,
                    refresh=refresh,
                    progress_info=progress_info,
                )
            vacancies_data.append(pd.DataFrame(temp_data[0]))
            api_counter += temp_data[1]
            if api_counter >= api_hourly_limit:
                break
        # combine daily data into the dataframe, remove duplicates and ave it
        vacancies_df_new = pd.concat(vacancies_data)
        vacancies_df = pd.concat([vacancies_df, vacancies_df_new])
        if vacancies_df.shape[0] == 0:
            continue
        dropped_counter += vacancies_df['vacancy_name'].isnull().sum()
        vacancies_df = vacancies_df[vacancies_df['vacancy_name'].notnull()]
        vacancies_df = vacancies_df.drop_duplicates(subset='vacancy_id')
        vacancies_df['query'] = query_texts[query_id]
        vacancies_df.to_csv(f"../data/download/{query_date}_{query_texts[query_id]}.csv",index=False)
        if api_counter >= api_hourly_limit:
            print(f"API download limit reached, downloaded {api_counter} vacancies for {query_date}")
            return False
    print(f"Downloaded all {api_counter} vacancies for {query_date}")
    if dropped_counter > 0:
        print(f"Removed nulls: {dropped_counter}")
    return True

In [9]:
# retrieve the data for the last week (not including today)
# stop if the api download limit reached
# text query is taken from query_texts
api_status = True
today = date.today()
#for i in range(7, 0, -1):
#    day = today - td(days = i)
#    query_date = day.isoformat()
#    api_status = retrieve_queries(query_texts, query_date, False, False)
#    if not api_status:
#        print("API limit reached, wait for an hour")
#        break
#if api_status:
#    print("Data for the last week downloaded")

In [67]:
update_responses(query_texts2, (date.today() - td(days = 8)).isoformat())
update_responses(query_texts2, (date.today() - td(days = 15)).isoformat())

Downloading for query 'Аналитик данных' for 2022-09-22
Downloading for query 'Системный аналитик' for 2022-09-22
Downloading for query 'Бизнес аналитик' for 2022-09-22
Downloading for query 'Продуктовый аналитик' for 2022-09-22
Downloading for query 'Marketing Analyst' for 2022-09-22
Downloading for query 'Веб-аналитик' for 2022-09-22
Downloading for query 'Аналитик BI' for 2022-09-22
Downloading for query 'Младший аналитик' for 2022-09-22
Downloading for query 'Руководитель отдела аналитики' for 2022-09-22
Downloading for query 'Аналитик баз данных' for 2022-09-22
Downloading for query 'Инженер баз данных' for 2022-09-22
Downloading for query 'Data Engineer' for 2022-09-22
Downloading for query 'Data Scientist' for 2022-09-22
Downloaded all 624 responses for 2022-09-22
Downloading for query 'Аналитик данных' for 2022-09-15
Downloading for query 'Системный аналитик' for 2022-09-15
Downloading for query 'Бизнес аналитик' for 2022-09-15
Downloading for query 'Продуктовый аналитик' for 20

True

In [66]:
# '2022-09-22'
retrieve_queries(query_texts2, (date.today() - td(days = 1)).isoformat(), refresh=True)

Downloading for query 'Аналитик данных' for 2022-09-29


Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]


Downloading for query 'Системный аналитик' for 2022-09-29


Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]


Downloading for query 'Бизнес аналитик' for 2022-09-29


Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]


Downloading for query 'Продуктовый аналитик' for 2022-09-29


Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 1/1 [00:

Downloading for query 'Marketing Analyst' for 2022-09-29


Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 6/6 [00:00<00:00,  7.82it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 3/3 [00:00<00:00,  9.52it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 8/8 [00:00<00:00,  8.42it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 4/4 [00:00<00:00,  8.16it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 5/5 [00:00<00:00,  7.05it/s]
Get data via HH API: 100%|█████████████████████████████

Downloading for query 'Веб-аналитик' for 2022-09-29


Get data via HH API: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  7.24it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  9.61it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 2/2 [00:00<00:00,  9.43it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  8.84it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 5/5 [00:00<00:00,  8.05it/s]
Get data via HH API: 100%|██████████████████████████████████████████| 10/10 [00:01<00:00,  7.85it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 8/8 [00:00<00:00,  9.24it/s]
Get data via HH API: 100%|██████████████████████████████████████████| 25/25 [00:03<00:00,  7.32it/s]
Get data via HH API: 100%|██████████████████████████████████████████| 14/14 [00:01<00:00,  8.19it/s]
Get data vi

Downloading for query 'Аналитик BI' for 2022-09-29


Get data via HH API: 100%|████████████████████████████████████████████| 2/2 [00:00<00:00,  8.73it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 3/3 [00:00<00:00,  7.81it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 2/2 [00:00<00:00,  7.63it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 6/6 [00:00<00:00,  8.45it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 9/9 [00:01<00:00,  7.01it/s]
Get data via HH API: 100%|██████████████████████████████████████████| 17/17 [00:02<00:00,  7.12it/s]
Get data via HH API: 100%|██████████████████████████████████████████| 23/23 [00:02<00:00,  7.98it/s]
Get data via HH API: 100%|██████████████████████████████████████████| 25/25 [00:03<00:00,  8.28it/s]
Get data via HH API: 100%|██████

Downloading for query 'Младший аналитик' for 2022-09-29


Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  8.81it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00, 11.36it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 3/3 [00:00<00:00,  6.49it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 3/3 [00:00<00:00,  7.42it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 2/2 [00:00<00:00,  8.36it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 6/6 [00:00<00:00, 10.67it/s]
Get data via HH API: 100%|██████████████████████████████████████████| 13/13 [00:01<00:00,  8.75it/s]
Get data via HH API: 100%|██████████████████████████████████████████| 14/14 [00:01<00:00,  8.16it/s]
Get data via HH API: 100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  8.10it/s]
Get data vi

Downloading for query 'Руководитель отдела аналитики' for 2022-09-29


Get data via HH API: 100%|████████████████████████████████████████████| 2/2 [00:00<00:00, 11.16it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 2/2 [00:00<00:00, 10.69it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00, 10.41it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 2/2 [00:00<00:00,  8.64it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 3/3 [00:00<00:00, 10.48it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 6/6 [00:00<00:00,  7.95it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 2/2 [00:00<00:00,  8.13it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 6/6 [00:00<00:00,  8.76it/s]
Get data via HH API: 100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  7.79it/s]
Get data via HH API: 100%|██████████████████████████████████████████| 49/49 [00:06<00:00,  

Downloading for query 'Аналитик баз данных' for 2022-09-29


Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  7.87it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00, 10.86it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  8.77it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 2/2 [00:00<00:00,  9.95it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 4/4 [00:00<00:00,  8.28it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 9/9 [00:00<00:00,  9.12it/s]
Get data via HH API: 100%|██████████████████████████████████████████| 11/11 [00:01<00:00,  8.17it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 7/7 [00:00<00:00,  8.04it/s]
Get data via HH API: 100%|██████

Downloading for query 'Инженер баз данных' for 2022-09-29


Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 2/2 [00:00<00:00, 10.31it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  8.19it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  3.18it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 3/3 [00:00<00:00,  9.37it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 7/7 [00:00<00:00,  8.31it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 7/7 [00:00<00:00,  8.37it/s]
Get data via HH API: 100%|██████████████████████████████████████████| 10/10 [00:01<00:00,  6.87it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 8/8 [00:00<00:00,  9.

Downloading for query 'Data Engineer' for 2022-09-29


Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  8.12it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 2/2 [00:00<00:00,  5.58it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  9.90it/s]
Get data via HH API: 100%|██████████████████████████████████████████| 10/10 [00:01<00:00,  8.13it/s]
Get data via HH API: 100%|██████████████████████████████████████████| 11/11 [00:01<00:00,  6.92it/s]
Get data via HH API: 100%|██████████████████████████████████████████| 15/15 [00:02<00:00,  7.26it/s]
Get data via HH API: 100%|██████████████████████████████████████████| 12/12 [00:01<00:00,  7.31it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 9/9 [00:01<00:00,  7.

Downloading for query 'Data Scientist' for 2022-09-29


Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 2/2 [00:00<00:00,  9.47it/s]
Get data via HH API: 0it [00:00, ?it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 8/8 [00:00<00:00,  8.58it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 5/5 [00:00<00:00,  6.43it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 5/5 [00:00<00:00,  8.59it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 2/2 [00:00<00:00,  6.49it/s]
Get data via HH API: 100%|████████████████████████████████████████████| 1/1 [00:00<00:00,  9.70it/s]
Get data via HH API: 100%|████████

Downloaded all 2171 vacancies for 2022-09-29





True