# Рынок вакансий аналитиков данных на LinkedIn

<div style="border: solid seagreen 1px; padding: 10px">
    
**Задача** — визуализировать информацию о рынке вакансий для аналитиков данных начального уровня (data analyst и BI analyst) в Европе. 

**Источник информации**: база в формате csv, содержащая код HTML-страниц с описанием вакансий на LinkedIn по состоянию на 23 мая 2023 года.
    
В рамках задачи будут выполнены следующие шаги:
- [загрузка и обзор данных](#overview),
- [предобработка данных](#data_processing),
- [выгрузка данных](#extraction),
- [построение дашборда](#dash).

In [1]:
# импортируем библиотеки
import pandas as pd
import requests
from urllib.parse import urlencode
import json
from IPython.display import HTML
from bs4 import BeautifulSoup
import re
import numpy as np
import datetime as dt

# настраиваем отображение колонок датафрейма при выводе данных
pd.options.display.max_columns = None
pd.options.display.max_colwidth = None

<a id='overview'></a>
## Загрузка и обзор данных

In [2]:
# загружаем данные

try:
    df = pd.read_csv('/Users/mrmrzpn/Desktop/Yandex Praktikum/\
7. Мастерская/LinkedIn/masterskaya_parsing_LinkedIn_2023_05_23.csv',
                    usecols=['html'])
except:
    # используем api 
    base_url = 'https://cloud-api.yandex.net/v1/disk/public/resources/download?' 
    # ссылка на файл в облаке
    public_key = 'https://disk.yandex.ru/d/Rlo2KdJRve6fkw'
    
    # получаем url для запроса на скачивание файла
    url = base_url + urlencode(dict(public_key=public_key))
    # отправляем запрос на скачивание файла
    response = requests.get(url).text
    # вытаскиваем из ответа прямую ссылку для скачивания файла
    download_url = json.loads(response)['href']
    
    # загружаем файл в датасет
    df = pd.read_csv(download_url, index_col=0)

In [3]:
# смотрим информацию о датасете

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 998 entries, 0 to 997
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   html    998 non-null    object
dtypes: object(1)
memory usage: 7.9+ KB


In [4]:
# смотрим для примера, как выглядит страница вакансии

HTML(df['html'][0])

**Наблюдения:** В датасете 998 вакансий, пропусков нет.

<a id='data_processing'></a>
## Предобработка данных

### Проверка на дубликаты - 1

In [5]:
# проверим, есть ли полные дубликаты

sum(df.duplicated())

98

В датасете 98 полных дубликатов, удалим их:

In [6]:
df = df.drop_duplicates(ignore_index=True)

### Парсинг

Нужно собрать следующую информацию:
- наименование вакансии
- наименование компании 
- город
- страна
- уровень кандидата (junior, middle, senior)
- тип занятости (удаленно/офис/гибридный график)
- дата публикации вакансии
- количество кандидатов на вакансию
- требуемые хард скилы
- сфера деятельности компании
- размер компании (количество сотрудников)

Требуемые хард скиллы возьмем из описания вакансии в следующем шаге.

In [7]:
# объявим функцию для парсинга данных

def get_data(html):
    soup = BeautifulSoup(html, 'lxml')
    
    # парсим данные, и если находим нужный признак, убираем лишние пробелы и перенос строки
    def parsing(tag_str, class_str):
        result = soup.find(tag_str, attrs={'class': class_str})
        if result:
            result = result.text.strip()
        return result
    
    # название вакансии
    job_title = parsing('h2', 't-24 t-bold jobs-unified-top-card__job-title')
    # название компании
    company = parsing('span', 'jobs-unified-top-card__company-name')
    # город и страна
    location = parsing('span', 'jobs-unified-top-card__bullet')
    # разделим город и страну
    if location:
        location = location.split(', ')
        city = location[0]
        country = location[-1]
    else:
        city = np.nan
        country = np.nan
    # ожидаемый уровень кандидатов
    level = parsing('li', 'jobs-unified-top-card__job-insight')
    if level:
        level = level.split('·')
        level = level[-1].strip()
    # тип занятости
    workplace = parsing('span', 'jobs-unified-top-card__workplace-type')
    # дата публикации
    posted_date = parsing('span', 'jobs-unified-top-card__posted-date')
    # число откликов
    applicants = parsing('span', 'jobs-unified-top-card__applicant-count')
    # описание вакансии
    description = parsing('article', 'jobs-description__container m4')
    # если в описании вакансии только заголовок, возвращаем nan
    if description == 'About the job':
        description = np.nan
    # индустрия и размер компании
    industry_size = parsing('div', 't-14 mt5')
    # разделим индустрию и размер компании
    if industry_size:
        industry_size = re.split('\n +', industry_size)
        industry = industry_size[0]
        try:
            size = industry_size[2]
        except:
            size = np.nan
    else:
        industry = np.nan
        size = np.nan
    
    return [job_title, company, city, country, level, workplace, posted_date, applicants, description,
            industry, size]
    

In [8]:
# создадим датафрейм по итогам парсинга

data = []
columns = ['job_title', 'company', 'city', 'country', 'level', 'workplace', 'posted_date',
           'applicants', 'description', 'industry', 'size']

for i in range(len(df)):
    data.append(get_data(df.loc[i,'html']))

parsed_df = pd.DataFrame(data, columns=columns)

In [9]:
# смотрим информацию о датафрейме

parsed_df.info()
parsed_df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 900 entries, 0 to 899
Data columns (total 11 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   job_title    900 non-null    object
 1   company      900 non-null    object
 2   city         900 non-null    object
 3   country      900 non-null    object
 4   level        883 non-null    object
 5   workplace    832 non-null    object
 6   posted_date  900 non-null    object
 7   applicants   740 non-null    object
 8   description  899 non-null    object
 9   industry     866 non-null    object
 10  size         866 non-null    object
dtypes: object(11)
memory usage: 77.5+ KB


Unnamed: 0,job_title,company,city,country,level,workplace,posted_date,applicants,description,industry,size
0,Data Analyst,PharmiWeb.Jobs: Global Life Science Jobs,Basel,Switzerland,Entry level,On-site,1 week ago,47 applicants,"About the job\n \n\n \n What You Will Achieve\nThis position will apply advanced manufacturing, science, and technology to support business and process improvements for the manufacture of small and / or large volume parenteral products. You will be a member of the Transformation and Strategy team. As a Data Analyst you will be responsible for mining, retrieving, organizing, and analyzing data to support the operations of a large manufacturing facility. Using the data, you will help to develop key performance indicators to demonstrate the effectiveness of processes and systems against business strategies.\nYour knowledge of manufacturing operations and computer systems/tools will make you a critical member of the team. Your strong business processes and workflow skills will help facilitate required gatherings for building and enhancing business process maps and strategies. Your innovative use of communication tools and techniques will facilitate in explaining difficult issues, establishing consensus between teams, and will create a collaborative teaming environment for your colleagues.\nMain Responsibilities\nInterpret data, analyze results using statistical techniques and provide ongoing reportsDevelop and implement databases, data collection systems, data analytics and other strategies that optimize statistical efficiency and qualityAcquire data from primary or secondary data sources and maintain databases/data systemsIdentify, analyze, and interpret trends or patterns in complex data sets Filter and “clean” data by reviewing reports, printouts, and performance indicatorsWork with management to prioritize business and information needs Identify and define new process improvement opportunities\nMust-Haves\nA Bachelor’s degree with at least three years of experience; OR a Master’s degree with more than one year of experience.Prior pharmaceutical and/or manufacturing experience requiredTechnical expertise regarding data models, database design development, data mining and segmentation techniques Knowledge of and experience with reporting packages (Business Objects etc) and databases (SQL etc)Knowledge of statistics and experience using statistical packages for analyzing datasets (Excel, SPSS, SAS etc)Knowledge of SAP (ERP materials planning systems).\nYou will be joining an organisation with determined to bring about considerable change to the global industry, in an environment that promotes self-development and personal success while driving for company growth.\nJob Title: Data Analyst\nLocation: Basel, Switzerland\nJob Type: Contract\nAerotek, an Allegis Group company. Allegis Group AG, Aeschengraben 20, CH-4051 Basel, Switzerland. Registration No. CHE-101.865.121. Aerotek and Actalent Services are companies within the Allegis Group network of companies (collectively referred to as ""Allegis Group""). Aerotek, Actalent Services, Aston Carter, EASi, TEKsystems, Stamford Consultants and The Stamford Group are Allegis Group brands. If you apply, your personal data will be processed as described in the Allegis Group Online Privacy Notice available at https://www.allegisgroup.com/en-gb/privacy-notices.\nTo access our Online Privacy Notice, which explains what information we may collect, use, share, and store about you, and describes your rights and choices about this, please go to https://www.allegisgroup.com/en-gb/privacy-notices.\nWe are part of a global network of companies and as a result, the personal data you provide will be shared within Allegis Group and transferred and processed outside the UK, Switzerland and European Economic Area subject to the protections described in the Allegis Group Online Privacy Notice. We store personal data in the UK, EEA, Switzerland and the USA. If you would like to exercise your privacy rights, please visit the ""Contacting Us"" section of our Online Privacy Notice at https://www.allegisgroup.com/en-gb/privacy-notices for details on how to contact us. To protect your privacy and security, we may take steps to verify your identity, such as a password and user ID if there is an account associated with your request, or identifying information such as your address or date of birth, before proceeding with your request. If you are resident in the UK, EEA or Switzerland, we will process any access request you make in accordance with our commitments under the UK Data Protection Act, EU-U.S. Privacy Shield or the Swiss-U.S. Privacy Shield",Staffing & Recruiting,11-50 employees
1,Data Analyst - Logistics,Resolute Recruitment,Coventry,United Kingdom,,On-site,1 week ago,,,,
2,Data Analyst - Logistics,Resolute Recruitment,Coventry,United Kingdom,,On-site,1 week ago,,"About the job\n \n\n \nData Analyst - Logistics ~ Permanent Role ~ Mon-Fri ~ Immediate Start ~Locations: Coventry, CV3Salary: 35,000 - 42,000 Per Year Shift Pattern: Monday to Friday 08:00 - 16:00*The correct candidate will also be able to work night shifts when required to observe this side of the business. Once the initial training period is complete you will be able to work from home as and when required. Job role: Analyse business data and provide detailed reports and reccomendations. Provide the business with regular reports to Quality Performance Supply key information to senior reviewers to assist decision making Ensuring actions are taken to speed up the issue resolution process Analyse general business processes to target cost saving & time saving efficiencies.. Report to business management and reccomend any changes. Monday to Friday Days, however the correct candidate must be prepared to work night shifts to look at this area of the business on occasion. What we are looking for: A proven background in data analytics & supplying detailed reports and reccomendations. Strong communication skills Flexibility to work nights on the rare occasion. Resolute Recruitment is acting as an Employment Agency in relation to this vacancy.Data Analysisdata, analysis, analytics, logistics, logistic, transport, business, profit, loss, performance, quality, management, coventry, warwickshire, nuneaton, rugby, leamington spa, warwick, stratford, wfh, work from home, analytical, Analyse, training, perm, permanent, jobs, job",,
3,Data Analyst (Space & Planning),Mole Valley Farmers,South Molton,United Kingdom,,On-site,1 week ago,,"About the job\n \n\n \nSalary: To be discussed on applicationLocation: Hybrid Working, Home working & Head Office, EX36 3LHContract Type: PermanentContracted Hours: Full time, 37.5 hours per week, Monday to FridayHere at Mole Valley Farmers we have an exciting opportunity for a Space & Planogram Planning Assistant to join our Merchandise Planning team.You will maintain the space and Merchandising requirements with the Procurement & Retail Operations teams for the Business, ensuring full optimisation of space & delivery of planogram implementation.What you will be doing as a Space & Planogram Planning Assistant… Analysing historic Space & Sales performance, presenting key facts and trendsWorking with Product Manager & Merchandise Planner to develop the optimum layouts within the Mock shop as required for their CategoriesUsing Financial analysis to feed into the Range planning process, working with the Product Manager & Merchandise Planner.Producing planograms for categories and creating Head Office Bulletins to ensure implementation in stores, working within agreed critical path and process.\nWe would love to hear from you if you have…. Good Excel knowledgeStrong analytic natureYou are highly organised with strong attention to detailYou are self-motivated but also enjoy working as part of a team\nPlease note, this vacancy may close prior to the expiry date if we have received a suitable number of applications.",,
4,Data Analyst,FORFIRM,Lugano,Switzerland,,On-site,2 weeks ago,,"About the job\n \n\n \nFORFIRM is providing solutions to real business challenges for our clients through innovation and deep industry understanding. We pride ourselves on being a knowledge-based company, with no barriers or pre-built solutions – we listen to our clients and solve their unique problems.At FORFIRM, we are creating a culture where each person can define their own role parameters and speak their mind without any hesitation. We are a true meritocracy, where individual results define each person’s career path.We are looking for experienced and motivated Data Analyst to join our lively international team and work on projects for Europe's leading brands! Manage business requests in the Data warehouse and Reporting area, from addressing the user request to the release of the solution (in collaboration with various work groups) Design and participate in the implementation of processing strategies, data transformation (ETL processes) and Business Intelligence / Self BI solutions collaborate, with the whole team, in the design and management of data storage and use solutions in the Cloud (GCP or AWS) support the team in monitoring pipelines (cloud and on-prem) and performance support the team in the evolution of the data model analyze, define and implement data quality procedures within the QA framework managed by the team\nWhat we are looking for ? experience of 5+ years of work on Datawarehouse and Front-END projects (managed in agile mode both on traditional technologies and on cloud platforms) knowledge in Data Modeling (techniques and methodologies) ability to analyze data model using relational DBs Experience in SQL language, ETL tools (preferably Oracle Data Integrator) and Data Visualization (Power BI) Experience in Python Previous experience in banking sector will represent a plus\n Opportunity to work on international multi-disciplinary projects for leading brands from day oneHighly meritocratic environment – points-based bonus and promotion systemA focus on knowledge building with regular internal and external training opportunitiesOpportunities to travel and work abroadSupportive and dynamic team \nForFirm is an equal opportunities employer that values diversity within the company. Qualified applicants will receive consideration for employment without discrimination about race, religion, colour, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.",,


**Наблюдения:** Не по всем вакансиям предоставлены полные данные: в некоторых случаях отсутствует информация о типе занятости, уровне кандидатов, количестве откликов на вакансию, индустрии и размере компании. В одном случае нет подробного описания вакансии с требованиями к кандидатам.  
Пропуски удалять не будем, чтобы аналитика по числу вакансий в разрезе компаний, стран, индустрий была полной.

### Расчет даты публикации

Дата публикации представлена в текстовом формате, в виде отрезка времени с момента публикации до 23 мая 2023 года. Изменим формат представления даты.

In [10]:
# посмотрим, какие отрезки времени есть в датасете

parsed_df['posted_date'].unique()

array(['1 week ago', '2 weeks ago', '6 days ago', '3 weeks ago',
       '2 days ago', '1 day ago', '4 days ago', '4 weeks ago',
       '3 days ago', '5 days ago', '12 minutes ago', '29 minutes ago',
       '5 hours ago', '8 hours ago', '6 hours ago', '9 hours ago',
       '11 hours ago', '12 hours ago', '7 hours ago', '10 hours ago'],
      dtype=object)

In [11]:
# для перевода отрезка времени в дни составим словарь замен

days_number = {'week':7, 'weeks':7, 'day':1, 'days':1, 'hours':0, 'minutes':0}

In [12]:
# сохраним расчетную дату публикации в датасете

parsed_df['date'] = parsed_df['posted_date'].apply(lambda x: (dt.date(2023, 5, 23) - 
                                                              dt.timedelta(days = int(x.split(' ')[0]) * 
                                                                           days_number[x.split(' ')[1]])
                                                             )).astype('datetime64[D]')

### Проверка на релевантность

Проверим вакансии на соответствие требованиям: ищем аналитиков данных и BI аналитиков начального уровня в Европе.

#### Название вакансии

In [13]:
# посмотрим на названия вакансий

sorted(parsed_df['job_title'].unique())

['(ESG) Data Analyst (w/m/d)',
 '(Junior) Application Specialist eCommerce (m/f/d)',
 '(Junior) Business Analyst (f/m/d)',
 '(Junior) Business Analyst (w/m/d)',
 '(Junior) Consumer Business Intelligence Analyst (m/w/d)',
 '(Junior) Data Scientist',
 '(Junior) Project Analyst (m/f/d)',
 '(Junior) Project Coordinator',
 '(Junior) Systems Business Analyst (f/m/d)',
 '(Senior) Consultant Transaction Analytics (f/m/d)',
 '360 Data Product Owner',
 'AFC Data Analyst',
 'ALTERNANCE - Assistance Data Analyst et Reporting RH F/H',
 'ANALISTA (H/M) FINANCIERO / FP&A REPORTING',
 'ANALISTA DE PROYECTO DIGITALIZACIÓN',
 'ANALYSTE DES DONNEES (H/F)',
 'APM Software Development & Operations',
 'Accountant',
 'Aftersales Reporting Specialist',
 'Alternance - Data Analyst - Data ETL & Visualisation (h/f/d)',
 'Analista Dati di Geodesia - CATEGORIA PROTETTA',
 'Analista Ecommerce, Web y App',
 'Analista Funzionale IT (Junior)',
 'Analista de Software',
 'Analista de datos',
 'Analista de datos BI',
 'A

Актуальные вакансии должны включать слово "аналитик" и "Business Intelligence (BI)":

In [14]:
# составим список слов, которые должны обязательно встречаться в подходящих вакансиях

analyst = ['anal', 'dba', 'data research', 'intelligence', '\bbi\b', '\bbi-', 'visualization',
           'visualisation', 'data consultant', 'processing']

In [15]:
# оставим в датасете только аналитиков

parsed_df = (parsed_df.loc[
    parsed_df['job_title']
    .apply(lambda x: (sum([i in x.lower() for i in analyst]) != 0))]
            ).reset_index(drop=True)

In [16]:
# посмотрим на оставшиеся названия вакансий, чтобы исключить неподходящие

sorted(parsed_df['job_title'].unique())

['(ESG) Data Analyst (w/m/d)',
 '(Junior) Business Analyst (f/m/d)',
 '(Junior) Business Analyst (w/m/d)',
 '(Junior) Consumer Business Intelligence Analyst (m/w/d)',
 '(Junior) Project Analyst (m/f/d)',
 '(Junior) Systems Business Analyst (f/m/d)',
 '(Senior) Consultant Transaction Analytics (f/m/d)',
 'AFC Data Analyst',
 'ALTERNANCE - Assistance Data Analyst et Reporting RH F/H',
 'ANALISTA (H/M) FINANCIERO / FP&A REPORTING',
 'ANALISTA DE PROYECTO DIGITALIZACIÓN',
 'ANALYSTE DES DONNEES (H/F)',
 'Alternance - Data Analyst - Data ETL & Visualisation (h/f/d)',
 'Analista Dati di Geodesia - CATEGORIA PROTETTA',
 'Analista Ecommerce, Web y App',
 'Analista Funzionale IT (Junior)',
 'Analista de Software',
 'Analista de datos',
 'Analista de datos BI',
 'Analista de datos dept de elaborados (Guissona)',
 'Analista funzionale',
 'Analista superior de datos',
 'Analityk',
 'Analityk danych',
 'Analityk danych internetowych (wszystko.pl)',
 'Analyst',
 'Analyst (Strategy, Policy and Innova

Исключим из вакансий аналитиков-разработчиков, аналитиков бизнес-процессов, лабораторных аналитиков, дата-саентистов, инженеров данных, финансовых аналитиков, вакансии в IT службах:

In [17]:
# составим список слов, которые должны отсутствовать в названии вакансий

job_to_exclude = ['developer', 'development', 'programmatore', 'software', 'business process', 
                  'business-anal', 'business anal', 'funzionale', 'functional', 'governance', 'engineer',
                  'ingen', 'operational', 'operations', 'process', 'lab analyst', 'infrastructure', 
                  'science', 'scientist', 'financ', 'it solution', 'it strategy', 'support', 'it anal', 'esg']

In [18]:
# обновим датасет

parsed_df = (parsed_df.loc[
    parsed_df['job_title']
    .apply(lambda x: (sum([i in x.lower() for i in job_to_exclude]) == 0))]
            ).reset_index(drop=True)

#### Уровень вакансии

In [19]:
# посмотрим, какие уровни вакансии представлены в датасете

parsed_df['level'].unique()

array(['Entry level', None, 'Associate', 'Full-time', 'Internship',
       'Part-time'], dtype=object)

Уровни вакансии в целом соответствуют критериям поиска: представлены начальные позиции, включая стажировки.  
Уровень "Associate" может быть как начальным, так и middle в зависимости от грейдов в компании, далее проверим, не указан ли в этих случаях уровень в названии вакансии.  
Там, где уровнь вакансии получился "Full-time" и "Part-time", данных об требуемом уровне кандидата нет, и поэтому заменим такие значения на NaN:

In [20]:
parsed_df.loc[(parsed_df['level'] == 'Full-time') | (parsed_df['level'] == 'Part-time'), 'level'] = np.nan

In [21]:
# проверим, сколько в датасете вакансий без уровня кандидата

print('Число вакансий без указания уровня кандидата -', len(parsed_df.query('level.isna()')))

Число вакансий без указания уровня кандидата - 25


In [22]:
# посмотрим на названия вакансий, где не указан уровень кандидата

sorted(parsed_df.query('level.isna()')['job_title'].unique())

['Analista de datos BI',
 'Asset Data Analyst',
 'DATA ANALYST IT',
 'Data Analist - startersfunctie (Dutch speaking)',
 'Data Analyst',
 'Data Analyst (9 Months FTC)',
 'Data Analyst (Space & Planning)',
 'Data Analyst (m/w/d)',
 'Data Analyst - Hybrid',
 'Data Analyst - Hybrid Working',
 'Data Analyst - Logistics',
 'Data Analyst H/F',
 'Data analyst',
 'Datový analytik/vývojář']

Из вакансий выше только одна, связанная с анализом данных и BI, упоминает ожидаемый уровень кандидата - *'Data Analist - startersfunctie (Dutch speaking)'* (startersfunctie - начальная позиция). Проставим для неё начальный уровень:

In [23]:
parsed_df.loc[parsed_df['job_title'] == 'Data Analist - startersfunctie (Dutch speaking)',
              'level'] = 'Entry level'

In [24]:
# посмотрим на названия вакансий, где уровень - "Associate"

sorted(parsed_df.query('level == "Associate"')['job_title'].unique())

['(Junior) Consumer Business Intelligence Analyst (m/w/d)',
 '(Junior) Project Analyst (m/f/d)',
 'ANALISTA DE PROYECTO DIGITALIZACIÓN',
 'Analista Dati di Geodesia - CATEGORIA PROTETTA',
 'Analista Ecommerce, Web y App',
 'Analista de datos',
 'Analista de datos dept de elaborados (Guissona)',
 'Analista superior de datos',
 'Analityk',
 'Analityk danych',
 'Analyst',
 'Analyst - Marketing Effectiveness',
 'Analyst Sourcing & Planning',
 'Analytics & Reporting - Associate – Warsaw',
 'Analytics Consultant',
 'Analytics Consultant - Marketing Effectiveness',
 'Associate Database Analyst---Clinical Database Management',
 'BI Analyst',
 'BI Analyst (Pricing)',
 'BI Analyst (m/w/d)',
 'BI Analyst (m/w/d) Marketing',
 'BI Analyst, Power BI Champion, CEE based in Warsaw',
 'BI-Analyst (m/w/d)',
 'BUSINESS DATA ANALYSTE Senior H/F',
 'Business & Data Analyst',
 'Business Data Analyst (m/w/d)',
 'Business Data Analyst Lead\xa0H/F',
 'Business Intelligence Analyst',
 'Business Intelligence Ana

В названии некоторых вакансий указаны middle и senior уровни. Исключим такие вакансии:

In [25]:
# составим список уровней, которые не должны встречаться в названии вакансии

level_to_exclude = ['senior', 'superior', 'middle', 'mid', 'medior', 'lead', 'chef', 'principal',
                    'owner', 'head']

In [26]:
# исключим неподходящие вакансии из датасета

parsed_df = (parsed_df.loc[parsed_df['job_title']
                           .apply(lambda x: sum([i in x.lower() for i in level_to_exclude]) == 0)]
            ).reset_index(drop=True)

#### Регион

In [27]:
# проверим, что в датасет попали только страны Европы

sorted(parsed_df['country'].unique())

['Amsterdam Area',
 'Austria',
 'Belgium',
 'Berlin Metropolitan Area',
 'Brussels Metropolitan Area',
 'Bulgaria',
 'Cologne Bonn Region',
 'Czechia',
 'Denmark',
 'Eindhoven Area',
 'Estonia',
 'Finland',
 'France',
 'Germany',
 'Greater Banska Bystrica Area',
 'Greater Milan Metropolitan Area',
 'Greater Munster Area',
 'Greater Nuremberg Metropolitan Area',
 'Greater Palma de Mallorca Metropolitan Area',
 'Greater Paris Metropolitan Region',
 'Greater Pau Area',
 'Greece',
 'Hungary',
 'Ireland',
 'Italy',
 'Krakow Metropolitan Area',
 'Latvia',
 'Lisbon Metropolitan Area',
 'Lithuania',
 'Luxembourg',
 'Malta',
 'Monaco',
 'Netherlands',
 'Norway',
 'Poland',
 'Portugal',
 'Prague Metropolitan Area',
 'Romania',
 'Rotterdam and The Hague',
 'Slovakia',
 'Spain',
 'Sweden',
 'Switzerland',
 'United Kingdom',
 'Warsaw Metropolitan Area',
 'Wroclaw Metropolitan Area']

Вакансии представлены только в Европе, однако в некоторых случаях указан лишь город/округ, и нет страны. Укажем для таких вакансий страну:

In [28]:
# составим словарь с названиями городов и соответствующих им стран

country_dict = {'Amsterdam Area': 'Netherlands', 'Athens Metropolitan Area':'Greece',
                'Berlin Metropolitan Area':'Germany', 'Brussels Metropolitan Area':'Belgium',
                'Cologne Bonn Region':'Germany', 'Copenhagen Metropolitan Area':'Denmark',
                'Eindhoven Area':'Netherlands', 'Greater Banska Bystrica Area':'Slovakia',
                'Greater Barcelona Metropolitan Area':'Spain',
                'Greater Dijon Area':'France', 'Greater Lyon Area':'France',
                'Greater Madrid Metropolitan Area':'Spain',
                'Greater Milan Metropolitan Area':'Italy',
                'Greater Munich Metropolitan Area':'Germany',
                'Greater Munster Area':'Ireland', 'Greater Nuremberg Metropolitan Area':'Germany',
                'Greater Oslo Region':'Norway', 'Greater Palma de Mallorca Metropolitan Area':'Spain',
                'Greater Paris Metropolitan Region':'France', 'Greater Pau Area':'France',
                'Greater Verona Metropolitan Area':'Italy',
                'Iasi Metropolitan Area':'Romania', 'Krakow Metropolitan Area':'Poland',
                'Lisbon Metropolitan Area':'Portugal', 'Prague Metropolitan Area':'Czechia',
                'Rotterdam and The Hague':'Netherlands', 'Stuttgart Region':'Germany',
                'Warsaw Metropolitan Area':'Poland', 'Wroclaw Metropolitan Area':'Poland'
               }

In [29]:
# добавим столбец с обновленными странами

parsed_df['country_new'] = parsed_df['country'].map(country_dict)
parsed_df.loc[parsed_df['country_new'].isna(), 'country_new'] = parsed_df['country']

---

In [30]:
# посмотрим на обновленный датасет

parsed_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 526 entries, 0 to 525
Data columns (total 13 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   job_title    526 non-null    object        
 1   company      526 non-null    object        
 2   city         526 non-null    object        
 3   country      526 non-null    object        
 4   level        502 non-null    object        
 5   workplace    487 non-null    object        
 6   posted_date  526 non-null    object        
 7   applicants   412 non-null    object        
 8   description  525 non-null    object        
 9   industry     495 non-null    object        
 10  size         495 non-null    object        
 11  date         526 non-null    datetime64[ns]
 12  country_new  526 non-null    object        
dtypes: datetime64[ns](1), object(12)
memory usage: 53.5+ KB


**Наблюдения:** Из 900 вакансий в датасете осталось 526 релевантных.

### Проверка на дубликаты - 2

Снова проверим датасет на наличие полных дубликатов: возможно, одна вакансия публиковалась несколько раз в один день.

In [31]:
# число полных дубликатов по итогам парсинга

sum(parsed_df.duplicated())

5

In [32]:
# удаляем полные дубликаты

parsed_df = parsed_df.drop_duplicates(ignore_index=True)

Проверим случаи, когда одна вакансия публиковалась в разные дни:

In [33]:
# число дублированных вакансий, опубликованных в разные дни

sum(parsed_df.duplicated(subset=['job_title','company','city','country','description']))

1

In [34]:
# удалим обнаруженные дубликаты

parsed_df = parsed_df.drop_duplicates(subset=['job_title','company','city','country','description'],
                                      ignore_index=True)

Проверим для вакансии без описания (индекс в датасете - 1), есть ли у нее дубликат:

In [35]:
# ищем дубликаты для вакансии без описания

parsed_df.query('(job_title == "Data Analyst - Logistics") and\
(company == "Resolute Recruitment") and (city == "Coventry")')

Unnamed: 0,job_title,company,city,country,level,workplace,posted_date,applicants,description,industry,size,date,country_new
1,Data Analyst - Logistics,Resolute Recruitment,Coventry,United Kingdom,,On-site,1 week ago,,,,,2023-05-16,United Kingdom
2,Data Analyst - Logistics,Resolute Recruitment,Coventry,United Kingdom,,On-site,1 week ago,,"About the job\n \n\n \nData Analyst - Logistics ~ Permanent Role ~ Mon-Fri ~ Immediate Start ~Locations: Coventry, CV3Salary: 35,000 - 42,000 Per Year Shift Pattern: Monday to Friday 08:00 - 16:00*The correct candidate will also be able to work night shifts when required to observe this side of the business. Once the initial training period is complete you will be able to work from home as and when required. Job role: Analyse business data and provide detailed reports and reccomendations. Provide the business with regular reports to Quality Performance Supply key information to senior reviewers to assist decision making Ensuring actions are taken to speed up the issue resolution process Analyse general business processes to target cost saving & time saving efficiencies.. Report to business management and reccomend any changes. Monday to Friday Days, however the correct candidate must be prepared to work night shifts to look at this area of the business on occasion. What we are looking for: A proven background in data analytics & supplying detailed reports and reccomendations. Strong communication skills Flexibility to work nights on the rare occasion. Resolute Recruitment is acting as an Employment Agency in relation to this vacancy.Data Analysisdata, analysis, analytics, logistics, logistic, transport, business, profit, loss, performance, quality, management, coventry, warwickshire, nuneaton, rugby, leamington spa, warwick, stratford, wfh, work from home, analytical, Analyse, training, perm, permanent, jobs, job",,,2023-05-16,United Kingdom


Дубликат есть, удаляем строку без описания вакансии:

In [36]:
# удаляем вакансию без описания

parsed_df = parsed_df.dropna(subset=['description']).reset_index(drop=True)

Проверим, есть ли вакансии, у которых совпадает описание, компания и город:

In [37]:
# число вакансий с совпадающими описанием, компанией и городом

sum(parsed_df.duplicated(subset=['company','city','description']))

0

Посмотрим на вакансии, у которых совпадает название компании и город. Возможно, среди них есть неявные дубликаты: незначительно отличаются названия и детали в описании, но по сути это одна и та же вакансия. При поиске дубликатов мы исходим из предположения, что на одну должность может набираться несколько человек, и одной должности должна соответствовать одна вакансия.

In [38]:
# число вакансий с совпадающими компанией и городом

sum(parsed_df.duplicated(subset=['company','city']))

22

In [39]:
# составим список задублированных компаний и городов

duplicated_list = (parsed_df
                   .loc[parsed_df.duplicated(subset=['company','city']), ['company','city']]
                  ).reset_index()

In [40]:
# посмотрим на потенциально задублированные вакансии

(parsed_df
 .query('(company in @duplicated_list.company) and (city in @duplicated_list.city)')
 .sort_values(by=['company','city'])
)

Unnamed: 0,job_title,company,city,country,level,workplace,posted_date,applicants,description,industry,size,date,country_new
99,Data-analist,ABN AMRO Bank N.V.,Amersfoort,Netherlands,Associate,Hybrid,1 week ago,33 applicants,"About the job\n \n\n \nIN HET KORTKom werken in de dynamische wereld van Hypotheken! Met meer dan 750.000 klanten en 150 miljard euro aan verstrekte hypotheken is Hypotheken één van de belangrijkste productgroepen binnen ABN AMRO. In Nederland hebben we een marktaandeel van rond de 20%. Binnen ABN AMRO Hypotheken nemen we veel beslissingen op basis van data. Als data-analist gebruik jij jouw nieuwsgierige houding om met doortastende onderzoeken inzicht te geven in de kwaliteit van extern aangeleverde data, en help je door pro-actief analyses uit te voeren mee aan het verbeteren van onze data-kwaliteit en het bereiken van onze doelstellingen.\nJE WERKAls data-analist werk je binnen het projectteam Dossierkwaliteit, dat zich bezighoudt met de classificatie van historische hypotheekdossiers en de extractie van data-elementen uit aangeleverde documenten. Op basis van AI-modellen wordt door een externe partij gezocht naar documenten die verplicht aanwezig moeten zijn in een dossier, en worden er gegevens uit deze documenten geëxtraheerd. Als data-analist ben jij verantwoordelijk voor de analyse en controle van de resultaten van deze modellen, om zo de externe partij te voorzien van feedback om hun modellen te verbeteren. Daarnaast bepaal je als inhoudelijk expert samen met interne en externe stakeholders welke analyses benodigd zijn om het project naar het volgende niveau te tillen.Je hebt het talent om inzichten uit grote data-bestanden te halen en je haalt daar energie uit. Je bent nieuwsgierig en onderzoekend, redeneert logisch en hebt oog voor detail. Ook signaleer je snel mogelijke roadblocks en kun je die omzetten in acties en analyses. De vertaling die je vanuit data maakt, kun je helder weergeven en jouw advies is goed onderbouwd en concreet. Als analist vind je het prettig om zelfstandig binnen jouw expertisegebied te werken, en je kunt je vastbijten in vraagstukken om de onderste steen boven te krijgen.\nWERKOMGEVINGJe komt te werken in het projectteam Dossierkwaliteit, dat valt onder de verantwoordelijkheid van de afdeling Data & Analytics. D&A is een ondersteunende afdeling voor de gehele Hypotheken-organisatie en wordt aangevoerd door de Chief Data Officer. D&A is verantwoordelijk voor de regie en coördinatie van data en analytics-initiatieven binnen Hypotheken, en het stimuleren van innovatie op de data en analytics-agenda in de business/grids. De afdeling D&A bestaat uit Consultants, een Hypotheken DataLab, data product owners en data-projectleiders.\nIn de dagelijkse praktijk heb je voornamelijk contact met de projectmanagers van het project Dossierkwaliteit, en werk je samen met data scientists uit het Hypotheken DataLab en leden van het projectteam uit andere onderdelen van het bedrijf.\nJE PROFIELJe bent pro-actief, kritisch, creatief, communicatief en een echte doorzetter. Je voert niet enkel uit, je denkt ook graag actief mee vanuit de doelen van het projectteam. Ook adviseer je je stakeholders op basis van je bevindingen en neem je je collega’s mee in een datagedreven werkwijze:Minimaal 3 jaar aantoonbare ervaring in een analytische functie:Je hebt hiervoor analyses uitgevoerd op grote datasets, waarin je gebruik hebt gemaakt van programmeertalen zoals SQL en PythonJe kunt als geen ander inzichten visualiseren met bijv. PowerBIJe bent in staat om zelfstandig analyses uit te voeren en op basis hiervan je bevindingen te delenJe haalt actief en zelfstandig requirements voor benodigde analyses op, en brengt data-gedreven advies uit op te nemen beslissingenErvaring met hypotheken of in de financiële sector is een preJe hebt een afgeronde HBO- of WO-opleiding in een relevant vakgebiedVloeiende beheersing van zowel de Nederlandse als de Engelse taal is een vereiste\nWIJ BIEDENWe bieden je een leuke en uitdagende werkomgeving, waar je impact hebt op onze datastrategie, datakwaliteit en kunt bijdragen aan belangrijke projecten binnen Hypotheken.We bieden je de vrijheid om het beste uit jezelf te halen, flexibel te werken en veel ruimte om te groeien. Je hebt de mogelijkheid je verder te verdiepen in je expertise en/of je te ontwikkelen in de breedte van de functie. Dit alles is afhankelijk van jouw ambities, interesses en ervaring.Bij ABN AMRO zetten we onze kennis, kunde en ons netwerk in voor klanten. Zodat zij op basis van verantwoorde beslissingen hun doelen kunnen bereiken. Hierbij staat het klantbelang altijd voorop. We willen dat klanten onze producten begrijpen en verkopen soms ‘nee’ als het risico van een product voor een klant te groot is. Klantbelang is ook het helder communiceren en het bedenken van slimme oplossingen die echt het verschil maken. Dat is ons doel!\n INTERESSE?Ben je geïnteresseerd? Reageer dan nu online op deze vacature. Voor meer informatie kan je altijd contact opnemen met Maartje van der Veen (Chief Data Officer) via Maartje.van.der.Veen@nl.abnamro.com. We maken graag kennis met je!\nGELIJKE KANSEN VOOR IEDEREENHet succes van onze organisatie staat of valt met de kwaliteit van onze mensen en de ideeen die zij hebben. Echt verrassende inzichten en innovatieve oplossingen voor onze klanten ontstaan door een samenspel van culturen, kennis en ervaring. Daarom is diversiteit voor onze organisatie ontzettend belangrijk. Om ervoor te zorgen dat alle collega’s binnen ABN AMRO hun kwaliteiten kunnen ontplooien, stimuleren we een inclusieve cultuur waarin iedereen zich betrokken en gewaardeerd voelt.\nDISCLAIMER EXTERNE RECRUITMENTBUREAUSExterne recruitmentbureaus dienen een overeenkomst met ABN AMRO BANK N.V. te hebben getekend, uitgegeven door een Talent Acquisition Specialist, om CV’s te mogen indienen. Daarbij mag alleen een CV worden ingediend wanneer het bureau is uitgenodigd door een Talent Acquisition Specialist om mee te zoeken naar geschikte kandidaten. Alle ongevraagde CV’s die buiten deze voorwaarden worden aangeboden zullen als eigendom van ABN AMRO BANK N.V. worden beschouwd. ABN AMRO BANK N.V. is hierbij geen plaatsingskosten verschuldigd.",Banking,"10,001+ employees",2023-05-16,Netherlands
122,Data-analist,ABN AMRO Bank N.V.,Amersfoort,Netherlands,Associate,Hybrid,1 week ago,26 applicants,"About the job\n \n\n \nIN HET KORTDe Data & Analytics organisatie is een ondersteunende afdeling voor heel HypothekenDe D&A organisatie is verantwoordelijk voor de regie en coördinatie van data en analytics initiatieven binnen hypotheken en voor het stimuleren van innovatie op de data en analytics agenda in de business /grids. Ook zijn we bij D&A verantwoordelijk voor overkoepelende zaken als het faciliteren van data governance, het vergroten van onze datakwaliteit en een aantal dataprojecten (zoals een futureproof datawarehouse voor Hypotheken).\nWij zijn op zoek naar een Data & Analytics Consultant die als een spin in het web de datakwaliteit en het datagedreven werken naar een hoger niveau brengt. Je doet dit samen met je collega’s binnen D&A, de data-eigenaren (Data Owners) en data experts van de vele databronnen. Het betreft naast een faciliterende en regie rol ook een hands-on rol waar jij o.a. verantwoordelijk bent voor het opzetten en beheren van datamodellen voor PowerBI dashboards, het faciliteren van het Data Quality Management Proces en het delen van best practices rondom het werken met data. Daarnaast werk je mee in projecten waarin data een belangrijke rol speelt.\nJE WERKJe faciliteert het opbouwen en delen van kennis en ervaring in de Hypotheken-organisatie rondom het werken met data (data awareness).Je bewaakt het Data Quality Issue Management proces.Je stimuleert self service oplossingen via PowerBI.Je ondersteunt de (delegated) data owners en data stewards bij het inrichten en beheren van de data governance binnen Hypotheken en help je data betrouwbaarheid en beschikbaarheid beter inzichtelijk te maken.Je onderhoudt contact met de Hypotheken-afdelingen en relevante partners binnen en buiten de organisatie. Je zoekt daarbij vooral naar mogelijkheden om de grids succesvoller te maken dankzij de inzet van data en/of analytics: slimmer, beter, sneller of misschien gewoon eenvoudiger.Daarnaast draai je mee in projecten van afdelingen waarin data een belangrijke rol speelt.\nWERKOMGEVINGDe afdeling D&A is een ondersteunende afdeling voor de gehele Hypotheken organisatie en wordt aangevoerd door de Chief Data Officer. D&A is verantwoordelijk voor de regie en coördinatie van data en analytics initiatieven binnen hypotheken, en het stimuleren van innovatie op de data en analytics agenda in de business/grids. De afdeling D&A bestaat naast de consultants uit een Hypotheken DataLab, data product owners en data projectleiders.De afdeling D&A onderscheidt zich vooral van bestaande organisatieonderdelen door het vervullen van de regiefunctie over de business heen, het innovatieve maar doelgerichte karakter, de “lean & mean” opzet van de afdeling, en het overzicht en inzicht wat geboden wordt op kennis, kunde en initiatieven. De complexiteit van de activiteiten van D&A wordt vooral gekenmerkt door het vinden van de juiste balans tussen gemeenschappelijke prioriteiten en prioriteiten in business/grids.\nJE PROFIELJe bent een verbinder, hebt ervaring en affiniteit met data & analytics en brede kennis van hypotheken. En je beschikt over:HBO/WO opleiding (betà, technisch of business gericht)3 tot 5 jaar ervaring, hierbij is ervaring in consultancy een preKennis van en ervaring met PowerBI en SQLStakeholdermanagementJe hebt een flexibele en proactieve instellingJe ziet verbanden en weet partijen te verbinden op het gebied van Data & AnalyticsKennis op het gebied van informatie modellering, architectuurSterke communicatieve vaardigheden en een vloeiende beheersing van de Nederlandse en Engelse taalKennis van Data Governance en Data Management is een pre\nWIJ BIEDENEen uitdagende baan waar je impact hebt op onze data strategie, onze data kwaliteit & governance en ons toekomstige model waar we steeds meer obv data kredietbeslissingen zullen nemen.\nWij bieden je de vrijheid om het beste uit jezelf te halen, flexibel te werken en veel ruimte om te groeien. Daarnaast vinden we het als D&A team ook belangrijk om buiten het werk om leuke activiteiten met elkaar te doen.\nEen greep uit onze uitstekende arbeidsvoorwaarden:Wij bieden een goed, marktconform salaris.Een aanvullend benefit budget van 11%, waarmee je flexibel arbeidsvoorwaarden kunt kopen zoals vier weken extra vakantie. Uitbetalen kan ook.Vijf weken vakantie per jaarElk jaar 5 ‘Banking for better’-dagen, die je naar eigen inzicht mag besteden aan de ontwikkeling van jezelf, iemand anders of de maatschappij.Wij verzorgen een goede, volledige thuiswerkplek bestaande uit een bureau, stoel, scherm en accessoiresEen OV-jaarabonnement waarmee je het hele jaar gratis in Nederland kan reizen, zowel zakelijk als privé\n INTERESSE?Ben je geïnteresseerd? Reageer dan nu online op deze vacature. Voor meer informatie kan je altijd contact opnemen met Maartje van der Veen (Chief Data Officer) op +31 (0)6 505 11 857 of Maartje.van.der.Veen@nl.abnamro.com. We maken graag kennis met je.\nGELIJKE KANSEN VOOR IEDEREENHet succes van onze organisatie staat of valt met de kwaliteit van onze mensen en de ideeen die zij hebben. Echt verrassende inzichten en innovatieve oplossingen voor onze klanten ontstaan door een samenspel van culturen, kennis en ervaring. Daarom is diversiteit voor onze organisatie ontzettend belangrijk. Om ervoor te zorgen dat alle collega’s binnen ABN AMRO hun kwaliteiten kunnen ontplooien, stimuleren we een inclusieve cultuur waarin iedereen zich betrokken en gewaardeerd voelt.\nDISCLAIMER EXTERNE RECRUITMENTBUREAUSExterne recruitmentbureaus dienen een overeenkomst met ABN AMRO BANK N.V. te hebben getekend, uitgegeven door een Talent Acquisition Specialist, om CV’s te mogen indienen. Daarbij mag alleen een CV worden ingediend wanneer het bureau is uitgenodigd door een Talent Acquisition Specialist om mee te zoeken naar geschikte kandidaten. Alle ongevraagde CV’s die buiten deze voorwaarden worden aangeboden zullen als eigendom van ABN AMRO BANK N.V. worden beschouwd. ABN AMRO BANK N.V. is hierbij geen plaatsingskosten verschuldigd.",Banking,"10,001+ employees",2023-05-16,Netherlands
265,"Customer Insights Analyst (Bangkok Based, Relocation Provided)",Agoda,Essen,Germany,Associate,,4 days ago,,"About the job\n \n\n \nAbout Agoda\nAgoda is an online travel booking platform for accommodations, flights, and more. We build and deploy cutting-edge technology that connects travelers with more than 2.5 million accommodations globally. Based in Asia and part of Booking Holdings, our 6,000+ employees representing 90+ nationalities foster a work environment rich in diversity, creativity, and collaboration. We innovate through a culture of experimentation and ownership, enhancing the ability for our customers to experience the world.\n Get to Know our Team: \nThe Performance Marketing Team of Agoda is a world leader in online marketing. This department is highly data-driven and focused on developing at-scale marketing programs that improve the lifetime value of Agoda customers through measurable marketing programs and channels. The team is a blend of the best analysts, marketing strategists, and data scientists in the world. The marketing leadership at Agoda has deep experience in data science, product, strategy, and other marketing fields and has built an organization that thrives on data, creative ideas, and technology. The Performance Marketing Team also fosters a great learning environment. You will be able to learn and grow by working closely with experts from a variety of backgrounds from all over the world.\nDue to continued expansion, there are multiple analyst roles within the marketing team across different marketing channels. While the scope will vary depending on these channels, here are some examples:\n Experimentation and optimizing campaign performance: Experiment with ads/campaign structures and bidding& pricing strategies on partners such as Google, Bing, TripAdvisor, Trivago, and other search engines. Adapt to new product features and roll out changes from successful tests. Modeling: Analyze vast amounts of data generated by experiments, develop predictive models using data science techniques (e.g. understanding the impact on bookings from large-scale TV campaigns or demand elasticity from pricing optimization), and liaise with product teams on an implementation roadmap Reporting, analysis, and insights: Building dashboards to track performance, derive insights, understand growth levers, and communicate recommendations via presentations to stakeholders \n What you’ll Need to Succeed: \n Bachelor’s degree or higher from top university in a quantitative subject e.g. computer science, mathematics, engineering business, science, or relevant field of study 0 to 4 years experience in data crunching from top-tier consulting, investment banking, private equity or strategy/business role for a fast-growing global tech company Experience in one or more data analysis packages or databases, e.g. SQL, SAS, R, SPSS, Python, VBA and visualization tools, e.g. Tableau, Power BI, etc. Excellent verbal and written communication skills in English Ability to move fast and be efficient, making decisions on objective data evidence Innate desire to take ownership, make an impact and influence outcomes Excellent organizational skills, attention to detail and ability to work independently \n It’s Great if you Have: \n Experience in digital marketing or e-commerce Experience with A/B testing and other testing metrics Strong presentation and negotiation skills #entrylevel #STRA#ANLS#MRKT#3 #sanfrancisco #sanjose #losangeles #sandiego #oakland #denver #miami #orlando #atlanta #chicago #boston #detroit #newyork #portland #philadelphia #dallas #houston #austin #seattle #washdc #tirana #yerevan #sydney #melbourne #perth #vienna #graz #linz #baku #minsk #brussels #antwerp #ghent #charleroi #liege #saopaolo #sofia #toronto #vancouver #montreal #shanghai #beijing #shenzhen #zagreb #cyprus #prague #Brno #Ostrava #copenhagen #cairo #alexandria #giza #estonia #helsinki #paris #nice #marseille #rouen #lyon #toulouse #tbilisi #berlin #munich #hamburg #stuttgart #cologne #frankfurt #dusseldorf #dortmund #essen #Bremen #leipzig #dresden #hanover #nuremberg #athens #hongkong #budapest #bangalore #newdelhi #jakarta #bali #bandung #dublin #telaviv #milan #rome #naples #turin #palermo #venice #bologna #florence #tokyo #osaka #yokohama #nagoya #okinawa #fukuoka #sapporo #amman #irbid #riga #beirut #tripoli #vilnius #luxembourg #kualalumpur #malta #chisinau #amsterdam #oslo #jerusalem #manila #warsaw #krakow #sintra #lisbon #porto #braga #cascais #loures #amadora #almada #doha #alrayyan #bucharest #moscow #saintpetersburg #riyadh #jeddah #mecca #medina #belgrade #singapore #bratislava #capetown #johannesburg #seoul #barcelona #madrid #valencia #seville #bilbao #malaga #oviedo #alicante #laspalmas #zaragozbanga #stockholm #zurich #geneva #basel #taipei #tainan #taichung #kaohsiung #Phuket #bangkok #istanbul #ankara #izmir #dubai #abudhabi #sharjah #london #manchester #liverpool #edinburgh #kiev #hcmc #hanoi #sanaa #taiz #aden #gibraltar #marrakech #lodz #wroclaw #poznan #Gdansk #szczecin #bydgoszcz #lublin #katowice #rio #salvador #fortaleza #brasilia #belo #belem #manaus #curitiba #portoalegre #saoluis data representation data analysis SQL data analytics analytics python (programming language) data mining data science r (programming language) tableau analytical skills data visualization databases business analysis business intelligence (bi) microsoft sql server machine learning statistics power bi \nEqual Opportunity Employer \nAt Agoda, we pride ourselves on being a company represented by people of all different backgrounds and orientations. We prioritize attracting diverse talent and cultivating an inclusive environment that encourages collaboration and innovation. Employment at Agoda is based solely on a person’s merit and qualifications. We are committed to providing equal employment opportunity regardless of sex, age, race, color, national origin, religion, marital status, pregnancy, sexual orientation, gender identity, disability, citizenship, veteran or military status, and other legally protected characteristics.\nWe will keep your application on file so that we can consider you for future vacancies and you can always ask to have your details removed from the file. For more details please read our privacy policy .\nTo all recruitment agencies: Agoda does not accept third party resumes. Please do not send resumes to our jobs alias, Agoda employees or any other organization location. Agoda is not responsible for any fees related to unsolicited resumes.",Computer Software,"5,001-10,000 employees",2023-05-19,Germany
316,"Statistical Analyst (Bangkok Based, Relocation Provided)",Agoda,Essen,Germany,Associate,,4 days ago,2 applicants,"About the job\n \n\n \nAbout Agoda\nAgoda is an online travel booking platform for accommodations, flights, and more. We build and deploy cutting-edge technology that connects travelers with more than 2.5 million accommodations globally. Based in Asia and part of Booking Holdings, our 6,000+ employees representing 90+ nationalities foster a work environment rich in diversity, creativity, and collaboration. We innovate through a culture of experimentation and ownership, enhancing the ability for our customers to experience the world.\nGet To Know The Team: \nThe Performance Marketing Team of Agoda is a world leader in online marketing. This department is highly data-driven and focused on developing at-scale marketing programs that improve the lifetime value of Agoda customers through measurable marketing programs and channels. The team is a blend of the best analysts, marketing strategists, and data scientists in the world. The marketing leadership at Agoda has deep experience in data science, product, strategy, and other marketing fields and has built an organization that thrives on data, creative ideas, and technology. The Performance Marketing Team also fosters a great learning environment. You will be able to learn and grow by working closely with experts from a variety of backgrounds from all over the world.\nDue to continued expansion, there are multiple analyst roles within the marketing team across different marketing channels. While the scope will vary depending on these channels, here are some examples:\n Experimentation and optimizing campaign performance: Experiment with ads/campaign structures and bidding& pricing strategies on partners such as Google, Bing, TripAdvisor, Trivago, and other search engines. Adapt to new product features and roll out changes from successful tests. Modeling: Analyze vast amounts of data generated by experiments, develop predictive models using data science techniques (e.g. understanding the impact on bookings from large-scale TV campaigns or demand elasticity from pricing optimization), and liaise with product teams on an implementation roadmap Reporting, analysis, and insights: Building dashboards to track performance, derive insights, understand growth levers, and communicate recommendations via presentations to stakeholders \n What you’ll Need to Succeed: \n Bachelor’s degree or higher from top university in a quantitative subject e.g. computer science, mathematics, engineering business, science, or relevant field of study 0 to 4 years experience in data crunching from top-tier consulting, investment banking, private equity or strategy/business role for a fast-growing global tech company Experience in one or more data analysis packages or databases, e.g. SQL, SAS, R, SPSS, Python, VBA and visualization tools, e.g. Tableau, Power BI, etc. Excellent verbal and written communication skills in English Ability to move fast and be efficient, making decisions on objective data evidence Innate desire to take ownership, make an impact and influence outcomes Excellent organizational skills, attention to detail and ability to work independently \n It’s Great if you Have: \n Experience in digital marketing or e-commerce Experience with A/B testing and other testing metrics Strong presentation and negotiation skills #STRA#ANLS#MRKT#3 #sanfrancisco #sanjose #losangeles #sandiego #oakland #denver #miami #orlando #atlanta #chicago #boston #detroit #newyork #portland #philadelphia #dallas #houston #austin #seattle #washdc #tirana #yerevan #sydney #melbourne #perth #vienna #graz #linz #baku #minsk #brussels #antwerp #ghent #charleroi #liege #saopaolo #sofia #toronto #vancouver #montreal #shanghai #beijing #shenzhen #zagreb #cyprus #prague #Brno #Ostrava #copenhagen #cairo #alexandria #giza #estonia #helsinki #paris #nice #marseille #rouen #lyon #toulouse #tbilisi #berlin #munich #hamburg #stuttgart #cologne #frankfurt #dusseldorf #dortmund #essen #Bremen #leipzig #dresden #hanover #nuremberg #athens #hongkong #budapest #bangalore #newdelhi #jakarta #bali #bandung #dublin #telaviv #milan #rome #naples #turin #palermo #venice #bologna #florence #tokyo #osaka #yokohama #nagoya #okinawa #fukuoka #sapporo #amman #irbid #riga #beirut #tripoli #vilnius #luxembourg #kualalumpur #malta #chisinau #amsterdam #oslo #jerusalem #manila #warsaw #krakow #sintra #lisbon #porto #braga #cascais #loures #amadora #almada #doha #alrayyan #bucharest #moscow #saintpetersburg #riyadh #jeddah #mecca #medina #belgrade #singapore #bratislava #capetown #johannesburg #seoul #barcelona #madrid #valencia #seville #bilbao #malaga #oviedo #alicante #laspalmas #zaragozbanga #stockholm #zurich #geneva #basel #taipei #tainan #taichung #kaohsiung #Phuket #bangkok #istanbul #ankara #izmir #dubai #abudhabi #sharjah #london #manchester #liverpool #edinburgh #kiev #hcmc #hanoi #sanaa #taiz #aden #gibraltar #marrakech #lodz #wroclaw #poznan #Gdansk #szczecin #bydgoszcz #lublin #katowice #rio #salvador #fortaleza #brasilia #belo #belem #manaus #curitiba #portoalegre #saoluis data representation data analysis SQL data analytics analytics python (programming language) data mining data science r (programming language) tableau analytical skills data visualization databases business analysis business intelligence (bi) microsoft sql server machine learning statistics power bi \nEqual Opportunity Employer \nAt Agoda, we pride ourselves on being a company represented by people of all different backgrounds and orientations. We prioritize attracting diverse talent and cultivating an inclusive environment that encourages collaboration and innovation. Employment at Agoda is based solely on a person’s merit and qualifications. We are committed to providing equal employment opportunity regardless of sex, age, race, color, national origin, religion, marital status, pregnancy, sexual orientation, gender identity, disability, citizenship, veteran or military status, and other legally protected characteristics.\nWe will keep your application on file so that we can consider you for future vacancies and you can always ask to have your details removed from the file. For more details please read our privacy policy .\nTo all recruitment agencies: Agoda does not accept third party resumes. Please do not send resumes to our jobs alias, Agoda employees or any other organization location. Agoda is not responsible for any fees related to unsolicited resumes.",Computer Software,"5,001-10,000 employees",2023-05-19,Germany
101,Analityk danych,Billennium,Poland,Poland,Associate,Remote,3 weeks ago,75 applicants,"About the job\n \n\n \nBillennium jest polską firmą IT, która od 20 lat rozwija innowacyjne usługi i produkty oraz dostarcza najlepszych specjalistów w ramach outsourcingu.Realizujemy projekty i usługi IT w Polsce, na terenie Unii Europejskiej, a także w Kanadzie.Obecnie zatrudniamy ponad 1800 specjalistów z zakresu najnowocześniejszych rozwiązań wspierających biznes.\nAktualnie poszukujemy specjalisty na stanowisko Analityka Danych dla naszego klienta z branży bankowej.\nwynagrodzenie do 900 zł netto dziennie B2Bangielski średniozaawansowany\nAnalityk Danych / Business Intelligence w obszarze Open BankingZakres odpowiedzialności:Praca w Tribe Open&Beyond Banking w obszarze Transformacji;Eksploracja, analiza i wizualizacja danych;Współuczestniczenie z biznesem w kreowaniu i ulepszaniu produktów oraz usług bankowych w oparciu o dane;Współwłaścicielstwo danych, w tym współodpowiedzialność za ich jakość i spójność. Wymagania:3+ lata doświadczenia w analizie danych w Python (pandas, agregacje, wizualizacje);SQL w zakresie ekstrakcji i manipulacji danymi;Proaktywność w generowaniu i prezentowaniu wglądów analitycznych (analytical insights);Umiejętność komunikowania się zarówno z audiencją biznesową jak i techniczną;Chęć zrozumienia zarówno danych jak i kontekstu biznesowego, w którym są używane;Kreatywność, myślenie analityczne, chęć podnoszenia umiejętności i kwalifikacji;Dbałość o jakość i o detale. Mile widziane:Podstawowe doświadczenie w pracy z tekstem (wyrażenia regularne, regex);Znajomość Tableau;Podstawowa znajomość MongoDB i doświadczenie z danymi pół-ustrukturyzowanymi (json);Znajomość Git, JIRA, Confluence;Doświadczenie w pracy w SCRUM / Kanban / Agile.",Information Technology & Services,"1,001-5,000 employees",2023-05-02,Poland
499,Analityk,Billennium,Poland,Poland,Associate,Remote,1 week ago,47 applicants,"About the job\n \n\n \nBillennium jest polską firmą IT, która od 20 lat rozwija innowacyjne usługi i produkty oraz dostarcza najlepszych specjalistów w ramach outsourcingu.Realizujemy projekty i usługi IT w Polsce, na terenie Unii Europejskiej, a także w Kanadzie.Obecnie zatrudniamy ponad 1800 specjalistów z zakresu najnowocześniejszych rozwiązań wspierających biznes.\nAktualnie poszukujemy specjalisty na stanowisko Analityka dla naszego klienta z branży bankowej.\nwynagrodzenie do 120 zł/h netto b2b w zależności od doświadczeniaangielski średniozaawansowanywymagane doświadczenie w branży bankowości\nOpis: Rozwój bankowości mobilnej - zespół rozwija frontend i backend. Aplikacja jest jedną z najlepszych mobilnych aplikacji bankowych na polskim rynku. Do aplikacji obecnej w sklepach Google Play oraz App Store każdego dnia loguje się ponad 700 tysięcy klientów. Aplikacja umożliwia Klientom szybkie sprawdzanie stanu konta, realizację przelewów i płatności BLIK, płatności Google Pay i Apple Pay, założenie konta oraz wnioskowanie o nowe produkty. Działanie aplikacji wspierane jest zestawem mikroserwisów projektowanych i rozwijanych w ramach zespołu. Jako zespół pracujemy w metodyce zwinnej, dbając o ciągły rozwój i poprawę jakości naszej aplikacji.Wierzymy w siłę podejścia „inspect & adapt” i korzystamy z niej na co dzień. Wymagane doświadczenie w rozwoju bankowości mobilnej albo internetowej.Praca w roli analityka systemowego, czyli opisywanie integracji pomiędzy systemami - API, kolejki, itd. Zadania:• Przeprowadzanie analizy systemowej pod kątem realizacji wymagań biznesowych,• Przygotowywanie dokumentacji projektowej/analitycznej jako wkładu dla prac developerskich,• Współpraca z Product Owner-ami (PO) przy doprecyzowywaniu wymagań biznesowych,• Współpraca z developerami,• Udział w spotkaniach projektowych oraz zespołu developerskiego,• Wsparcie merytoryczne PO w odbiorze prac developerskich. Profil kandydata:• Minimum 3 letnie doświadczenie na podobnym stanowisku,• Znajomość notacji BPMN i UML,• Znajomość języka angielskiego w stopniu przynajmniej komunikatywnym,• Umiejętność samoorganizacji,• Duża komunikatywność,• Znajomość procesów wytwarzania oprogramowania. Mile widziane:• Doświadczenie w obszarze aplikacji mobilnych,• Doświadczenia w pracy w metodyce AGILE,• Znajomość Confluence i Jira. Narzędzia:· Confluence,· Jira,· BPMN,· UML.",Information Technology & Services,"1,001-5,000 employees",2023-05-16,Poland
8,Data Analyst - Hybrid Working,Blue Arrow,Cambridge,United Kingdom,,On-site,2 weeks ago,,"About the job\n \n\n \nData AnalystHybrid - Cambridge Pay Rate: 22.03 to 24.66 p/hr Start Date: Mon, 15 May 2023End Date: Fri, 17 Nov 2023JOB ROLE AND RESPONSIBILITIES: Analysing and interpreting data on demographic, socioeconomic and environmental issues and identifying key trends and policy and service implications to inform decision-making. Providing data analysis to support colleagues to identify opportunities and providing data products to support funding bids and business cases. Maintaining an awareness and understanding of publicly available secondary data on demographic, socioeconomic and environmental issues affecting Cambridge Providing clear, accessible data visualisations and written reports Providing clear, engaging oral presentations, briefings, and contributions to meetings Helping to build community of data practitioners, to support and to understand and use data better, including publicly available data and data that they hold. Working collaboratively with key partner organisations and national organisations on data initiatives as required.\nIf you are interested and feel that you meet the above criteria, then please apply online today or contact Andre on for more details.Blue Arrow is proud to be a Disability Confident Employer and is committed to helping find great work opportunities for great people.",,,2023-05-09,United Kingdom
12,Data Analyst - Hybrid,Blue Arrow,Cambridge,United Kingdom,,On-site,2 weeks ago,,"About the job\n \n\n \nData AnalystHybrid - Cambridge Pay Rate: 22.03 to 24.66 p/hr Start Date: Mon, 15 May 2023End Date: Fri, 17 Nov 2023JOB ROLE AND RESPONSIBILITIES: Analysing and interpreting data on demographic, socioeconomic and environmental issues and identifying key trends and policy and service implications to inform decision-making. Providing data analysis to support colleagues to identify opportunities and providing data products to support funding bids and business cases. Maintaining an awareness and understanding of publicly available secondary data on demographic, socioeconomic and environmental issues affecting Cambridge Providing clear, accessible data visualisations and written reports Providing clear, engaging oral presentations, briefings, and contributions to meetings Helping to build community of data practitioners, to support and to understand and use data better, including publicly available data and data that they hold. Working collaboratively with key partner organisations and national organisations on data initiatives as required.\nIf you are interested and feel that you meet the above criteria, then please apply online today or contact Andre on 020 3096 4493 for more details.Blue Arrow is proud to be a Disability Confident Employer and is committed to helping find great work opportunities for great people.",,,2023-05-09,United Kingdom
236,Data Analyst - Alternance - Boursorama-(H/F),Boursorama,Boulogne-Billancourt,France,Entry level,Hybrid,4 days ago,,"About the job\n \n\n \n 23000E4V\n Vos missions au quotidien\nAu sein de la Direction des Systèmes d’Information, vous travaillerez en collaboration avec les chefs de projets et les équipes IT composées de développeurs. Vous qualifierez les données du Système d’Information afin de constituer des métriques de gestion et de la qualité nécessaires à la gouvernance de l’activité.\nVos missions consistent à :\nInterpréter, analyser et identifier des anomalies applicatives à partir des différentes sources de données, en s’appuyant sur les normes et l’architecture technique en placeConsolider les données sous forme de Dashboard et d’alertes KPI Suivre l’avancer des actions correctives, préventives et des développementsRédiger les documentations nécessaires de l’activité (expressions de besoins, spécifications, rapports de bilans…)Participer et mettre en œuvre le Plan d’assurance Qualité de l’Activité avec sa base de connaissances\n Et si c’était vous ?\nVous recherchez un contrat d’alternance d’1 ou 2 ans à partir d’aout, septembre ou octobre 2023 et vous préparez un BAC+4/5 en Analyse de données.\nVous maitrisez sur le bout des doigts l’analyse et le traitement d’informations quantitatives, qualitatives, structurées ou non structurées. Python et le Monitoring (Grafana, Splunk, Kibana ...) n’ont pas de secret pour vous !\nVous êtes reconnu pour votre sens de l’analyse, votre autonomie et votre esprit de synthèse.\nAlors rejoignez-nous ! Si votre candidature est sélectionnée, vous serez invité(e) à réaliser une vidéo de présentation. Nous vous contacterons ensuite pour un entretien.\nDurée de l’alternance : 1 ou 2 ans\nRythme souhaité : Indifférent\nLieu de l’alternance : Boulogne-Billancourt (92)\n Pourquoi nous choisir ?\nAvec son double positionnement unique de banque-média, Boursorama est leader, en France, sur trois métiers : la banque en ligne, le courtage en ligne, et l’information économique et financière sur Internet avec le portail Boursorama.com.\nLes objectifs de croissance sont ambitieux : Nous comptons 4,5 millions de clients et ce avec 18 mois d'avance sur l'objectif de conquête initialement fixé à fin 2023.\nL’entreprise, filiale à 100 % de Société Générale, compte plus de 800 collaborateurs. En devenant l’un d’eux, vous prendrez part à une aventure professionnelle enrichissante, placée sous le signe de la croissance !\nBoursorama assure également une politique de travail en remote dynamique, le télétravail fait ainsi partie intégrante de son organisation du temps de travail !\nNous sommes un employeur garantissant l'égalité des chances et nous sommes fiers de faire de la diversité une force pour notre entreprise. Le groupe s’engage à reconnaître et à promouvoir tous les talents, quels que soient leurs croyances, âge, handicap, parentalité, origine ethnique, nationalité, identité de genre, orientation sexuelle, appartenance à une organisation politique, religieuse, syndicale ou à une minorité, ou toute autre caractéristique qui pourrait faire l’objet d’une discrimination.",Financial Services,"501-1,000 employees",2023-05-19,France
239,Data Analyst Marketing Stratégique - Boursorama-(H/F),Boursorama,Boulogne-Billancourt,France,Entry level,Hybrid,5 days ago,65 applicants,"About the job\n \n\n \n 22000GEK\n Vos missions au quotidien\nAu sein de la Direction Marketing Stratégique et rattaché à l’équipe Data Anticipation, le Data Analyst apporte son expertise sur des projets transverses. Il travaille en collaboration avec les Data Analysts des verticales métier et participe activement aux instances de la Communauté Data Marketing.\nExpertise en Dashboarding\nConcevoir, produire et faire évoluer les tableaux de bord de l’activité commerciale et marketing en Datavisualisation (Tableau) ou sous Excel (traitement des bases de données via SAS)Produire des insights pertinents pour tirer des enseignements, lever des alertes dans le cadre de certains reportings d’activités à forte valeur ajoutée Interpréter les résultats en regard des actualités Business\nConnaissance clients \nRéaliser des études statistiques adhoc sur des problématiques marketing clientsComprendre les enjeux business et l’impact de son travail sur la prise de décisionDéfinir des indicateurs clés, et assurer leur suivi dans le tempsContribuer à la construction et l’évolution des outils de fidélisation (segmentation client, valeur client …)Présenter ses analyses à l’aide d’une restitution écrite (powerpoint) et orale\nProjets Data\nMener des projets data de façon autonome et proactive : compréhension du besoin métier, définition des étapes de mise en œuvre, suivi des échéances, respect des délais ….Travailler de façon transverse et collaborative avec les autres collaborateurs de la communauté Data Marketing, ainsi qu’avec les différentes parties prenantes du projet, et plus particulièrement l’IT (mode Agile)\n Et si c’était vous ?\nDe formation Bac +4/5 en statistiques, vous justifiez d’une expérience réussie de 3-5 ans dans ce domaine.\nVous maitrisez SAS (langage macro), Excel et Powerpoint. Vous maîtrisez un outil de DataVisualisation (idéalement Tableau).\nVous faites preuve d’une bonne compréhension des problématiques marketing et de l‘ecosystème data dans le digital (CDP, web analytics,...).\nLa connaissance du langage Python est OBLIGATOIRE sur ce poste.\nVotre capacité à gérer des projets Data en travaillant de façon transverse avec d’autres Directions sera essentielle pour ce poste.\nVous êtes rigoureux, fiable, autonome et impliqué. Votre bon sens du relationnel et votre enthousiasme ne sont plus à démontrer.\n Pourquoi nous choisir ?\nAvec son double positionnement unique de banque-média, Boursorama est leader, en France, sur trois métiers : la banque en ligne, le courtage en ligne, et l’information économique et financière sur Internet avec le portail Boursorama.com.\nLes objectifs de croissance sont ambitieux : compter 4,5 millions de clients à la fin de l'année 2022 et ce avec 18 mois d'avance sur l'objectif de conquête initialement fixé à fin 2023.\nL’entreprise, filiale à 100 % de Société Générale, compte plus de 800 collaborateurs. En devenant l’un d’eux, vous prendrez part à une aventure professionnelle enrichissante, placée sous le signe de la croissance !\nBoursorama assure également une politique de travail en remote dynamique, le télétravail fait ainsi partie intégrante de son organisation du temps de travail !\nVous rejoignez notre Direction Marketing et Communication et, plus particulièrement, le périmètre Data Anticipation, composée d’experts du marketing digital et des offres bancaires.\nSa principale mission ? Conforter chaque jour la position de leader de Boursorama.\nSon organisation ? Basé à Boulogne-Billancourt (92), vous travaillez de chez vous 2 jours par semaine.\nEn bref, le Marketing, chez Boursorama, c’est de l’acquisition, de la fidélisation et de l’innovation tout en maintenant un très haut niveau de satisfaction des clients au sein d’une organisation flexible.\nNous sommes un employeur garantissant l'égalité des chances et nous sommes fiers de faire de la diversité une force pour notre entreprise. Le groupe s’engage à reconnaître et à promouvoir tous les talents, quels que soient leurs croyances, âge, handicap, parentalité, origine ethnique, nationalité, identité de genre, orientation sexuelle, appartenance à une organisation politique, religieuse, syndicale ou à une minorité, ou toute autre caractéristique qui pourrait faire l’objet d’une discrimination.",Financial Services,"501-1,000 employees",2023-05-18,France


Неявные дубликаты есть: некоторые описания вакансий практически не отличаются, а названия совпадают либо переформулированы. В большинстве случаев совпадают также уровень и тип занятости. Иногда такие вакансии опубликованы в один день. Также есть задублированные вакансии, опубликованные с указанием разных городов, при этом в их описании упоминается, что место работы - офис в Милане.      

Проанализировав данные вакансии, мы решили исключить дубликаты со следующими индексами: 122, 265, 12, 275, 468, 320, 477, 57, 361, 181, 50, 14, 440.

In [41]:
# удалим неявные дубликаты

parsed_df = (parsed_df.drop(index=[122, 265, 12, 275, 468, 320, 477, 57, 361, 181, 50, 14, 440])
             .reset_index(drop=True))

In [42]:
# смотрим информацию о датасете без дубликатов

parsed_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 506 entries, 0 to 505
Data columns (total 13 columns):
 #   Column       Non-Null Count  Dtype         
---  ------       --------------  -----         
 0   job_title    506 non-null    object        
 1   company      506 non-null    object        
 2   city         506 non-null    object        
 3   country      506 non-null    object        
 4   level        485 non-null    object        
 5   workplace    469 non-null    object        
 6   posted_date  506 non-null    object        
 7   applicants   399 non-null    object        
 8   description  506 non-null    object        
 9   industry     478 non-null    object        
 10  size         478 non-null    object        
 11  date         506 non-null    datetime64[ns]
 12  country_new  506 non-null    object        
dtypes: datetime64[ns](1), object(12)
memory usage: 51.5+ KB


**Наблюдения:** После удаления дубликатов осталось 506 вакансий.

### Выделение hard skills

Проверим, какие из типичных хард скиллов указаны в требованиях к вакансии. Мы взяли список стандартных требований к хард-скиллам в дата-аналитике и дополнили его, просмотрев описание вакансий.  

Названия скиллов без упоминания конкретного иструмента (например, 'bi tools', 'bi', 'business intelligence', 'dashboard', 'dashboards', 'programming') укажем в конце списка, чтобы искать их в последнюю очередь, если не будут найдены конкретные инструменты.

In [43]:
# список hard skills

skills = ['datahub', 'api', 'github', 'google analytics', 'adobe analytics', 'ibm coremetrics', 'omniture',
          'gitlab', 'erwin', 'hadoop', 'spark', 'hive', 'databricks', 'aws', 'gcp', 'azure', 'excel',
          'redshift', 'bigquery', 'snowflake',  'hana', 'grafana', 'kantar', 'spss', 'a/b testing',
          'ab testing', 'asana', 'basecamp', 'jira', 'dbeaver','trello', 'miro', 'salesforce', 
          'rapidminer', 'thoughtspot',  'power point', 'docker', 'jenkins', 'integrate.io', 
          'talend', 'apache nifi', 'aws glue', 'pentaho', 'google data flow', 'azure data factory',
          'xplenty', 'skyvia', 'xtract.io', 'dataddo', 'ssis', 'hevo data',
          'informatica', 'oracle data integrator', 'k2view', 'cdata sync', 'querysurge', 'rivery', 
          'dbconvert', 'alooma', 'stitch', 'fivetran', 'matillion','streamsets', 'blendo', 
          'iri voracity', 'logstash', 'etleap', 'singer', 'apache camel', 'actian', 'airflow',
          'luidgi', 'datastage', 'python', 'vba', 'java script', 'julia', 'sql', 
          'matlab', 'java', 'html', 'c++', 'sas', 'tableau', 'looker', 'powerbi', 'cognos', 
          'microstrategy', 'spotfire', 'sap business objects', 'microsoft sql server',
          'oracle business intelligence', 'yellowfin', 'webfocus', 'sas visual analytics',
          'targit', 'izenda', 'sisense', 'statsbot', 'panorama', 'inetsoft', 'birst', 
          'domo', 'metabase', 'redash', 'power bi', 'alteryx', 'dataiku', 
          'microsoft office', 'sqlms', 'data models', 'data model', 'data modeling', 'data modelleren',
          'modèles statistiques', 'statistical analysis', 'google data studio', 'azure data studio',
          'relational db', 'power automate', 'nosql', 'sap crystal reports', 'postgresql', 'mysql',
          'microsoft powerapps', 'tibco spotfire', 'machine learning', 'mssql', 'oracle', 'ssrs', 'ssas',
          'power pivot', 'powerpivot', 'power query', 'powerquery', 'power automation',
          'dax', 'data factory', 'data lake', 'power fx', 'json', 'xml', 'terraform', 'notion',
          'ms office', 'office it', 'sharepoint', 'power-bi', 'js', 'r', 'mapr','sap',
          'powerpoint', 'mongodb', 'git', 'scrum', 'kanban', 'eviews', 'sequel', 'google cloud',
          'qliksense', 'qlik sense', 'qlikview',  'qlik view', 'qlik',
          'bi tools', 'bi', 'business intelligence', 'dashboard', 'dashboards', 'programming', 'coding',
          'english', 'inglese'
          ]
sorted(skills)

['a/b testing',
 'ab testing',
 'actian',
 'adobe analytics',
 'airflow',
 'alooma',
 'alteryx',
 'apache camel',
 'apache nifi',
 'api',
 'asana',
 'aws',
 'aws glue',
 'azure',
 'azure data factory',
 'azure data studio',
 'basecamp',
 'bi',
 'bi tools',
 'bigquery',
 'birst',
 'blendo',
 'business intelligence',
 'c++',
 'cdata sync',
 'coding',
 'cognos',
 'dashboard',
 'dashboards',
 'data factory',
 'data lake',
 'data model',
 'data modeling',
 'data modelleren',
 'data models',
 'databricks',
 'dataddo',
 'datahub',
 'dataiku',
 'datastage',
 'dax',
 'dbconvert',
 'dbeaver',
 'docker',
 'domo',
 'english',
 'erwin',
 'etleap',
 'eviews',
 'excel',
 'fivetran',
 'gcp',
 'git',
 'github',
 'gitlab',
 'google analytics',
 'google cloud',
 'google data flow',
 'google data studio',
 'grafana',
 'hadoop',
 'hana',
 'hevo data',
 'hive',
 'html',
 'ibm coremetrics',
 'inetsoft',
 'informatica',
 'inglese',
 'integrate.io',
 'iri voracity',
 'izenda',
 'java',
 'java script',
 'jenkin

In [44]:
# составим словарь унифицированных названий скиллов, чтобы избавиться от неявных дубликатов

skills_unified = {'azure':['azure','azure data factory','data factory','data lake','azure data studio'],
                  'power bi':['power bi','power-bi','powerbi'],
                  'data modeling':['data modeling','data model','data modelleren','data models',
                                   'modèles statistiques'],
                  'git':['github','gitlab'],
                  'ms office':['ms office','microsoft office','office it'],
                  'ms sql':['microsoft sql server','mssql','sqlms'],
                  'oracle':['oracle','oracle data integrator'],
                  'power automate':['power automate','power automation'],
                  'power point':['power point','powerpoint'],
                  'power pivot':['power pivot','powerpivot'],
                  'power query':['power query','powerquery'],
                  'qlik':['qlik','qlik sense','qliksense','qlik view','qlikview'],
                  'sap bi tools':['sap business objects','sap crystal reports'],
                  'ab testing':['ab testing','a/b testing'],
                  'aws':['aws','aws glue'],
                  'java script':['java script','js'],
                  'sas':['sas','sas visual analytics'],
                  'spotfire':['spotfire','tibco spotfire'],
                  'bi tools (unspecified)':['bi','bi tools','business intelligence','dashboard','dashboards'],
                  'programming language (unspecified)':['programming','coding'],
                  'english':['english','inglese']
                 }

# транспонируем словарь
skills_map = {}
for key in skills_unified.keys():
    for value in skills_unified[key]:
        skills_map.update({value:key})

In [45]:
# объявим функцию для выделения hard skills

def skills_list(description):
    # объявляем список, в который будем собирать скиллы для вакансии
    skills_list = []
    # идем по списку типичных хард скиллов и проверям наличие каждого элемента в описании вакансии
    for i in skills:      
        
        # задаем паттерн поиска
        # когда ищем в описании "sap", убеждаемся, что это не "sap business objects" и "sap crystal reports"
        if i == 'sap':
            skill = r'(\b|\W)' + re.escape(i) + (r'(?! c)' + '&' + r'(?! b)')
        # когда ищем в описании "sql", убеждаемся, что это не "microsoft sql server"
        elif i == 'sql':
            skill = r'(?!t)' + r'(\b|\W)' + re.escape(i) + r'(\b|\W)' + r'(?!s)'
        # в остальных случаях скилл должен быть отдельным словом
        # или, если в описании удалены пробелы (из-за чего не найдется скилл),
        # не требуем, чтобы скилл был отдельным словом, кроме некоторых случаев
        elif i not in ['api','aws','bi','git','hive','r','sas','ssis','ssas']:
            skill = r'(\b|\W)' + re.escape(i) + r'(\b|\W)' + '|' + re.escape(i.replace(' ',''))
        else:
            skill = r'(\b|\W)' + re.escape(i) + r'(\b|\W)'
                
        # находим скилл в описании
        if re.search(skill, description.lower()):
            
            # меняем название скилла на унифицированное 
            if i in skills_map:
                i = skills_map[i]
           
            # перед тем как добавить скилл 'bi tools (unspecified)' в список 'skills_list' проверяем,
            # что в этом списке нет конкретных BI инструментов
            if i == 'bi tools (unspecified)':
                check = 0
                for j in ['tableau','power bi','oracle business intelligence','birst','data studio',
                          'sap bi tools','inetsoft','izenda','looker','metabase','microstrategy',
                          'panorama','pentaho','redash','sas','sisense','spotfire','ssrs','targit',
                          'thoughtspot','yellowfin','qlick']:
                    if j in skills_list:
                        check += 1
                if (check == 0) and (i not in skills_list):
                    skills_list.append(i)
            
            # перед тем как добавить в список скилл 'programming language (unspecified)' проверяем,
            # что в списке нет конкретных языков программирования
            elif i == 'programming language (unspecified)':
                check = 0
                for j in ['python','r','eviews','java','java script','julia','matlab','vba','c++']:
                    if j in skills_list:
                        check += 1
                if (check == 0) and (i not in skills_list):
                    skills_list.append(i)
            
            # для других случаев просто добавляем в список скилл, если его еще там нет
            else:
                if i not in skills_list:
                    skills_list.append(i)
                
    # если вообще не нашли скиллы для вакансии, возвращаем nan, а непустой список сортируем по алфавиту
    if skills_list == []:
        skills_list = np.nan
    else:
        skills_list = sorted(skills_list)
    return skills_list

In [46]:
# создадим столбец с hard skills

parsed_df['skills'] = parsed_df['description'].apply(skills_list)

In [47]:
# убедимся, что скиллы распределены корректно

parsed_df[['description','skills']].head(50)

Unnamed: 0,description,skills
0,"About the job\n \n\n \n What You Will Achieve\nThis position will apply advanced manufacturing, science, and technology to support business and process improvements for the manufacture of small and / or large volume parenteral products. You will be a member of the Transformation and Strategy team. As a Data Analyst you will be responsible for mining, retrieving, organizing, and analyzing data to support the operations of a large manufacturing facility. Using the data, you will help to develop key performance indicators to demonstrate the effectiveness of processes and systems against business strategies.\nYour knowledge of manufacturing operations and computer systems/tools will make you a critical member of the team. Your strong business processes and workflow skills will help facilitate required gatherings for building and enhancing business process maps and strategies. Your innovative use of communication tools and techniques will facilitate in explaining difficult issues, establishing consensus between teams, and will create a collaborative teaming environment for your colleagues.\nMain Responsibilities\nInterpret data, analyze results using statistical techniques and provide ongoing reportsDevelop and implement databases, data collection systems, data analytics and other strategies that optimize statistical efficiency and qualityAcquire data from primary or secondary data sources and maintain databases/data systemsIdentify, analyze, and interpret trends or patterns in complex data sets Filter and “clean” data by reviewing reports, printouts, and performance indicatorsWork with management to prioritize business and information needs Identify and define new process improvement opportunities\nMust-Haves\nA Bachelor’s degree with at least three years of experience; OR a Master’s degree with more than one year of experience.Prior pharmaceutical and/or manufacturing experience requiredTechnical expertise regarding data models, database design development, data mining and segmentation techniques Knowledge of and experience with reporting packages (Business Objects etc) and databases (SQL etc)Knowledge of statistics and experience using statistical packages for analyzing datasets (Excel, SPSS, SAS etc)Knowledge of SAP (ERP materials planning systems).\nYou will be joining an organisation with determined to bring about considerable change to the global industry, in an environment that promotes self-development and personal success while driving for company growth.\nJob Title: Data Analyst\nLocation: Basel, Switzerland\nJob Type: Contract\nAerotek, an Allegis Group company. Allegis Group AG, Aeschengraben 20, CH-4051 Basel, Switzerland. Registration No. CHE-101.865.121. Aerotek and Actalent Services are companies within the Allegis Group network of companies (collectively referred to as ""Allegis Group""). Aerotek, Actalent Services, Aston Carter, EASi, TEKsystems, Stamford Consultants and The Stamford Group are Allegis Group brands. If you apply, your personal data will be processed as described in the Allegis Group Online Privacy Notice available at https://www.allegisgroup.com/en-gb/privacy-notices.\nTo access our Online Privacy Notice, which explains what information we may collect, use, share, and store about you, and describes your rights and choices about this, please go to https://www.allegisgroup.com/en-gb/privacy-notices.\nWe are part of a global network of companies and as a result, the personal data you provide will be shared within Allegis Group and transferred and processed outside the UK, Switzerland and European Economic Area subject to the protections described in the Allegis Group Online Privacy Notice. We store personal data in the UK, EEA, Switzerland and the USA. If you would like to exercise your privacy rights, please visit the ""Contacting Us"" section of our Online Privacy Notice at https://www.allegisgroup.com/en-gb/privacy-notices for details on how to contact us. To protect your privacy and security, we may take steps to verify your identity, such as a password and user ID if there is an account associated with your request, or identifying information such as your address or date of birth, before proceeding with your request. If you are resident in the UK, EEA or Switzerland, we will process any access request you make in accordance with our commitments under the UK Data Protection Act, EU-U.S. Privacy Shield or the Swiss-U.S. Privacy Shield","[data modeling, excel, sas, spss, sql]"
1,"About the job\n \n\n \nData Analyst - Logistics ~ Permanent Role ~ Mon-Fri ~ Immediate Start ~Locations: Coventry, CV3Salary: 35,000 - 42,000 Per Year Shift Pattern: Monday to Friday 08:00 - 16:00*The correct candidate will also be able to work night shifts when required to observe this side of the business. Once the initial training period is complete you will be able to work from home as and when required. Job role: Analyse business data and provide detailed reports and reccomendations. Provide the business with regular reports to Quality Performance Supply key information to senior reviewers to assist decision making Ensuring actions are taken to speed up the issue resolution process Analyse general business processes to target cost saving & time saving efficiencies.. Report to business management and reccomend any changes. Monday to Friday Days, however the correct candidate must be prepared to work night shifts to look at this area of the business on occasion. What we are looking for: A proven background in data analytics & supplying detailed reports and reccomendations. Strong communication skills Flexibility to work nights on the rare occasion. Resolute Recruitment is acting as an Employment Agency in relation to this vacancy.Data Analysisdata, analysis, analytics, logistics, logistic, transport, business, profit, loss, performance, quality, management, coventry, warwickshire, nuneaton, rugby, leamington spa, warwick, stratford, wfh, work from home, analytical, Analyse, training, perm, permanent, jobs, job",
2,"About the job\n \n\n \nSalary: To be discussed on applicationLocation: Hybrid Working, Home working & Head Office, EX36 3LHContract Type: PermanentContracted Hours: Full time, 37.5 hours per week, Monday to FridayHere at Mole Valley Farmers we have an exciting opportunity for a Space & Planogram Planning Assistant to join our Merchandise Planning team.You will maintain the space and Merchandising requirements with the Procurement & Retail Operations teams for the Business, ensuring full optimisation of space & delivery of planogram implementation.What you will be doing as a Space & Planogram Planning Assistant… Analysing historic Space & Sales performance, presenting key facts and trendsWorking with Product Manager & Merchandise Planner to develop the optimum layouts within the Mock shop as required for their CategoriesUsing Financial analysis to feed into the Range planning process, working with the Product Manager & Merchandise Planner.Producing planograms for categories and creating Head Office Bulletins to ensure implementation in stores, working within agreed critical path and process.\nWe would love to hear from you if you have…. Good Excel knowledgeStrong analytic natureYou are highly organised with strong attention to detailYou are self-motivated but also enjoy working as part of a team\nPlease note, this vacancy may close prior to the expiry date if we have received a suitable number of applications.",[excel]
3,"About the job\n \n\n \nFORFIRM is providing solutions to real business challenges for our clients through innovation and deep industry understanding. We pride ourselves on being a knowledge-based company, with no barriers or pre-built solutions – we listen to our clients and solve their unique problems.At FORFIRM, we are creating a culture where each person can define their own role parameters and speak their mind without any hesitation. We are a true meritocracy, where individual results define each person’s career path.We are looking for experienced and motivated Data Analyst to join our lively international team and work on projects for Europe's leading brands! Manage business requests in the Data warehouse and Reporting area, from addressing the user request to the release of the solution (in collaboration with various work groups) Design and participate in the implementation of processing strategies, data transformation (ETL processes) and Business Intelligence / Self BI solutions collaborate, with the whole team, in the design and management of data storage and use solutions in the Cloud (GCP or AWS) support the team in monitoring pipelines (cloud and on-prem) and performance support the team in the evolution of the data model analyze, define and implement data quality procedures within the QA framework managed by the team\nWhat we are looking for ? experience of 5+ years of work on Datawarehouse and Front-END projects (managed in agile mode both on traditional technologies and on cloud platforms) knowledge in Data Modeling (techniques and methodologies) ability to analyze data model using relational DBs Experience in SQL language, ETL tools (preferably Oracle Data Integrator) and Data Visualization (Power BI) Experience in Python Previous experience in banking sector will represent a plus\n Opportunity to work on international multi-disciplinary projects for leading brands from day oneHighly meritocratic environment – points-based bonus and promotion systemA focus on knowledge building with regular internal and external training opportunitiesOpportunities to travel and work abroadSupportive and dynamic team \nForFirm is an equal opportunities employer that values diversity within the company. Qualified applicants will receive consideration for employment without discrimination about race, religion, colour, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.","[aws, data modeling, gcp, oracle, power bi, python, sql]"
4,"About the job\n \n\n \nLocation: Southampton, HampshireJob title: Planning and Logistics AnalystLocation: SouthamptonWork Structure: 6 Months (Hybrid)A top global organization is looking for a Planning and Logistics Analyst to join their multinational team. Main Responsibilities:- Providing data sets and analysis supporting projects defining and optimising the logistic network.- Identifying optimisation / loss elimination opportunities within the network.- Responsible for preparation and timely execution of lead time reviews, duties updates, GI/GR updates, freight rates maintenance. - Facilitating Logistics governance forums (inputs preparation, admin).- Sharing expertise and supporting issue resolution in respect of Logistics execution and escalations with LLP, Carriers, Procurement and Planning Team members.- Delivering the Logistic QPR inputs on time and in full (freights), deviations are identified and explained.- Accountable for OTIF metrics reporting in liaison with LLP' and owns / drives initiatives identified for performance improvement.- Ensuring key logistic issues in terms of service failures or commercial non-compliance are addressed and/or escalated to right stakeholders.- Key Logistics contact for finance team in respect of stock takes and adjustments.Your experience:- Very good practical knowledge of Excel and PowerBi- Ability to conduct detailed manipulation of complex data sets from multiple sources, develop reports and analyse data- Knowledge and experience of SAP- Good understanding of logistic processes and ideally sea freight logistic- Good communication skills, ability to effectively solve problemsIf you have the skills above for our Planning and Logistics Analyst role, please do not hesitate and click apply.The JM Longbridge Group is operating and advertising as an Employment Agency for permanent positions and as an Employment Business for interim / contract / temporary positions. The JM Longbridge Group is an Equal Opportunities employer and we encourage applicants from all backgrounds.","[eviews, excel, power bi]"
5,"About the job\n \n\n \nWe’re Maria Mallaband Care Group, you may not know us, so let us tell you more...At MMCG, it’s our people who make our company great, and we pride ourselves on how well we work together as a team. The fact you’re here, makes us think you’re exactly the type of person who is passionate about care and is ready to put the happiness and well-being of residents at the heart of everything they do.We’re among the UK’s largest independent care providers, employing thousands of staff in over 80 homes across the country.At MMCG, we celebrate individuality.We believe that there’s nothing more rewarding or important than providing the best care to all. For us, diversity and inclusion simply means better care. We understand that people are unique, and we celebrate the differences of our residents and of our people.We're on the lookout for a Data Analyst to maintain the current Power BI reporting environment and develop new reports as required.Let's talk about your role, some of the key details: Maintain existing reporting environments and develop new data models as required. Maintenance and optimisation of existing reports / data-sets. Developing new reports. Ensuring GDPR compliance. Managing Power BI user access. Training end users.Experience and capability requirements: Accomplished user of Advanced Microsoft Excel, Power BI and Power Automate. Ability to quantify raw data into meaningful dashboards, graphs and charts. Creative flair and ability to visualise data. Analytical logical thinker. Strong problem-solving skills. Good communication skills.How we say thank you.A rewarding career is much more than just a salary. We've put together a range of benefits to help you get the best out of your role with us. These include: Simply Health – company funded, providing cashback for prescriptions, optical and dental costs 24/7 virtual GP access plus more for you and up to 4 children* Early Pay – Access to earned pay prior to payday Benefits platform – discounts across multiple retailers, leisure providers, hospitality etc Pension Scheme with Nest Personal car leases via salary sacrifice** 25 days holiday plus bank holidays Holiday Flex – purchase additional holiday** Flexible working patterns Cycle to work scheme** Recommend a friend Service recognition Training support and development opportunities Employee Assistance Programme Wellbeing support Discounted gym membership*Benefits require completion of a 12-week probationary period before they can be accessed.**Benefit subject to deduction not taking colleague below National Living Wage","[data modeling, excel, power automate, power bi]"
6,"About the job\n \n\n \nKelly Group are seeking a highly analytical and detail-oriented individual to join our team as a Business Intelligence Coordinator. As a Business Intelligence Coordinator, you will be responsible for collecting, analysing, and interpreting data to support strategic decision-making and drive business growth. Your role will involve collaborating with various departments and stakeholders to identify key performance indicators (KPIs), develop data-driven insights, and implement effective reporting systems. The ideal candidate will possess a strong understanding of data analysis, excellent communication skills, and the ability to work in a fast-paced and dynamic environment. Gather and consolidate data from multiple sources to build comprehensive datasets for analysis.Apply statistical methods and data mining techniques to identify patterns, trends, and insights.Conduct in-depth analysis to understand business performance, customer behaviour, and market trends.Develop and maintain data models, dashboards, and reports to facilitate data-driven decision-making.Create visually appealing and user-friendly reports and dashboards to present insights and KPIs.Generate regular reports and performance metrics to track business performance and highlight areas of improvement.Provide training and support to end-users in utilizing reporting tools and interpreting data effectively.Work closely with cross-functional teams to identify key business questions and data needs.Provide actionable insights and recommendations based on data analysis to support strategic decision-making.Monitor industry trends and competitor activities to identify potential areas for growth or improvement.\nYou will have the following qualifications and be able to prove experience and competence: For this role you will need administrative experience. It will require excellent attention to detail and an ability to prioritise a number of different tasks.Microsoft Office essential (especially Excel)Organisation-Well organised, plans ahead and highlights problems in advance should be methodical and meticulous to keep records An ability to work under pressureInterpersonal-Excellent interpersonal skills and ability to deal with conflictTeamwork-Able to collaborate with others and contribute as part of a high performance team in order to achieve a common goal, as opposed to working independently or competitivelyCommunication-Excellent communication skills, both written and verbalInitiative-Able to see the ‘bigger picture’ and to realise the implications of one’s actions possessing a proactive approach and attitude to one’s duties; sees tasks through to completion\nWhat’s on offer for successful candidates? Competitive PAYE Salary20 days’ holiday plus statutory bank holidaysCompany Pension schemeCycle to work schemeExcellent career progression opportunitiesFull training will be given to the right candidate.Employee discount scheme\nEstablished in 1985 to support the emerging UK cable television market, Kelly Group has diversified to meet the needs of our clients. With over 35 years’ experience and customer service at the heart of our business, Kelly Group is renowned for building networks in collaboration with several major leading telecommunication service providers, delivering innovative multi-functional solutions - connecting people globally.Working across several industries including telecommunications, rail, metro, highways and fleet, Kelly Group operate nationally with 40+ operational centres and deliver a full suite of services. From the initial planning, design, notification through to installation, final commissioning and maintenance of networks, Kelly Group deliver a ‘one-stop solution’ to meet the needs of our clients and delight their customers.As a service provider, our workforce is our extended family, which is why we invest in their safety, training, careers, welfare, vehicles and tools they need to deliver excellence. With a workforce of circa 3500, 40+ national operational centres, 4 training hubs and a fleet of over 2500 vehicles, Kelly Group are committed to developing an excellent team, a fair culture and a safe working environment.The Kelly Group is committed to equality of opportunity for all staff and applications from individuals are encouraged regardless of age, disability, sex, gender reassignment, sexual orientation, pregnancy and maternity, race, religion or belief, marriage and civil partnerships.If you feel you have the required experience and want to further your career with a long standing communications contractor, please apply.","[bi tools (unspecified), data modeling, excel, ms office]"
7,"About the job\n \n\n \nQui sont-ils ?\nCabinet de conseil en Transformation Digitale, eXalt est avant tout une formidable aventure humaine, et une communauté de plus de 700 collaborateurs, basés à Paris (siège social), mais également à Lyon, Lille, Nantes, Bordeaux, Aix-en-Provence, Bogota et bientôt New-York\nFondée en juillet 2018 autour des valeurs d’intrapreneuriat, de co-apprentissage et de co-construction, eXalt inscrit son développement dans un engagement fort auprès de ses clients et de ses équipes. Multi-spécialiste, le groupe décline son modèle dans différents domaines à travers ses filiales dédiées :\n Le Product Management & la Gestion de Projet au sein d’eXalt P&P La Finance de Marché au sein d’eXalt Fi, La Tech au sein d’eXalt IT, La Cybersécurité au sein d’eXalt ShieldLa Data au sein d’eXalt Value \nDécouvrez eXalt\nCulture d'entreprise\nÉquipes\nOffres d'emplois\n28\nJ’y vais !\nRencontrez Mathéa, Co-fondatrice d'eXalt Bordeaux\nRencontrez Abdeljawad, Consultant en Finance de Marché eXalt Fi\nRencontrez Jenith, Consultant Data Engineer eXalt IT\nDescriptif du poste\neXalt Value, filiale du groupe spécialisée sur les métiers de la Data, et recherche son/ sa nouveau/elle Data Analyst pour aller à la conquête de nouveaux projets ! Vous évoluerez dans un contexte multi sociétés et challengeant.\nVous serez rattaché(e) à notre bureau parisien.\nVos principales missions seront de :\n Comprendre les problématiques métiers et les traduire de manières analytiques Extraire les données nécessaires à l’analyse. Définir et réaliser le nettoyage de la base de données. S’assurer la qualité des données tout au long de leur traitement Analyser et exploiter les données Créer des dashboards via des outils de visualisations Effectuer une veille sur les nouvelles technologies et solutions logicielles d’analyse des données.\nProfil recherché\nNous recherchons avant tout une personne animée par l’esprit et l’ADN d’eXalt ☀️ ayant l’envie de prendre part à un superbe challenge et à notre aventure !\n Vous êtes diplômé(e) d’un Bac+5 Vous bénéficiez d’une expérience d’au moins 4 ans en tant que Data Analyst Vous avez une expertise en base de données et gestion de base de données (SQL/ NoSQL) Vous maitrisez des outils de data visualisation (Tableau, Qlikview, PowerBI) et/ou des outils de fouille et analyse de données (Dataiku) Vous avez une aisance rédactionnelle & relationnelle Vous avez une passion pour les chiffres et le goût pour l’innovation Vous êtes reconnu(e) pour votre rigueur, votre organisation et votre adaptabilité. L’anglais professionnel est requis","[dataiku, nosql, power bi, qlik, sql, tableau]"
8,"About the job\n \n\n \nData AnalystHybrid - Cambridge Pay Rate: 22.03 to 24.66 p/hr Start Date: Mon, 15 May 2023End Date: Fri, 17 Nov 2023JOB ROLE AND RESPONSIBILITIES: Analysing and interpreting data on demographic, socioeconomic and environmental issues and identifying key trends and policy and service implications to inform decision-making. Providing data analysis to support colleagues to identify opportunities and providing data products to support funding bids and business cases. Maintaining an awareness and understanding of publicly available secondary data on demographic, socioeconomic and environmental issues affecting Cambridge Providing clear, accessible data visualisations and written reports Providing clear, engaging oral presentations, briefings, and contributions to meetings Helping to build community of data practitioners, to support and to understand and use data better, including publicly available data and data that they hold. Working collaboratively with key partner organisations and national organisations on data initiatives as required.\nIf you are interested and feel that you meet the above criteria, then please apply online today or contact Andre on for more details.Blue Arrow is proud to be a Disability Confident Employer and is committed to helping find great work opportunities for great people.",
9,"About the job\n \n\n \nHybrid - West Midlands PermanentUp to £50,000 + BonusYour opportunity to work for a well-known, highly successful Global organization.Are you looking to progress your Data Analyst career, working for a business that works on modern, new technologies?Long-term exciting projects, working within specialist teams, take your fancy?You will be joining a close-knit team of experts, focusing on analysing complex data sets.This business is rapidly growing and are crammed with new opportunities.Your career advancement working for this company is exceptional.The benefits package includes a bonus scheme, life assurance and private healthcare, as well as personal development plans where you'll be given the chance to complete any courses or qualifications to ensure your keeping up to date with modern technologies.Your new challengeYou'll be working within a growing team, analysing complex data sets using a variety of data analysis tools, and working alongside stakeholders, creating, developing, and maintaining reports and dashboards using data visualization platforms, such as Power BI and Tableau.You'll be working in an environment that encourages self-development.Managers want to see you succeed and are all very approachable.To begin with, you'll have a solid Data Analyst background, with exposure to the below technologies: SQLMS ExcelTableauPower BI\nYou'll have a passion for Data, and willingness to learn and work alongside other teams on projects. You'll be building new relationships as well as supporting innovations within the organization and with clients.You probably want to know more about the business, their plans, and their history.For an informal chat, please call Katie Winstanley on 07547 672 062 or email katie. winstanley @ mexasolutions . com. Don't worry if your CV isn't up to date, we can deal with that later.Alternatively, if you do have an up-to-date CV, please click Apply and I look forward to reviewing your application.","[excel, ms sql, power bi, tableau]"


**Наблюдения:** Функция выделения скиллов работает, однако есть вакансии, где скиллы не были обнаружены. Посмотрим на них подробнее.

In [48]:
# посмотрим на количество вакансий без выделенных скиллов

len(parsed_df.query('skills.isna()'))

23

In [49]:
# посмотрим на описание вакансий без выделенных скиллов

parsed_df.query('skills.isna()')['description']

1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       

**Наблюдения:** В описании данных вакансий действительно отсутствуют четкие технические требования. Таких вакансий немного относительно общего числа.

### Обработка признаков и категорий

Для построения дашборда выполним следующую предобработку:
- объединим компании, входящие в одну группу,
- унифицируем названия городов,
- выделим число откликов на вакансии в отдельную колонку,
- выделим хард скиллы в отдельный датасет.

Сферы деятельности и размер унифицированы и указываются компанией при создании аккаунта из списка, предоставляемого LinkedIn. Также унифицированы типы занятости, которые указываются при публикации вакансий. Следовательно, данные признаки не должны дублироваться и не требуют группировки/замены.

#### Тип занятости, сфера, размер

Убедимся, что тип занятости, сфера деятельности и размер компаний не дублируются:

In [50]:
# посмотрим на названия типов занятости

parsed_df['workplace'].unique()

array(['On-site', 'Hybrid', None, 'Remote'], dtype=object)

In [51]:
# посмотрим на названия сфер деятельности

sorted(parsed_df['industry'].astype(str).unique())

['Airlines/Aviation',
 'Apparel & Fashion',
 'Automotive',
 'Aviation & Aerospace',
 'Banking',
 'Biotechnology',
 'Building Materials',
 'Chemicals',
 'Computer Games',
 'Computer Software',
 'Consumer Electronics',
 'Consumer Goods',
 'Cosmetics',
 'Defense & Space',
 'Education Management',
 'Electrical & Electronic Manufacturing',
 'Entertainment',
 'Environmental Services',
 'Farming',
 'Financial Services',
 'Food & Beverages',
 'Food Production',
 'Gambling & Casinos',
 'Government Administration',
 'Higher Education',
 'Hospital & Health Care',
 'Hospitality',
 'Human Resources',
 'Individual & Family Services',
 'Information Services',
 'Information Technology & Services',
 'Insurance',
 'International Affairs',
 'International Trade & Development',
 'Internet',
 'Leisure, Travel & Tourism',
 'Luxury Goods & Jewelry',
 'Management Consulting',
 'Market Research',
 'Marketing & Advertising',
 'Mechanical Or Industrial Engineering',
 'Medical Device',
 'Mental Health Care',
 'Mi

In [52]:
# посмотрим на размер компаний

parsed_df['size'].unique()

array(['11-50 employees', nan, '10,001+ employees', '51-200 employees',
       '1,001-5,000 employees', '501-1,000 employees', '0-1 employees',
       '201-500 employees', '5,001-10,000 employees', '2-10 employees'],
      dtype=object)

**Наблюдения:** Типы занятости и размер компании не дублируются. В сферах деятельности возможно неявное дублирование (например, индустрии 'Airlines/Aviation' и 'Aviation & Aerospace', 'Consumer Goods' и 'Apparel & Fashion'/'Cosmetics'/'Food & Beverages'/'Food Production'). Чтобы понимать, насколько корректным было бы объединение сфер деятельности, нужно знать, чем занимается каждая компания в датасете, поэтому сферы деятельности оставим как есть.

#### Название компаний

In [53]:
# посмотрим на названия компаний

sorted(parsed_df['company'].unique())

['',
 '24S',
 'ABN AMRO Bank N.V.',
 'ACS Recruitment Solutions',
 'AGS S.\u200bp.\u200bA.\u200b',
 'AKQUINET',
 'ARI',
 'AUTODOC',
 'AXA Investment Managers',
 'Abbott',
 'Accenture Italia',
 'Action',
 'Adecco',
 'Agoda',
 'Airbus',
 'Albert Cliff',
 'Albéa Group',
 'Alteca',
 'Alvarez & Marsal',
 'Amadeus FiRe',
 'Antella Travel Recruitment',
 'Aon',
 'Applus+ Laboratories',
 'ArcelorMittal',
 'Artemys Agil-IT (groupe Artemys)',
 'AstraZeneca',
 'Ausy Belgium',
 'Autoriteit Financiële Markten',
 "Autostrade per l'Italia",
 'Avon',
 'Awin Global',
 'Axon Moore',
 'BCEE',
 'BINTER CANARIAS',
 'BNP Paribas',
 'Babcock',
 'Babcock International Group',
 'Bandai Namco Europe',
 'Banque Populaire Rives de Paris',
 'Behaviour – UK North',
 'Belfius',
 'Belgie Vacature Groep',
 'Benchmark International',
 'Biedronka',
 'BillEase',
 'Billennium',
 'Black Fox Solutions',
 'Blue Arrow',
 'Boursorama',
 'Bouygues Telecom',
 'Bridgestone EMIA',
 'BrightStone Group',
 "Burton's Biscuit Company",


In [54]:
# для одной вакансии не указано название компании: посмотрим, есть ли оно есть в описании

parsed_df.query('company == ""')

Unnamed: 0,job_title,company,city,country,level,workplace,posted_date,applicants,description,industry,size,date,country_new,skills
424,Data Analyst (m/w/d),,Osnabrück,Germany,Entry level,Hybrid,1 week ago,2 applicants,"About the job\n \n \n\n\n\n This job is sourced from a job board.\n Learn more\n\n\n \n Data Analyst (m/w/d) bei metacrew service GmbH | softgarden\n\nView job here\n\nData Analyst (m/w/d)\nVollzeitOsnabrück, DeutschlandRemote (hybrid)Mit Berufserfahrung21.02.23\nAls Full Service Agentur zeichnet sich die metacrew® service GmbH ::: part of metacrew group GmbH mit ihrer langjährigen Erfahrung als Marktführer in der Generierung von realen Nutzungserlebnissen sowie als Experte für die Definition und Realisierung von E-Commerce- & Marketing-Services-Lösungen aus.\n\nMit rund 50 Mitarbeiter:innen an den Standorten Osnabrück und Hamburg bietet sie als Vermarkter der Plattformen „aboutfood“ (u.a. PAM Box mit Influencerin Pamela Reif, Vegan Box, Fine Food Box, My Cake Box oder den Lindt Chocoladen Club) und „BeautyLove“ (u.a. mit BRIGITTE Box, InStyle Box, Pink Box) Herstellern Sampling- und Produkttestmöglichkeiten sowohl in (Abo-)Boxen und Adventskalendern als auch auf der WOM-Plattform „Oh of the day“ an.\n\nFokus liegt dabei auf der Durchführung von Kampagnen in digitalen Kanälen, sowie die Einführung oder Repositionierung neuer und etablierter Marken im Verbrauchermarkt. metacrew® service GmbH erstellt für jedes Unternehmen individuell zugeschnittene Konzepte und ist mittels vielzähliger Möglichkeiten in der Lage, auf Basis von umfangreichen Datensätzen, komplexen Analysen zu Verbraucherpräferenzen, Insights, Trends sowie Produktnutzungs-Reviews, passende Lösungsansätze für die Herausforderungen von Herstellern zu liefern.\n\nUnsere schnell wachsende Unternehmensgruppe ist immer auf der Suche nach kreativen Köpfen mit Ideen, die Lust darauf haben, die Themen von morgen zu denken, zu gestalten und zu erschaffen. Als Teil unserer #Crew bist Du ein wichtiger Baustein für die Zukunft. Wir denken nicht nur Deine Ideen gemeinsam, sondern unterstützen Dich auch bei allen Herausforderungen mit echter Wertschätzung und konstruktivem Feedback. Welches Ziel Du auch verfolgst, Du kannst es mit uns zusammen gehen, wir nennen das #crewlove. Bereit, zu neuen Zielen aufzubrechen und Verantwortung zu übernehmen? Dann werde jetzt Teil der #Crew!\n\nDeine Aufgaben\n Du bist für die Durchführung und Präsentation datengetriebener Analysen auf Basis unserer Kunden- und Profildatenbanken, z.B. im Rahmen von Attributions-, Segmentierungs- und Prognosemodellen zuständig Du unterstützt bei der datengetriebenen Entwicklung von neuen Produkten und Angebotsmodellen in der Food- und Beauty-Industrie i.V.m. mit internen Käufer-, Produktbewertungs-, Verhaltens- und Profildaten sowie ergänzenden externen Datenquellen Die Erstellung von Handlungsanweisungen zur Optimierung der bestehenden E-Commerce-, Abo-Commerce- und Community-Plattformen durch Data Analytics liegt in Deinem Aufgabenbereich Unterstützung bei der Neu- und Weiterentwicklung von Business-Intelligence-Geschäftsmodellen Du betreust und berätst interne und externe Stakeholder in der Daten-Analyse Erstellung von Reports mit BI-Tools (u.a. AWS Quicksight) nach Anforderungen der Fachbereiche und die AdHoc-Auswertungen für aktuelle Anforderungen. Konzeptionierung von Datenstrukturen für die Analyse der Bestell- und Umsatzdaten.\nDein Profil\n Du überzeugst durch sehr gute analytische Fähigkeiten und interdisziplinäres Denken Du bringst Dein fundiertes Wissen in statistischen Analysemethoden ein Du hast eine Leidenschaft für das Verständnis von neuen, datengetriebenen Geschäftsmodellen Du konntest bereits Erfahrungen mit gängigen BI- und (Web-)Analyse-Tools, Data-Warehouse-Anwendungen sammeln Aktuelle Kenntnisse und sichere Anwendung von Datenschutzvorgaben des Gesetzgebers sind für Dich selbstverständlich Du bringst gute Deutschkenntnisse in Wort und Schrift mit\nDu erfüllst nicht alle Anforderungen? Kein Problem! Überzeuge uns von anderen interessanten Fähigkeiten und Erfahrungen, die Du mitbringst und die Du bei uns einbringen möchtest.\n\nGemeinsam leben wir #crewlove\nEine steile Lernkurve: Bei uns erlebst Du in einer schnell wachsenden Unternehmensgruppe interessante und abwechslungsreiche Projekte sowie anspruchsvollen und direkten Kontakt in den verschiedensten Branchen(Eigen-) Verantwortung: Bei uns kannst Du selbstständig arbeiten und Deine kreativen Ideen einbringen, weiterentwickeln und umsetzenGanz viel Leidenschaft: Du wirst Teil eines talentierten und motivierten Teams, welches gemeinsam mit Dir das Unternehmen nach vorne bringen möchteFlexibilität: Wir bieten Dir Arbeitszeit auf Vertrauensbasis und nach Absprache die Möglichkeit zum mobilen ArbeitenWertschätzung und Feedbackkultur: Bei uns arbeitest Du in einem sehr wertschätzenden Arbeitsumfeld mit flachen Hierarchien und einer direkten Feedbackkultur, die Dich wirklich weiterbringtWeiterentwicklung: Bei uns erhältst Du die Möglichkeit, Dich sowohl fachlich als auch persönlich weiterzuentwickelnRegelmäßige Firmenevents: Unser jährliches Sommerfest sowie die Weihnachtsfeier sind fester Bestandteil unseres EventkalendersStandorte: Unsere zentralen Standorte haben eine gute Verkehrsanbindung\nDu willst Teil unserer #crewlove werden?\n\nDann bewirb Dich direkt über unser Online-Bewerbungsformular! Lade hier einfach Deinen aussagekräftigen Lebenslauf, ein kurzes Anschreiben und relevante Zeugnisse hoch.\n\nSich zu duzen gehört übrigens zu unserer Unternehmenskultur. Wir hoffen das ist auch für Dich in Ordnung! Da Du ja vielleicht auch schon bald Teil unserer Crew bist, fang doch in Deiner Bewerbung direkt damit an!\n\nSei während des Bewerbungsprozesses ganz Du selbst, dann passen wir am besten zusammen. Denn: Wir leben Vielfalt und freuen uns über alle Bewerber:innen – unabhängig davon, woher Du kommst, wie alt Du bist, mit welchem Geschlecht Du Dich identifizierst, wen Du liebst oder woran Du glaubst.\n\nWenn Du Fragen zu dieser Stelle oder einer Karriere bei der metacrew service hast, dann melde Dich gerne unter karriere@metacrew.de.\n\nAnsprechpartner\n\nKatharina Wibbeke\n\nOnline bewerben\n\nWeitere Jobs ansehen\n\nImpressum | Datenschutzerklärung\n\nPowered by softgarden",,,2023-05-16,Germany,"[aws, bi tools (unspecified), eviews]"


Название данной компании есть в описании - Metacrew service GmbH:

In [55]:
# укажем название в датасете

parsed_df.loc[parsed_df['company'] == '', 'company'] = 'Metacrew service GmbH'

Объединим следующих работодателей в группы компаний:

In [56]:
# составим словарь для группировки компаний
company_dict = {'AXA':['AXA','AXA Investment Managers'],
                'Babcock':['Babcock','Babcock International Group'],
                'Credit Agricole':['Credit Agricole Consumer Finance','Crédit Agricole Italia'],
                'Experis':['Experis','Experis Italia'],
                'NTT DATA':['NTT DATA Europe & Latam','NTT DATA Italia'],
                'PwC':['PwC España','PwC France','PwC Polska'],
                'Randstad':['Randstad Switzerland','Randstad Technologies Italia'],
                'SDG Group':['SDG Group España','SDG Group Italy'],
                'Santander':['Santander','Santander Bank Polska'],
                'Societe Generale':['Societe Generale',
                                    'Societe Generale Corporate and Investment Banking - SGCIB'],
                'TELUS International':['TELUS International','TELUS International AI Data Solutions']
               }

# транспонируем словарь
company_map = {}
for key in company_dict.keys():
    for value in company_dict[key]:
        company_map.update({value:key})

In [57]:
# объединяем работодателей в группы компаний

parsed_df['company_new'] = parsed_df['company'].map(company_map)
parsed_df.loc[parsed_df['company_new'].isna(), 'company_new'] = (parsed_df.loc[parsed_df['company_new'].isna(),
                                                                               'company'])

#### Названия городов

In [58]:
# проверим, что города в датасете называются одинаково

sorted(parsed_df['city'].unique())

['Ahlen',
 'Aix-en-Provence',
 'Alsónémedi',
 'Amersfoort',
 'Amstelveen',
 'Amsterdam',
 'Amsterdam Area',
 'Arconate',
 'Arluno',
 'Arnhem',
 'Athens',
 'Baierbrunn',
 'Barcelona',
 'Bari',
 'Basel',
 'Basingstoke',
 'Bath',
 'Belfast',
 'Bellinzago Lombardo',
 'Bergamo',
 'Bergen op Zoom',
 'Bergisch Gladbach',
 'Berlin',
 'Berlin Metropolitan Area',
 'Binasco',
 'Birkirkara',
 'Blackpool',
 'Boadilla del Monte',
 'Bodelshausen',
 'Bois-Colombes',
 'Bollate',
 'Bologna',
 'Bordeaux',
 'Boulogne-Billancourt',
 'Bracknell',
 'Bremen',
 'Brindisi',
 'Bristol',
 'Brussels',
 'Brussels Metropolitan Area',
 'Brussels Region',
 'Bucharest',
 'Buckinghamshire',
 'Budapest',
 'Bulgaria',
 'Calvignasco',
 'Cambridge',
 'Canegrate',
 'Carpiano',
 "Cassano d'Adda",
 "Cassina de' Pecchi",
 'Cerdanyola del Vallès',
 'Cernusco sul Naviglio',
 'Cesate',
 'Chappes',
 'Chaucer',
 'Chester',
 'Coimbra',
 'Cologne',
 'Cologne Bonn Region',
 'Colturano',
 'Copenhagen',
 'Cornaredo',
 'Corsico',
 'County

У некоторых городов есть приписки "Area", "Metropolitan Area", "Greater", "County" - уберем их. Иногда вместо города указана страна. В этом случае информации о городе нет, поэтому поставим пустые значения. Какие-то названия дублируются с немного разным написанием, исправим это.

In [59]:
# очистим названия от приписок типа "Area"

parsed_df['city_new'] = parsed_df['city']

for i in ['Metropolitan','Area','Greater','County','and The Hague']:
    parsed_df['city_new'] = parsed_df['city_new'].str.replace(i, '').str.strip()

In [60]:
# посмотрим на названия, содержащие слово "Region": возможно, какие-то из них - это города

sorted(parsed_df.query('city_new.str.contains("Region")')['city_new'].unique())

['Brussels Region', 'Cologne Bonn Region', 'Flemish Region', 'Paris  Region']

Регионы Кёльн - Бонн (Cologne Bonn Region) и Фламандский (Flemish Region) точными названиями городов заменить нельзя, поэтому поставим пустые значения. "Brussels Region" и "Paris  Region" можно заменить названиями соответствующих городов:

In [61]:
# ставим пропуски в названии города для регионов Кёльн - Бонн и Фламандского
parsed_df.loc[(parsed_df['city_new'] == 'Cologne Bonn Region') |\
              (parsed_df['city_new'] == 'Flemish Region'),
              'city_new'] = ''

# убираем "Region" из названий других городов
parsed_df['city_new'] = parsed_df['city_new'].str.replace('Region', '').str.strip()

In [62]:
# поставим пустые значения там, где вместо названия города - страна

parsed_df.loc[parsed_df['city_new'] == parsed_df['country_new'], 'city_new'] = ''

In [63]:
# составим словарь замен для названий городов

city_unified = {'Cracow':'Krakow',
                'Frankfurt am Main':'Frankfurt',
                'Košice':'Kosice',
                'Lisboa':'Lisbon',
                'Münster':'Munster',
                'Palma de Mallorca':'Palma',
                'Wrocław':'Wroclaw'
               }

In [64]:
# заменим дубликаты в названиях городов

for key, value in city_unified.items():
    parsed_df['city_new'] = parsed_df['city_new'].str.replace(key, value)

#### Число откликов

Создадим столбец с числом откликов на вакансию, без слова "applicants":

In [65]:
parsed_df['applicants_new'] = (parsed_df['applicants']
                               .apply(lambda x: x.split()[0] if x != None else x)
                              ).astype('Int64')

#### Hard skills

Соберем хард скиллы в отдельный датасет для последующей визуализации:

In [66]:
# составим датасет со скиллами для каждой вакансии

skills_df = parsed_df[['job_title','company_new','skills']].explode('skills').dropna().reset_index()
skills_df.columns = ['job_id','job_title','company','skills']
skills_df['job_id'] += 1

In [67]:
# посмотрим на первые 20 строк

skills_df.head(20)

Unnamed: 0,job_id,job_title,company,skills
0,1,Data Analyst,PharmiWeb.Jobs: Global Life Science Jobs,data modeling
1,1,Data Analyst,PharmiWeb.Jobs: Global Life Science Jobs,excel
2,1,Data Analyst,PharmiWeb.Jobs: Global Life Science Jobs,sas
3,1,Data Analyst,PharmiWeb.Jobs: Global Life Science Jobs,spss
4,1,Data Analyst,PharmiWeb.Jobs: Global Life Science Jobs,sql
5,3,Data Analyst (Space & Planning),Mole Valley Farmers,excel
6,4,Data Analyst,FORFIRM,aws
7,4,Data Analyst,FORFIRM,data modeling
8,4,Data Analyst,FORFIRM,gcp
9,4,Data Analyst,FORFIRM,oracle


In [68]:
# посмотрим, по скольким вакансиям есть информация

skills_df['job_id'].nunique()

483

<a id='extraction'></a>
## Выгрузка данных

In [69]:
# сохраним в отдельный датасет информацию по вакансиям, ограничившись только необходимыми колонками

jobs_data = parsed_df[['job_title','company_new','industry','size','city_new','country_new','workplace',
                       'date','applicants_new']
                     ].reset_index()
jobs_data.columns = ['job_id','job_title','company','industry','size','city','country','workplace',
                     'date','applicants']
jobs_data['job_id'] += 1

In [70]:
# выгрузим датасет с вакансиями в формате csv

jobs_data.to_csv('linkedin_jobs.csv')

In [71]:
# выгрузим датасет со скиллами в формате csv

skills_df.to_csv('linkedin_skills.csv')

<a id='dash'></a>
## Дашборд

При составлении дашборда ориентировались на фильтры поиска вакансий в порядке их приоритета - страна и город, тип занятости, индустрия, размер компании. Данные признаки расположили слева (тк читаем слева направо) и в центре дашборда.

Ссылка на дашборд:

https://public.tableau.com/app/profile/mrmrzpn/viz/DALinkedInJobs/DataAnalystLinkedInJobs