## <div style="text-align:center"><span style="color:red">PROJECT-2. Loading new data. Refinement of the analysis.</span></div>

### <div style="text-align:center"><span style="color:red">0. Setting up the environment for data processing.</span></div>

In [1]:
# loading ipython-sql magic extebtion:
%load_ext sql
# setting the initial rows display limitation:
%config SqlMagic.displaylimit=5
# setting the base connection variable:
conn = 'postgresql+psycopg2://****:****@****:****/****'

_<span style="color:green">To get the SQL requests working in this notebook ipython-sql magic is required. Use  
```$conda install -c conda-forge ipython-sql``` within conda environment or  
```$pip install ipython-sql``` for the pip case to install it.  
More info can be found on the [ipython-sql project repository](https://github.com/catherinedevlin/ipython-sql).  
The access to the database is also required._

### <div style="text-align:center"><span style="color:red">1. Displaying dataset tables to evaluate their size, structure, and content.</span></div>

<center> <img src = https://github.com/ssergeegress/SF-DS-Projects/blob/main/Project_01A/data/dst3-u2-pr2_1_1.png?raw=true alt='dst3-u2-pr2_1_1.png' style='width:800px'>

<div style="text-align:center"><span style="color:red">1.1 The 'Candidate' table from 'hh' database.</span></div>

In [2]:
%sql $conn SELECT * FROM hh.candidate ORDER BY id

44744 rows affected.


gender,age,desirable_occupation,city_id,employment_type,current_occupation,updated_at,id,salary
M,39,Системный администратор,390,"частичная занятость, проектная работа, полная занятость",Системный администратор,2019-04-16,1,29000
M,60,Технический писатель,838,"частичная занятость, проектная работа, полная занятость","Менеджер проекта, Аналитик, Технический писатель",2019-04-12,2,40000
F,36,Оператор,733,полная занятость,Кассир-операционист,2019-04-16,3,20000
M,38,"Веб-разработчик (HTML / CSS / JS / PHP / базы данных; фреймворки, дизайн, интерфейсы, CMS)",530,"частичная занятость, проектная работа, полная занятость",Инженер-программист,2019-04-08,4,100000
F,26,Региональный менеджер по продажам,802,полная занятость,Менеджер по продажам,2019-04-22,5,140000


<span style="color:red">1.2 The 'City' table from 'hh' database.</span>

In [3]:
%sql $conn SELECT * FROM hh.city

984 rows affected.


title,id
Абакан,1
Абдулино,2
Абинск,3
Агой,4
Агрыз,5


<span style="color:red">1.3 The 'Candidate Timetable Type' table from 'hh' database.</span>

In [4]:
%sql $conn SELECT * FROM hh.candidate_timetable_type ORDER BY candidate_id DESC

31155 rows affected.


id,candidate_id,timetable_id
17732,44744,2
12798,44739,2
37394,44738,2
37393,44736,2
26767,44735,1


<span style="color:red">1.4 The 'Timetable Type' table from 'hh' database.</span>

In [5]:
%sql $conn SELECT * FROM hh.timetable_type

5 rows affected.


id,title
1,гибкий график
2,полный день
3,сменный график
4,вахтовый метод
5,удаленная работа


<span style="color:red">1.5 Conclusion of a pivot table with the number of lines for clarity.</span>

In [6]:
%%sql $conn
/* just union the outputs of COUNT function */
SELECT
    'Candidate' table_name,
    COUNT(c.id) row_cnt
FROM
    hh.candidate c
UNION
SELECT
    'City',
    COUNT(ct.id) rows_cnt
FROM
    hh.city ct
UNION
SELECT
    'Candidate Timetable Type' table_name,
    COUNT(ctt.id) rows_cnt
FROM
    hh.candidate_timetable_type ctt
UNION
SELECT
    'Timetable Type',
    COUNT(tt.id)
FROM
    hh.timetable_type tt
ORDER BY 2 DESC



4 rows affected.


table_name,row_cnt
Candidate,44744
Candidate Timetable Type,31155
City,984
Timetable Type,5


- <span style="color:blue">Hmmm... Looks neat. Let's try to deal with this a little, guys!</span>

### <div style="text-align:center"><span style="color:red">2. Preliminary data analysis.</span></div>

<span style="color:red">2.1 Calculation of the maximum age (max_age) of the candidate in the table.</span>

In [7]:
%%sql $conn
/* using MAX function to get the maximum val */ 
SELECT
    MAX(c.age) max_age
FROM
    hh.candidate c

1 rows affected.


max_age
100


- <span style="color:blue">100 years is a rather mysterious age for employment. It needs to explore the sampling range in more detail.</span>

<span style="color:red">2.2 Calculation of the minimum age (min_age) of the candidate in the table.</span>

In [8]:
%%sql $conn
/* using MIN function to get the minimum val */ 
SELECT
    MIN(c.age) min_age
FROM
    hh.candidate c

1 rows affected.


min_age
14


- <span style="color:blue">14 years old may seem too early here, but not the most anomalous either - the law allows you to start working a little at this age. Nothing unusual.</span>

<span style="color:red">2.3 Minor data cleanup. Writing a query that will allow to calculate for each age (age) how many (cnt) people of this age we have. Sorting the result by age in reverse order.</span>

In [9]:
%%sql $conn
/* using DISTINCT to get grouping by unique vals
and ordering output to see the largest ones */
SELECT
    DISTINCT c.age,
    COUNT(c.id) cnt
FROM
    hh.candidate c
GROUP BY 1
ORDER BY 1 DESC

63 rows affected.


age,cnt
100,1
77,1
76,1
73,4
72,3


- <span style="color:blue">There is only one outlier value that makes sense to exclude from the sample.</span>

<span style="color:red">2.4 According to Rosstat, the average age of those employed in the Russian economy is 39.7 years. Rounding this value up to 40, let's find the number of candidates who are older than this age. It' important do not forget to filter out the "erroneous" age of 100.</span>

In [10]:
%%sql $conn
/* filtering output of COUNT function by proper range to get the val */
SELECT
    COUNT(c.id) cnt
FROM
    hh.candidate c
WHERE c.age BETWEEN 41 AND 99

1 rows affected.


cnt
6263


- <span style="color:blue">Well, let's do some math here, guys:</span>

In [11]:
prc = round(6263 / 44744 * 100)
display(f'The percentage of older job seekers is only {prc}% of the total amount')

'The percentage of older job seekers is only 14% of the total amount'

- <span style="color:blue">A simple calculation shows that we have the only 14% of older applicants (6263 out of almost 45000) here. Not so many! The vast majority of applicants are younger. One can raise the question of the advisability of the recent pension reform.</span>

### <div style="text-align:center"><span style="color:red">3. Global indicator analysis.</span></div>

<span style="color:red">3.1 To get started, let's write a query that will let us know how many (cnt) candidates we have from each city. Sample format: city, cnt. Let's group the result by the title column and sort by number in reverse order.</span>

In [12]:
%%sql $conn
/* joining the tables, grouping by city to get candidate quantity 
for each one and ordering output to see the highest vals */
SELECT
    ct.title city,
    COUNT(c.id) cnt
FROM
    hh.city ct
JOIN hh.candidate c ON ct.id = c.city_id
GROUP BY 1
ORDER BY 2 DESC

984 rows affected.


city,cnt
Москва,16622
Санкт-Петербург,4937
Краснодар,1066
Новосибирск,958
Казань,872


- <span style="color:blue">Moscow's huge lead compared to other cities is not due to anomalies in the mathematical distribution, but to the anomalous distribution of real resources in the economy and mediocre domestic politics. The fat capital and the dying depression regions are one of the grim signs of the current crazy times.</span>

In [13]:
# changing the rows display limitation:
%config SqlMagic.displaylimit=10

<span style="color:red">3.2 Anyway, Moscow is conspicuous as perhaps the most active labor market. Let's write a query that will allow to understand which candidates from Moscow will be satisfied with the “project work”. Sample format: gender, age, desirable_occupation, city, employment_type, sorting: by candidate id.</span>

In [14]:
%%sql $conn
/* joining the tables and filtering/ordering output */
SELECT
    c.gender,
    c.age,
    c.desirable_occupation,
    ct.title city,
    c.employment_type
FROM
    hh.candidate c
JOIN hh.city ct ON ct.id = c.city_id
WHERE ct.title = 'Москва'
AND c.employment_type ~ 'проектная работа'
ORDER BY c.id

2950 rows affected.


gender,age,desirable_occupation,city,employment_type
M,38,"Веб-разработчик (HTML / CSS / JS / PHP / базы данных; фреймворки, дизайн, интерфейсы, CMS)",Москва,"частичная занятость, проектная работа, полная занятость"
M,31,Специалист,Москва,"частичная занятость, проектная работа, полная занятость"
F,42,"pre-sale инженер, pre-sale менеджер",Москва,"частичная занятость, проектная работа, полная занятость"
M,49,Дежурный администратор,Москва,"частичная занятость, проектная работа, полная занятость"
M,29,Главный инженер проекта,Москва,"частичная занятость, проектная работа, полная занятость"
M,22,Программист С++,Москва,"проектная работа, частичная занятость"
F,29,Технический специалист,Москва,"частичная занятость, проектная работа, полная занятость"
M,32,IT Operations Coordinator,Москва,"частичная занятость, проектная работа, полная занятость"
M,23,"Инженер-связист,системный администратор",Москва,"частичная занятость, проектная работа, полная занятость"
M,31,Менеджер,Москва,"частичная занятость, проектная работа, полная занятость"


<span style="color:red">3.3 There was too much data. Let's filter only the most popular IT professions - developer, analyst, programmer. These names can be written in both capital and small letters. Sorting: by candidate id.</span>

In [15]:
%%sql $conn
/* just adding another filter to the previous request */
SELECT
    c.gender,
    c.age,
    c.desirable_occupation,
    ct.title city,
    c.employment_type
FROM
    hh.candidate c
JOIN hh.city ct ON ct.id = c.city_id
WHERE ct.title = 'Москва'
AND c.employment_type ~ 'проектная работа'
AND c.desirable_occupation ~* 'разработчик|программист|аналитик' 
ORDER BY c.id

778 rows affected.


gender,age,desirable_occupation,city,employment_type
M,38,"Веб-разработчик (HTML / CSS / JS / PHP / базы данных; фреймворки, дизайн, интерфейсы, CMS)",Москва,"частичная занятость, проектная работа, полная занятость"
M,22,Программист С++,Москва,"проектная работа, частичная занятость"
M,25,Frontend-разработчик,Москва,"стажировка, волонтерство, частичная занятость, проектная работа, полная занятость"
M,30,Программист,Москва,"частичная занятость, проектная работа"
M,35,Ruby / Rails разработчик,Москва,"частичная занятость, проектная работа, полная занятость"
M,28,Программист микроконтроллеров,Москва,"стажировка, частичная занятость, проектная работа, полная занятость"
M,36,Программист-разработчик,Москва,"частичная занятость, проектная работа, полная занятость"
M,25,Аналитик,Москва,"проектная работа, стажировка, частичная занятость, полная занятость"
M,38,"Инженер, программист C/C++, разработчик ПО",Москва,"частичная занятость, проектная работа, полная занятость"
F,54,Ведущий инженер программист,Москва,"частичная занятость, проектная работа, полная занятость"


- <span style="color:blue">As many as 778 qualified IT candidates right now - there is plenty to choose from. Looks nice! It remains to be hoped that at least one of them wrote the truth about himself.</span>

In [16]:
# changing the rows display limitation:
%config SqlMagic.displaylimit=5

<span style="color:red">3.4 For general information, let's try to select the numbers and cities of candidates whose position matches the desired one. Sample format: id, city with sorting the result by city and candidate id.</span>

In [17]:
%%sql $conn
/* joining the tables and filtering/ordering output */
SELECT
    c.id,
    ct.title city
FROM
    hh.candidate c
JOIN hh.city ct ON ct.id = c.city_id
WHERE c.desirable_occupation = c.current_occupation
ORDER BY 2, 1

5104 rows affected.


id,city
2009,Абакан
10340,Абакан
14449,Абакан
20261,Абакан
13705,Агрыз


- <span style="color:blue">Approximately one in ten applicants clearly indicate that they want to change jobs within their own competence. With the rest, things may be different, so this group, perhaps, looks pretty reliable. This criterion can be a real godsend for those hiring specialists who are not very inclined to delve into their business, but come to work mainly to shift papers.</span>

<span style="color:red">3.5 Determine the number of candidates of retirement age. The retirement age for men is 65 years, for women - 60 years.</span>

In [18]:
%%sql $conn
/* filtering output of COUNT() function by proper ranges to get the val */
SELECT
    COUNT(c.id)
FROM
    hh.candidate c
WHERE (c.gender = 'M' AND c.age BETWEEN 65 AND 99)
OR (c.gender = 'F' AND c.age BETWEEN 60 AND 99)

1 rows affected.


count
75


- <span style="color:blue">Let's go back one more time to our mathematics from section 2.4. There we calculated the percentage of elderly applicants, and we got 14%. Let's do the same for the category of pensioners! And...</span>

In [19]:
prc = round(75 / 44744 * 100, 2)
display(f'The percentage of retirement age job seekers is only {prc}% of the total amount')

'The percentage of retirement age job seekers is only 0.17% of the total amount'

- <span style="color:blue">And it's 0.17%, guys! The only 0.17%! That's it. No more words about that pension reform...</span>

### <div style="text-align:center"><span style="color:red">4. Analysis of candidates for customers.</span></div>

In [20]:
# changing the rows display limitation:
%config SqlMagic.displaylimit=15

<span style="color:red">4.1 For a mining company, we need to select candidates from Novosibirsk, Omsk, Tomsk and Tyumen who are ready to work on a rotational basis. Sample format: gender, age, desirable_occupation, city, employment_type, timetable_type. Sort the result by city and candidate number.</span>

In [21]:
%%sql $conn
/* joining all the tables and filtering/ordering output */
SELECT
    c.gender,
    c.age,
    c.desirable_occupation,
    ct.title city,
    c.employment_type,
    t.title timetable_type
FROM
    hh.candidate c
JOIN hh.city ct ON ct.id = c.city_id
JOIN hh.candidate_timetable_type ctt ON ctt.candidate_id = c.id
JOIN hh.timetable_type t ON ctt.timetable_id = t.id
WHERE
    ct.title ~ 'Новосибирск|Омск|Томск|Тюмень'
AND t.title ~ 'вахтовый метод'
ORDER BY ct.title, c.id

11 rows affected.


gender,age,desirable_occupation,city,employment_type,timetable_type
M,29,ИТ Инженер,Новосибирск,полная занятость,вахтовый метод
M,25,Заместитель начальника лаборатории,Новосибирск,"проектная работа, стажировка, частичная занятость, полная занятость",вахтовый метод
M,30,"Ведущий инженер, Специалист по защите информации,",Новосибирск,"частичная занятость, полная занятость",вахтовый метод
M,23,Программист,Новосибирск,полная занятость,вахтовый метод
M,35,"Инженер АСУТП, инженер-электроник",Омск,полная занятость,вахтовый метод
M,25,Тестировщик ПО,Омск,"стажировка, полная занятость",вахтовый метод
M,26,Специалист технической поддержки,Томск,"частичная занятость, полная занятость",вахтовый метод
M,30,Менеджер проектов,Томск,"проектная работа, частичная занятость, полная занятость",вахтовый метод
M,42,Инженер,Томск,"проектная работа, частичная занятость, полная занятость",вахтовый метод
M,31,Инженер связи,Тюмень,полная занятость,вахтовый метод


- <span style="color:blue">Not so much! Well, it was not so simply that it has become now to find real romantic heroes who could dig the whole earth in search of genuine treasures ...</span>

<span style="color:red">4.2 For customers from St. Petersburg, we need to collect a list of 10 desired professions of candidates from the same city from 16 to 21 years old (the sample includes 16 and 21, sorted by age) indicating their age, and also add the Total line with the total number such candidates. Let's write a query that will allow you to get a selection.</span>

In [22]:
%%sql $conn
/* union (all) of two joined/filtered/ordered tables output with
additional COUNT() function output joined/filtered the same way */
(SELECT
    c.desirable_occupation,
    c.age
FROM
    hh.candidate c
JOIN hh.city ct ON ct.id = c.city_id
WHERE
    c.age BETWEEN 16 AND 21
AND ct.title = 'Санкт-Петербург'
ORDER BY 2
LIMIT 10)
UNION ALL
SELECT
    'Total',
    COUNT(c.id)
FROM
    hh.candidate c
JOIN hh.city ct ON ct.id = c.city_id
WHERE
    c.age BETWEEN 16 AND 21
AND ct.title = 'Санкт-Петербург'

11 rows affected.


desirable_occupation,age
Системный администратор,16
Junior Разработчик C++/C#,18
Программист,18
Junior Data Scientist,18
Руководитель web-разработки,18
Специалист по IT,18
Unity3D developer Junior/middle,18
HTML-верстальщик,18
3D-дизайнер,18
Java-разработчик,18


- <span style="color:blue">... But there are more than enough lovers to sit all their lives in offices and at home! The abundance of vacancies in St. Petersburg is easily explained by the fact that, St. Petersburg is beautiful and will become even more beautiful, unlike some other "capitals"!</span>

### <div style="text-align:center"><span style="color:red">5. The Final Conclusion:</span></div>

### <div style="text-align:center"><span style="color:blue">Oh, those Russians...</span></div>