# Requerimientos (tareas)

El presente documento muestra el uso y los resultados obtenidos para los requerimientos dados, utilizando los algoritmos desarrollados.

## Configuración inicial

Instancia de la clase `Analyst` que permite leer los datos en disco una única vez, mantener los datos en memoria, y calcular los requerimientos:

In [7]:
from Analyst import Analyst

analyst = Analyst()

Funcionalidades varias para este documento:

In [8]:
import os
from pprint import PrettyPrinter
from typing import List, Dict

import pandas as pd


pp = PrettyPrinter(indent=2)

def print_report(report: Dict):
    pp.pprint(report)

# función para mejorar la visualización
def records_to_dataframe(records: List[Dict]):
    return pd.DataFrame(records)


## Lectura de datos

Directorio de datos:

In [9]:
# definir el directorio manualmente
# data_dir = "/home/user/kaggle-justjoinit/data"

# opcionalmente con una variable de ambiente
data_dir = os.environ.get("DATA_DIR")

# data_dir

Leer datos en disco y preservar en memoria:

In [10]:
report = analyst.read_data_dir(data_dir, prefix="small")

In [11]:
first_jobs = report.pop("records_first_jobs")
lastest_jobs = report.pop("records_latest_jobs")

In [12]:
print_report(report)

{'count_jobs': 114709}


In [13]:
records_to_dataframe(first_jobs)

Unnamed: 0,published_at,title,company_name,experience_level,country_code,city
0,2022-04-09 13:00:13.440000+00:00,Flutter Developer,DO OK • Life-changing software services,mid,PL,Wroclaw
1,2022-04-09 13:00:13.440000+00:00,Performance Marketing Analyst,GetResponse,mid,PL,Gdansk
2,2022-04-09 13:00:13.440000+00:00,Senior Software Development Engineer,Amazon Development Centre,senior,PL,Warszawa


In [14]:
records_to_dataframe(lastest_jobs)

Unnamed: 0,published_at,title,company_name,experience_level,country_code,city
0,2023-09-01 18:20:00+00:00,Scrum Master,Sunrise System sp. z o.o. sp. k.,mid,PL,Wroclaw
1,2023-09-01 17:37:00+00:00,IT Business Intelligence Manager,Elis Textile Service,mid,PL,Gdansk
2,2023-09-01 16:04:48.008000+00:00,Senior Java Developer,Trimetis Services,senior,PL,Krakow


## Requerimiento 1

In [15]:
report = analyst.requirement_1(2, "PL", "junior")
records = report.pop("records")
print_report(report)

{'count_jobs_junior': 8728, 'count_jobs_junior_PL': 8629}


In [16]:
records_to_dataframe(records)

Unnamed: 0,published_at,title,company_name,experience_level,country_code,city,company_size,open_to_hire_ukrainians
0,2023-09-01 16:00:10.800000+00:00,Junior Frontend Developer,Softax,junior,PL,Warszawa,Undefined,False
1,2023-09-01 15:55:00.798000+00:00,Project Management Officer (PMO),Miquido,junior,PL,Krakow,200,False


## Requerimiento 2

In [17]:
report = analyst.requirement_2("Gazelle Global IT Recruitment", "2023-08-31", "2023-09-02")
records = report.pop("records")
print_report(report)

{ 'count_jobs': 39,
  'count_jobs_junior': 0,
  'count_jobs_mid': 28,
  'count_jobs_senior': 11}


In [18]:
records_to_dataframe(records)

Unnamed: 0,published_at,title,experience_level,city,country_code,company_size,workplace_type,open_to_hire_ukrainians
0,2023-09-01 16:00:10.800000+00:00,Linux Cloud DevOps Engineer,mid,Rzeszow,PL,30,remote,False
1,2023-09-01 16:00:10.800000+00:00,SailPoint IIQ Developer,mid,Wroclaw,PL,30,remote,False
2,2023-09-01 16:00:10.800000+00:00,Linux Cloud DevOps Engineer,mid,Wroclaw,PL,30,remote,False
3,2023-09-01 16:00:10.800000+00:00,SailPoint IIQ Developer,mid,Szczecin,PL,30,remote,False
4,2023-09-01 16:00:10.800000+00:00,Linux Cloud DevOps Engineer,mid,Poznan,PL,30,remote,False
5,2023-09-01 16:00:10.800000+00:00,SailPoint IIQ Developer,mid,Krakow,PL,30,remote,False
6,2023-09-01 16:00:10.800000+00:00,SailPoint IIQ Developer,mid,Katowice,PL,30,remote,False
7,2023-09-01 16:00:10.800000+00:00,SailPoint IIQ Developer,mid,Poznan,PL,30,remote,False
8,2023-09-01 16:00:10.800000+00:00,SailPoint IIQ Developer,mid,Gdansk,PL,30,remote,False
9,2023-09-01 16:00:10.800000+00:00,SailPoint IIQ Developer,mid,Bialystok,PL,30,remote,False


## Requerimiento 3

In [19]:
report = analyst.requirement_3("PL", "2023-08-31", "2023-09-02")
records = report.pop("records")
print_report(report)

{ 'city_least_jobs': 'Познань',
  'city_most_jobs': 'Warszawa',
  'count_cities': 152,
  'count_companies': 527,
  'count_jobs': 3940}


El campo "remote" se creó siendo verdadero si el campo "workplace_type" es igual al valor "remote", de lo contrario es falso:

In [20]:
records_to_dataframe(records)

Unnamed: 0,published_at,title,experience_level,company_name,city,workplace_type,remote,open_to_hire_ukrainians
0,2023-09-01 18:20:00+00:00,Scrum Master,mid,Sunrise System sp. z o.o. sp. k.,Wroclaw,remote,True,False
1,2023-09-01 17:37:00+00:00,IT Business Intelligence Manager,mid,Elis Textile Service,Gdansk,partly_remote,False,False
2,2023-09-01 16:04:48.008000+00:00,Senior Java Developer,senior,Trimetis Services,Krakow,remote,True,False
3,2023-09-01 16:04:44.051000+00:00,Senior Java Developer,senior,Trimetis Services,Poznan,remote,True,False
4,2023-09-01 16:04:41.344000+00:00,Senior Java Developer,senior,Trimetis Services,Warszawa,remote,True,False
...,...,...,...,...,...,...,...,...
3935,2023-08-31 06:22:00+00:00,Angular Developer,mid,nexocode,Krakow,remote,True,False
3936,2023-08-31 06:00:11.023000+00:00,Senior Software Engineer,senior,DevsData LLC,Warszawa,partly_remote,False,False
3937,2023-08-31 06:00:11.023000+00:00,Software Engineer (.NET) - Allegro Pay,mid,Allegro,Warszawa,partly_remote,False,False
3938,2023-08-31 06:00:11.023000+00:00,Analityk Biznesowy,mid,Adamed,Krakow,remote,True,True


## Requerimiento 4

In [21]:
report = analyst.requirement_4(10, "2022-04-09", "2023-09-02", "junior", "US")
records = report.pop("records")
print_report(report)

{ 'count_cities': 6,
  'count_companies': 6,
  'count_jobs': 10,
  'least_jobs_city_count': 1.0,
  'least_jobs_city_name': 'Torun',
  'most_jobs_city_count': 5.0,
  'most_jobs_city_name': 'Krakow',
  'salary_mean': 7131.25}


In [22]:
records_to_dataframe(records)

Unnamed: 0,count_jobs,count_companies,most_jobs_company_count,most_jobs_company_name,salary_best,salary_mean,salary_worst
0,1,1,1,GMS,7500.0,6750.0,6000.0
1,3,2,2,Miquido,8400.0,6600.0,4900.0
2,1,1,1,GMS,7500.0,6750.0,6000.0
3,1,1,1,Posti,,,
4,3,3,1,Softax,12000.0,8500.0,6000.0
5,1,1,1,GMS,7500.0,6750.0,6000.0


## Requerimiento 5

In [23]:
report = analyst.requirement_5(4, "2022-04-09", "2023-09-02")
stats = report.pop("stats_per_level")
print_report(report)

{ 'count_cities': 888,
  'count_jobs': 112553,
  'most_jobs_country_count': 111209,
  'most_jobs_country_name': 'PL'}


In [24]:
records_to_dataframe([{"experience_level": key, **values} for (key, values) in stats.items()])

Unnamed: 0,experience_level,count_companies_with_location,count_companies_without_location,count_unique_skills,most_required_skill_count,most_required_skill_name,least_required_skill_count,least_required_skill_name,minimum_skills_level,count_companies,most_jobs_company_count,most_jobs_company_name,least_jobs_company_count,least_jobs_company_name
0,junior,2029,0,1646,1261,ENGLISH,1,DOCUMENTATION,2.219012,2029,265,Dataedo,1,Macrobond Financial
1,mid,4472,0,5604,6011,JAVA,1,GOOGLE ANALYTICS 360,3.249139,4472,1328,Nokia,1,DealStack
2,senior,3005,0,4367,5276,JAVA,1,QMS,3.87212,3005,1620,WIPRO,1,TP Servglobal Ltd
