# Gerador de Recomendações
---
Este notebook utiliza-se das informações contidas no notebook anterior para realizar as recomendações de perfis para os 85.000+ usuários.

Dado o volume de dados, é paralelizar o programa. Como mencionado anteriormente, o processamento CPU bound em Python só pode ser feito utilizando múltiplos processos. Por isso, este programa tenta minimizar o consumo de memória pois não tivemos tempo para explorar como o conceito de compartilhamento de memória entre processos poderia ser explorado


In [1]:
import os
import os.path
import numpy as np
import pandas as pd
import pynndescent
import duckdb
import pickle
import joblib
import scipy.sparse
import importlib
from datetime import datetime
from knn_indexer import KnnIndex

As variáveis de ambiente abaixo precisam ser configuradas antes da execução deste notebook. Vide o arquivo **setenv.ps1.example**

In [2]:
DATASET                 = os.environ['DATASET']
USER_ROLES              = os.environ['USER_ROLES']
ORGUNIT_ROLES           = os.environ['ORGUNIT_ROLES']
FUNCTION_ROLES          = os.environ['FUNCTION_ROLES']
KNN_INDEX               = os.environ['KNN_INDEX']
HASHED_FEATURE_COUNT    = int(os.environ['HASHED_FEATURE_COUNT'])
HASHED_FEATURES         = os.environ['HASHED_FEATURES']
HASHED_FEATURES_IDX     = os.environ['HASHED_FEATURES_IDX']


Parâmetros do Recomendador

In [3]:
CUTOFF_DISTANCE = 0.5
N_JOBS          = 20 # nº de processos em paralelo. Deve ser escolhido com base no número de CPUs e memória disponível

Arquivos de Saída

In [4]:
NEIGHBORS_CSV           = './DATA/vizinhos.csv' # temporário
RECOMMENDATIONS_CSV     = './DATA/recomendacoes.csv' # # temporário
NEIGHBORS_PARQUET       = './DATA/vizinhos.parquet'
RECOMMENDATIONS_PARQUET = './DATA/recomendacoes.parquet'

Leitura de arquivos de Dados

In [5]:
dataset_df = pd.read_parquet(DATASET)
dataset_df.info(verbose=True, show_counts=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 86332 entries, 0 to 86331
Data columns (total 20 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   chave_usuario            86300 non-null  object
 1   tipo_usuario             86332 non-null  object
 2   centro_custo             39577 non-null  object
 3   lotacao_topo             86332 non-null  object
 4   sigla_lotacao            86332 non-null  object
 5   nome_lotacao             86332 non-null  object
 6   cargo                    86332 non-null  object
 7   enfase                   86332 non-null  object
 8   funcao                   86332 non-null  object
 9   sindicato                39070 non-null  object
 10  area_rh                  85902 non-null  object
 11  imovel                   85902 non-null  object
 12  local_negocio            85900 non-null  object
 13  grupo_prestacao_servico  46192 non-null  object
 14  regime_trabalho          73068 non-nul

In [6]:
user_roles_df = pd.read_parquet(USER_ROLES)
user_roles_df.info(verbose=True, show_counts=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 411381 entries, 0 to 411380
Data columns (total 2 columns):
 #   Column         Non-Null Count   Dtype 
---  ------         --------------   ----- 
 0   chave_usuario  411381 non-null  object
 1   role           411381 non-null  object
dtypes: object(2)
memory usage: 6.3+ MB


In [7]:
orgunit_roles_df = pd.read_parquet(ORGUNIT_ROLES)
orgunit_roles_df.info(verbose=True, show_counts=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 28171 entries, 0 to 28170
Data columns (total 4 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   lotacao_topo   28171 non-null  object
 1   sigla_lotacao  28171 non-null  object
 2   role           28171 non-null  object
 3   atribuicoes    28171 non-null  int64 
dtypes: int64(1), object(3)
memory usage: 880.5+ KB


In [8]:
function_roles_df = pd.read_parquet(FUNCTION_ROLES)
function_roles_df.info(verbose=True, show_counts=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13865 entries, 0 to 13864
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   tipo_usuario  13865 non-null  object
 1   cargo         13865 non-null  object
 2   enfase        13865 non-null  object
 3   funcao        13865 non-null  object
 4   role          13865 non-null  object
 5   atribuicoes   13865 non-null  int64 
dtypes: int64(1), object(5)
memory usage: 650.0+ KB


In [9]:
knn_index = KnnIndex.load(KNN_INDEX)
knn_index.get_neighbors(0)

(array([11080,     0, 59152, 51250, 24581, 24742, 58253,  5381, 29334,
        19220, 81875,  5129, 64794, 59437, 28229,   967, 80180, 58105,
        27382, 26544, 58198, 38320, 27953, 56639, 10148, 14332, 25573,
          908, 26130, 64954, 25287, 58041, 26495, 15523, 28842, 26753,
         5354, 63230,  5418,  9597, 32426, 57225, 28978, 32496, 13568,
         8499, 19391, 25197, 71572, 14190, 33207,  3496, 33537, 52526,
        33525, 33464, 58827,  8329, 53906, 30310, 26543,  1146, 31718,
         5602, 25909, 32453, 27874,  1514, 80161, 63641, 33206, 18867,
        27564, 27567, 54165, 23454, 12672, 27360,  1602,  7567, 18159,
        38197, 68702, 26431, 25670, 65608, 58165, 29158, 24517,  2119,
         2101,  8208, 54040, 25559, 33876, 38327, 80129, 38239, 11533,
         2113, 26032, 17764, 31191,  2108, 81978, 26091, 71450, 37677,
        15539, 85250, 73627, 58949, 54850,  7533, 32389,  2780, 45425,
        45592, 25760,  2102, 16067,  2791,  7597, 36230, 45589, 27496,
      

In [10]:
hashed_features_idx = pd.read_parquet(HASHED_FEATURES_IDX)
hashed_features_idx.info(verbose=True, show_counts=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 86332 entries, 0 to 86331
Data columns (total 8 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   index          86332 non-null  int64 
 1   chave_usuario  86300 non-null  object
 2   lotacao_topo   86332 non-null  object
 3   sigla_lotacao  86332 non-null  object
 4   tipo_usuario   86332 non-null  object
 5   cargo          86332 non-null  object
 6   enfase         86332 non-null  object
 7   funcao         86332 non-null  object
dtypes: int64(1), object(7)
memory usage: 5.3+ MB


In [11]:
conn = duckdb.connect(":memory:")

Os dados necessários para cômputo das atribuições categóricas são pré-processados e escritos como arquivos parquet para otimizar o tempo de execução do recomendador em lote.

Os dados precisam ser consultados por cada um dos N_JOBS processos e é mais rápido ler os dados do disco uma vez para cada processo do que serializar os data frames para cada recomendação.

Os dados que não são mais necessários são posteriormente excluídos

In [12]:
# to remove correlated subqueries and reduce total processing time
atribuido_tipo_usuario_df = conn.execute("SELECT DISTINCT tipo_usuario, role  FROM function_roles_df ORDER BY 1,2").fetchdf()
atribuido_topo_df         = conn.execute("SELECT DISTINCT lotacao_topo, role  FROM orgunit_roles_df  ORDER BY 1,2").fetchdf()
atribuido_lotacao_df      = conn.execute("SELECT DISTINCT sigla_lotacao, role FROM orgunit_roles_df  ORDER BY 1,2").fetchdf()
atribuido_cargo_df        = conn.execute("SELECT DISTINCT cargo, role         FROM function_roles_df ORDER BY 1,2").fetchdf()
atribuido_enfase_df       = conn.execute("SELECT DISTINCT enfase, role        FROM function_roles_df ORDER BY 1,2").fetchdf()
atribuido_funcao_df       = conn.execute("SELECT DISTINCT funcao, role        FROM function_roles_df ORDER BY 1,2").fetchdf()

# write files dataframes to work dir
dataset_df.to_parquet("./WORK/dataset.parquet")
user_roles_df.to_parquet("./WORK/user_roles.parquet")
orgunit_roles_df.to_parquet("./WORK/orgunit_roles.parquet")
function_roles_df.to_parquet("./WORK/function_roles.parquet")
hashed_features_idx.to_parquet("./WORK/hashed_features_idx.parquet")
atribuido_tipo_usuario_df.to_parquet("./WORK/atribuido_tipo_usuario.parquet")
atribuido_topo_df.to_parquet("./WORK/atribuido_topo.parquet")
atribuido_lotacao_df.to_parquet("./WORK/atribuido_lotacao.parquet")
atribuido_cargo_df.to_parquet("./WORK/atribuido_cargo.parquet")
atribuido_enfase_df.to_parquet("./WORK/atribuido_enfase.parquet")
atribuido_funcao_df.to_parquet("./WORK/atribuido_funcao.parquet")

# save memory
del user_roles_df
del orgunit_roles_df
del function_roles_df
del hashed_features_idx
del atribuido_tipo_usuario_df
del atribuido_topo_df
del atribuido_lotacao_df
del atribuido_cargo_df
del atribuido_enfase_df
del atribuido_funcao_df

Como o recomendador irá utilizar múltiplos processos, é preciso que o mesmo seja definido em um módulo diferente do módulo \_\_main\_\_ para que a serialização funcione conforme esperado.

O arquivo abaixo contém a lógica de recomendação de perfis contida no notebook anterior adaptada para execução em um cenário contendo múltiplos arquivos.

Reiteramos aqui o quão frustrante pode ser o processo de tornar um protótipo em Python em um produto que possa atender as necessidades do negócio. A linguagem Python é ótima para prototipação mas para volumes de dados mais elevados, o GIL (Global Interpreter Lock) afeta negativamente a DX (developer experience)

In [13]:
%%writefile recommender.py
import os
import pandas as pd
import duckdb
import pickle
import scipy
import numpy as np
import threading
from knn_indexer import KnnIndex

HASHED_FEATURE_COUNT    = int(os.environ['HASHED_FEATURE_COUNT'])
KNN_INDEX               = os.environ['KNN_INDEX']
CUTOFF_DISTANCE         = 0.5

class RoleRecommenderState:
    
    # classe utilizada para guardar o estado dos dados do recomendador.
    # A conexão do DuckDB não é serializável e portanto é armazenada em uma variável thread local
    
    def __init__(self):
        self.tld                 = threading.local()
        self.knn_index           = KnnIndex.load(KNN_INDEX)
        self.hashed_features_df  = pd.read_parquet("./WORK/hashed_features_idx.parquet")
        self.on_start_thread()
        
    def on_start_thread(self):
        self.tld.conn                = duckdb.connect(":memory:")
        self.conn.execute("SET enable_progress_bar=false")
        
    @property
    def conn(self):
        return self.tld.conn
                    
class RoleRecommender:
    
    STATE = None
    
    SQL_NEIGHBORS = """
        SELECT  DISTINCT 
                z.chave_usuario as usuario_busca
        ,       a.distance
        ,       b.chave_usuario
        ,       b.sigla_lotacao
        ,       b.cargo
        ,       b.enfase
        ,       b.funcao
        ,       c.role
        ,       CASE WHEN e.role IS NULL THEN 0 ELSE 1 END                   AS atribuido_tipo_usuario
        ,       CASE WHEN f.role IS NULL THEN 0 ELSE 1 END                   AS atribuido_topo        
        ,       CASE WHEN g.role IS NULL THEN 0 ELSE 1 END                   AS atribuido_lotacao
        ,       CASE WHEN h.role IS NULL THEN 0 ELSE 1 END                   AS atribuido_cargo        
        ,       CASE WHEN i.role IS NULL THEN 0 ELSE 1 END                   AS atribuido_enfase
        ,       CASE WHEN j.role IS NULL THEN 0 ELSE 1 END                   AS atribuido_funcao
        FROM    search_df z
                --
                CROSS JOIN _neighbors_df a
                --
                INNER JOIN "./WORK/hashed_features_idx.parquet" b
                ON a.index              = b.index
                --
                INNER JOIN "./WORK/user_roles.parquet" c
                ON b.chave_usuario      = c.chave_usuario
                --
                LEFT OUTER JOIN "./WORK/user_roles.parquet" d
                ON  z.chave_usuario     = d.chave_usuario
                AND c.role              = d.role
                --
                LEFT OUTER JOIN "./WORK/atribuido_tipo_usuario.parquet" e
                ON  z.tipo_usuario      = e.tipo_usuario 
                AND c.role              = e.role
                --
                LEFT OUTER JOIN "./WORK/atribuido_topo.parquet" f
                ON  z.lotacao_topo      = f.lotacao_topo 
                AND c.role              = f.role
                --
                LEFT OUTER JOIN "./WORK/atribuido_lotacao.parquet" g
                ON  z.sigla_lotacao     = g.sigla_lotacao 
                AND c.role              = g.role
                --
                LEFT OUTER JOIN "./WORK/atribuido_cargo.parquet" h
                ON  z.cargo             = h.cargo 
                AND c.role              = h.role
                --
                LEFT OUTER JOIN "./WORK/atribuido_enfase.parquet" i
                ON  z.enfase            = i.enfase 
                AND c.role              = i.role
                --
                LEFT OUTER JOIN "./WORK/atribuido_funcao.parquet" j
                ON  z.funcao            = j.funcao 
                AND c.role              = j.role
                --
        WHERE   b.chave_usuario         <> z.chave_usuario
        AND     z.chave_usuario         IS NOT NULL
        AND     d.role                  IS NULL
        AND     c.role                  NOT IN ('Z:BC_USO_GERAL')
        ORDER   BY a.distance    
    """
    
    SQL_RECOMMENDATIONS = """
        WITH cte_roles AS (
            SELECT  a.usuario_busca
            ,       a.role
            ,       MAX(atribuido_tipo_usuario)     AS atribuido_tipo_usuario
            ,       MAX(atribuido_topo)             AS atribuido_topo
            ,       MAX(atribuido_lotacao)          AS atribuido_lotacao
            ,       MAX(atribuido_cargo)            AS atribuido_cargo
            ,       MAX(atribuido_enfase)           AS atribuido_enfase
            ,       MAX(atribuido_funcao)           AS atribuido_funcao
            ,       MAX(atribuido_tipo_usuario)
            +       MAX(atribuido_topo)
            +       MAX(atribuido_lotacao)
            +       MAX(atribuido_cargo)
            +       MAX(atribuido_enfase)
            +       MAX(atribuido_funcao)           AS atribuicoes_categoricas
            ,       MIN(distance)                   AS min_distance
            ,       AVG(distance)                   AS avg_distance
            ,       ( (1 - MIN(distance)) + AVG(distance)) * 0.5 AS distance_factor     
            ,       COUNT()                         AS qtd_atribuicoes
            FROM    neighbors_df a
            WHERE   a.distance                      <= ?
            GROUP   BY a.usuario_busca
            ,       a.role
        )
        , cte_role_scores AS (
            SELECT  a.role
            ,       ( a.atribuicoes_categoricas * a.qtd_atribuicoes * a.distance_factor )
            /       ( (SELECT MAX(atribuicoes_categoricas) FROM cte_roles) 
            *         (SELECT MAX(qtd_atribuicoes)         FROM cte_roles)
            *         (SELECT MAX(distance_factor)         FROM cte_roles)
                    ) AS score   
            FROM    cte_roles a
        )
        SELECT  a.usuario_busca                AS chave_usuario
        ,       a.role
        ,       a.atribuicoes_categoricas
        ,       a.min_distance
        ,       a.avg_distance
        ,       a.distance_factor
        ,       a.qtd_atribuicoes
        ,       b.score
        ,       a.atribuido_tipo_usuario
        ,       a.atribuido_topo
        ,       a.atribuido_lotacao
        ,       a.atribuido_cargo
        ,       a.atribuido_enfase
        ,       a.atribuido_funcao
        FROM    cte_roles a
                --
                INNER JOIN cte_role_scores b
                ON  a.role                      = b.role
                --
        WHERE   b.score                         > 0.0
        ORDER   BY a.usuario_busca
        ,       b.score DESC
"""
    
    def __init__(self):
        pass
    
    @property
    def state(self):
        return RoleRecommender.STATE
    
    def _get_neighbours(self, search_idx):
        indices, distances = self.state.knn_index.get_neighbors(search_idx)
        _neighbors_df = pd.DataFrame({"index": indices, "distance": distances})
        self.state.conn.register("_neighbors_df", _neighbors_df)
        neighbors_df = self.state.conn.execute(self.SQL_NEIGHBORS).fetchdf()
        self.state.conn.unregister("_neighbors_df")
        return neighbors_df
    
    @classmethod
    def on_process_start(klass):
        klass.STATE = RoleRecommenderState()
        
    def recommend_roles(self, search_user, cutoff_distance=CUTOFF_DISTANCE):        
        search_df          = self.state.hashed_features_df[self.state.hashed_features_df["chave_usuario"] == search_user].copy()
        self.state.conn.register('search_df', search_df)
        
        search_index  = search_df.iloc[0]["index"]
        neighbors_df  = self._get_neighbours(search_index)
        self.state.conn.register("neighbors_df", neighbors_df)
        
        params = (cutoff_distance,)
        recommendations_df = self.state.conn.execute(self.SQL_RECOMMENDATIONS, params).fetchdf()
        self.state.conn.unregister("search_df")
        self.state.conn.unregister("neighbors_df")
        return search_df, neighbors_df, recommendations_df
        
        
def recommend_roles(search_user):
    if RoleRecommender.STATE is None:
        RoleRecommender.on_process_start()
    return RoleRecommender().recommend_roles(search_user)


Overwriting recommender.py


O código abaixo serve para recarregar o módulo em que a função **recommend_roles** é criada.

Por padrão um módulo não é reimportado caso uma instrução import seja repetida

In [14]:
import importlib
import recommender
importlib.reload(recommender)
from recommender import RoleRecommender, recommend_roles

Segue abaixo uma chamada da função de recomendação de perfis para ilustrar suas saídas

In [15]:
search_df, neighbors_df, recommendations_df = recommend_roles("U4UL")
display(search_df)
display(neighbors_df)
display(recommendations_df)
del search_df
del neighbors_df
del recommendations_df

Unnamed: 0,index,chave_usuario,lotacao_topo,sigla_lotacao,tipo_usuario,cargo,enfase,funcao
75984,75984,U4UL,TIC,TIC/CORP/DSCESI/DS-PDDS,EMPREGADO,PROF. PETROBRAS DE NIVEL SUPERIOR SENIOR,PCR NS ANALISE DE SISTEMAS ENG SOFTWARE,


Unnamed: 0,usuario_busca,distance,chave_usuario,sigla_lotacao,cargo,enfase,funcao,role,atribuido_tipo_usuario,atribuido_topo,atribuido_lotacao,atribuido_cargo,atribuido_enfase,atribuido_funcao
0,U4UL,0.132842,UPOT,TIC/CORP/DSCESI/DS-PDDS,PROF. PETROBRAS DE NIVEL SUPERIOR SENIOR,PCR NS ANALISE DE SISTEMAS ENG SOFTWARE,,Z:MM_PB001_CONSULTANTE_INF_MES,1,1,1,1,1,1
1,U4UL,0.132842,UPOT,TIC/CORP/DSCESI/DS-PDDS,PROF. PETROBRAS DE NIVEL SUPERIOR SENIOR,PCR NS ANALISE DE SISTEMAS ENG SOFTWARE,,Z:MM_PB001_CONS_COCKPIT,1,1,1,1,0,1
2,U4UL,0.163098,X0BY,TIC/CORP/RJTI/PN-STSMS,PROF. PETROBRAS DE NIVEL SUPERIOR MASTER,PCR NS ANALISE DE SISTEMAS ENG SOFTWARE,,Z:PS_PBD06_APONTADOR_HORAS,1,0,0,1,1,1
3,U4UL,0.163873,UR5J,TIC/CORP/FRIE,PROF. PETROBRAS DE NIVEL SUPERIOR SENIOR,PCR NS ANALISE DE SISTEMAS ENG SOFTWARE,,Z:MM_PBAUT_FISCAL_CONTRATO,1,1,0,1,1,1
4,U4UL,0.163873,UR5J,TIC/CORP/FRIE,PROF. PETROBRAS DE NIVEL SUPERIOR SENIOR,PCR NS ANALISE DE SISTEMAS ENG SOFTWARE,,Z:FI_FT_PB001_SUP_PC_TIMP,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
74,U4UL,0.271500,UPM1,TIC/CORP/FRIE/PN-SDE,PROF. PETROBRAS DE NIVEL SUPERIOR SENIOR,PCR NS ANALISE DE SISTEMAS ENG SOFTWARE,,Z:BC_CUTOVER_S4,1,1,0,1,1,1
75,U4UL,0.271500,UPM1,TIC/CORP/FRIE/PN-SDE,PROF. PETROBRAS DE NIVEL SUPERIOR SENIOR,PCR NS ANALISE DE SISTEMAS ENG SOFTWARE,,Z:BC_SUPORTE_FUNCIONAL_MDG,1,1,0,0,1,1
76,U4UL,0.271500,XF79,TIC/CORP/FRIE/PN-SDE,PROF. PETROBRAS DE NIVEL SUPERIOR MASTER,PCR NS ANALISE DE SISTEMAS ENG SOFTWARE,,Z:BC_SUBCOMISSAO_PERFIS_DISP,0,0,0,0,0,1
77,U4UL,0.271500,XF79,TIC/CORP/FRIE/PN-SDE,PROF. PETROBRAS DE NIVEL SUPERIOR MASTER,PCR NS ANALISE DE SISTEMAS ENG SOFTWARE,,Z:GR_AC_PB001_GESTOR_MACRO,0,0,0,0,0,0


Unnamed: 0,chave_usuario,role,atribuicoes_categoricas,min_distance,avg_distance,distance_factor,qtd_atribuicoes,score,atribuido_tipo_usuario,atribuido_topo,atribuido_lotacao,atribuido_cargo,atribuido_enfase,atribuido_funcao
0,U4UL,Z:MM_PBAUT_FISCAL_CONTRATO,5,0.163873,0.23744,0.536784,10,0.833333,1,1,0,1,1,1
1,U4UL,Z:BC_CUTOVER_FIORI,5,0.202262,0.242436,0.520087,7,0.565189,1,1,0,1,1,1
2,U4UL,Z:BC_CUTOVER_S4,5,0.202262,0.242436,0.520087,7,0.565189,1,1,0,1,1,1
3,U4UL,Z:PS_PB003_APONTADOR_HORAS,5,0.188604,0.229549,0.520472,6,0.484806,1,1,0,1,1,1
4,U4UL,Z:BC_CHARM_GMUD,5,0.203686,0.239383,0.517848,6,0.482362,1,1,0,1,1,1
5,U4UL,Z:MM_ARIBA_GERAL,5,0.202262,0.229623,0.513681,4,0.318987,1,1,0,1,1,1
6,U4UL,Z:PS_PBD06_APONTADOR_HORAS,4,0.163098,0.211524,0.524213,4,0.260422,1,0,0,1,1,1
7,U4UL,Z:MM_PB001_CONSULTANTE_INF_MES,6,0.132842,0.180144,0.523651,2,0.195107,1,1,1,1,1,1
8,U4UL,Z:MM_PB001_LIBERADOR_NL_PGTO,5,0.266413,0.267499,0.500543,2,0.155414,1,1,0,1,1,1
9,U4UL,Z:BC_DESENVOLVEDOR_PERFORMANCE,4,0.174393,0.220403,0.523005,2,0.129911,1,1,0,0,1,1


O trecho de código abaixo serve para estimar o tempo de processamento de todas as recomendações em uma única thread ou processo, desconsiderando o tempo necessário para escrita dos dados

In [16]:
begin = datetime.now()
for i in range(50):
    search_df, neighbors_df, recommendations_df = recommend_roles("U4UL")
end = datetime.now()

delta                  = end - begin
time_per_user          = delta.total_seconds() / 50
total_users            = len(dataset_df["chave_usuario"])
expected_total_time_s  = time_per_user * total_users
expected_total_time_m  = expected_total_time_s / 60
expected_total_time_h  = expected_total_time_m/ 60
print(f"""
delta                  = {delta}
time/user              = {time_per_user}
total users            = {total_users}
expected total time(s) = {expected_total_time_s} seconds
expected total_time(m) = {expected_total_time_m} minutes
expected total_time(h) = {expected_total_time_h} hours

time to write csv is not accounted
""")


delta                  = 0:00:10.878444
time/user              = 0.21756888
total users            = 86332
expected total time(s) = 18783.15654816 seconds
expected total_time(m) = 313.052609136 minutes
expected total_time(h) = 5.2175434856 hours

time to write csv is not accounted



Os resultados intermediários do recomendador são escritos em modo append em dois arquivos CSV. A cada processamento este arquivos precisam ser sempre excluídos. Ao término, estes arquivos são convertidos e salvos no formato parquet.


O recomendador monta uma lista de chaves de usuário e as processa usando um ProcessPoolExecutor para paralelizar as recomendações. As recomendações são processadas usando a função **recommend_roles** criada acima.

Como não existe necessidade de ordenação dos resultados das recomendações, os objetos contendo os resultado são processados assim que terminam por meio da função **as_completed** do módulo concurrent.futures.

Apesar de pequeno, o trecho de código abaixo apresentou diversos vazamentos de memória e foi necessário cuidado e atenção para que as recomendações rodassem em um tempo satisfatório.

In [17]:
from concurrent.futures import ProcessPoolExecutor, as_completed

if os.path.exists(NEIGHBORS_CSV):
    print(f"removing existing neighbors file")
    os.unlink(NEIGHBORS_CSV)

if os.path.exists(RECOMMENDATIONS_CSV):
    print(f"removing existing recommendations file")
    os.unlink(RECOMMENDATIONS_CSV)

search_users           = list([u for u in dataset_df["chave_usuario"].tolist() if u])
neighbors_acc_df       = None
recommendations_acc_df = None
write_headers          = True

# create the futures
with ProcessPoolExecutor(
    max_workers         = N_JOBS
,   initializer         = RoleRecommender.on_process_start
) as executor:
    futures = (executor.submit(recommend_roles, search_user) for search_user in search_users)
    for i, future in enumerate(as_completed(futures)):
        if i % 50 == 0:
            print(F" -> {datetime.now()} :: {i}/{len(search_users)}")        
            if i > 0:
                neighbors_acc_df.to_csv(NEIGHBORS_CSV,             mode='a', header=write_headers, index=False)
                recommendations_acc_df.to_csv(RECOMMENDATIONS_CSV, mode='a', header=write_headers, index=False)
                write_headers = False
                # accumulate results
                neighbors_acc_df.drop(neighbors_acc_df.index, inplace=True)
                recommendations_acc_df.drop(recommendations_acc_df.index, inplace=True)

        _, neighbors_df, recommendations_df = future.result()

        if i > 0:
            neighbors_acc_df       = pd.concat([neighbors_acc_df, neighbors_df])
            recommendations_acc_df = pd.concat([recommendations_acc_df, recommendations_df])        
        else:
            neighbors_acc_df       = neighbors_df
            recommendations_acc_df = recommendations_df
    
    del future
    del _
    del neighbors_df
    del recommendations_df
    
print(F" -> {datetime.now()} :: {len(search_users)}/{len(search_users)}")        
neighbors_acc_df.to_csv(NEIGHBORS_CSV,             mode='a', header=write_headers, index=False)
recommendations_acc_df.to_csv(RECOMMENDATIONS_CSV, mode='a', header=write_headers, index=False)


removing existing neighbors file
removing existing recommendations file
 -> 2023-07-16 17:31:15.387840 :: 0/86300
 -> 2023-07-16 17:31:18.172629 :: 50/86300
 -> 2023-07-16 17:31:20.258345 :: 100/86300
 -> 2023-07-16 17:31:22.280146 :: 150/86300
 -> 2023-07-16 17:31:24.309343 :: 200/86300
 -> 2023-07-16 17:31:26.301352 :: 250/86300
 -> 2023-07-16 17:31:28.401475 :: 300/86300
 -> 2023-07-16 17:31:46.292806 :: 350/86300
 -> 2023-07-16 17:31:46.888034 :: 400/86300
 -> 2023-07-16 17:31:47.425063 :: 450/86300
 -> 2023-07-16 17:31:48.234659 :: 500/86300
 -> 2023-07-16 17:31:48.711081 :: 550/86300
 -> 2023-07-16 17:31:49.390193 :: 600/86300
 -> 2023-07-16 17:31:50.083578 :: 650/86300
 -> 2023-07-16 17:31:50.928348 :: 700/86300
 -> 2023-07-16 17:31:51.659347 :: 750/86300
 -> 2023-07-16 17:31:58.397831 :: 800/86300
 -> 2023-07-16 17:31:59.052626 :: 850/86300
 -> 2023-07-16 17:31:59.694906 :: 900/86300
 -> 2023-07-16 17:32:00.604588 :: 950/86300
 -> 2023-07-16 17:32:01.535287 :: 1000/86300
 -> 20

 -> 2023-07-16 17:39:23.649168 :: 9050/86300
 -> 2023-07-16 17:39:25.195825 :: 9100/86300
 -> 2023-07-16 17:39:26.473648 :: 9150/86300
 -> 2023-07-16 17:39:28.413168 :: 9200/86300
 -> 2023-07-16 17:39:29.557506 :: 9250/86300
 -> 2023-07-16 17:39:30.536735 :: 9300/86300
 -> 2023-07-16 17:39:32.326893 :: 9350/86300
 -> 2023-07-16 17:39:33.523901 :: 9400/86300
 -> 2023-07-16 17:39:35.356143 :: 9450/86300
 -> 2023-07-16 17:39:37.405169 :: 9500/86300
 -> 2023-07-16 17:39:39.266352 :: 9550/86300
 -> 2023-07-16 17:39:40.927275 :: 9600/86300
 -> 2023-07-16 17:39:41.760671 :: 9650/86300
 -> 2023-07-16 17:39:43.251520 :: 9700/86300
 -> 2023-07-16 17:39:45.234038 :: 9750/86300
 -> 2023-07-16 17:39:47.897551 :: 9800/86300
 -> 2023-07-16 17:39:48.893203 :: 9850/86300
 -> 2023-07-16 17:39:50.402533 :: 9900/86300
 -> 2023-07-16 17:39:51.726818 :: 9950/86300
 -> 2023-07-16 17:39:53.002432 :: 10000/86300
 -> 2023-07-16 17:39:54.004554 :: 10050/86300
 -> 2023-07-16 17:39:55.712177 :: 10100/86300
 -> 202

 -> 2023-07-16 17:44:19.992964 :: 17950/86300
 -> 2023-07-16 17:44:21.305770 :: 18000/86300
 -> 2023-07-16 17:44:22.656124 :: 18050/86300
 -> 2023-07-16 17:44:23.684271 :: 18100/86300
 -> 2023-07-16 17:44:25.656123 :: 18150/86300
 -> 2023-07-16 17:44:26.996985 :: 18200/86300
 -> 2023-07-16 17:44:28.907682 :: 18250/86300
 -> 2023-07-16 17:44:32.079593 :: 18300/86300
 -> 2023-07-16 17:44:33.444919 :: 18350/86300
 -> 2023-07-16 17:44:35.422652 :: 18400/86300
 -> 2023-07-16 17:44:37.042661 :: 18450/86300
 -> 2023-07-16 17:44:38.665023 :: 18500/86300
 -> 2023-07-16 17:44:39.593704 :: 18550/86300
 -> 2023-07-16 17:44:41.336457 :: 18600/86300
 -> 2023-07-16 17:44:43.759381 :: 18650/86300
 -> 2023-07-16 17:44:44.965047 :: 18700/86300
 -> 2023-07-16 17:44:46.936689 :: 18750/86300
 -> 2023-07-16 17:44:48.589563 :: 18800/86300
 -> 2023-07-16 17:44:50.400668 :: 18850/86300
 -> 2023-07-16 17:44:52.196190 :: 18900/86300
 -> 2023-07-16 17:44:53.793907 :: 18950/86300
 -> 2023-07-16 17:44:55.266829 :: 

 -> 2023-07-16 17:48:51.893041 :: 26900/86300
 -> 2023-07-16 17:48:52.749787 :: 26950/86300
 -> 2023-07-16 17:48:53.557652 :: 27000/86300
 -> 2023-07-16 17:48:54.901542 :: 27050/86300
 -> 2023-07-16 17:48:55.913683 :: 27100/86300
 -> 2023-07-16 17:48:57.015608 :: 27150/86300
 -> 2023-07-16 17:48:58.318240 :: 27200/86300
 -> 2023-07-16 17:48:59.592212 :: 27250/86300
 -> 2023-07-16 17:49:00.865764 :: 27300/86300
 -> 2023-07-16 17:49:02.291551 :: 27350/86300
 -> 2023-07-16 17:49:03.652301 :: 27400/86300
 -> 2023-07-16 17:49:05.139706 :: 27450/86300
 -> 2023-07-16 17:49:06.400033 :: 27500/86300
 -> 2023-07-16 17:49:07.716924 :: 27550/86300
 -> 2023-07-16 17:49:09.153145 :: 27600/86300
 -> 2023-07-16 17:49:10.515334 :: 27650/86300
 -> 2023-07-16 17:49:11.812219 :: 27700/86300
 -> 2023-07-16 17:49:13.104798 :: 27750/86300
 -> 2023-07-16 17:49:14.427857 :: 27800/86300
 -> 2023-07-16 17:49:15.718981 :: 27850/86300
 -> 2023-07-16 17:49:17.150682 :: 27900/86300
 -> 2023-07-16 17:49:18.401688 :: 

 -> 2023-07-16 17:54:05.612468 :: 35850/86300
 -> 2023-07-16 17:54:07.624870 :: 35900/86300
 -> 2023-07-16 17:54:09.549213 :: 35950/86300
 -> 2023-07-16 17:54:11.464795 :: 36000/86300
 -> 2023-07-16 17:54:13.462239 :: 36050/86300
 -> 2023-07-16 17:54:15.476609 :: 36100/86300
 -> 2023-07-16 17:54:17.495222 :: 36150/86300
 -> 2023-07-16 17:54:19.604761 :: 36200/86300
 -> 2023-07-16 17:54:21.524788 :: 36250/86300
 -> 2023-07-16 17:54:23.545317 :: 36300/86300
 -> 2023-07-16 17:54:25.768246 :: 36350/86300
 -> 2023-07-16 17:54:27.737307 :: 36400/86300
 -> 2023-07-16 17:54:29.762118 :: 36450/86300
 -> 2023-07-16 17:54:31.964620 :: 36500/86300
 -> 2023-07-16 17:54:33.708470 :: 36550/86300
 -> 2023-07-16 17:54:35.601535 :: 36600/86300
 -> 2023-07-16 17:54:37.623774 :: 36650/86300
 -> 2023-07-16 17:54:39.657597 :: 36700/86300
 -> 2023-07-16 17:54:41.500584 :: 36750/86300
 -> 2023-07-16 17:54:43.525060 :: 36800/86300
 -> 2023-07-16 17:54:45.417066 :: 36850/86300
 -> 2023-07-16 17:54:47.344987 :: 

 -> 2023-07-16 18:00:06.670387 :: 44800/86300
 -> 2023-07-16 18:00:08.528086 :: 44850/86300
 -> 2023-07-16 18:00:10.509493 :: 44900/86300
 -> 2023-07-16 18:00:12.502855 :: 44950/86300
 -> 2023-07-16 18:00:14.366053 :: 45000/86300
 -> 2023-07-16 18:00:16.133295 :: 45050/86300
 -> 2023-07-16 18:00:18.248220 :: 45100/86300
 -> 2023-07-16 18:00:20.143899 :: 45150/86300
 -> 2023-07-16 18:00:21.936511 :: 45200/86300
 -> 2023-07-16 18:00:23.958116 :: 45250/86300
 -> 2023-07-16 18:00:25.955049 :: 45300/86300
 -> 2023-07-16 18:00:27.813323 :: 45350/86300
 -> 2023-07-16 18:00:29.891437 :: 45400/86300
 -> 2023-07-16 18:00:31.966147 :: 45450/86300
 -> 2023-07-16 18:00:33.937478 :: 45500/86300
 -> 2023-07-16 18:00:35.748513 :: 45550/86300
 -> 2023-07-16 18:00:37.851369 :: 45600/86300
 -> 2023-07-16 18:00:39.682280 :: 45650/86300
 -> 2023-07-16 18:00:41.719186 :: 45700/86300
 -> 2023-07-16 18:00:43.768918 :: 45750/86300
 -> 2023-07-16 18:00:45.653494 :: 45800/86300
 -> 2023-07-16 18:00:47.971842 :: 

 -> 2023-07-16 18:06:01.049860 :: 53750/86300
 -> 2023-07-16 18:06:02.875257 :: 53800/86300
 -> 2023-07-16 18:06:04.909689 :: 53850/86300
 -> 2023-07-16 18:06:06.885341 :: 53900/86300
 -> 2023-07-16 18:06:08.725674 :: 53950/86300
 -> 2023-07-16 18:06:10.822595 :: 54000/86300
 -> 2023-07-16 18:06:12.657452 :: 54050/86300
 -> 2023-07-16 18:06:14.666631 :: 54100/86300
 -> 2023-07-16 18:06:16.688355 :: 54150/86300
 -> 2023-07-16 18:06:19.779324 :: 54200/86300
 -> 2023-07-16 18:06:21.734901 :: 54250/86300
 -> 2023-07-16 18:06:23.909225 :: 54300/86300
 -> 2023-07-16 18:06:26.087504 :: 54350/86300
 -> 2023-07-16 18:06:27.393297 :: 54400/86300
 -> 2023-07-16 18:06:29.355396 :: 54450/86300
 -> 2023-07-16 18:06:30.999729 :: 54500/86300
 -> 2023-07-16 18:06:32.261119 :: 54550/86300
 -> 2023-07-16 18:06:33.728594 :: 54600/86300
 -> 2023-07-16 18:06:35.542049 :: 54650/86300
 -> 2023-07-16 18:06:37.535805 :: 54700/86300
 -> 2023-07-16 18:06:39.608720 :: 54750/86300
 -> 2023-07-16 18:06:41.549567 :: 

 -> 2023-07-16 18:11:59.112402 :: 62700/86300
 -> 2023-07-16 18:12:01.613279 :: 62750/86300
 -> 2023-07-16 18:12:03.652513 :: 62800/86300
 -> 2023-07-16 18:12:05.700302 :: 62850/86300
 -> 2023-07-16 18:12:08.155810 :: 62900/86300
 -> 2023-07-16 18:12:10.442056 :: 62950/86300
 -> 2023-07-16 18:12:13.185212 :: 63000/86300
 -> 2023-07-16 18:12:15.470787 :: 63050/86300
 -> 2023-07-16 18:12:17.612033 :: 63100/86300
 -> 2023-07-16 18:12:19.656994 :: 63150/86300
 -> 2023-07-16 18:12:23.465471 :: 63200/86300
 -> 2023-07-16 18:12:25.758260 :: 63250/86300
 -> 2023-07-16 18:12:27.808963 :: 63300/86300
 -> 2023-07-16 18:12:29.861583 :: 63350/86300
 -> 2023-07-16 18:12:32.129874 :: 63400/86300
 -> 2023-07-16 18:12:34.457752 :: 63450/86300
 -> 2023-07-16 18:12:36.528107 :: 63500/86300
 -> 2023-07-16 18:12:38.794523 :: 63550/86300
 -> 2023-07-16 18:12:41.008696 :: 63600/86300
 -> 2023-07-16 18:12:42.364908 :: 63650/86300
 -> 2023-07-16 18:12:43.739928 :: 63700/86300
 -> 2023-07-16 18:12:44.969845 :: 

 -> 2023-07-16 18:16:33.595230 :: 71650/86300
 -> 2023-07-16 18:16:34.545790 :: 71700/86300
 -> 2023-07-16 18:16:35.695056 :: 71750/86300
 -> 2023-07-16 18:16:36.936488 :: 71800/86300
 -> 2023-07-16 18:16:38.394804 :: 71850/86300
 -> 2023-07-16 18:16:39.672839 :: 71900/86300
 -> 2023-07-16 18:16:41.429653 :: 71950/86300
 -> 2023-07-16 18:16:43.530106 :: 72000/86300
 -> 2023-07-16 18:16:45.420120 :: 72050/86300
 -> 2023-07-16 18:16:46.864880 :: 72100/86300
 -> 2023-07-16 18:16:48.244318 :: 72150/86300
 -> 2023-07-16 18:16:49.555549 :: 72200/86300
 -> 2023-07-16 18:16:50.879260 :: 72250/86300
 -> 2023-07-16 18:16:52.246654 :: 72300/86300
 -> 2023-07-16 18:16:53.461808 :: 72350/86300
 -> 2023-07-16 18:16:54.402089 :: 72400/86300
 -> 2023-07-16 18:16:55.556004 :: 72450/86300
 -> 2023-07-16 18:16:56.522974 :: 72500/86300
 -> 2023-07-16 18:16:57.668027 :: 72550/86300
 -> 2023-07-16 18:16:58.935886 :: 72600/86300
 -> 2023-07-16 18:17:00.050335 :: 72650/86300
 -> 2023-07-16 18:17:01.761345 :: 

 -> 2023-07-16 18:20:36.650052 :: 80600/86300
 -> 2023-07-16 18:20:39.078286 :: 80650/86300
 -> 2023-07-16 18:20:41.212222 :: 80700/86300
 -> 2023-07-16 18:20:43.660431 :: 80750/86300
 -> 2023-07-16 18:20:45.372428 :: 80800/86300
 -> 2023-07-16 18:20:47.214937 :: 80850/86300
 -> 2023-07-16 18:20:49.409384 :: 80900/86300
 -> 2023-07-16 18:20:51.353648 :: 80950/86300
 -> 2023-07-16 18:20:53.010620 :: 81000/86300
 -> 2023-07-16 18:20:55.254815 :: 81050/86300
 -> 2023-07-16 18:20:57.862673 :: 81100/86300
 -> 2023-07-16 18:20:59.445596 :: 81150/86300
 -> 2023-07-16 18:21:01.343125 :: 81200/86300
 -> 2023-07-16 18:21:03.276480 :: 81250/86300
 -> 2023-07-16 18:21:05.500847 :: 81300/86300
 -> 2023-07-16 18:21:07.522777 :: 81350/86300
 -> 2023-07-16 18:21:09.892698 :: 81400/86300
 -> 2023-07-16 18:21:11.927049 :: 81450/86300
 -> 2023-07-16 18:21:14.351409 :: 81500/86300
 -> 2023-07-16 18:21:16.057702 :: 81550/86300
 -> 2023-07-16 18:21:18.505634 :: 81600/86300
 -> 2023-07-16 18:21:20.513325 :: 

Os CSV's contendo os resultados são posteriormente lidos e convertidos para parquet, representando os artefatos finais entregues para a área de perfis da gerência de gestão de identidades e acessos da área de Segurança da Informação.

In [18]:
recommendations_acc_df.info(verbose=True, show_counts=True)

<class 'pandas.core.frame.DataFrame'>
Index: 2858 entries, 0 to 118
Data columns (total 14 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   chave_usuario            2858 non-null   object 
 1   role                     2858 non-null   object 
 2   atribuicoes_categoricas  2858 non-null   int32  
 3   min_distance             2858 non-null   float32
 4   avg_distance             2858 non-null   float64
 5   distance_factor          2858 non-null   float64
 6   qtd_atribuicoes          2858 non-null   int64  
 7   score                    2858 non-null   float64
 8   atribuido_tipo_usuario   2858 non-null   int32  
 9   atribuido_topo           2858 non-null   int32  
 10  atribuido_lotacao        2858 non-null   int32  
 11  atribuido_cargo          2858 non-null   int32  
 12  atribuido_enfase         2858 non-null   int32  
 13  atribuido_funcao         2858 non-null   int32  
dtypes: float32(1), float64(3), int

In [19]:
neighbors_df = pd.read_csv(NEIGHBORS_CSV)
neighbors_df.info(verbose=True, show_counts=True)
neighbors_df.to_parquet(NEIGHBORS_PARQUET)

  neighbors_df = pd.read_csv(NEIGHBORS_CSV)


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19232436 entries, 0 to 19232435
Data columns (total 14 columns):
 #   Column                  Non-Null Count     Dtype  
---  ------                  --------------     -----  
 0   usuario_busca           19232436 non-null  object 
 1   distance                19232436 non-null  float64
 2   chave_usuario           19232436 non-null  object 
 3   sigla_lotacao           19232435 non-null  object 
 4   cargo                   8129361 non-null   object 
 5   enfase                  7922264 non-null   object 
 6   funcao                  1781024 non-null   object 
 7   role                    19232436 non-null  object 
 8   atribuido_tipo_usuario  19232436 non-null  int64  
 9   atribuido_topo          19232436 non-null  int64  
 10  atribuido_lotacao       19232436 non-null  int64  
 11  atribuido_cargo         19232436 non-null  int64  
 12  atribuido_enfase        19232436 non-null  int64  
 13  atribuido_funcao        19232436 non-nul

In [20]:
recommendations_df = pd.read_csv(RECOMMENDATIONS_CSV)
recommendations_df.info(verbose=True, show_counts=True)
recommendations_df.to_parquet(RECOMMENDATIONS_PARQUET)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3222793 entries, 0 to 3222792
Data columns (total 14 columns):
 #   Column                   Non-Null Count    Dtype  
---  ------                   --------------    -----  
 0   chave_usuario            3222793 non-null  object 
 1   role                     3222793 non-null  object 
 2   atribuicoes_categoricas  3222793 non-null  int64  
 3   min_distance             3222793 non-null  float64
 4   avg_distance             3222793 non-null  float64
 5   distance_factor          3222793 non-null  float64
 6   qtd_atribuicoes          3222793 non-null  int64  
 7   score                    3222793 non-null  float64
 8   atribuido_tipo_usuario   3222793 non-null  int64  
 9   atribuido_topo           3222793 non-null  int64  
 10  atribuido_lotacao        3222793 non-null  int64  
 11  atribuido_cargo          3222793 non-null  int64  
 12  atribuido_enfase         3222793 non-null  int64  
 13  atribuido_funcao         3222793 non-null 

In [21]:
neighbors_df[neighbors_df["usuario_busca"] == "U4UL"]

Unnamed: 0,usuario_busca,distance,chave_usuario,sigla_lotacao,cargo,enfase,funcao,role,atribuido_tipo_usuario,atribuido_topo,atribuido_lotacao,atribuido_cargo,atribuido_enfase,atribuido_funcao
16927975,U4UL,0.132842,UPOT,TIC/CORP/DSCESI/DS-PDDS,PROF. PETROBRAS DE NIVEL SUPERIOR SENIOR,PCR NS ANALISE DE SISTEMAS ENG SOFTWARE,,Z:MM_PB001_CONSULTANTE_INF_MES,1,1,1,1,1,1
16927976,U4UL,0.132842,UPOT,TIC/CORP/DSCESI/DS-PDDS,PROF. PETROBRAS DE NIVEL SUPERIOR SENIOR,PCR NS ANALISE DE SISTEMAS ENG SOFTWARE,,Z:MM_PB001_CONS_COCKPIT,1,1,1,1,0,1
16927977,U4UL,0.163098,X0BY,TIC/CORP/RJTI/PN-STSMS,PROF. PETROBRAS DE NIVEL SUPERIOR MASTER,PCR NS ANALISE DE SISTEMAS ENG SOFTWARE,,Z:PS_PBD06_APONTADOR_HORAS,1,0,0,1,1,1
16927978,U4UL,0.163873,UR5J,TIC/CORP/FRIE,PROF. PETROBRAS DE NIVEL SUPERIOR SENIOR,PCR NS ANALISE DE SISTEMAS ENG SOFTWARE,,Z:MM_PBAUT_FISCAL_CONTRATO,1,1,0,1,1,1
16927979,U4UL,0.163873,UR5J,TIC/CORP/FRIE,PROF. PETROBRAS DE NIVEL SUPERIOR SENIOR,PCR NS ANALISE DE SISTEMAS ENG SOFTWARE,,Z:FI_FT_PB001_SUP_PC_TIMP,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16928049,U4UL,0.271500,UPM1,TIC/CORP/FRIE/PN-SDE,PROF. PETROBRAS DE NIVEL SUPERIOR SENIOR,PCR NS ANALISE DE SISTEMAS ENG SOFTWARE,,Z:BC_CUTOVER_S4,1,1,0,1,1,1
16928050,U4UL,0.271500,UPM1,TIC/CORP/FRIE/PN-SDE,PROF. PETROBRAS DE NIVEL SUPERIOR SENIOR,PCR NS ANALISE DE SISTEMAS ENG SOFTWARE,,Z:BC_SUPORTE_FUNCIONAL_MDG,1,1,0,0,1,1
16928051,U4UL,0.271500,XF79,TIC/CORP/FRIE/PN-SDE,PROF. PETROBRAS DE NIVEL SUPERIOR MASTER,PCR NS ANALISE DE SISTEMAS ENG SOFTWARE,,Z:BC_SUBCOMISSAO_PERFIS_DISP,0,0,0,0,0,1
16928052,U4UL,0.271500,XF79,TIC/CORP/FRIE/PN-SDE,PROF. PETROBRAS DE NIVEL SUPERIOR MASTER,PCR NS ANALISE DE SISTEMAS ENG SOFTWARE,,Z:GR_AC_PB001_GESTOR_MACRO,0,0,0,0,0,0


In [22]:
recommendations_df[recommendations_df["chave_usuario"] == "U4UL"]

Unnamed: 0,chave_usuario,role,atribuicoes_categoricas,min_distance,avg_distance,distance_factor,qtd_atribuicoes,score,atribuido_tipo_usuario,atribuido_topo,atribuido_lotacao,atribuido_cargo,atribuido_enfase,atribuido_funcao
2802223,U4UL,Z:MM_PBAUT_FISCAL_CONTRATO,5,0.163873,0.23744,0.536784,10,0.833333,1,1,0,1,1,1
2802224,U4UL,Z:BC_CUTOVER_FIORI,5,0.202262,0.242436,0.520087,7,0.565189,1,1,0,1,1,1
2802225,U4UL,Z:BC_CUTOVER_S4,5,0.202262,0.242436,0.520087,7,0.565189,1,1,0,1,1,1
2802226,U4UL,Z:PS_PB003_APONTADOR_HORAS,5,0.188604,0.229549,0.520472,6,0.484806,1,1,0,1,1,1
2802227,U4UL,Z:BC_CHARM_GMUD,5,0.203686,0.239383,0.517848,6,0.482362,1,1,0,1,1,1
2802228,U4UL,Z:MM_ARIBA_GERAL,5,0.202262,0.229623,0.513681,4,0.318987,1,1,0,1,1,1
2802229,U4UL,Z:PS_PBD06_APONTADOR_HORAS,4,0.163098,0.211524,0.524213,4,0.260422,1,0,0,1,1,1
2802230,U4UL,Z:MM_PB001_CONSULTANTE_INF_MES,6,0.132842,0.180144,0.523651,2,0.195107,1,1,1,1,1,1
2802231,U4UL,Z:MM_PB001_LIBERADOR_NL_PGTO,5,0.266413,0.267499,0.500543,2,0.155414,1,1,0,1,1,1
2802232,U4UL,Z:BC_DESENVOLVEDOR_PERFORMANCE,4,0.174393,0.220403,0.523005,2,0.129911,1,1,0,0,1,1
