### Simbora
#### Vamos continuar trabalhando com Data Frames, porém iremos além nesse notebook
##### Vamos ver começar organizando e padronizando nossa tabela

In [1]:
import pandas as pd

In [2]:
#Vamos começar carregando o dataset funcionários

funcionarios = pd.read_csv('funcionarios.csv', sep = ';')
funcionarios.head()

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,08/06/1993,12:42 PM,97308,6.945,True,Marketing
1,Thomas,Male,3/31/1996,6:53 AM,61933,4.17,True,
2,Maria,Female,4/23/1993,11:17 AM,130590,11.858,False,Finance
3,Jerry,Male,03/04/2005,1:00 PM,138705,9.34,True,Finance
4,Larry,Male,1/24/1998,4:47 PM,101004,1.389,True,Client Services


In [4]:
#Vamos olhar um review sobre a tabela

funcionarios.info()

#Temos diversos campos não preenchidos em nome, gênero, time e se pertence a alta gerência.
#O restante das infos estão devidamente preenchidas
#Também podemos ver que alguns campos estão caracterizados como strings, porém não deveriam, como Horário de Login e data de início na empresa

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   First Name         933 non-null    object 
 1   Gender             855 non-null    object 
 2   Start Date         1000 non-null   object 
 3   Last Login Time    1000 non-null   object 
 4   Salary             1000 non-null   int64  
 5   Bonus %            1000 non-null   float64
 6   Senior Management  933 non-null    object 
 7   Team               957 non-null    object 
dtypes: float64(1), int64(1), object(6)
memory usage: 62.6+ KB


In [5]:
#Nosso primeiro passo é padronizar/organizar o data frame
#Vamos começar arrumando os formatos de data e hora das colunas informadas anteriormente

#O comando que realiza essa tarefa é o pd.to_datetime()
funcionarios['Start Date'] = pd.to_datetime(funcionarios['Start Date'])

In [6]:
#Vamos fazer a mesma coisa com a coluna de login

funcionarios['Last Login Time'] = pd.to_datetime(funcionarios['Last Login Time'])

In [7]:
#Vamos olhar como ficou
funcionarios.head()

#Como a data de login não foi especificada, o python considerou todos como hoje, mas não vamos nos ater a isso.

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,2021-12-26 12:42:00,97308,6.945,True,Marketing
1,Thomas,Male,1996-03-31,2021-12-26 06:53:00,61933,4.17,True,
2,Maria,Female,1993-04-23,2021-12-26 11:17:00,130590,11.858,False,Finance
3,Jerry,Male,2005-03-04,2021-12-26 13:00:00,138705,9.34,True,Finance
4,Larry,Male,1998-01-24,2021-12-26 16:47:00,101004,1.389,True,Client Services


In [8]:
#O Campo que informa se o funcionário pertence a alta gerência está no formato string.
#Deveria ser booleano

funcionarios['Senior Management'] = funcionarios['Senior Management'].astype('bool')

#Resolvido

In [9]:
#A coluna de gênero traz apenas valores para Masculino e Feminino, portanto podemos armazenar como categoria

funcionarios['Gender'] = funcionarios['Gender'].astype('category')

#Essa padronização é interessante para diminuir o peso do arquivo, além de trazer algumas otimizações para análises futuras

#O time também é um forte candidato a ser transformado em Categoria

funcionarios['Team'] = funcionarios['Team'].astype('category')

In [10]:
funcionarios.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   First Name         933 non-null    object        
 1   Gender             855 non-null    category      
 2   Start Date         1000 non-null   datetime64[ns]
 3   Last Login Time    1000 non-null   datetime64[ns]
 4   Salary             1000 non-null   int64         
 5   Bonus %            1000 non-null   float64       
 6   Senior Management  1000 non-null   bool          
 7   Team               957 non-null    category      
dtypes: bool(1), category(2), datetime64[ns](2), float64(1), int64(1), object(1)
memory usage: 42.6+ KB


#### Essas padronizações nos ajudaram a reduzir o uso de memória.
##### O dataset passou de 62KB para 42KB, uma redução de 32%
##### Para bancos de dados com dezenas de GB, essa otimização será extremamente relevante

### Agora vamos filtrar valores

In [11]:
#Vamos mostrar apenas as linhas com gênero feminino

display(funcionarios[funcionarios['Gender'] == 'Female'])

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
2,Maria,Female,1993-04-23,2021-12-26 11:17:00,130590,11.858,False,Finance
6,Ruby,Female,1987-08-17,2021-12-26 16:20:00,65476,10.012,True,Product
7,,Female,2015-07-20,2021-12-26 10:43:00,45906,11.598,True,Finance
8,Angela,Female,2005-11-22,2021-12-26 06:29:00,95570,18.523,True,Engineering
9,Frances,Female,2002-08-08,2021-12-26 06:51:00,139852,7.524,True,Business Development
...,...,...,...,...,...,...,...,...
987,Gloria,Female,2014-12-08,2021-12-26 05:08:00,136709,10.331,True,Finance
988,Alice,Female,2004-10-05,2021-12-26 09:34:00,47638,11.209,False,Human Resources
990,Robin,Female,1987-07-24,2021-12-26 13:35:00,100765,10.982,True,Client Services
991,Rose,Female,2002-08-25,2021-12-26 05:12:00,134505,11.051,True,Marketing


In [12]:
#Agora vamos filtrar com base no time

funcionarios[funcionarios['Team'] == 'Finance']

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
2,Maria,Female,1993-04-23,2021-12-26 11:17:00,130590,11.858,False,Finance
3,Jerry,Male,2005-03-04,2021-12-26 13:00:00,138705,9.340,True,Finance
7,,Female,2015-07-20,2021-12-26 10:43:00,45906,11.598,True,Finance
14,Kimberly,Female,1999-01-14,2021-12-26 07:13:00,41426,14.543,True,Finance
46,Bruce,Male,2009-11-28,2021-12-26 22:47:00,114796,6.796,False,Finance
...,...,...,...,...,...,...,...,...
907,Elizabeth,Female,1998-07-27,2021-12-26 11:12:00,137144,10.081,False,Finance
954,Joe,Male,1980-01-19,2021-12-26 16:06:00,119667,1.148,True,Finance
987,Gloria,Female,2014-12-08,2021-12-26 05:08:00,136709,10.331,True,Finance
992,Anthony,Male,2011-10-16,2021-12-26 08:35:00,112769,11.625,True,Finance


In [13]:
#Outra forma de fazer isso é criar uma variável com os booleanos da condição procurada

filtro = funcionarios['Team'] == 'Marketing'
funcionarios[filtro]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,2021-12-26 12:42:00,97308,6.945,True,Marketing
21,Matthew,Male,1995-09-05,2021-12-26 02:12:00,100612,13.645,False,Marketing
26,Craig,Male,2000-02-27,2021-12-26 07:45:00,37598,7.757,True,Marketing
43,Marilyn,Female,1980-12-07,2021-12-26 03:16:00,73524,5.207,True,Marketing
62,,Female,2007-06-12,2021-12-26 17:25:00,58112,19.414,True,Marketing
...,...,...,...,...,...,...,...,...
942,Lori,Female,2015-11-20,2021-12-26 13:15:00,75498,6.537,True,Marketing
947,,Male,2012-07-30,2021-12-26 15:07:00,107351,5.329,True,Marketing
986,Donna,Female,1982-11-26,2021-12-26 07:04:00,82871,17.999,False,Marketing
991,Rose,Female,2002-08-25,2021-12-26 05:12:00,134505,11.051,True,Marketing


In [14]:
#Já a coluna de alta gerência possui valores booleanos, logo basta acessar os itens diretamente

funcionarios[funcionarios['Senior Management']]

#Apenas retorna as linhas com Senior Manegement == True

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,2021-12-26 12:42:00,97308,6.945,True,Marketing
1,Thomas,Male,1996-03-31,2021-12-26 06:53:00,61933,4.170,True,
3,Jerry,Male,2005-03-04,2021-12-26 13:00:00,138705,9.340,True,Finance
4,Larry,Male,1998-01-24,2021-12-26 16:47:00,101004,1.389,True,Client Services
6,Ruby,Female,1987-08-17,2021-12-26 16:20:00,65476,10.012,True,Product
...,...,...,...,...,...,...,...,...
991,Rose,Female,2002-08-25,2021-12-26 05:12:00,134505,11.051,True,Marketing
992,Anthony,Male,2011-10-16,2021-12-26 08:35:00,112769,11.625,True,Finance
993,Tina,Female,1997-05-15,2021-12-26 15:53:00,56450,19.040,True,Engineering
994,George,Male,2013-06-21,2021-12-26 17:47:00,98874,4.479,True,Marketing


In [18]:
#E se eu quiser filtrar todos os valores, exceto determinado setor?
#Usaremos o !=

funcionarios['Team'] != 'Marketing'
display(funcionarios[funcionarios['Team'] != 'Marketing'
])

#Este comando mostrará apenas funcionários cujo campo "Team" é diferente de "Marketing"

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
1,Thomas,Male,1996-03-31,2021-12-26 06:53:00,61933,4.170,True,
2,Maria,Female,1993-04-23,2021-12-26 11:17:00,130590,11.858,False,Finance
3,Jerry,Male,2005-03-04,2021-12-26 13:00:00,138705,9.340,True,Finance
4,Larry,Male,1998-01-24,2021-12-26 16:47:00,101004,1.389,True,Client Services
5,Dennis,Male,1987-04-18,2021-12-26 01:35:00,115163,10.125,False,Legal
...,...,...,...,...,...,...,...,...
995,Henry,,2014-11-23,2021-12-26 06:09:00,132483,16.655,False,Distribution
996,Phillip,Male,1984-01-31,2021-12-26 06:30:00,42392,19.675,False,Finance
997,Russell,Male,2013-05-20,2021-12-26 12:39:00,96914,1.421,False,Product
998,Larry,Male,2013-04-20,2021-12-26 16:45:00,60500,11.985,False,Business Development


In [19]:
#Agora vamos filtrar por valores numéricos

#Vamos filtrar apenas pessoas com salário maior que 110000

funcionarios[funcionarios['Salary'] > 110000]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
2,Maria,Female,1993-04-23,2021-12-26 11:17:00,130590,11.858,False,Finance
3,Jerry,Male,2005-03-04,2021-12-26 13:00:00,138705,9.340,True,Finance
5,Dennis,Male,1987-04-18,2021-12-26 01:35:00,115163,10.125,False,Legal
9,Frances,Female,2002-08-08,2021-12-26 06:51:00,139852,7.524,True,Business Development
12,Brandon,Male,1980-12-01,2021-12-26 01:08:00,112807,17.492,True,Human Resources
...,...,...,...,...,...,...,...,...
987,Gloria,Female,2014-12-08,2021-12-26 05:08:00,136709,10.331,True,Finance
991,Rose,Female,2002-08-25,2021-12-26 05:12:00,134505,11.051,True,Marketing
992,Anthony,Male,2011-10-16,2021-12-26 08:35:00,112769,11.625,True,Finance
995,Henry,,2014-11-23,2021-12-26 06:09:00,132483,16.655,False,Distribution


In [20]:
#A filtragem também funciona com datas, mas com diferença na sintaxe.
#Para datas, '<' é usado para 'antes' e '>' é usado para 'depois'
#A data procurada deve ser inserida como string

funcionarios[funcionarios['Start Date'] < '1985-01-01']

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
10,Louise,Female,1980-08-12,2021-12-26 09:01:00,63241,15.132,True,
12,Brandon,Male,1980-12-01,2021-12-26 01:08:00,112807,17.492,True,Human Resources
18,Diana,Female,1981-10-23,2021-12-26 10:27:00,132940,19.082,False,Client Services
28,Terry,Male,1981-11-27,2021-12-26 18:30:00,124008,13.464,True,Client Services
37,Linda,Female,1981-10-19,2021-12-26 20:49:00,57427,9.557,True,Client Services
...,...,...,...,...,...,...,...,...
982,Rose,Female,1982-04-06,2021-12-26 10:43:00,91411,8.639,True,Human Resources
983,John,Male,1982-12-23,2021-12-26 22:35:00,146907,11.738,False,Engineering
985,Stephen,,1983-07-10,2021-12-26 20:10:00,85668,1.909,False,Legal
986,Donna,Female,1982-11-26,2021-12-26 07:04:00,82871,17.999,False,Marketing


### Agora vamos filtrar com base em mais de uma condição

### Operador AND
###### Este operador retorna TRUE caso todas as condições elencadas sejam TRUE

In [51]:
#Vamos realizar uma consulta de todos os homens do setor arketing

homens = funcionarios['Gender'] == 'Male'
marketeiros = funcionarios['Team'] == 'Marketing'

#Temos as duas consultas, agora basta mesclá-las numa condição

#Para facilitar a escrita/leitura, as consultas foram adicionadas a variáveis

funcionarios[homens & marketeiros]

#Como ambas as variáveis são valores booleanos, a consulta mostrará apenas as linhas cujas as séries sejam True

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,2021-12-26 12:42:00,97308,6.945,True,Marketing
21,Matthew,Male,1995-09-05,2021-12-26 02:12:00,100612,13.645,False,Marketing
26,Craig,Male,2000-02-27,2021-12-26 07:45:00,37598,7.757,True,Marketing
74,Thomas,Male,1995-06-04,2021-12-26 14:24:00,62096,17.029,False,Marketing
77,Charles,Male,2004-09-14,2021-12-26 20:13:00,107391,1.26,True,Marketing
101,Aaron,Male,2012-02-17,2021-12-26 10:20:00,61602,11.849,True,Marketing
104,John,Male,1989-12-23,2021-12-26 07:01:00,80740,19.305,False,Marketing
112,Willie,Male,2003-11-27,2021-12-26 06:21:00,64363,4.023,False,Marketing
119,Paul,Male,2008-06-03,2021-12-26 15:05:00,41054,12.299,False,Marketing
150,Sean,Male,1996-05-04,2021-12-26 20:59:00,135490,19.934,False,Marketing


In [36]:
#Vamos fazer outro exemplo.
#Agora queremos consultar todas as mulheres do setor financeiro

mulheres = funcionarios['Gender'] == 'Female'
financeiro = funcionarios['Team'] == 'Finance'

funcionarios[mulheres & financeiro]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
2,Maria,Female,1993-04-23,2021-12-26 11:17:00,130590,11.858,False,Finance
7,,Female,2015-07-20,2021-12-26 10:43:00,45906,11.598,True,Finance
14,Kimberly,Female,1999-01-14,2021-12-26 07:13:00,41426,14.543,True,Finance
67,Rachel,Female,1999-08-16,2021-12-26 06:53:00,51178,9.735,True,Finance
84,Doris,Female,2004-08-20,2021-12-26 05:51:00,83072,7.511,False,Finance
96,Cynthia,Female,1994-03-21,2021-12-26 08:34:00,142321,1.737,False,Finance
100,Melissa,Female,2005-06-21,2021-12-26 06:33:00,48109,14.995,False,Finance
103,Phyllis,Female,1996-10-11,2021-12-26 21:30:00,136984,8.932,True,Finance
105,Kathy,Female,1996-03-09,2021-12-26 04:33:00,91712,8.567,False,Finance
142,Elizabeth,Female,2003-10-09,2021-12-26 17:53:00,146129,5.687,False,Finance


In [48]:
#Por fim vamos criar uma consulta com 4 condições
#Quero consultar apenas mulheres da alta gerência no setor de Marketing, e com salários maiores que 110000
#As séries booleanas para mulheres e marketing já estão criadas, então as usaremos

salario = funcionarios['Salary'] > 110000
gerencia = funcionarios['Senior Management']

funcionarios[mulheres & marketeiros & salario & gerencia]

#Podemos adicionar diversas condições para refinar nossa busca em vários níveis

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
158,Norma,Female,1999-02-28,2021-12-26 20:45:00,114412,8.756,True,Marketing
379,,Female,2002-09-18,2021-12-26 12:39:00,118906,4.537,True,Marketing
468,Janice,Female,1997-06-28,2021-12-26 13:48:00,136032,10.696,True,Marketing
531,Virginia,Female,2010-05-02,2021-12-26 21:10:00,123649,10.154,True,Marketing
656,Lisa,Female,1982-02-09,2021-12-26 18:44:00,113592,17.108,True,Marketing
676,Annie,Female,1992-06-06,2021-12-26 10:04:00,138925,9.801,True,Marketing
811,Judith,Female,1989-09-03,2021-12-26 11:16:00,134048,6.818,True,Marketing
813,Evelyn,Female,2002-02-10,2021-12-26 04:44:00,123621,19.767,True,Marketing
991,Rose,Female,2002-08-25,2021-12-26 05:12:00,134505,11.051,True,Marketing


### Operador OU
#### Este operador retorna True desde que pelo menos uma das condições sejam atendidas

In [53]:
#Vamos testar as mesmas consultas, porém agora com o operador "OU"

funcionarios[homens | marketeiros]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,2021-12-26 12:42:00,97308,6.945,True,Marketing
1,Thomas,Male,1996-03-31,2021-12-26 06:53:00,61933,4.170,True,
3,Jerry,Male,2005-03-04,2021-12-26 13:00:00,138705,9.340,True,Finance
4,Larry,Male,1998-01-24,2021-12-26 16:47:00,101004,1.389,True,Client Services
5,Dennis,Male,1987-04-18,2021-12-26 01:35:00,115163,10.125,False,Legal
...,...,...,...,...,...,...,...,...
994,George,Male,2013-06-21,2021-12-26 17:47:00,98874,4.479,True,Marketing
996,Phillip,Male,1984-01-31,2021-12-26 06:30:00,42392,19.675,False,Finance
997,Russell,Male,2013-05-20,2021-12-26 12:39:00,96914,1.421,False,Product
998,Larry,Male,2013-04-20,2021-12-26 16:45:00,60500,11.985,False,Business Development


In [54]:
funcionarios[mulheres | financeiro]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
2,Maria,Female,1993-04-23,2021-12-26 11:17:00,130590,11.858,False,Finance
3,Jerry,Male,2005-03-04,2021-12-26 13:00:00,138705,9.340,True,Finance
6,Ruby,Female,1987-08-17,2021-12-26 16:20:00,65476,10.012,True,Product
7,,Female,2015-07-20,2021-12-26 10:43:00,45906,11.598,True,Finance
8,Angela,Female,2005-11-22,2021-12-26 06:29:00,95570,18.523,True,Engineering
...,...,...,...,...,...,...,...,...
990,Robin,Female,1987-07-24,2021-12-26 13:35:00,100765,10.982,True,Client Services
991,Rose,Female,2002-08-25,2021-12-26 05:12:00,134505,11.051,True,Marketing
992,Anthony,Male,2011-10-16,2021-12-26 08:35:00,112769,11.625,True,Finance
993,Tina,Female,1997-05-15,2021-12-26 15:53:00,56450,19.040,True,Engineering


In [55]:
funcionarios[mulheres | marketeiros | salario | gerencia]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,2021-12-26 12:42:00,97308,6.945,True,Marketing
1,Thomas,Male,1996-03-31,2021-12-26 06:53:00,61933,4.170,True,
2,Maria,Female,1993-04-23,2021-12-26 11:17:00,130590,11.858,False,Finance
3,Jerry,Male,2005-03-04,2021-12-26 13:00:00,138705,9.340,True,Finance
4,Larry,Male,1998-01-24,2021-12-26 16:47:00,101004,1.389,True,Client Services
...,...,...,...,...,...,...,...,...
992,Anthony,Male,2011-10-16,2021-12-26 08:35:00,112769,11.625,True,Finance
993,Tina,Female,1997-05-15,2021-12-26 15:53:00,56450,19.040,True,Engineering
994,George,Male,2013-06-21,2021-12-26 17:47:00,98874,4.479,True,Marketing
995,Henry,,2014-11-23,2021-12-26 06:09:00,132483,16.655,False,Distribution


#### Em todas as consultas, a quantidade de funcionários exibidos foi maior quando usamos o operador "OU"
#### Isso porque para mostrar resultados, basta que apenas uma condição seja satisfeita
#### Podemos ver abaixo a diferença de comportamento entre os operadores "And" e "Or"

<img src="operador and or.png"/>

In [72]:
#Podemos mesclar condições "E" e "OU" nas consultas

funcionarios[ marketeiros | mulheres & gerencia ]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,2021-12-26 12:42:00,97308,6.945,True,Marketing
6,Ruby,Female,1987-08-17,2021-12-26 16:20:00,65476,10.012,True,Product
7,,Female,2015-07-20,2021-12-26 10:43:00,45906,11.598,True,Finance
8,Angela,Female,2005-11-22,2021-12-26 06:29:00,95570,18.523,True,Engineering
9,Frances,Female,2002-08-08,2021-12-26 06:51:00,139852,7.524,True,Business Development
...,...,...,...,...,...,...,...,...
987,Gloria,Female,2014-12-08,2021-12-26 05:08:00,136709,10.331,True,Finance
990,Robin,Female,1987-07-24,2021-12-26 13:35:00,100765,10.982,True,Client Services
991,Rose,Female,2002-08-25,2021-12-26 05:12:00,134505,11.051,True,Marketing
993,Tina,Female,1997-05-15,2021-12-26 15:53:00,56450,19.040,True,Engineering


In [82]:
#Entretanto fica extremamente confuso de entender qual condição deve ser respeitada primeiro, qual vem depois, etc
#Por isso, o ideal é definir as condições "E" com parênteses
#O pandas olhará para a condição dentro dos parênteses primeiro e avaliará as outras condições logo após

funcionarios[salario | (gerencia & marketeiros)]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,2021-12-26 12:42:00,97308,6.945,True,Marketing
2,Maria,Female,1993-04-23,2021-12-26 11:17:00,130590,11.858,False,Finance
3,Jerry,Male,2005-03-04,2021-12-26 13:00:00,138705,9.340,True,Finance
5,Dennis,Male,1987-04-18,2021-12-26 01:35:00,115163,10.125,False,Legal
9,Frances,Female,2002-08-08,2021-12-26 06:51:00,139852,7.524,True,Business Development
...,...,...,...,...,...,...,...,...
991,Rose,Female,2002-08-25,2021-12-26 05:12:00,134505,11.051,True,Marketing
992,Anthony,Male,2011-10-16,2021-12-26 08:35:00,112769,11.625,True,Finance
994,George,Male,2013-06-21,2021-12-26 17:47:00,98874,4.479,True,Marketing
995,Henry,,2014-11-23,2021-12-26 06:09:00,132483,16.655,False,Distribution


#### Método .isin()

In [97]:
#Vamos supor que a gente queira consultar apenas funcionarios de Legal, Sales, e Product.
#Criar 3 variáveis e consultá-las é uma solução possível, porém nem tão eficiente

legal = funcionarios['Team'] == 'Legal'
sales = funcionarios['Team'] == 'Sales'
product = funcionarios['Team'] == 'Product'

funcionarios[legal | sales | product]

#Esta solução é longa e ambígua. Vamos utilizar o operador .isin()

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
5,Dennis,Male,1987-04-18,2021-12-26 01:35:00,115163,10.125,False,Legal
6,Ruby,Female,1987-08-17,2021-12-26 16:20:00,65476,10.012,True,Product
11,Julie,Female,1997-10-26,2021-12-26 15:19:00,102508,12.637,True,Legal
13,Gary,Male,2008-01-27,2021-12-26 23:40:00,109831,5.831,False,Sales
15,Lillian,Female,2016-06-05,2021-12-26 06:09:00,59414,1.256,False,Product
...,...,...,...,...,...,...,...,...
981,James,Male,1993-01-15,2021-12-26 17:19:00,148985,19.280,False,Legal
985,Stephen,,1983-07-10,2021-12-26 20:10:00,85668,1.909,False,Legal
989,Justin,,1991-02-10,2021-12-26 16:58:00,38344,3.794,False,Legal
997,Russell,Male,2013-05-20,2021-12-26 12:39:00,96914,1.421,False,Product


In [106]:
#Podemos fazer essa seleção completa em apenas uma linha de código

funcionarios[funcionarios['Team'].isin(['Legal', 'Product', 'Marketing'])]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,2021-12-26 12:42:00,97308,6.945,True,Marketing
5,Dennis,Male,1987-04-18,2021-12-26 01:35:00,115163,10.125,False,Legal
6,Ruby,Female,1987-08-17,2021-12-26 16:20:00,65476,10.012,True,Product
11,Julie,Female,1997-10-26,2021-12-26 15:19:00,102508,12.637,True,Legal
15,Lillian,Female,2016-06-05,2021-12-26 06:09:00,59414,1.256,False,Product
...,...,...,...,...,...,...,...,...
986,Donna,Female,1982-11-26,2021-12-26 07:04:00,82871,17.999,False,Marketing
989,Justin,,1991-02-10,2021-12-26 16:58:00,38344,3.794,False,Legal
991,Rose,Female,2002-08-25,2021-12-26 05:12:00,134505,11.051,True,Marketing
994,George,Male,2013-06-21,2021-12-26 17:47:00,98874,4.479,True,Marketing


#### Métodos isnull e notnull
#### Vamos olhar os itens que possuem campos vazios ou não vazios

In [110]:
#Este comando retorna uma lista booleana informando se o valor é vazio ou não

funcionarios['Team'].isnull()

0      False
1       True
2      False
3      False
4      False
       ...  
995    False
996    False
997    False
998    False
999    False
Name: Team, Length: 1000, dtype: bool

In [117]:
#Este comando retorna apenas valores preenchidos no campo Gender

funcionarios[funcionarios['Gender'].notnull()]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,2021-12-26 12:42:00,97308,6.945,True,Marketing
1,Thomas,Male,1996-03-31,2021-12-26 06:53:00,61933,4.170,True,
2,Maria,Female,1993-04-23,2021-12-26 11:17:00,130590,11.858,False,Finance
3,Jerry,Male,2005-03-04,2021-12-26 13:00:00,138705,9.340,True,Finance
4,Larry,Male,1998-01-24,2021-12-26 16:47:00,101004,1.389,True,Client Services
...,...,...,...,...,...,...,...,...
994,George,Male,2013-06-21,2021-12-26 17:47:00,98874,4.479,True,Marketing
996,Phillip,Male,1984-01-31,2021-12-26 06:30:00,42392,19.675,False,Finance
997,Russell,Male,2013-05-20,2021-12-26 12:39:00,96914,1.421,False,Product
998,Larry,Male,2013-04-20,2021-12-26 16:45:00,60500,11.985,False,Business Development


#### Método .between()

In [120]:
#Este método torna a seleção condicional mais fácil, pois podemos selecionar linhas com base em valores máximos e mínimos

#Vamos começar olhando os salários

#Vamos procurar funcionários com salários entre 60 e 70k

funcionarios['Salary'].between(60000, 70000)

#O comando retorna uma série booleana destes resultados

#Podemos colocar numa variável e printar apenas esses itens, ou printar apenas itens com True

funcionarios[funcionarios['Salary'].between(60000, 70000)]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
1,Thomas,Male,1996-03-31,2021-12-26 06:53:00,61933,4.170,True,
6,Ruby,Female,1987-08-17,2021-12-26 16:20:00,65476,10.012,True,Product
10,Louise,Female,1980-08-12,2021-12-26 09:01:00,63241,15.132,True,
20,Lois,,1995-04-22,2021-12-26 19:18:00,64714,4.934,True,Legal
41,Christine,,2015-06-28,2021-12-26 01:08:00,66582,11.308,True,Business Development
...,...,...,...,...,...,...,...,...
965,Catherine,Female,1989-09-25,2021-12-26 01:31:00,68164,18.393,False,Client Services
970,Alice,Female,1988-09-03,2021-12-26 20:54:00,63571,15.397,True,Product
974,Harry,Male,2011-08-30,2021-12-26 18:31:00,67656,16.455,True,Client Services
978,Sean,Male,1983-01-17,2021-12-26 14:23:00,66146,11.178,False,Human Resources


In [124]:
#Vamos olhar apenas funcionários com bonus entre 2 e 5%

funcionarios[funcionarios['Bonus %'].between(2, 5)]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
1,Thomas,Male,1996-03-31,2021-12-26 06:53:00,61933,4.170,True,
20,Lois,,1995-04-22,2021-12-26 19:18:00,64714,4.934,True,Legal
40,Michael,Male,2008-10-10,2021-12-26 11:25:00,99283,2.665,True,Distribution
49,Chris,,1980-01-24,2021-12-26 12:13:00,113590,3.055,False,Sales
60,Paula,,2005-11-23,2021-12-26 14:01:00,48866,4.271,False,Distribution
...,...,...,...,...,...,...,...,...
943,Wayne,Male,2006-09-08,2021-12-26 11:09:00,67471,2.728,False,Engineering
961,Antonio,,1989-06-18,2021-12-26 21:37:00,103050,3.050,False,Legal
976,Denise,Female,1992-10-19,2021-12-26 05:42:00,137954,4.195,True,Legal
989,Justin,,1991-02-10,2021-12-26 16:58:00,38344,3.794,False,Legal


In [126]:
#Para utilizar o between com datas, informaremos as datas em formato string

funcionarios[funcionarios['Start Date'].between('1991-01-01', '1992-01-01')]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
27,Scott,,1991-07-11,2021-12-26 18:58:00,122367,5.218,False,Legal
75,Bonnie,Female,1991-07-02,2021-12-26 01:27:00,104897,5.118,True,Human Resources
88,Donna,Female,1991-11-27,2021-12-26 13:59:00,64088,6.155,True,Legal
116,,Male,1991-06-22,2021-12-26 20:58:00,76189,18.988,True,Legal
148,Patrick,,1991-07-14,2021-12-26 02:24:00,124488,14.837,True,Sales
166,,Female,1991-07-09,2021-12-26 18:52:00,42341,7.014,True,Sales
172,Sara,Female,1991-09-23,2021-12-26 18:17:00,97058,9.402,False,Finance
220,,Female,1991-06-17,2021-12-26 12:49:00,71945,5.56,True,Marketing
245,Victor,Male,1991-04-11,2021-12-26 07:44:00,70817,17.138,False,Engineering
277,Brenda,,1991-05-29,2021-12-26 06:32:00,82439,19.062,False,Sales


In [136]:
#Agora vamos aplicar o método ao horário de login

funcionarios[funcionarios['Last Login Time'].between('08:30AM', '12:00PM')]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
2,Maria,Female,1993-04-23,2021-12-26 11:17:00,130590,11.858,False,Finance
7,,Female,2015-07-20,2021-12-26 10:43:00,45906,11.598,True,Finance
10,Louise,Female,1980-08-12,2021-12-26 09:01:00,63241,15.132,True,
18,Diana,Female,1981-10-23,2021-12-26 10:27:00,132940,19.082,False,Client Services
33,Jean,Female,1993-12-18,2021-12-26 09:07:00,119082,16.180,False,Business Development
...,...,...,...,...,...,...,...,...
963,Ann,Female,1994-09-23,2021-12-26 11:15:00,89443,17.940,True,Sales
977,Sarah,Female,1995-12-04,2021-12-26 09:16:00,124566,5.949,False,Product
982,Rose,Female,1982-04-06,2021-12-26 10:43:00,91411,8.639,True,Human Resources
988,Alice,Female,2004-10-05,2021-12-26 09:34:00,47638,11.209,False,Human Resources


### Essa foi a terceira parte.
#### Vamos continuar olhando o python/pandas no próximo notebook