# Operations

There are lots of operations with pandas that will be really useful to you, but don't fall into any distinct category. Let's show them here in this lecture:

In [1]:
import pandas as pd

In [4]:
# criando DF
df_one = pd.DataFrame({'k1':['A','A','B','B','C','C'],
                      'col1':[100,200,300,300,400,500],
                      'col2':['NY','CA','WA','WA','AK','NV']})

In [5]:
df_one

Unnamed: 0,k1,col1,col2
0,A,100,NY
1,A,200,CA
2,B,300,WA
3,B,300,WA
4,C,400,AK
5,C,500,NV


### Information on Unique Values

In [6]:
# valores únicos
df_one['col2'].unique()

array(['NY', 'CA', 'WA', 'AK', 'NV'], dtype=object)

In [7]:
df_one['k1'].unique()

array(['A', 'B', 'C'], dtype=object)

In [8]:
# quantidade de valores únicos
df_one['col2'].nunique()

5

In [9]:
# contar cada tipo
df_one['col2'].value_counts()

WA    2
NY    1
CA    1
NV    1
AK    1
Name: col2, dtype: int64

In [10]:
# removendo linhas iguais
df_one.drop_duplicates()

Unnamed: 0,k1,col1,col2
0,A,100,NY
1,A,200,CA
2,B,300,WA
4,C,400,AK
5,C,500,NV


### Creating New Columns with Operations and Functions

We already know we can easily create new columns through basic arithmetic operations:

In [11]:
df_one

Unnamed: 0,k1,col1,col2
0,A,100,NY
1,A,200,CA
2,B,300,WA
3,B,300,WA
4,C,400,AK
5,C,500,NV


In [14]:
# nova coluna envolvendo operações
df_one['NEW'] = df_one['col1'] * 10

In [15]:
df_one

Unnamed: 0,k1,col1,col2,NEW
0,A,100,NY,1000
1,A,200,CA,2000
2,B,300,WA,3000
3,B,300,WA,3000
4,C,400,AK,4000
5,C,500,NV,5000


But we can also create new columns by applying any custom function we want, as you can imagine, this could be as complex as we want, and gives us great flexibility.

Step 1: Define the function that will operate on every row entry in a column

In [16]:
# selecionar a 1ª letra e transformar num coluna
def grab_first_letter(state):
  return state[0]

In [17]:
# testando
grab_first_letter('NY')

'N'

In [18]:
# nova coluna com a function criada
df_one['col2'].apply(grab_first_letter) # não passamos o parâmetro (), pandas passará por toda coluna

0    N
1    C
2    W
3    W
4    A
5    N
Name: col2, dtype: object

In [19]:
df_one['first_letter'] = df_one['col2'].apply(grab_first_letter)

In [20]:
df_one

Unnamed: 0,k1,col1,col2,NEW,first_letter
0,A,100,NY,1000,N
1,A,200,CA,2000,C
2,B,300,WA,3000,W
3,B,300,WA,3000,W
4,C,400,AK,4000,A
5,C,500,NV,5000,N


These functions can be as complex as you want, as long as it would be able to accept the items in each row. Watch our for data type issues!

In [21]:
# função pode ser quão complexa o usuário queira
def complex_letter(state):
  if state[0] == 'W':
    return 'Washington'
  else:
    return 'Error'

In [22]:
# chamando a função
df_one['col2'].apply(complex_letter)

0         Error
1         Error
2    Washington
3    Washington
4         Error
5         Error
Name: col2, dtype: object

In [23]:
# nova coluna
df_one['State_Check'] = df_one['col2'].apply(complex_letter)

In [24]:
df_one

Unnamed: 0,k1,col1,col2,NEW,first_letter,State_Check
0,A,100,NY,1000,N,Error
1,A,200,CA,2000,C,Error
2,B,300,WA,3000,W,Washington
3,B,300,WA,3000,W,Washington
4,C,400,AK,4000,A,Error
5,C,500,NV,5000,N,Error


In [25]:
# WATCH OUT FOR DATA TYPE ERRORS!
# numéros não tem índices igual a string!

# df_one['col1'].apply(complex_letter)

TypeError: ignored

### Mapping

In [26]:
df_one['k1']

0    A
1    A
2    B
3    B
4    C
5    C
Name: k1, dtype: object

In [27]:
# criando dict
my_map = {'A': 1, 'B': 2, 'C': 3}

In [29]:
# mapeando a coluna 'k1'
df_one['k1'].map(my_map)

0    1
1    1
2    2
3    2
4    3
5    3
Name: k1, dtype: int64

In [30]:
# inserindo numa coluna
df_one['num'] = df_one['k1'].map(my_map)
df_one

Unnamed: 0,k1,col1,col2,NEW,first_letter,State_Check,num
0,A,100,NY,1000,N,Error,1
1,A,200,CA,2000,C,Error,1
2,B,300,WA,3000,W,Washington,2
3,B,300,WA,3000,W,Washington,2
4,C,400,AK,4000,A,Error,3
5,C,500,NV,5000,N,Error,3


### Locating Index positions of max and min values

In [31]:
df_one['col1'].max()

500

In [32]:
df_one['col1'].min()

100

In [34]:
# posição do índice do máximo
df_one['col1'].idxmax()

5

In [35]:
# posição do índice do mínimo
df_one['col1'].idxmin()

0

In [36]:
# colunas
df_one.columns

Index(['k1', 'col1', 'col2', 'NEW', 'first_letter', 'State_Check', 'num'], dtype='object')

In [37]:
# índices
df_one.index

RangeIndex(start=0, stop=6, step=1)

In [39]:
# renomeando colunas
df_one.columns = ['C1','C2','C3','C4','C5','C6', 'C7']

In [40]:
df_one

Unnamed: 0,C1,C2,C3,C4,C5,C6,C7
0,A,100,NY,1000,N,Error,1
1,A,200,CA,2000,C,Error,1
2,B,300,WA,3000,W,Washington,2
3,B,300,WA,3000,W,Washington,2
4,C,400,AK,4000,A,Error,3
5,C,500,NV,5000,N,Error,3


### Sorting and Ordering a DataFrame:

In [41]:
# reordenando de acordo com a coluna escolhida, pode ser STRING
df_one.sort_values('C3', ascending=False)

Unnamed: 0,C1,C2,C3,C4,C5,C6,C7
2,B,300,WA,3000,W,Washington,2
3,B,300,WA,3000,W,Washington,2
0,A,100,NY,1000,N,Error,1
5,C,500,NV,5000,N,Error,3
1,A,200,CA,2000,C,Error,1
4,C,400,AK,4000,A,Error,3


# Concatenating DataFrames

In [42]:
features = pd.DataFrame({'A':[100,200,300,400,500],
                        'B':[12,13,14,15,16]})
predictions = pd.DataFrame({'pred':[0,1,1,0,1]})

In [43]:
features

Unnamed: 0,A,B
0,100,12
1,200,13
2,300,14
3,400,15
4,500,16


In [44]:
predictions

Unnamed: 0,pred
0,0
1,1
2,1
3,0
4,1


In [46]:
# concatenando os DF
pd.concat([features, predictions]) # axis = 0 é default

Unnamed: 0,A,B,pred
0,100.0,12.0,
1,200.0,13.0,
2,300.0,14.0,
3,400.0,15.0,
4,500.0,16.0,
0,,,0.0
1,,,1.0
2,,,1.0
3,,,0.0
4,,,1.0


In [47]:
# modificando a orientação da concatenação
pd.concat([features, predictions], axis = 1)

Unnamed: 0,A,B,pred
0,100,12,0
1,200,13,1
2,300,14,1
3,400,15,0
4,500,16,1


## Creating Dummy Variables

In [48]:
df_one

Unnamed: 0,C1,C2,C3,C4,C5,C6,C7
0,A,100,NY,1000,N,Error,1
1,A,200,CA,2000,C,Error,1
2,B,300,WA,3000,W,Washington,2
3,B,300,WA,3000,W,Washington,2
4,C,400,AK,4000,A,Error,3
5,C,500,NV,5000,N,Error,3


In [49]:
# cria um DF com a coluna atribuida e set 1 ou 0 para cada categoria da coluna
pd.get_dummies(df_one['C1'])

Unnamed: 0,A,B,C
0,1,0,0
1,1,0,0
2,0,1,0
3,0,1,0
4,0,0,1
5,0,0,1
