# Respondendo questões pendentes:
* **Q:** É possível fixar inplace como sendo true?
    * **R:** A princípio, não. Veja [este trecho](https://github.com/pandas-dev/pandas/blob/v1.3.0/pandas/core/series.py#L1308-L1441) do código fonte do Pandas. Vamos observar, por exemplo, o funcionamento da função reset_index: O código decide qual objeto alterar a partir da linha 1427, sendo que o valor desta vem do parâmetro opcional inplace fornecido na invocação, como podemos ver na linha 1417.
* **Q:** É possível descobrir quais variáveis existem?
    * **R:** Sim! Temos as funções dir(), locals() e globals() que retornam as variáveis no escopo, as locais e as globais, respectivamente.
* **Q:** O VS Code detecta variáveis que existem na memória mas não mais no script?
    * **R:** As funções dir(), locals() e globals() vêm do Python puro, e portanto, vão funcionar em qualquer lugar, seja no Jupyter, no terminal, em um arquivo .py editado em um editor de texto simples ou em uma IDE.

# Criando DataFrames

## Jeito 1: Fornecendo um dicionário
* Cada chave é o nome de uma coluna
* Cada valor é uma lista que forma uma coluna

In [427]:
import pandas as pd

df = pd.DataFrame({'Nome':['Adalberto','Bernardo','Carlos','Daniel','Ernesto'],
                   'Idade':[15,24,86,53,56],
                   'Peso':[50.5,80.3,75.3,64.2,68.9],
                   'Altura':[1.7,1.8,1.7,1.9,2.0]})

df

Unnamed: 0,Nome,Idade,Peso,Altura
0,Adalberto,15,50.5,1.7
1,Bernardo,24,80.3,1.8
2,Carlos,86,75.3,1.7
3,Daniel,53,64.2,1.9
4,Ernesto,56,68.9,2.0


In [428]:
df.shape

(5, 4)

In [429]:
len(df)

5

In [431]:
df.values

array([['Adalberto', 15, 50.5, 1.7],
       ['Bernardo', 24, 80.3, 1.8],
       ['Carlos', 86, 75.3, 1.7],
       ['Daniel', 53, 64.2, 1.9],
       ['Ernesto', 56, 68.9, 2.0]], dtype=object)

In [432]:
type(df.values)

numpy.ndarray

In [433]:
df.Idade.values

array([15, 24, 86, 53, 56], dtype=int64)

In [271]:
df.Peso

Adalberto    50.5
Bernardo     80.3
Carlos       75.3
Daniel       64.2
Ernesto      68.9
Fernanda      NaN
Name: Peso, dtype: float64

### Fazendo comparação para obter resultado booleano e usando o resultado booleano como filtro 

In [289]:
df.loc[(df.Peso > 70) & (df.Idade > 50)]

Unnamed: 0,Idade,Peso,Altura
Carlos,86.0,75.3,1.7


# Modificando os rótulos de linha
### Repare que, sem inplace=True, o objeto não seria modificado

In [253]:
df.set_index("Nome",inplace=True)

In [254]:
df

Unnamed: 0_level_0,Idade,Peso,Altura
Nome,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Adalberto,15,50.5,1.7
Bernardo,24,80.3,1.8
Carlos,86,75.3,1.7
Daniel,53,64.2,1.9
Ernesto,56,68.9,2.0


# Obtendo os dados
## Jeito 1: Usando os rótulos para obter dados através de **loc**

In [255]:
df.loc["Bernardo"]

Idade     24.0
Peso      80.3
Altura     1.8
Name: Bernardo, dtype: float64

In [256]:
df.loc["Bernardo","Idade"]

24

In [257]:
df.loc["Bernardo":"Daniel","Idade":"Peso"]

Unnamed: 0_level_0,Idade,Peso
Nome,Unnamed: 1_level_1,Unnamed: 2_level_1
Bernardo,24,80.3
Carlos,86,75.3
Daniel,53,64.2


# Criando DataFrames
### Jeito 2: fornecer uma matriz, uma lista de rótulos de linha de uma lista de rótulos de colunas
* Repare que as linhas da matriz representam linhas do DataFrame, enquanto no "jeito 1" as listas representavam colunas
* Estou usando NumPy para transpor nossa matriz a fim de não fazer na mão
* A matriz poderia ser do NumPy

In [262]:
import numpy as np

m = np.array([[15,24,86,53,56],
                   [50.5,80.3,75.3,64.2,68.9],
                   [1.7,1.8,1.7,1.9,2.0]])

m = m.transpose()

df = pd.DataFrame(m,
                 index=['Adalberto','Bernardo','Carlos','Daniel','Ernesto'],
                 columns=["Idade","Peso","Altura"])

df

Unnamed: 0,Idade,Peso,Altura
Adalberto,15.0,50.5,1.7
Bernardo,24.0,80.3,1.8
Carlos,86.0,75.3,1.7
Daniel,53.0,64.2,1.9
Ernesto,56.0,68.9,2.0


# Obtendo dados
## Jeito 2: usando o índice numérico da posição através de **iloc**

In [263]:
df.iloc[2,1]

75.3

In [264]:
df.iloc[1:3,0:2]

Unnamed: 0,Idade,Peso
Bernardo,24.0,80.3
Carlos,86.0,75.3


# Diferença extra entre loc e iloc:
### Loc permite criação de novas linhas/colunas

In [265]:
df.loc["Fernanda","Idade"] = 30

df

Unnamed: 0,Idade,Peso,Altura
Adalberto,15.0,50.5,1.7
Bernardo,24.0,80.3,1.8
Carlos,86.0,75.3,1.7
Daniel,53.0,64.2,1.9
Ernesto,56.0,68.9,2.0
Fernanda,30.0,,


In [425]:
# O NaN é o not a number do numpy:
np.nan

nan

### Criando a linha toda de uma vez com **loc**

In [290]:
df.loc["Gabriel"] = [1,2,3]

### Loc também cria coluna, mas só se setar um valor por vez

In [291]:
df.loc["Gabriel","UF"] = "ES"

In [292]:
df

Unnamed: 0,Idade,Peso,Altura,UF
Adalberto,15.0,50.5,1.7,
Bernardo,24.0,80.3,1.8,
Carlos,86.0,75.3,1.7,
Daniel,53.0,64.2,1.9,
Ernesto,56.0,68.9,2.0,
Fernanda,30.0,,,
Gabriel,1.0,2.0,3.0,ES


### Podemos inserir a coluna inteira de uma vez com **.insert**

In [293]:
df.insert(4,"Cidade",["Marechal Floriano","Marechal Floriano","Marechal Floriano","Vitória","Vitória","Vitória","Domingos Martins"])

In [294]:
df

Unnamed: 0,Idade,Peso,Altura,UF,Cidade
Adalberto,15.0,50.5,1.7,,Marechal Floriano
Bernardo,24.0,80.3,1.8,,Marechal Floriano
Carlos,86.0,75.3,1.7,,Marechal Floriano
Daniel,53.0,64.2,1.9,,Vitória
Ernesto,56.0,68.9,2.0,,Vitória
Fernanda,30.0,,,,Vitória
Gabriel,1.0,2.0,3.0,ES,Domingos Martins


### Obtendo apenas colunas numéricas para poder usar funções matemáticas sem medo de ser feliz:

In [297]:
df2 = df.iloc[:,:3]

In [299]:
df2.sum(axis=0)

Idade     265.0
Peso      341.2
Altura     12.1
dtype: float64

In [300]:
df2.sum(axis=1)

Adalberto     67.2
Bernardo     106.1
Carlos       163.0
Daniel       119.1
Ernesto      126.9
Fernanda      30.0
Gabriel        6.0
dtype: float64

In [301]:
df2.mean(axis=1)

Adalberto    22.400000
Bernardo     35.366667
Carlos       54.333333
Daniel       39.700000
Ernesto      42.300000
Fernanda     30.000000
Gabriel       2.000000
dtype: float64

### max não retorna uma linha, mas sim mistura linhas:

In [302]:
df2.max()

Idade     86.0
Peso      80.3
Altura     3.0
dtype: float64

### comparando com max e usando resultado booleano como filtro para obter uma linha concisa

In [304]:
df2.loc[df2.Peso == df2.Peso.max()]

Unnamed: 0,Idade,Peso,Altura
Bernardo,24.0,80.3,1.8


## Visualizando apenas o começo

In [305]:
df.head(4)

Unnamed: 0,Idade,Peso,Altura,UF,Cidade
Adalberto,15.0,50.5,1.7,,Marechal Floriano
Bernardo,24.0,80.3,1.8,,Marechal Floriano
Carlos,86.0,75.3,1.7,,Marechal Floriano
Daniel,53.0,64.2,1.9,,Vitória


## Visualizando apenas o final

In [306]:
df.tail(4)

Unnamed: 0,Idade,Peso,Altura,UF,Cidade
Daniel,53.0,64.2,1.9,,Vitória
Ernesto,56.0,68.9,2.0,,Vitória
Fernanda,30.0,,,,Vitória
Gabriel,1.0,2.0,3.0,ES,Domingos Martins


## Obtendo as 4 menores idades

In [310]:
df.sort_values(by="Idade").head(4)

Unnamed: 0,Idade,Peso,Altura,UF,Cidade
Gabriel,1.0,2.0,3.0,ES,Domingos Martins
Adalberto,15.0,50.5,1.7,,Marechal Floriano
Bernardo,24.0,80.3,1.8,,Marechal Floriano
Fernanda,30.0,,,,Vitória


## Obtendo as maiores idades

In [332]:
df.sort_values(by="Idade",ascending=False)

Unnamed: 0,Idade,Peso,Altura,UF,Cidade
Carlos,86.0,75.3,1.7,,Marechal Floriano
Ernesto,56.0,68.9,2.0,,Vitória
Daniel,53.0,64.2,1.9,,Vitória
Fernanda,30.0,,,,Vitória
Bernardo,24.0,80.3,1.8,,Marechal Floriano
Adalberto,15.0,50.5,1.7,,Marechal Floriano
Gabriel,1.0,2.0,3.0,ES,Domingos Martins


## Podemos usar isso para detectar anomalias
### Uma alternativa é o max:

In [311]:
df.max()

Idade          86
Peso         80.3
Altura          3
Cidade    Vitória
dtype: object

## Mudando número de linhas exibidas

In [315]:
# Reduza para ver a diferença
pd.options.display.max_rows = 10
df

Unnamed: 0,Idade,Peso,Altura,UF,Cidade
Adalberto,15.0,50.5,1.7,,Marechal Floriano
Bernardo,24.0,80.3,1.8,,Marechal Floriano
Carlos,86.0,75.3,1.7,,Marechal Floriano
Daniel,53.0,64.2,1.9,,Vitória
Ernesto,56.0,68.9,2.0,,Vitória
Fernanda,30.0,,,,Vitória
Gabriel,1.0,2.0,3.0,ES,Domingos Martins


## Removendo coluna

In [318]:
df.drop("Altura",axis=1)

Unnamed: 0,Idade,Peso,UF,Cidade
Adalberto,15.0,50.5,,Marechal Floriano
Bernardo,24.0,80.3,,Marechal Floriano
Carlos,86.0,75.3,,Marechal Floriano
Daniel,53.0,64.2,,Vitória
Ernesto,56.0,68.9,,Vitória
Fernanda,30.0,,,Vitória
Gabriel,1.0,2.0,ES,Domingos Martins


# Fazendo soma sem obter NaN

In [322]:
df2.add(10,fill_value=0)

Unnamed: 0,Idade,Peso,Altura
Adalberto,25.0,60.5,11.7
Bernardo,34.0,90.3,11.8
Carlos,96.0,85.3,11.7
Daniel,63.0,74.2,11.9
Ernesto,66.0,78.9,12.0
Fernanda,40.0,10.0,10.0
Gabriel,11.0,12.0,13.0


In [323]:
df_a = df2.iloc[0:4]

df_a

Unnamed: 0,Idade,Peso,Altura
Adalberto,15.0,50.5,1.7
Bernardo,24.0,80.3,1.8
Carlos,86.0,75.3,1.7
Daniel,53.0,64.2,1.9


In [330]:
df_b = df2.tail(5)

df_b

Unnamed: 0,Idade,Peso,Altura
Carlos,86.0,75.3,1.7
Daniel,53.0,64.2,1.9
Ernesto,56.0,68.9,2.0
Fernanda,30.0,,
Gabriel,1.0,2.0,3.0


In [326]:
df_a.drop("UF",axis=1) + df_b

Unnamed: 0,Altura,Cidade,Idade,Peso,UF
Adalberto,,,,,
Bernardo,,,,,
Carlos,3.4,,172.0,150.6,
Daniel,3.8,,106.0,128.4,
Ernesto,,,,,
Fernanda,,,,,
Gabriel,,,,,


In [331]:
df_a.add(df_b,fill_value=0)

Unnamed: 0,Idade,Peso,Altura
Adalberto,15.0,50.5,1.7
Bernardo,24.0,80.3,1.8
Carlos,172.0,150.6,3.4
Daniel,106.0,128.4,3.8
Ernesto,56.0,68.9,2.0
Fernanda,30.0,,
Gabriel,1.0,2.0,3.0


# Métodos Estatísticos

In [333]:
df2.sum()

Idade     265.0
Peso      341.2
Altura     12.1
dtype: float64

In [334]:
df2.sum(axis=1)

Adalberto     67.2
Bernardo     106.1
Carlos       163.0
Daniel       119.1
Ernesto      126.9
Fernanda      30.0
Gabriel        6.0
dtype: float64

In [335]:
df2.cumsum()

Unnamed: 0,Idade,Peso,Altura
Adalberto,15.0,50.5,1.7
Bernardo,39.0,130.8,3.5
Carlos,125.0,206.1,5.2
Daniel,178.0,270.3,7.1
Ernesto,234.0,339.2,9.1
Fernanda,264.0,,
Gabriel,265.0,341.2,12.1


In [336]:
df.count()

Idade     7
Peso      6
Altura    6
UF        1
Cidade    7
dtype: int64

In [337]:
df.max()

Idade          86
Peso         80.3
Altura          3
Cidade    Vitória
dtype: object

In [340]:
df.describe().loc["count","Idade"]

7.0

In [342]:
g = df.groupby("Cidade")

In [343]:
g.sum()

Unnamed: 0_level_0,Idade,Peso,Altura
Cidade,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Domingos Martins,1.0,2.0,3.0
Marechal Floriano,125.0,206.1,5.2
Vitória,139.0,133.1,3.9


In [344]:
g.mean()

Unnamed: 0_level_0,Idade,Peso,Altura
Cidade,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Domingos Martins,1.0,2.0,3.0
Marechal Floriano,41.666667,68.7,1.733333
Vitória,46.333333,66.55,1.95


# Lendo no disco
## CSV

In [346]:
df3 = pd.read_csv("cases-brazil-cities-time.csv",sep=",")

df3

Unnamed: 0,epi_week,date,country,state,city,ibgeID,cod_RegiaoDeSaude,name_RegiaoDeSaude,newDeaths,deaths,newCases,totalCases,deaths_per_100k_inhabitants,totalCases_per_100k_inhabitants,deaths_by_totalCases,_source,last_info_date
0,9,2020-02-25,Brazil,SP,São Paulo/SP,3550308,35016.0,São Paulo,0,0,1,1,0.00000,0.00811,0.00000,SES,2021-05-24
1,9,2020-02-25,Brazil,TOTAL,TOTAL,0,,,0,0,1,1,0.00000,0.00047,0.00000,,
2,9,2020-02-26,Brazil,SP,São Paulo/SP,3550308,35016.0,São Paulo,0,0,0,1,0.00000,0.00811,0.00000,SES,2021-05-24
3,9,2020-02-26,Brazil,TOTAL,TOTAL,0,,,0,0,0,1,0.00000,0.00047,0.00000,,
4,9,2020-02-27,Brazil,SP,São Paulo/SP,3550308,35016.0,São Paulo,0,0,0,1,0.00000,0.00811,0.00000,SES,2021-05-24
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2131457,121,2021-05-26,Brazil,PR,Ângulo/PR,4101150,41015.0,15ª RS Maringá,0,6,0,283,204.77816,9658.70307,0.02120,SES,2021-05-26
2131458,121,2021-05-26,Brazil,BA,Érico Cardoso/BA,2900504,29003.0,Brumado,0,2,5,240,18.93939,2272.72727,0.00833,MS,2021-05-26
2131459,121,2021-05-26,Brazil,PA,Óbidos/PA,1505106,15002.0,Baixo Amazonas,0,116,0,6062,221.77188,11589.49260,0.01914,MS,2021-05-26
2131460,121,2021-05-26,Brazil,SP,Óleo/SP,3533809,35094.0,Ourinhos,0,1,1,71,40.46945,2873.33064,0.01408,MS,2021-05-26


## Agrupamentos

In [347]:
df3.groupby("state")

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001F4CCB1A490>

In [350]:
df4 = df3.groupby("state").sum()

In [352]:
df4.to_csv("soma_por_estado.csv")

## Excel

In [353]:
df4.to_excel("soma_por_estado.xlsx",sheet_name="Aba 1")

In [354]:
df = pd.read_excel("soma_por_estado.xlsx",sheet_name = "Aba 1")

In [355]:
df

Unnamed: 0,state,epi_week,ibgeID,cod_RegiaoDeSaude,newDeaths,deaths,newCases,totalCases,deaths_per_100k_inhabitants,totalCases_per_100k_inhabitants,deaths_by_totalCases
0,AC,571606,10441859656,104406567,1653,297681,81853,14683162,5.102845e+05,3.647672e+07,139.38030
1,AL,2547988,106923184072,1067552596,4661,891012,190379,36708453,1.945994e+06,8.524004e+07,1574.31311
2,AM,1574765,32508873492,324662279,12943,2406643,383980,74494843,2.156820e+06,1.440802e+08,543.30396
3,AP,425155,10311196836,103099711,1674,325691,111097,22729337,3.873586e+05,5.686922e+07,66.28714
4,BA,10279272,464079779466,4615777919,20726,3225506,995364,163859066,5.790367e+06,3.651809e+08,3463.57588
...,...,...,...,...,...,...,...,...,...,...,...
23,SC,7281003,476517368203,4754890990,15006,1888572,956526,148412728,6.601104e+06,5.040826e+08,1935.46407
24,SE,1882643,82166100202,820637708,4981,864214,228827,38703150,2.212429e+06,8.715154e+07,1024.55679
25,SP,15964533,884629015192,8799236312,109241,17108550,3226875,512908354,1.614940e+07,6.406230e+08,8799.68720
26,TO,3438397,90086125830,895016228,2813,435380,175150,30034594,3.015635e+06,1.842559e+08,1171.95422


## SQL

pymysql

prompt do anaconda: conda install -c anaconda pymysql

In [356]:
import sqlalchemy as sqla

db = sqla.create_engine("mysql+pymysql://root@localhost/jpa_tutorial")

pd.read_sql("select * from teste",db)

Unnamed: 0,id,nome,sobrenome,idade,peso,altura,UF
0,1,Alberto,Gonçalves,25,92.3,1.76,ES
1,2,Bernardo,Miranda,52,76.5,1.65,SP
2,3,Carlos,Nascimento,24,84.1,1.79,RJ
3,4,Daniel,Vieira,65,81.7,1.9,MG
4,5,Ernesto,Lopes,44,79.6,1.82,BA
5,6,Fernanda,Ruas,13,55.2,1.52,MS
6,7,Gabriela,Botafogo,77,67.6,1.67,CE
7,8,Hiago,de Paula,42,75.2,1.8,AM


In [423]:
import requests

url = "https://api.github.com/repos/pandas-dev/pandas/issues"

resp = requests.get(url)

data = resp.json()

issues = pd.DataFrame(data)

issues

Unnamed: 0,url,repository_url,labels_url,comments_url,events_url,html_url,id,node_id,number,title,...,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,pull_request,body,performed_via_github_app
0,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://github.com/pandas-dev/pandas/pull/42350,936086587,MDExOlB1bGxSZXF1ZXN0NjgyODk3NzUy,42350,BUG: Don't cache args during rolling/expanding...,...,{'url': 'https://api.github.com/repos/pandas-d...,0,2021-07-02T21:59:52Z,2021-07-02T22:01:02Z,,MEMBER,,{'url': 'https://api.github.com/repos/pandas-d...,- [x] closes #42287\r\n- [x] tests added / pas...,
1,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://github.com/pandas-dev/pandas/issues/42349,936001052,MDU6SXNzdWU5MzYwMDEwNTI=,42349,API: pd.Label to disambiguate e.g. MultiIndex ...,...,,1,2021-07-02T18:51:15Z,2021-07-02T19:11:53Z,,MEMBER,,,The 'level' argument in MultiIndex methods can...,
2,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://github.com/pandas-dev/pandas/issues/42347,935912349,MDU6SXNzdWU5MzU5MTIzNDk=,42347,RLS: Missing assets in release 1.3.0,...,,3,2021-07-02T16:26:12Z,2021-07-02T17:43:03Z,,CONTRIBUTOR,,,- [x] I have checked that this issue has not a...,
3,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://github.com/pandas-dev/pandas/pull/42346,935882908,MDExOlB1bGxSZXF1ZXN0NjgyNzI0MTUx,42346,Rename index when using DataFrame.reset_index,...,,0,2021-07-02T15:42:15Z,2021-07-02T15:42:49Z,,NONE,,{'url': 'https://api.github.com/repos/pandas-d...,- [x] closes #6878\r\n- [x] tests added / pass...,
4,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://github.com/pandas-dev/pandas/issues/42345,935876620,MDU6SXNzdWU5MzU4NzY2MjA=,42345,BUG: regression for 1.3.0: saving a dataframe ...,...,,0,2021-07-02T15:33:25Z,2021-07-02T15:37:42Z,,CONTRIBUTOR,,,- [x] I have checked that this issue has not a...,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
25,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://github.com/pandas-dev/pandas/pull/42304,932785286,MDExOlB1bGxSZXF1ZXN0NjgwMDg5NDk5,42304,DEPS: update setuptools min version,...,{'url': 'https://api.github.com/repos/pandas-d...,30,2021-06-29T14:51:46Z,2021-07-02T20:28:20Z,,MEMBER,,{'url': 'https://api.github.com/repos/pandas-d...,According to https://setuptools.readthedocs.io...,
26,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://github.com/pandas-dev/pandas/issues/42303,932738401,MDU6SXNzdWU5MzI3Mzg0MDE=,42303,BUG: `__array_ufunc__` with for functions with...,...,,0,2021-06-29T14:17:16Z,2021-06-29T14:17:16Z,,CONTRIBUTOR,,,- [x] I have checked that this issue has not a...,
27,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://github.com/pandas-dev/pandas/pull/42301,932653321,MDExOlB1bGxSZXF1ZXN0Njc5OTcyNDIx,42301,ENH: `Styler.bar` extended to allow centering ...,...,,0,2021-06-29T13:14:18Z,2021-06-30T18:49:56Z,,CONTRIBUTOR,,{'url': 'https://api.github.com/repos/pandas-d...,This refactors `Styler.bar` to allow more flex...,
28,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://api.github.com/repos/pandas-dev/pandas...,https://github.com/pandas-dev/pandas/issues/42295,932388491,MDU6SXNzdWU5MzIzODg0OTE=,42295,BUG: df.where() inconsistently casts columns t...,...,,0,2021-06-29T08:58:21Z,2021-06-30T07:15:43Z,,NONE,,,- [x] I have checked that this issue has not a...,


### Essas são as colunas

In [424]:
issues.columns

Index(['url', 'repository_url', 'labels_url', 'comments_url', 'events_url',
       'html_url', 'id', 'node_id', 'number', 'title', 'user', 'labels',
       'state', 'locked', 'assignee', 'assignees', 'milestone', 'comments',
       'created_at', 'updated_at', 'closed_at', 'author_association',
       'active_lock_reason', 'pull_request', 'body',
       'performed_via_github_app'],
      dtype='object')

# Limpeza

## Contando valores nulos em todo o df

In [368]:
df2.isnull().sum().sum()

2

## Mudando encoding do arquivo lido
### Deveria dar erro mesmo nesse caso, pois não tinha um arquivo em mãos com um encoding diferente

In [370]:
pd.read_csv("cases-brazil-cities-time.csv",encoding="Windows-1252")

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 64510: character maps to <undefined>

# Tratando nulos

In [372]:
df2.to_csv("file.csv")

In [376]:
teste = pd.read_csv("file.csv",na_values=["sn","s/n"])

teste

Unnamed: 0.1,Unnamed: 0,Idade,Peso,Altura
0,Adalberto,15.0,50.5,1.7
1,Bernardo,24.0,80.3,1.8
2,Carlos,86.0,75.3,1.7
3,Daniel,53.0,64.2,1.9
4,Ernesto,56.0,68.9,2.0
5,Fernanda,30.0,,
6,Gabriel,1.0,2.0,3.0


In [377]:
teste2 = pd.read_csv("file.csv")

teste2

Unnamed: 0.1,Unnamed: 0,Idade,Peso,Altura
0,Adalberto,15.0,50.5,1.7
1,Bernardo,24.0,80.3,1.8
2,Carlos,86.0,75.3,1.7
3,Daniel,53.0,64.2,1.9
4,Ernesto,56.0,68.9,2.0
5,Fernanda,30.0,sn,s/n
6,Gabriel,1.0,2.0,3.0


In [380]:
teste2.replace("sn",np.nan,inplace=True)
teste2.replace("s/n",np.nan,inplace=True)
teste2

Unnamed: 0.1,Unnamed: 0,Idade,Peso,Altura
0,Adalberto,15.0,50.5,1.7
1,Bernardo,24.0,80.3,1.8
2,Carlos,86.0,75.3,1.7
3,Daniel,53.0,64.2,1.9
4,Ernesto,56.0,68.9,2.0
5,Fernanda,30.0,,
6,Gabriel,1.0,2.0,3.0


## Cabeçalho do CSV

In [383]:
teste2 = pd.read_csv("file.csv",header=1)

teste2

Unnamed: 0.1,Unnamed: 0,Idade,Peso,Altura
0,Adalberto,15.0,50.5,1.7
1,Bernardo,24.0,80.3,1.8
2,Carlos,86.0,75.3,1.7
3,Daniel,53.0,64.2,1.9
4,Ernesto,56.0,68.9,2.0
5,Fernanda,30.0,sn,s/n
6,Gabriel,1.0,2.0,3.0


## Convertendo para dado numérico

In [385]:
teste2.set_index("Unnamed: 0",inplace=True)

teste2.loc["Gabriel"] = ["1","2","3"]

teste2

Unnamed: 0_level_0,Idade,Peso,Altura
Unnamed: 0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Adalberto,15,50.5,1.7
Bernardo,24,80.3,1.8
Carlos,86,75.3,1.7
Daniel,53,64.2,1.9
Ernesto,56,68.9,2.0
Fernanda,30,sn,s/n
Gabriel,1,2,3


In [387]:
pd.to_numeric(teste2.loc["Gabriel"])

Idade     1
Peso      2
Altura    3
Name: Gabriel, dtype: int64

## Preenchendo nulos

In [389]:
df2.fillna(0)

Unnamed: 0,Idade,Peso,Altura
Adalberto,15.0,50.5,1.7
Bernardo,24.0,80.3,1.8
Carlos,86.0,75.3,1.7
Daniel,53.0,64.2,1.9
Ernesto,56.0,68.9,2.0
Fernanda,30.0,0.0,0.0
Gabriel,1.0,2.0,3.0


In [390]:
df2.median()

Idade     30.00
Peso      66.55
Altura     1.85
dtype: float64

In [391]:
df2.Peso.fillna(df2.Peso.median())

Adalberto    50.50
Bernardo     80.30
Carlos       75.30
Daniel       64.20
Ernesto      68.90
Fernanda     66.55
Gabriel       2.00
Name: Peso, dtype: float64

1 2 3 4 5 6 1_000_000

## Concatenando dataframes

In [400]:
pd.concat([df1,df2],join="outer",axis=0)

Unnamed: 0,Idade,Peso,Altura
Adalberto,15.0,50.5,1.7
Bernardo,24.0,80.3,1.8
Carlos,86.0,75.3,1.7
Daniel,53.0,64.2,1.9
Adalberto,15.0,50.5,1.7
...,...,...,...
Carlos,86.0,75.3,1.7
Daniel,53.0,64.2,1.9
Ernesto,56.0,68.9,2.0
Fernanda,30.0,,


In [401]:
pd.concat([df1,df2],join="outer",axis=1)

Unnamed: 0,Idade,Peso,Altura,Idade.1,Peso.1,Altura.1
Adalberto,15.0,50.5,1.7,15.0,50.5,1.7
Bernardo,24.0,80.3,1.8,24.0,80.3,1.8
Carlos,86.0,75.3,1.7,86.0,75.3,1.7
Daniel,53.0,64.2,1.9,53.0,64.2,1.9
Ernesto,,,,56.0,68.9,2.0
Fernanda,,,,30.0,,
Gabriel,,,,1.0,2.0,3.0


In [402]:
pd.concat([df1,df2],join="inner",axis=1)

Unnamed: 0,Idade,Peso,Altura,Idade.1,Peso.1,Altura.1
Adalberto,15.0,50.5,1.7,15.0,50.5,1.7
Bernardo,24.0,80.3,1.8,24.0,80.3,1.8
Carlos,86.0,75.3,1.7,86.0,75.3,1.7
Daniel,53.0,64.2,1.9,53.0,64.2,1.9


In [408]:
df1.reset_index(inplace=True)

In [409]:
df2.reset_index(inplace=True)

In [412]:
df = pd.merge(df1,df2,on="index",how="outer")

In [413]:
df

Unnamed: 0,index,Idade_x,Peso_x,Altura_x,Idade_y,Peso_y,Altura_y
0,Adalberto,15.0,50.5,1.7,15.0,50.5,1.7
1,Bernardo,24.0,80.3,1.8,24.0,80.3,1.8
2,Carlos,86.0,75.3,1.7,86.0,75.3,1.7
3,Daniel,53.0,64.2,1.9,53.0,64.2,1.9
4,Ernesto,,,,56.0,68.9,2.0
5,Fernanda,,,,30.0,,
6,Gabriel,,,,1.0,2.0,3.0


## Alterando rótulos

In [415]:
df.rename({"index":"nomes","Idade_x":"idade"},axis=1)

Unnamed: 0,nomes,idade,Peso_x,Altura_x,Idade_y,Peso_y,Altura_y
0,Adalberto,15.0,50.5,1.7,15.0,50.5,1.7
1,Bernardo,24.0,80.3,1.8,24.0,80.3,1.8
2,Carlos,86.0,75.3,1.7,86.0,75.3,1.7
3,Daniel,53.0,64.2,1.9,53.0,64.2,1.9
4,Ernesto,,,,56.0,68.9,2.0
5,Fernanda,,,,30.0,,
6,Gabriel,,,,1.0,2.0,3.0


In [416]:
df.rename({0:"Primeira pessoa",1:"Segunda pessoa"},axis=0)

Unnamed: 0,index,Idade_x,Peso_x,Altura_x,Idade_y,Peso_y,Altura_y
Primeira pessoa,Adalberto,15.0,50.5,1.7,15.0,50.5,1.7
Segunda pessoa,Bernardo,24.0,80.3,1.8,24.0,80.3,1.8
2,Carlos,86.0,75.3,1.7,86.0,75.3,1.7
3,Daniel,53.0,64.2,1.9,53.0,64.2,1.9
4,Ernesto,,,,56.0,68.9,2.0
5,Fernanda,,,,30.0,,
6,Gabriel,,,,1.0,2.0,3.0


In [417]:
df.index

Int64Index([0, 1, 2, 3, 4, 5, 6], dtype='int64')

In [418]:
df.describe()

Unnamed: 0,Idade_x,Peso_x,Altura_x,Idade_y,Peso_y,Altura_y
count,4.0,4.0,4.0,7.0,6.0,6.0
mean,44.5,67.575,1.775,37.857143,56.866667,2.016667
std,32.067637,13.223054,0.095743,28.898838,28.769336,0.495648
min,15.0,50.5,1.7,1.0,2.0,1.7
25%,21.75,60.775,1.7,19.5,53.925,1.725
50%,38.5,69.75,1.75,30.0,66.55,1.85
75%,61.25,76.55,1.825,54.5,73.7,1.975
max,86.0,80.3,1.9,86.0,80.3,3.0


In [419]:
type(df)

pandas.core.frame.DataFrame

In [421]:
type(df.values)

numpy.ndarray

In [422]:
issues

Unnamed: 0,number,title,state
0,42350,BUG: Don't cache args during rolling/expanding...,open
1,42349,API: pd.Label to disambiguate e.g. MultiIndex ...,open
2,42347,RLS: Missing assets in release 1.3.0,open
3,42346,Rename index when using DataFrame.reset_index,open
4,42345,BUG: regression for 1.3.0: saving a dataframe ...,open
...,...,...,...
25,42304,DEPS: update setuptools min version,open
26,42303,BUG: `__array_ufunc__` with for functions with...,open
27,42301,ENH: `Styler.bar` extended to allow centering ...,open
28,42295,BUG: df.where() inconsistently casts columns t...,open
