# Trabalhando com Pandas

---


### Importando a biblioteca Pandas

In [None]:
import pandas as pd

Lendo o arquivo Sanduiches.txt

In [None]:
faturas = pd.read_table('/content/drive/MyDrive/Sanduiches.txt')

In [None]:
faturas.head(10)

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98
5,3,1,Chicken Bowl,"[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...",$10.98
6,3,1,Side of Chips,,$1.69
7,4,1,Steak Burrito,"[Tomatillo Red Chili Salsa, [Fajita Vegetables...",$11.75
8,4,1,Steak Soft Tacos,"[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...",$9.25
9,5,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...",$9.25


Lendo o arquivo Users.txt.

In [None]:
usuarios = pd.read_table('/content/drive/MyDrive/Users.txt')

In [None]:
usuarios.head()

Unnamed: 0,1|24|M|technician|85711
0,2|53|F|other|94043
1,3|23|M|writer|32067
2,4|24|M|technician|43537
3,5|33|F|other|15213
4,6|42|M|executive|98101


Para qua a leitura seja de forma correta, passarei um separador.

In [None]:
usuarios = pd.read_table('/content/drive/MyDrive/Users.txt', sep='|')

In [None]:
usuarios.head()

Unnamed: 0,1,24,M,technician,85711
0,2,53,F,other,94043
1,3,23,M,writer,32067
2,4,24,M,technician,43537
3,5,33,F,other,15213
4,6,42,M,executive,98101


Precisamos tirar o cabeçalho, pois é uma linha de dados. Para isso usaremos o parâmetro header.

In [None]:
usuarios = pd.read_table('/content/drive/MyDrive/Users.txt', sep='|', header=None)

In [None]:
usuarios.head()

Unnamed: 0,0,1,2,3,4
0,1,24,M,technician,85711
1,2,53,F,other,94043
2,3,23,M,writer,32067
3,4,24,M,technician,43537
4,5,33,F,other,15213


Criando uma lista para substituir o cabeçalho de números. Após, vamos acrescentar mais um parâmetro na chamada do arquivo para fazer a alteração.

In [None]:
colunas = ['id','idade','sexo','profissao','cep']

usuarios = pd.read_table('/content/drive/MyDrive/Users.txt', sep='|', header=None, names=colunas)

In [None]:
usuarios.head()

Unnamed: 0,id,idade,sexo,profissao,cep
0,1,24,M,technician,85711
1,2,53,F,other,94043
2,3,23,M,writer,32067
3,4,24,M,technician,43537
4,5,33,F,other,15213


Lendo o arquivo ufo.csv.

In [None]:
ufo = pd.read_csv('/content/drive/MyDrive/ufo.csv')

In [None]:
ufo.head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,6/1/1930 22:00
1,Willingboro,,OTHER,NJ,6/30/1930 20:00
2,Holyoke,,OVAL,CO,2/15/1931 14:00
3,Abilene,,DISK,KS,6/1/1931 13:00
4,New York Worlds Fair,,LIGHT,NY,4/18/1933 19:00


### Lendo um arquivo de largura fixa.

Com o método "fwf" é feita a leitura do arquivo preenchendo os espaços vazios.

In [None]:
df = pd.read_fwf('/content/drive/MyDrive/LarguraFixa.txt')

In [None]:
df

Unnamed: 0,USAF,WBAN,STATION NAME,CTRY,ST,CALL,LAT,LON,ELEV(M),BEGIN,END
0,7005,99999,CWOS 07005,,,,,,,20120127,20120127
1,7011,99999,CWOS 07011,,,,,,,20111025,20121129
2,7018,99999,WXPOD 7018,,,,0.000,0.000,7018.0,20110309,20130730
3,7025,99999,CWOS 07025,,,,,,,20120127,20120127
4,7026,99999,WXPOD 7026,AF,,,0.000,0.000,7026.0,20120713,20141120
...,...,...,...,...,...,...,...,...,...,...,...
29332,999999,94925,GRAND FORKS AF,US,ND,KRDR,47.967,-97.400,277.7,19710101,19710101
29333,999999,94931,HIBBING CHISHOLM-HIBBIN,US,MN,KHIB,47.386,-92.839,413.6,19720101,19721231
29334,999999,94995,LINCOLN 8 ENE,US,NE,,40.848,-96.565,362.4,20020115,20150427
29335,999999,94996,LINCOLN 11 SW,US,NE,,40.695,-96.854,418.2,20020114,20150427


## Lendo um arquivo com espaços irregulares.

In [None]:
df = pd.read_csv('/content/drive/MyDrive/SepEspaco.txt')

In [None]:
df.head()

Unnamed: 0,Year Month Day Hour Temp DewTemp Pressure WindDir WindSpeed Sky Precip1 Precip6
0,1912 4 1 6 114 94 10121 200 120 8 -9999 -9999
1,1912 4 1 9 146 96 10122 240 140 8 -9999 -9999
2,1912 4 2 6 172 122 10146 210 40 2 -9999 40
3,1912 4 2 9 193 -9999 10150 240 210 1 -9999 0
4,1912 4 2 12 188 125 10152 240 250 1 -9999 0


Os dados não foram alinhados nas suas respectivas colunas.

**Para arrumar, vamos utilizar expressão regular, REGEX.**

In [None]:
regex = '\s+'

df = pd.read_csv('/content/drive/MyDrive/SepEspaco.txt',sep=regex)

In [None]:
df.head()

Unnamed: 0,Year,Month,Day,Hour,Temp,DewTemp,Pressure,WindDir,WindSpeed,Sky,Precip1,Precip6
0,1912,4,1,6,114,94,10121,200,120,8,-9999,-9999
1,1912,4,1,9,146,96,10122,240,140,8,-9999,-9999
2,1912,4,2,6,172,122,10146,210,40,2,-9999,40
3,1912,4,2,9,193,-9999,10150,240,210,1,-9999,0
4,1912,4,2,12,188,125,10152,240,250,1,-9999,0
