<a href="https://colab.research.google.com/github/strawndri/python-ds-pandas-io/blob/main/Desafios.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Python para Data Science - Desafios

Neste Notebook, serão trabalhadas diferentes maneiras de importar e exportar arquivos utilizando a [biblioteca Pandas](https://pandas.pydata.org/docs/), do Python.

Todo o estudo é baseado no conteúdo apresentado no curso [Pandas I/O: trabalhando com diferentes formatos de arquivos](https://www.alura.com.br/curso-online-pandas-io-trabalhando-diferentes-formatos-arquivos), da Alura.

## Aula 1: Fazendo leitura de arquivos CSV

In [3]:
import pandas as pd

In [4]:
url = 'https://raw.githubusercontent.com/strawndri/python-ds-pandas-io/main/dados/dados_sus.csv'

### 1. Verifique se o arquivo CSV está separado por vírgula ou ponto e vírgula.

In [None]:
dados = pd.read_csv(url, sep=';')

### 2. A codificação do arquivo é ISO-8859-1.

O parâmetro `enconding` especifica a condificação de caracteres a ser usada na leitura do arquivo CSV. Geralmente, trabalhamos com o padrão `UTF-8`, mas outros podem surgir.

In [None]:
dados = pd.read_csv(url, sep=';', encoding='ISO-8859-1')

### 3. As três primeiras linhas linhas do arquivo podem ser desconsideradas, pois o cabeçalho só começa na quarta linha.

O parâmetro `skiprows` permite pular um número específico de linhas no início do arquivo durante a leitura.

In [12]:
dados = pd.read_csv(url, sep=';', encoding='ISO-8859-1', skiprows=3)
dados.head()

Unnamed: 0,Unidade da Federação,2008/Jan,2008/Fev,2008/Mar,2008/Abr,2008/Mai,2008/Jun,2008/Jul,2008/Ago,2008/Set,...,2020/Jul,2020/Ago,2020/Set,2020/Out,2020/Nov,2020/Dez,2021/Jan,2021/Fev,2021/Mar,Total
0,Rondônia,138852839,293128342,154168252,152531496,164595384,140615068,306527901,323149461,311717863,...,1182468713,1173330776,1020198514,795513945,935794629,888083655,926601459,773059704,1102330947,99641125468
1,Acre,90241600,149720626,179402848,173046942,181944392,182849600,251175459,208910714,227570853,...,391519320,364014282,339124221,404432144,327659010,369699731,371572312,353842792,407704592,45004853047
2,Amazonas,473552942,711899057,819663549,825937842,783139990,847727362,936885872,935253270,936309935,...,1976946014,1805993143,1784101563,1640831510,1989561791,1776516769,2143028917,2591713455,2203217622,191724793605
3,Roraima,65788953,77793931,71868803,83999439,86234796,83244066,99669309,89427118,91042417,...,301548830,282648618,292804391,309031373,362103105,345446094,326692847,351977373,398553008,32887696509
4,Pará,1886474411,1955375820,2193734270,2084282969,2324995288,2324068756,2400222356,2334121803,2517226132,...,4080412643,4438571588,3682024947,3696593134,3900431580,3801514579,3835468246,3768831423,3327639289,470530900229


### 4. As 9 últimas linhas também podem ser desconsideradas, pois são apenas informações sobre onde os dados foram obtidos.

O parâmetro `skipfooter` permite pular um número específico de linhas no final do arquivo durante a leitura.

In [13]:
dados = pd.read_csv(url, sep=';', encoding='ISO-8859-1', skiprows=3, skipfooter=9)
dados.head()

  dados = pd.read_csv(url, sep=';', encoding='ISO-8859-1', skiprows=3, skipfooter=9)


Unnamed: 0,Unidade da Federação,2008/Jan,2008/Fev,2008/Mar,2008/Abr,2008/Mai,2008/Jun,2008/Jul,2008/Ago,2008/Set,...,2020/Jul,2020/Ago,2020/Set,2020/Out,2020/Nov,2020/Dez,2021/Jan,2021/Fev,2021/Mar,Total
0,Rondônia,138852839,293128342,154168252,152531496,164595384,140615068,306527901,323149461,311717863,...,1182468713,1173330776,1020198514,795513945,935794629,888083655,926601459,773059704,1102330947,99641125468
1,Acre,90241600,149720626,179402848,173046942,181944392,182849600,251175459,208910714,227570853,...,391519320,364014282,339124221,404432144,327659010,369699731,371572312,353842792,407704592,45004853047
2,Amazonas,473552942,711899057,819663549,825937842,783139990,847727362,936885872,935253270,936309935,...,1976946014,1805993143,1784101563,1640831510,1989561791,1776516769,2143028917,2591713455,2203217622,191724793605
3,Roraima,65788953,77793931,71868803,83999439,86234796,83244066,99669309,89427118,91042417,...,301548830,282648618,292804391,309031373,362103105,345446094,326692847,351977373,398553008,32887696509
4,Pará,1886474411,1955375820,2193734270,2084282969,2324995288,2324068756,2400222356,2334121803,2517226132,...,4080412643,4438571588,3682024947,3696593134,3900431580,3801514579,3835468246,3768831423,3327639289,470530900229


### 5. Para deletar as últimas linhas é necessário adicionar o parâmetro `engine='python'`.

Com o parâmetro `engine`, definimos o mecanismo a ser usado para a leitura do arquivo CSV. Por padrão, o Pandas trabalha com `'c'` **(C engine)**.

Contudo, para alguns casos, pode ser necessário usar o mecanismo `'python'` **(usando o interpretador Python)** para suportar funcionalidades adicionais, como o próprio `skipfooter.`

In [14]:
dados = pd.read_csv(url, sep=';', encoding='ISO-8859-1', skiprows=3, skipfooter=9, engine='python')
dados.head()

Unnamed: 0,Unidade da Federação,2008/Jan,2008/Fev,2008/Mar,2008/Abr,2008/Mai,2008/Jun,2008/Jul,2008/Ago,2008/Set,...,2020/Jul,2020/Ago,2020/Set,2020/Out,2020/Nov,2020/Dez,2021/Jan,2021/Fev,2021/Mar,Total
0,Rondônia,138852839,293128342,154168252,152531496,164595384,140615068,306527901,323149461,311717863,...,1182468713,1173330776,1020198514,795513945,935794629,888083655,926601459,773059704,1102330947,99641125468
1,Acre,90241600,149720626,179402848,173046942,181944392,182849600,251175459,208910714,227570853,...,391519320,364014282,339124221,404432144,327659010,369699731,371572312,353842792,407704592,45004853047
2,Amazonas,473552942,711899057,819663549,825937842,783139990,847727362,936885872,935253270,936309935,...,1976946014,1805993143,1784101563,1640831510,1989561791,1776516769,2143028917,2591713455,2203217622,191724793605
3,Roraima,65788953,77793931,71868803,83999439,86234796,83244066,99669309,89427118,91042417,...,301548830,282648618,292804391,309031373,362103105,345446094,326692847,351977373,398553008,32887696509
4,Pará,1886474411,1955375820,2193734270,2084282969,2324995288,2324068756,2400222356,2334121803,2517226132,...,4080412643,4438571588,3682024947,3696593134,3900431580,3801514579,3835468246,3768831423,3327639289,470530900229
