# Instruções gerais

Para executar uma CÉLULA, pressione o botão "Run" no topo, com a CÉLULA selecionada, ou pressione:

```
Shift + Enter
```

EXECUTE APENAS UMA CÉLULA POR VEZ

\-Caso deseje apagar a saída de uma célula carregada, mas não deseje carregar uma nova saída, vá à aba superior, escolha o ícone do teclado (open the command palette/ jupyter-notebook command group) e selecione clear cell output.

@author: Marco César Prado Soares, MSc. Especialista Lean Six Sigma Master Black Belt, Eng. Químico, MSc. Eng. Mecatrônica (instrumentação) [Marco.Soares@br.ey.com](mailto:&#77;&#97;&#x72;&#x63;&#111;&#46;&#x53;&#111;&#97;&#114;&#101;&#x73;&#x40;&#98;&#x72;&#x2e;&#x65;&#121;&#46;&#99;&#111;&#109;); [marcosoares.feq@gmail.com](mailto:&#x6d;&#x61;&#114;&#99;&#111;&#115;&#x6f;&#97;&#114;&#101;&#x73;&#x2e;&#x66;&#101;&#113;&#64;&#103;&#109;&#97;&#x69;&#x6c;&#46;&#x63;&#x6f;&#109;)

**Extração simplificada de múltiplas tabelas de arquivos xlsx, e arquivos xml, html e json**

\- Estes e outros exemplos estão disponíveis na página do pacote petl:

https://petl.readthedocs.io/en/stable/io.html#extract-read

## Excel .xlsx files (openpyxl)

The following functions require [openpyxl](https://bitbucket.org/ericgazoni/openpyxl/wiki/Home) to be installed, e.g.:

```
$ pip install openpyxl
```

`petl.io.xlsx.``fromxlsx`(_filename_, _sheet=None_, _range\_string=None_, _min\_row=None_, _min\_col=None_, _max\_row=None_, _max\_col=None_, _read\_only=False_, _\*\*kwargs_)[](https://petl.readthedocs.io/en/stable/io.html#petl.io.xlsx.fromxlsx)

Extract a table from a sheet in an Excel .xlsx file.

N.B., the sheet name is case sensitive.

The sheet argument can be omitted, in which case the first sheet in the workbook is used by default.

The range\_string argument can be used to provide a range string specifying a range of cells to extract.

The min\_row, min\_col, max\_row and max\_col arguments can be used to limit the range of cells to extract. They will be ignored if range\_string is provided.

The read\_only argument determines how openpyxl returns the loaded workbook. Default is False as it prevents some LibreOffice files from getting truncated at 65536 rows. True should be faster if the file use is read-only and the files are made with Microsoft Excel.

Any other keyword arguments are passed through to `openpyxl.load_workbook()`.

In [None]:
import petl as etl
# Pacote otimizado para extração deste tipo de documento
from etl import io

In [None]:
etl.io.xlsx.fromxlsx(filename, sheet=None, range_string=None, min_row=None, min_col=None, max_row=None, max_col=None, read_only=False)

`petl.io.xlsx.``toxlsx`(_tbl_, _filename_, _sheet=None_, _write\_header=True_, _mode='replace'_)[](https://petl.readthedocs.io/en/stable/io.html#petl.io.xlsx.toxlsx)

Write a table to a new Excel .xlsx file.

N.B., the sheet name is case sensitive.

The mode argument controls how the file and sheet are treated:

> - replace: This is the default. It either replaces or adds a named sheet, or if no sheet name is provided, all sheets (overwrites the entire file).
> - overwrite: Always overwrites the file. This produces a file with a single sheet.
> - add: Adds a new sheet. Raises ValueError if a named sheet already exists.

The sheet argument can be omitted in all cases. The new sheet will then get a default name. If the file does not exist, it will be created, unless replace mode is used with a named sheet. In the latter case, the file must exist and be a valid .xlsx file.

In [None]:
import petl as etl
# Pacote otimizado para extração deste tipo de documento
from etl import io

In [None]:
etl.io.xlsx.toxlsx(tbl, filename, sheet=None, write_header=True, mode='replace')

`petl.io.xlsx.``appendxlsx`(_tbl_, _filename_, _sheet=None_, _write\_header=False_)[](https://petl.readthedocs.io/en/stable/io.html#petl.io.xlsx.appendxlsx)

Appends rows to an existing Excel .xlsx file.

In [None]:
import petl as etl
# Pacote otimizado para extração deste tipo de documento
from etl import io

In [None]:
etl.io.xlsx.appendxlsx(tbl, filename, sheet=None, write_header=False)

## XML files[](https://petl.readthedocs.io/en/stable/io.html#xml-files)

`petl.io.xml.``fromxml`(_source_, _\*args_, _\*\*kwargs_)[<span class="viewcode-link" style="box-sizing: border-box; display: inline-block; color: rgb(39, 174, 96); font-size: 11.52px; padding-left: 24px;">[source]</span>](https://petl.readthedocs.io/en/stable/_modules/petl/io/xml.html#fromxml)[](https://petl.readthedocs.io/en/stable/io.html#petl.io.xml.fromxml)

Extract data from an XML file. E.g.:

In [None]:
import petl as etl
# Pacote otimizado para extração deste tipo de documento

In [None]:
# setup a file to demonstrate with
d = '''<table>
     <tr>
         <td>foo</td><td>bar</td>
     </tr>
     <tr>
         <td>a</td><td>1</td>
     </tr>
     <tr>
         <td>b</td><td>2</td>
     </tr>
     <tr>
         <td>c</td><td>2</td>
     </tr>
 </table>'''
with open('example1.xml', 'w') as f:
     f.write(d

Note que foi criado um arquivo XML apenas para teste. Substitua este arquivo por um desejado.

In [None]:
table1 = etl.fromxml('example1.xml', 'tr', 'td')
table1

#table1 limpa o arquivo XML e armazena os resultados como um dataframe com as características desejadas.

A saída será a seguinte:

```
+-----+-----+
| foo | bar |
+=====+=====+
| 'a' | '1' |
+-----+-----+
| 'b' | '2' |
+-----+-----+
| 'c' | '2' |
+-----+-----+
```

<span style="color: rgb(64, 64, 64); font-family: Lato, proxima-nova, &quot;Helvetica Neue&quot;, Arial, sans-serif; font-size: 16px;"><b style="background-color: yellow;">If the data values are stored in an attribute</b><span style="background-color: rgb(252, 252, 252);">, provide the attribute name as an extra positional argument:</span></span>

In [None]:
#CRIAÇÃO DO ARQUIVO XML USADO COMO EXEMPLO:

d = '''<table>
     <tr>
         <td v='foo'/><td v='bar'/>
     </tr>
     <tr>
         <td v='a'/><td v='1'/>
     </tr>
     <tr>
         <td v='b'/><td v='2'/>
     </tr>
     <tr>
         <td v='c'/><td v='2'/>
     </tr>
 </table>'''
with open('example2.xml', 'w') as f:
     f.write(d)

In [None]:
table2 = etl.fromxml('example2.xml', 'tr', 'td', 'v')
table2

Aqui, a saída será a seguinte:

```
+-----+-----+
| foo | bar |
+=====+=====+
| 'a' | '1' |
+-----+-----+
| 'b' | '2' |
+-----+-----+
| 'c' | '2' |
+-----+-----+
```

<span style="color: rgb(64, 64, 64); font-family: Lato, proxima-nova, &quot;Helvetica Neue&quot;, Arial, sans-serif; font-size: 16px;"><span style="background-color: rgb(252, 252, 252);">Data values can also be extracted by </span><b style="background-color: yellow;">providing a mapping of field names to element paths</b><span style="background-color: rgb(252, 252, 252);">:</span></span>

In [None]:
#CRIAÇÃO DO ARQUIVO XML USADO COMO EXEMPLO:

d = '''<table>
     <row>
         <foo>a</foo><baz><bar v='1'/><bar v='3'/></baz>
     </row>
     <row>
         <foo>b</foo><baz><bar v='2'/></baz>
     </row>
     <row>
         <foo>c</foo><baz><bar v='2'/></baz>
     </row>
 </table>'''
with open('example3.xml', 'w') as f:
     f.write(d)

In [None]:
table3 = etl.fromxml('example3.xml', 'row', {'foo': 'foo', 'bar': ('baz/bar', 'v')})
table3

A saída será:

```
+------------+-----+
| bar        | foo |
+============+=====+
| ('1', '3') | 'a' |
+------------+-----+
| '2'        | 'b' |
+------------+-----+
| '2'        | 'c' |
+------------+-----+
```

## HTML files

<span style="color: rgb(64, 64, 64); font-family: Lato, proxima-nova, &quot;Helvetica Neue&quot;, Arial, sans-serif; font-size: 16px; background-color: rgb(252, 252, 252);">Return a table that writes rows to a Unicode HTML file as they are iterated over.</span>

`petl.io.html.``teehtml`<span class="sig-paren" style="box-sizing: border-box; color: rgb(41, 128, 185); font-family: Lato, proxima-nova, &quot;Helvetica Neue&quot;, Arial, sans-serif; font-size: 14.4px; font-weight: 700;">(</span>_table_<span style="color: rgb(41, 128, 185); font-family: Lato, proxima-nova, &quot;Helvetica Neue&quot;, Arial, sans-serif; font-size: 14.4px; font-weight: 700; background-color: rgb(231, 242, 250);">,&nbsp;</span> _source=None_<span style="color: rgb(41, 128, 185); font-family: Lato, proxima-nova, &quot;Helvetica Neue&quot;, Arial, sans-serif; font-size: 14.4px; font-weight: 700; background-color: rgb(231, 242, 250);">,&nbsp;</span> _encoding=None_<span style="color: rgb(41, 128, 185); font-family: Lato, proxima-nova, &quot;Helvetica Neue&quot;, Arial, sans-serif; font-size: 14.4px; font-weight: 700; background-color: rgb(231, 242, 250);">,&nbsp;</span> _errors='strict'_<span style="color: rgb(41, 128, 185); font-family: Lato, proxima-nova, &quot;Helvetica Neue&quot;, Arial, sans-serif; font-size: 14.4px; font-weight: 700; background-color: rgb(231, 242, 250);">,&nbsp;</span> _caption=None_<span style="color: rgb(41, 128, 185); font-family: Lato, proxima-nova, &quot;Helvetica Neue&quot;, Arial, sans-serif; font-size: 14.4px; font-weight: 700; background-color: rgb(231, 242, 250);">,&nbsp;</span> _vrepr=\<class 'str'\>_<span style="color: rgb(41, 128, 185); font-family: Lato, proxima-nova, &quot;Helvetica Neue&quot;, Arial, sans-serif; font-size: 14.4px; font-weight: 700; background-color: rgb(231, 242, 250);">,&nbsp;</span> _lineterminator='\\n'_<span style="color: rgb(41, 128, 185); font-family: Lato, proxima-nova, &quot;Helvetica Neue&quot;, Arial, sans-serif; font-size: 14.4px; font-weight: 700; background-color: rgb(231, 242, 250);">,&nbsp;</span> _index\_header=False_<span style="color: rgb(41, 128, 185); font-family: Lato, proxima-nova, &quot;Helvetica Neue&quot;, Arial, sans-serif; font-size: 14.4px; font-weight: 700; background-color: rgb(231, 242, 250);">,&nbsp;</span> _tr\_style=None_<span style="color: rgb(41, 128, 185); font-family: Lato, proxima-nova, &quot;Helvetica Neue&quot;, Arial, sans-serif; font-size: 14.4px; font-weight: 700; background-color: rgb(231, 242, 250);">,&nbsp;</span> _td\_styles=None_<span style="color: rgb(41, 128, 185); font-family: Lato, proxima-nova, &quot;Helvetica Neue&quot;, Arial, sans-serif; font-size: 14.4px; font-weight: 700; background-color: rgb(231, 242, 250);">,&nbsp;</span> _truncate=None_<span class="sig-paren" style="box-sizing: border-box; color: rgb(41, 128, 185); font-family: Lato, proxima-nova, &quot;Helvetica Neue&quot;, Arial, sans-serif; font-size: 14.4px; font-weight: 700;">)</span><span style="color: rgb(64, 64, 64); font-family: Lato, proxima-nova, &quot;Helvetica Neue&quot;, Arial, sans-serif; font-size: 16px; background-color: rgb(252, 252, 252);"><br></span>

In [None]:
import petl as etl
# Pacote otimizado para extração deste tipo de documento
from etl import io

In [None]:
etl.io.html.teehtml(table, source=None, encoding=None, errors='strict', caption=None, vrepr=<class 'str'>, lineterminator='\n', index_header=False, tr_style=None, td_styles=None, truncate=None)

## JSON files[](https://petl.readthedocs.io/en/stable/io.html#json-files)

`petl.io.json.``fromjson`(_source_, _\*args_, _\*\*kwargs_)[<span class="viewcode-link" style="box-sizing: border-box; display: inline-block; color: rgb(39, 174, 96); font-size: 11.52px; padding-left: 24px;">[source]</span>](https://petl.readthedocs.io/en/stable/_modules/petl/io/json.html#fromjson)[](https://petl.readthedocs.io/en/stable/io.html#petl.io.json.fromjson)

Extract data from a JSON file. The file must contain a JSON array as the top level object, and each member of the array will be treated as a row of data

In [None]:
import petl as etl
# Pacote otimizado para extração deste tipo de documento

In [None]:
#CRIAÇÃO DO ARQUIVO JSON USADO COMO EXEMPLO:

data = '''
 [{"foo": "a", "bar": 1},
 {"foo": "b", "bar": 2},
 {"foo": "c", "bar": 2}]
 '''
with open('example.json', 'w') as f:
     f.write(data)

In [None]:
table1 = etl.fromjson('example.json', header=['foo', 'bar'])
table1

A saída deste exemplo é mostrada a seguir:

```
+-----+-----+
| foo | bar |
+=====+=====+
| 'a' |   1 |
+-----+-----+
| 'b' |   2 |
+-----+-----+
| 'c' |   2 |
+-----+-----+
```

<span style="color: rgb(64, 64, 64); font-family: Lato, proxima-nova, &quot;Helvetica Neue&quot;, Arial, sans-serif; font-size: 16px; background-color: rgb(252, 252, 252);">Setting argument&nbsp;</span> lines <span style="color: rgb(64, 64, 64); font-family: Lato, proxima-nova, &quot;Helvetica Neue&quot;, Arial, sans-serif; font-size: 16px; background-color: rgb(252, 252, 252);">&nbsp;to&nbsp;</span> True <span style="color: rgb(64, 64, 64); font-family: Lato, proxima-nova, &quot;Helvetica Neue&quot;, Arial, sans-serif; font-size: 16px; background-color: rgb(252, 252, 252);">&nbsp;will enable to infer the document as a JSON lines document. For more details about JSON lines please visit&nbsp;</span> [https://jsonlines.org/](https://jsonlines.org/)<span style="color: rgb(64, 64, 64); font-family: Lato, proxima-nova, &quot;Helvetica Neue&quot;, Arial, sans-serif; font-size: 16px; background-color: rgb(252, 252, 252);">.</span>

In [None]:
import petl as etl
# Pacote otimizado para extração deste tipo de documento

In [None]:
#CRIAÇÃO DO ARQUIVO JSON USADO COMO EXEMPLO:

data_with_jlines = '''{"name": "Gilbert", "wins": [["straight", "7S"], ["one pair", "10H"]]}
 {"name": "Alexa", "wins": [["two pair", "4S"], ["two pair", "9S"]]}
 {"name": "May", "wins": []}
 {"name": "Deloise", "wins": [["three of a kind", "5S"]]}'''

with open('example2.json', 'w') as f:
     f.write(data_with_jlines)

In [None]:
table2 = etl.fromjson('example2.json', lines=True)
table2

A saída deste exemplo será:

```
+-----------+-------------------------------------------+
| name      | wins                                      |
+===========+===========================================+
| 'Gilbert' | [['straight', '7S'], ['one pair', '10H']] |
+-----------+-------------------------------------------+
| 'Alexa'   | [['two pair', '4S'], ['two pair', '9S']]  |
+-----------+-------------------------------------------+
| 'May'     | []                                        |
+-----------+-------------------------------------------+
| 'Deloise' | [['three of a kind', '5S']]               |
+-----------+-------------------------------------------+
```

If your JSON file does not fit this structure, you will need to parse it via `json.load()` and select the array to treat as the data, see also [`petl.io.json.fromdicts()`](https://petl.readthedocs.io/en/stable/io.html#petl.io.json.fromdicts).

_Changed in version 1.1.0._

If no header is specified, fields will be discovered by sampling keys from the first sample objects in source. The header will be constructed from keys in the order discovered. Note that this ordering may not be stable, and therefore it may be advisable to specify an explicit header or to use another function like [`petl.transform.headers.sortheader()`](https://petl.readthedocs.io/en/stable/transform.html#petl.transform.headers.sortheader) on the resulting table to guarantee stability.