Skip to content

Latest commit

 

History

History
319 lines (218 loc) · 7.97 KB

io.rst

File metadata and controls

319 lines (218 loc) · 7.97 KB

petl.io

Extract/Load - reading/writing tables from files, databases and other sources

Extract (read)

The "from..." functions extract a table from a file-like source or database. For everything except petl.io.db.fromdb the source argument provides information about where to extract the underlying data from. If the source argument is None or a string it is interpreted as follows:

  • None - read from stdin
  • string starting with http://, https:// or ftp:// - read from URL
  • string ending with .gz or .bgz - read from file via gzip decompression
  • string ending with .bz2 - read from file via bz2 decompression
  • any other string - read directly from file

Some helper classes are also available for reading from other types of file-like sources, e.g., reading data from a Zip file, a string or a subprocess, see the section on io_helpers below for more information.

Be aware that loading data from stdin breaks the table container convention, because data can usually only be read once. If you are sure that data will only be read once in your script or interactive session then this may not be a problem, however note that some petl functions do access the underlying data source more than once and so will not work as expected with data from stdin.

Load (write)

The "to..." functions load data from a table into a file-like source or database. For functions that accept a source argument, if the source argument is None or a string it is interpreted as follows:

  • None - write to stdout
  • string ending with .gz or .bgz - write to file via gzip decompression
  • string ending with .bz2 - write to file via bz2 decompression
  • any other string - write directly to file

Some helper classes are also available for writing to other types of file-like sources, e.g., writing to a Zip file or string buffer, see the section on io_helpers below for more information.

petl.io.csv

Python objects

petl.io.base.fromcolumns

Delimited files

petl.io.csv.fromcsv

petl.io.csv.tocsv

petl.io.csv.appendcsv

petl.io.csv.teecsv

petl.io.csv.fromtsv

petl.io.csv.totsv

petl.io.csv.appendtsv

petl.io.csv.teetsv

petl.io.pickle

Pickle files

petl.io.pickle.frompickle

petl.io.pickle.topickle

petl.io.pickle.appendpickle

petl.io.pickle.teepickle

petl.io.text

Text files

petl.io.text.fromtext

petl.io.text.totext

petl.io.text.appendtext

petl.io.text.teetext

petl.io.xml

XML files

petl.io.xml.fromxml

For writing to an XML file, see petl.io.text.totext.

petl.io.html

HTML files

petl.io.html.tohtml

petl.io.html.teehtml

petl.io.json

JSON files

petl.io.json.fromjson

petl.io.json.fromdicts

petl.io.json.tojson

petl.io.json.tojsonarrays

petl.io.db

Databases

Note

The automatic table creation feature of petl.io.db.todb requires SQLAlchemy to be installed, e.g.:

$ pip install sqlalchemy

petl.io.db.fromdb

petl.io.db.todb

petl.io.db.appenddb

petl.io.xls

Excel .xls files (xlrd/xlwt)

Note

The following functions require xlrd and xlwt to be installed, e.g.:

$ pip install xlrd xlwt-future

petl.io.xls.fromxls

petl.io.xls.toxls

petl.io.xlsx

Excel .xlsx files (openpyxl)

Note

The following functions require openpyxl to be installed, e.g.:

$ pip install openpyxl

petl.io.xlsx.fromxlsx

petl.io.xlsx.toxlsx

petl.io.numpy

Arrays (NumPy)

Note

The following functions require numpy to be installed, e.g.:

$ pip install numpy

petl.io.numpy.fromarray

petl.io.numpy.toarray

petl.io.numpy.torecarray

petl.io.numpy.valuestoarray

petl.io.pandas

DataFrames (pandas)

Note

The following functions require pandas to be installed, e.g.:

$ pip install pandas

petl.io.pandas.fromdataframe

petl.io.pandas.todataframe

petl.io.pytables

HDF5 files (PyTables)

Note

The following functions require PyTables to be installed, e.g.:

$ # install HDF5
$ apt-get install libhdf5-7 libhdf5-dev
$ # install other prerequisites
$ pip install cython
$ pip install numpy
$ pip install numexpr
$ # install PyTables
$ pip install tables

petl.io.pytables.fromhdf5

petl.io.pytables.fromhdf5sorted

petl.io.pytables.tohdf5

petl.io.pytables.appendhdf5

petl.io.bcolz

Bcolz ctables

Note

The following functions require bcolz to be installed, e.g.:

$ pip install bcolz

petl.io.bcolz.frombcolz

petl.io.bcolz.tobcolz

petl.io.bcolz.appendbcolz

petl.io.whoosh

Text indexes (Whoosh)

Note

The following functions require Whoosh to be installed, e.g.:

$ pip install whoosh

petl.io.whoosh.fromtextindex

petl.io.whoosh.searchtextindex

petl.io.whoosh.searchtextindexpage

petl.io.whoosh.totextindex

petl.io.whoosh.appendtextindex

petl.io.sources

I/O helper classes

The following classes are helpers for extract (from...()) and load (to...()) functions that use a file-like data source.

An instance of any of the following classes can be used as the source argument to data extraction functions like petl.io.csv.fromcsv etc., with the exception of petl.io.sources.StdoutSource which is write-only.

An instance of any of the following classes can also be used as the source argument to data loading functions like petl.io.csv.tocsv etc., with the exception of petl.io.sources.StdinSource, petl.io.sources.URLSource and petl.io.sources.PopenSource which are read-only.

The behaviour of each source can usually be configured by passing arguments to the constructor, see the source code of the petl.io.sources module for full details.

petl.io.sources.FileSource

petl.io.sources.GzipSource

petl.io.sources.BZ2Source

petl.io.sources.ZipSource

petl.io.sources.StdinSource

petl.io.sources.StdoutSource

petl.io.sources.URLSource

petl.io.sources.MemorySource

petl.io.sources.PopenSource