Initial alpha release

st-pasha released this 19 Mar 22:21

24798ca

Added

Method df.tonumpy() now has argument stype which will force conversion into
a numpy array of the specific stype.
Enums stype and ltype that encapsulate the type-system of the datatable
module.
It is now possible to fread from a bytes object.
Allow columns to be renamed by setting the names property on the datatable.
Internal "MemoryMapManager" will make datatable more robust when opening a
frame with many columns on Linux systems. In particular, error 12 "not enough
memory" should become much more rare now.
Number of threads used by fread can now be controlled via parameter nthreads.
It is now possible to supply string argument to dt.DataTable constructor,
which in turn will try to interpret that argument via fread.
fread can now read compressed .xz files.
fread now automatically skips Ctrl+Z / NUL characters at the end of the file.
It is now possible to create a datatable from string numpy array.
Added parameters skip_blank_lines, strip_white, quotechar and dec to fread.
Single-column files with blank lines can now be read successfully.
Fread now recognizes \r\r\n as a valid line ending.
Added parameters url and cmd to fread, as well as ability to detect URLs
automatically. The url parameter downloads file from HTTP/HTTPS/FTP server
into a temporary location and reads it from there. The cmd parameter executes
the provided shell command and then reads the data from the stdout.
It is now possible to pass file objects to fread (or any objects exposing
method read()).
File path given to fread can now transparently select files within .zip archives.
This doesn't work with archives-within-archives.
GenericReader now supports auto-detecting and reading UTF-16 files.
GenericReader now attempts to detect whether the input file is an HTML, and if so
raises an exception with the appropriate error message.
Datatable can now use either llvm-4.0 or llvm-5.0 depending on what the user has.
fread now allows sep="", causing the file to be read line-by-line.
range arguments can now be passed to a DataTable constructor.
datatable will now fall back to eager execution if it cannot detect LLVM runtime.
simple Excel file reader.
It is now possible to select columns from DataTable by type: df[int] selects
all integer columns from df.
Allow creating DataTable from list, while forcing a specific stype(s).
Added ability to delete rows from a DataTable: del df[rows, :]
DataTable can now accept pandas/numpy frames with columns of float16 dtype
(which will be automatically converted to float32).
.isna() function now works on strings too.
.save() is now a method of Frame class.
Warnings now have custom display hook.
Added global option nthreads which control the number of Omp threads used
by datatable for parallel execution. Example: dt.options.nthreads = 1.
Add method .scalar() to quickly convert a 1x1 Frame into a python scalar.
New methods .min1(), .max1(), .mean1(), .sum1(), .sd1(), .countna1()
that are similar to .min(), .max(), etc. but return a scalar instead of a
Frame (however they only work with a 1-column Frames).
Implemented method .nunique() to compute the number of unique values in each
column.
Added stats functions .mode() and .nmodal().

Changed

When writing "round" doubles/floats to CSV, they'll now always have trailing zero.
For example, [0.0, 1.0, 1e23] now produce "0.0,1.0,1.0e+23" instead of "0,1,1e+23".
df.stypes now returns a tuple of stype elements (previously it was returning
a list of strings). Likewise, df.types was renamed into df.ltypes and now it
returns a tuple of ltype elements instead of strings.
Parameter colnames= in DataTable constructor was renamed to names=. The old
parameter may still be used, but it will result in a warning.
DataTable can no longer have duplicate column names. If such names are given,
they will be mangled to make them unique, and a warning will be issued.
Special characters (in the ASCII range \x00 - \x1F) are no longer permitted in
the column names. If encountered, they will be replaced with a dot ..
Fread now ignores trailing whitespace on each line, even if ' ' separator is used.
Fread on an empty file now produces an empty DataTable, instead of an exception.
Fread's parameter skip_lines was replaced with skip_to_line, so that it's
more in sync with the similar argument skip_to_string.
When saving datatable containing "obj64" columns, they will no longer be saved,
and user warning will be shown (previously saving this column would eventually
lead to a segfault).
(python) DataTable class was renamed into Frame.
"eager" evaluation engine is now the default.
Parameter inplace of method rbind() was removed: instead you can now rbind
frames to an empty frame: dt.Frame().rbind(df1, df2).

Fixed

datatable will no longer cause the C locale settings to change upon importing.
reading a csv file with invalid UTF-8 characters in column names will no longer
throw an exception.
creating a DataTable from pandas.Series with explicit colnames will no longer
ignore those column names.
fread(fill=True) will correctly fill missing fields with NAs.
fread(columns=set(...)) will correctly handle the case when the input contains
multiple columns with the same names.
fread will no longer crash if the input dataset contains invalid utf8/win1252
data in the column headers (#594, #628).
fixed bug in exception handling, which occasionally caused empty exception
messages.
fixed bug in fread where string fields starting with "NaN" caused an assertion error.
Fixed bug when saving a DataTable with unicode column names into .nff format
on systems where default encoding is not unicode-aware.
More robust newline handling in fread (#634, #641, #647).
Quoted fields are now correctly unquoted in fread.
Fixed a bug in fread which occurred if the number of rows in the CSV file was
estimated too low (#664).
Fixed fread bug where an invalid DataTable was constructed if parameter max_nrows
was used and there were any string columns (#671).
Fixed a rare bug in fread which produced error message "Jump X did not finish
reading where jump X+1 started" (#682).
Prevented memory leak when using "PyObject" columns in conjunction with numpy.
View frames can now be properly saved.
Fixed crash when sorting view frame by a string column.
Deleting 0 columns is no longer an error.
Rows filter now works properly when applied to a view table and using "eager"
evaluation engine.
Computed columns expression can now be combined with rows expression, or
applied to a view Frame.

Assets 2