Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

ENH: Add JSON export option for DataFrame #631 #1226

Closed
wants to merge 114 commits into
from
Commits
Jump to file or symbol
Failed to load files and symbols.
+11,678 −1,937
Split
View
@@ -38,6 +38,22 @@ pandas 0.8.0
- Add support for indexes (dates or otherwise) with duplicates and common
sense indexing/selection functionality
- Series/DataFrame.update methods, in-place variant of combine_first (#961)
+ - Add ``match`` function to API (#502)
+ - Add Cython-optimized first, last, min, max, prod functions to GroupBy (#994,
+ #1043)
+ - Dates can be split across multiple columns (#1227, #1186)
+ - Add experimental support for converting pandas DataFrame to R data.frame
+ via rpy2 (#350, #1212)
+ - Can pass list of (name, function) to GroupBy.aggregate to get aggregates in
+ a particular order (#610)
+ - Can pass dicts with lists of functions or dicts to GroupBy aggregate to do
+ much more flexible multiple function aggregation (#642)
+ - New ordered_merge functions for merging DataFrames with ordered
+ data. Also supports group-wise merging for panel data (#813)
+ - Add keys() method to DataFrame
+ - Add flexible replace method for replacing potentially values to Series and
+ DataFrame (#929, #1241)
+ - Add 'kde' plot kind for Series/DataFrame.plot (#1059)
**Improvements to existing features**
@@ -50,13 +66,21 @@ pandas 0.8.0
- Can pass arrays in addition to column names to DataFrame.set_index (#402)
- Improve the speed of "square" reindexing of homogeneous DataFrame objects
by significant margin (#836)
+ - Handle more dtypes when passed MaskedArrays in DataFrame constructor (#406)
+ - Improved performance of join operations on integer keys (#682)
+ - Can pass multiple columns to GroupBy object, e.g. grouped[[col1, col2]] to
+ only aggregate a subset of the value columns (#383)
+ - Add histogram / kde plot options for scatter_matrix diagonals (#1237)
+ - Add inplace option to DataFrame.drop_duplicates (#805)
**API Changes**
- Raise ValueError in DataFrame.__nonzero__, so "if df" no longer works
(#1073)
- Change BDay (business day) to not normalize dates by default
- Remove deprecated DataMatrix name
+ - Default merge suffixes for overlap now have underscores instead of periods
+ to facilitate tab completion, etc. (#1239)
**Bug fixes**
@@ -76,6 +100,10 @@ pandas 0.8.0
cases. Fix pivot table bug (#1181)
- Fix formatting of MultiIndex on Series/DataFrame when index name coincides
with label (#1217)
+ - Handle Excel 2003 #N/A as NaN from xlrd (#1213, #1225)
+ - Fix timestamp locale-related deserialization issues with HDFStore by moving
+ to datetime64 representation (#1081, #809)
+ - Fix DataFrame.duplicated/drop_duplicates NA value handling (#557)
pandas 0.7.3
============
View
@@ -25,35 +25,29 @@
SPHINX_BUILD = 'sphinxbuild'
-def sf():
- 'push a copy to the sf'
- os.system('cd build/html; rsync -avz . wesmckinn,pandas@web.sf.net'
- ':/home/groups/p/pa/pandas/htdocs/ -essh --cvs-exclude')
-
def upload_dev():
'push a copy to the pydata dev directory'
- os.system('cd build/html; rsync -avz . pandas@pandas.pydata.org'
- ':/usr/share/nginx/pandas/pandas-docs/dev/ -essh')
+ if os.system('cd build/html; rsync -avz . pandas@pandas.pydata.org'
+ ':/usr/share/nginx/pandas/pandas-docs/dev/ -essh'):
+ raise SystemExit('Upload to Pydata Dev failed')
def upload_dev_pdf():
'push a copy to the pydata dev directory'
- os.system('cd build/latex; scp pandas.pdf pandas@pandas.pydata.org'
- ':/usr/share/nginx/pandas/pandas-docs/dev/')
+ if os.system('cd build/latex; scp pandas.pdf pandas@pandas.pydata.org'
+ ':/usr/share/nginx/pandas/pandas-docs/dev/'):
+ raise SystemExit('PDF upload to Pydata Dev failed')
def upload_stable():
- 'push a copy to the pydata dev directory'
- os.system('cd build/html; rsync -avz . pandas@pandas.pydata.org'
- ':/usr/share/nginx/pandas/pandas-docs/stable/ -essh')
+ 'push a copy to the pydata stable directory'
+ if os.system('cd build/html; rsync -avz . pandas@pandas.pydata.org'
+ ':/usr/share/nginx/pandas/pandas-docs/stable/ -essh'):
+ raise SystemExit('Upload to stable failed')
def upload_stable_pdf():
'push a copy to the pydata dev directory'
- os.system('cd build/latex; scp pandas.pdf pandas@pandas.pydata.org'
- ':/usr/share/nginx/pandas/pandas-docs/stable/')
-
-def sfpdf():
- 'push a copy to the sf site'
- os.system('cd build/latex; scp pandas.pdf wesmckinn,pandas@web.sf.net'
- ':/home/groups/p/pa/pandas/htdocs/')
+ if os.system('cd build/latex; scp pandas.pdf pandas@pandas.pydata.org'
+ ':/usr/share/nginx/pandas/pandas-docs/stable/'):
+ raise SystemExit('PDF upload to stable failed')
def clean():
if os.path.exists('build'):
@@ -102,6 +96,80 @@ def all():
# clean()
html()
+def auto_dev_build(debug=False):
+ msg = ''
+ try:
+ clean()
+ html()
+ latex()
+ upload_dev()
+ upload_dev_pdf()
+ if not debug:
+ sendmail()
+ except (Exception, SystemExit), inst:
+ msg += str(inst) + '\n'
+ sendmail(msg)
+
+def sendmail(err_msg=None):
+ from_name, to_name = _get_config()
+
+ if err_msg is None:
+ msgstr = 'Daily docs build completed successfully'
+ subject = "DOC: daily build successful"
+ else:
+ msgstr = err_msg
+ subject = "DOC: daily build failed"
+
+ import smtplib
+ from email.MIMEText import MIMEText
+ msg = MIMEText(msgstr)
+ msg['Subject'] = subject
+ msg['From'] = from_name
+ msg['To'] = to_name
+
+ server_str, port, login, pwd = _get_credentials()
+ server = smtplib.SMTP(server_str, port)
+ server.ehlo()
+ server.starttls()
+ server.ehlo()
+
+ server.login(login, pwd)
+ try:
+ server.sendmail(from_name, to_name, msg.as_string())
+ finally:
+ server.close()
+
+def _get_dir():
+ import getpass
+ USERNAME = getpass.getuser()
+ if sys.platform == 'darwin':
+ HOME = '/Users/%s' % USERNAME
+ else:
+ HOME = '/home/%s' % USERNAME
+
+ tmp_dir = '%s/tmp' % HOME
+ return tmp_dir
+
+def _get_credentials():
+ tmp_dir = _get_dir()
+ cred = '%s/credentials' % tmp_dir
+ with open(cred, 'r') as fh:
+ server, port, un, domain = fh.read().split(',')
+ port = int(port)
+ login = un + '@' + domain + '.com'
+
+ import base64
+ with open('%s/cron_email_pwd' % tmp_dir, 'r') as fh:
+ pwd = base64.b64decode(fh.read())
+
+ return server, port, login, pwd
+
+def _get_config():
+ tmp_dir = _get_dir()
+ with open('%s/config' % tmp_dir, 'r') as fh:
+ from_name, to_name = fh.read().split(',')
+ return from_name, to_name
+
funcd = {
'html' : html,
'upload_dev' : upload_dev,
@@ -110,8 +178,8 @@ def all():
'upload_stable_pdf' : upload_stable_pdf,
'latex' : latex,
'clean' : clean,
- 'sf' : sf,
- 'sfpdf' : sfpdf,
+ 'auto_dev' : auto_dev_build,
+ 'auto_debug' : lambda: auto_dev_build(True),
'all' : all,
}
View
@@ -491,7 +491,7 @@ With a DataFrame, you can simultaneously reindex the index and columns:
df.reindex(index=['c', 'f', 'b'], columns=['three', 'two', 'one'])
For convenience, you may utilize the ``reindex_axis`` method, which takes the
-labels and a keyword ``axis`` paramater.
+labels and a keyword ``axis`` parameter.
Note that the ``Index`` objects containing the actual axis labels can be
**shared** between objects. So if we have a Series and a DataFrame, the
@@ -657,7 +657,7 @@ set of labels from an axis:
df.drop(['a', 'd'], axis=0)
df.drop(['one'], axis=1)
-Note that the following also works, but a bit less obvious / clean:
+Note that the following also works, but is a bit less obvious / clean:
.. ipython:: python
@@ -685,24 +685,25 @@ Series, it need only contain a subset of the labels as keys:
df.rename(columns={'one' : 'foo', 'two' : 'bar'},
index={'a' : 'apple', 'b' : 'banana', 'd' : 'durian'})
-The ``rename`` method also provides a ``copy`` named parameter that is by
-default ``True`` and copies the underlying data. Pass ``copy=False`` to rename
-the data in place.
+The ``rename`` method also provides an ``inplace`` named parameter that is by
+default ``False`` and copies the underlying data. Pass ``inplace=True`` to
+rename the data in place.
.. _basics.rename_axis:
-The Panel class has an a related ``rename_axis`` class which can rename any of
+The Panel class has a related ``rename_axis`` class which can rename any of
its three axes.
Iteration
---------
-Considering the pandas as somewhat dict-like structure, basic iteration
-produces the "keys" of the objects, namely:
+Because Series is array-like, basic iteration produces the values. Other data
+structures follow the dict-like convention of iterating over the "keys" of the
+objects. In short:
- * **Series**: the index label
- * **DataFrame**: the column labels
- * **Panel**: the item labels
+ * **Series**: values
+ * **DataFrame**: column labels
+ * **Panel**: item labels
Thus, for example:
@@ -171,10 +171,10 @@ accept the following arguments:
- ``window``: size of moving window
- ``min_periods``: threshold of non-null data points to require (otherwise
result is NA)
- - ``freq``: optionally specify a :ref: `frequency string <timeseries.freq>` or :ref:`DateOffset <timeseries.offsets>`
- to pre-conform the data to. Note that prior to pandas v0.8.0, a keyword
- argument ``time_rule`` was used instead of ``freq`` that referred to
- the legacy time rule constants
+ - ``freq``: optionally specify a :ref: `frequency string <timeseries.alias>`
+ or :ref:`DateOffset <timeseries.offsets>` to pre-conform the data to.
+ Note that prior to pandas v0.8.0, a keyword argument ``time_rule`` was used
+ instead of ``freq`` that referred to the legacy time rule constants
These functions can be applied to ndarrays or Series objects:
View
@@ -209,7 +209,7 @@
latex_documents = [
('index', 'pandas.tex',
u'pandas: powerful Python data analysis toolkit',
- u'Wes McKinney', 'manual'),
+ u'Wes McKinney\n& PyData Development Team', 'manual'),
]
# The name of an image file (relative to this directory) to place at the top of
View
@@ -200,7 +200,7 @@ of the DataFrame):
Consider the ``isin`` method of Series, which returns a boolean vector that is
true wherever the Series elements exist in the passed list. This allows you to
-select out rows where one or more columns have values you want:
+select rows where one or more columns have values you want:
.. ipython:: python
@@ -215,7 +215,7 @@ more complex criteria:
.. ipython:: python
# only want 'two' or 'three'
- criterion = df2['a'].map(lambda x: x.startswith('t')
+ criterion = df2['a'].map(lambda x: x.startswith('t'))
df2[criterion]
@@ -319,7 +319,7 @@ Duplicate Data
.. _indexing.duplicate:
-If you want to indentify and remove duplicate rows in a DataFrame, there are
+If you want to identify and remove duplicate rows in a DataFrame, there are
two methods that will help: ``duplicated`` and ``drop_duplicates``. Each
takes as an argument the columns to use to identify duplicated rows.
@@ -567,9 +567,9 @@ Hierarchical indexing (MultiIndex)
Hierarchical indexing (also referred to as "multi-level" indexing) is brand new
in the pandas 0.4 release. It is very exciting as it opens the door to some
quite sophisticated data analysis and manipulation, especially for working with
-higher dimensional data. In essence, it enables you to effectively store and
-manipulate arbitrarily high dimension data in a 2-dimensional tabular structure
-(DataFrame), for example. It is not limited to DataFrame
+higher dimensional data. In essence, it enables you to store and manipulate
+data with an arbitrary number of dimensions in lower dimensional data
+structures like Series (1d) and DataFrame (2d).
In this section, we will show what exactly we mean by "hierarchical" indexing
and how it integrates with the all of the pandas indexing functionality
@@ -611,6 +611,7 @@ As a convenience, you can pass a list of arrays directly into Series or
DataFrame to construct a MultiIndex automatically:
.. ipython:: python
+
arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
s = Series(randn(8), index=arrays)
View
@@ -59,7 +59,7 @@ The two workhorse functions for reading text files (a.k.a. flat files) are
They both use the same parsing code to intelligently convert tabular
data into a DataFrame object. They can take a number of arguments:
- - ``path_or_buffer``: Either a string path to a file, or any object with a
+ - ``filepath_or_buffer``: Either a string path to a file, or any object with a
``read`` method (such as an open file or ``StringIO``).
- ``sep`` or ``delimiter``: A delimiter / separator to split fields
on. `read_csv` is capable of inferring the delimiter automatically in some
@@ -204,8 +204,7 @@ for interpolation methods outside of the filling methods described above.
:suppress:
np.random.seed(123456)
- ts = Series(randn(100), index=date_range('1/1/2000', periods=100,
- timeRule='EOM'))
+ ts = Series(randn(100), index=date_range('1/1/2000', periods=100, freq='BM'))
ts[20:40] = np.nan
ts[60:80] = np.nan
ts = ts.cumsum()
Oops, something went wrong.