Skip to content

Commit

Permalink
Initial commit. Bringing over work from depricated repo.
Browse files Browse the repository at this point in the history
  • Loading branch information
robdmc committed Jun 7, 2014
1 parent d5fd570 commit 1ca5fb1
Show file tree
Hide file tree
Showing 40 changed files with 3,666 additions and 4 deletions.
2 changes: 2 additions & 0 deletions .gitignore
@@ -0,0 +1,2 @@
.DS_Store
*.pyc
26 changes: 26 additions & 0 deletions LICENSE
@@ -0,0 +1,26 @@
Copyright (c) 2014, Robert deCarvalho
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

The views and conclusions contained in the software and documentation are those
of the authors and should not be interpreted as representing official policies,
either expressed or implied, of the FreeBSD Project.
110 changes: 106 additions & 4 deletions README.md
@@ -1,7 +1,109 @@
pandashells
===========
PANDASHELLS
===

Bringing the power of python-pandas to the shell prompt
Description
-------------------------------------------------------------------------------
The ptools library was written to bring the power of the python scienctific
stack to the unix command-line. This allows well-known and time-tested tools
like grep, awk, sed, etc. to interact seemlessly with the powerful data
manipulation, visualization, and statistical libraries being developed in the
python data-science community.


Coming soon.
Installation
--------------------------------------------------------------------------------
--- master branch
pip install git+https://github.com/robdmc/ptools.git

--- experimental branch with pandas (very early stage developement
pip install git+https://github.com/robdmc/ptools.git@with_pandas


List of tools (run with -h for help, --example to see example)
--------------------------------------------------------------------------------
p.df Pandas dataframe manipulation of csv files


*********** here are some new tools I want
p.lombscargle
p.mcmc 'patsy model' (see if there's an easy way to do this)
Maybe make distribution,params,prior for each variable
p.mcmc 'y ~ x + z' 'x:Normal(mu, sigma)', y:Normal(mu,sigma)
think about defaults here where partials don't have noise

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
here are some regression and classification ideas.

p.regress - statmodels linear regression with full summary output. maybe use --fit to add fit results to df
p.learn.regress_linear
p.learn.regress_ridge
p.learn.regress_tree
p.learn.regress_forest
p.learn.classify.logistic
p.learn.classify.tree
p.learn.classify.forest
p.learn.classify.svm

Always use patsy language

the model.pkl files (which can be user-def names) hold the model as well
as the string used to do the fit

with --fit model.pkl
saves model in model.pkl and displays rms R^2 and cross_val scores
as well as the original string used to do the fit and the type of model


with --predict model.pkl
loads model, input and shows _fit variable to the dataframe
with --stats, does same thing, but displays rms and R2
with --hist shows hist of residuals
with --plot shows fit vs residual

of course classifiers have their own metrics and maybe have a
--roc that plots the roc curve

with
--info model.pkl, just shows the model

with --desc 'my desc' allows you to store a description that will be
displayed with the --info flag



++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++




********** here is list of tools I want to replicate *********************
p.cov -> covariance between collumns. cols and index have respective names
*p.parallel
*p.plot
*p.geoCode
*p.crypt
p.bar
p.cdf
p.color
p.fft
p.lombscargle
p.hist
p.interp # cat xvals_file | p.interp -r .6 -t <(cat table_file.txt)
p.linspace
p.map
p.mapDots2html
p.mapPoly2html
p.mongoDump
p.normalize
p.pgsql2csv
p.pie
p.rand
p.regress
p.scat
p.server
p.shuffle
p.sigEdit
p.smooth lowess, spline, medianFilter
p.sshKeyPush
p.template
p.utc2local
43 changes: 43 additions & 0 deletions ideas.txt
@@ -0,0 +1,43 @@
p.regress - statmodels linear regression with full summary output
p.learn.regress_linear
p.learn.regress_ridge
p.learn.regress_tree
p.learn.regress_forest
p.learn.classify.logistic
p.learn.classify.tree
p.learn.classify.forest
p.learn.classify.svm

Always use patsy language

the model.pkl files (which can be user-def names) hold the model as well
as the string used to do the fit

with --fit model.pkl
saves model in model.pkl and displays rms R^2 and cross_val scores
as well as the original string used to do the fit and the type of model


with --predict model.pkl
loads model, input and shows _fit variable to the dataframe
with --stats, does same thing, but displays rms and R2
with --hist shows hist of residuals
with --plot shows fit vs residual

of course classifiers have their own metrics and maybe have a
--roc that plots the roc curve

with
--info model.pkl, just shows the model

with --desc 'my desc' allows you to store a description that will be
displayed with the --info flag









Empty file added pandashells/__init__.py
Empty file.
Binary file added pandashells/bin/.p.rand.swp
Binary file not shown.
52 changes: 52 additions & 0 deletions pandashells/bin/p.config
@@ -0,0 +1,52 @@
#! /usr/bin/env python

#--- standard library imports
import os
import sys
import argparse

############# dev only. Comment out for production ######################
sys.path.append('../..')
##########################################################################


from ptools.lib import config_lib


if __name__ == '__main__':

#--- read in the current configuration
default_dict = config_lib.get_config()

msg = "Need to write this. "
msg += "and write more."

#--- populate the arg parser with current configuration
parser = argparse.ArgumentParser(
description=msg)
parser.add_argument('--force_defaults', action='store_true',
dest='force_defaults',
help='Force to default settings')
for tup in config_lib.CONFIG_OPTS:
msg = 'opts: '+str(tup[1])
parser.add_argument('--%s'%tup[0], nargs=1, type=str,
dest=tup[0], metavar='',#default_dict[tup[0]],
default=[default_dict[tup[0]]], choices=tup[1], help=msg)

#--- parse arguments
args = parser.parse_args()

#--- set the arguments to the current value of the arg parser
config_dict = {t[0]:t[1][0] for t in args.__dict__.iteritems()
if not t[0] in ['force_defaults']}

if args.force_defaults:
config_dict = config_lib.DEFAULT_DICT
config_lib.set_config(config_dict)

print '\n Current Config'
print ' ' + '-'*40
for k in sorted(config_dict.keys()):
if not k in ['--force_defaults']:
print ' {: <20} {}'.format(k+':', config_dict[k])

53 changes: 53 additions & 0 deletions pandashells/bin/p.crypt
@@ -0,0 +1,53 @@
#! /usr/bin/env python

#--- standard library imports
import os
import sys
import argparse
import re

############# dev only. Comment out for production ######################
sys.path.append('../..')
##########################################################################

from ptools.lib import arg_lib

#=============================================================================
if __name__ == '__main__':
msg = "Encrypt a file with aes-256-cbc as implemented by openssl. "

#--- read command line arguments
parser = argparse.ArgumentParser(
description=msg)

arg_lib.addArgs(parser, 'example')

parser.add_argument('-i', '--inFile', nargs=1, type=str,
required=True, dest='inFile', metavar='inFileName',
help="The input file name")

parser.add_argument('-o', '--outFile', nargs=1, type=str,
required=True, dest='outFile', metavar='outFileName',
help="The output file name")

parser.add_argument('-d', '--decrypt', action='store_true', default=False,
dest='decrypt', help='Decrypt the input file into the output file')

#--- parse arguments
args = parser.parse_args()

#--- make sure input file exists
if not os.path.isfile(args.inFile[0]):
sys.stderr.write("\n\nCan't find input file\n\n")
sys.exit(1)

#--- create a dycryption command if requested
if args.decrypt:
cmd = "cat %s | openssl enc -d -aes-256-cbc > %s" % (args.inFile[0],
args.outFile[0])
#--- otherwise just encrypt
else:
cmd = "cat %s | openssl enc -aes-256-cbc -salt > %s" % (args.inFile[0],
args.outFile[0])
#--- run the proper openssl command
os.system(cmd)
93 changes: 93 additions & 0 deletions pandashells/bin/p.df
@@ -0,0 +1,93 @@
#! /usr/bin/env python

#--- standard library imports
import os
import sys
import argparse
import re

############# dev only. Comment out for production ######################
sys.path.append('../..')
##########################################################################

from ptools.lib import module_checker_lib, arg_lib, io_lib

#--- import required dependencies
modulesOkay = module_checker_lib.check_for_modules(
[
'pandas',
'numpy',
'scipy',
'dateutil',
'matplotlib',
])
if not modulesOkay:
sys.exit(1)

import pandas as pd
import numpy as np
import scipy as scp
import pylab as pl
from dateutil.parser import parse
import datetime

#=============================================================================
if __name__ == '__main__':
msg = "Bring pandas manipulation to command line. Input from stdin "
msg += "is placed into a dataframe named 'df'. The output of each "
msg += "specified command must evaluate to a dataframe that will "
msg += "overwrite 'df'. The output of the final command will be sent "
msg += "to stdout. The namespace in which the commands are executed "
msg += "includes pandas as pd, numpy as np, scipy as scp, pylab as pl, "
msg += "dateutil.parser.parse as parse, datetime"

#--- read command line arguments
parser = argparse.ArgumentParser(
description=msg)

options = {}
arg_lib.addArgs(parser, 'io_in', 'io_out', 'example')
parser.add_argument("statement", help="Statement to execute", nargs="+")

#--- parse arguments
args = parser.parse_args()

#--- get the input dataframe
df = io_lib.df_from_input(args)

#--- define regex to identify if supplied command is for col assignment
rex_col_cmd = re.compile(r'.*?df\[.+\].*?=')

#--- define regex to identify plot commands
rex_plot_cmd = re.compile(r'.*(plot|hist)\(.*\).*')

#--- execute the statements in sequence
for cmd in args.statement:
#--- if this is a column-assignment command, just execute it
if rex_col_cmd.match(cmd):
exec(cmd)
temp = df
#--- if this is a plot command, execute it and quit
elif rex_plot_cmd.match(cmd):
exec(cmd)
pl.show()
sys.exit(0)

#--- if instead this is a command on the whole frame
else:
#--- put results of command in temp var
cmd = 'temp = {}'.format(cmd)
exec(cmd)

#--- transform results to dataframe if needed
if isinstance(temp, pd.DataFrame):
df = temp
else:
try:
df = pd.DataFrame(temp)
except pd.core.common.PandasError:
print temp
sys.exit(0)

#--- write dataframe to output
io_lib.df_to_output(args, df)

0 comments on commit 1ca5fb1

Please sign in to comment.