duffel

Lightweight Python data frames without bloat or typecasting, using only the standard library.

Take your data with you to the cloud without the bloat of a 200kg full-grown bear that refuses to mate.

git clone https://github.com/russellromney/duffel
cd duffel

import duffel as pd

df = pd.read_csv('duffel/data/MOCK_DATA.csv',index_col=0)

df.shape
>>> (1000, 6)

df.head(2)
>>>
index  first_name  last_name  email                     gender  ip_address      
 --    ----        ----       ----                      ----    ----            
 1     Brinn       Herity     bherity0@hugedomains.com  Female  53.183.199.223  
 2     Wylma       Lavell     wlavell1@stumbleupon.com  Female  206.172.62.206  
duffel.DataFrame (1000, 5)

df.loc[576]
>>>
      first_name  last_name  email                      gender  ip_address      
 --   ----        ----       ----                       ----    ----            
 576  Cesaro      Ohrtmann   cohrtmanng0@tuttocitta.it  Male    252.141.154.52  
duffel.Row (1, 5)

df.loc[5:7, ['first_name','gender']]
>>>
index  first_name  gender  
 --    ----        ----    
 6     Jolynn      Female  
 7     Moina       Female  
duffel.DataFrame (2, 2)

Project inspiration

Pandas is great for hardcore analytical workloads. However, If you are using Pandas for convenient-but-basic dataframe operations in a non-analytical use case, you might encounter the following limitations:

Pandas file size is large - hard to use in size-constrained places e.g. Lambda functions
NumPy file size is large - ditto above
Pandas transforms numbers to numpy types and dates to pandas.Timestamp - this leads to unpredictable results
Pandas has a bloated API with several ways to accomplish a goal
Pandas sometimes returns some subset of a dataframe with a link to the original, instead of making a new dataframe
Pandas throws strange errors while allowing operations to work - instead of throwing clear errors that are real exceptions

It is to solve these problems that I'm building ~~red-pandas~~ duffel: a smaller, simpler dataframe tool that relies only on the standard library and is generally a drop-in replacement for the Pandas API.

Notes

Some inspiration on organization, structure, and some copypasta from https://github.com/paleolimbot/dflite. duffel borrows much from @paleolimbot implementation of loc, iloc, and __repr__

Uses the black code style. https://black.readthedocs.io/en/stable/the_black_code_style.html

Project goals

Build a dataframe solution that can be easily used in AWS Lambda functions for most non-massive-scale-analytical dataframe operations.

Implement a significant subset of the "minimally sufficient" Pandas API as laid out in https://medium.com/dunder-data/minimally-sufficient-pandas-a8e67f2a2428:

Project Progress

Implemented functionality names are strikethrough -ed .

Attributes

~~columns~~
dtypes
~~index~~
~~shape~~
T
~~values~~

Subset Selection

~~head~~
~~iloc~~
~~loc~~
~~tail~~
~~scalar comparison~~
~~vector comparison~~
~~getitem selection~~

Missing Value Handling

dropna
fillna
interpolate
isna
notna

Grouping

expanding
groupby
pivot_table
resample
rolling

Joining Data

~~append~~
merge

Other

asfreq
astype
copy
~~drop~~
drop_duplicates
equals
isin
melt
plot
rename
replace
~~reset_index~~
~~sample~~
select_dtypes
shift
~~sort_index~~
~~sort_values~~
~~to_csv~~
~~to_json~~
to_sql
~~to_dict~~

Aggregation Methods

all
any
count
describe
~~idxmax~~
~~idxmin~~
max
mean
median
min
mode
nunique
sum
std
var

Non-Aggretaion Statistical Methods

abs
clip
corr
cov
cummax
cummin
cumprod
cumsum
diff
nlargest
nsmallest
pct_change
prod
quantile
rank
round

Functions

~~pd.concat~~
pd.crosstab
pd.cut
pd.qcut
~~pd.read_csv~~
~~pd.read_json~~
pd.read_sql
pd.to_datetime
pd.to_timedelta

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
duffel		duffel
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
same_data.py		same_data.py
setup.cfg		setup.cfg
setup.py		setup.py
timing.py		timing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

duffel

duffel

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

same_data.py

same_data.py

setup.cfg

setup.cfg

setup.py

setup.py

timing.py

timing.py

Repository files navigation

duffel

Project inspiration

Notes

Project goals

Project Progress

About

Releases 3

Packages

Languages

License

russellromney/duffel

Folders and files

Latest commit

History

Repository files navigation

duffel

Project inspiration

Notes

Project goals

Project Progress

About

Resources

License

Stars

Watchers

Forks

Languages