Skip to content

huyng/datapad

master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Datapad: A Fluent API for Exploratory Data Analysis



Datapad is a Python library for processing sequence and stream data using a Fluent style API. Data scientists and researchers use it as a lightweight toolset to efficiently explore datasets and to massage data for modeling tasks.

It can be viewed as a combination of syntatic sugar for the Python itertools module and supercharged tooling for working with Structured Sequence data.

Learn more in Documentation


Install

pip install datapad

Exploratory data analysis with Datapad

See what you can do with datapad in the examples below.

Count all unique items in a sequence:

>>> import datapad as dp
>>> data = ['a', 'b', 'b', 'c', 'c', 'c']
>>> seq = dp.Sequence(data)
>>> seq.count(distinct=True) \
...    .collect()
[('a', 1),
 ('b', 2),
 ('c', 3)]

Transform individual fields in a sequence:

>>> import datapad as dp
>>> import datapad.fields as F
>>> data = [
...     {'a': 1, 'b': 2},
...     {'a': 4, 'b': 4},
...     {'a': 5, 'b': 7}
... ]
>>> seq = dp.Sequence(data)
>>> seq.map(F.apply('a', lambda x: x*2)) \
...    .map(F.apply('b', lambda x: x*3)) \
...    .collect()
[{'a': 2, 'b': 6},
 {'a': 8, 'b': 12},
 {'a': 10, 'b': 21}]

Chain together multiple transforms for the elements of a sequence:

>>> import datapad as dp
>>> data = ['a', 'b', 'b', 'c', 'c', 'c']
>>> seq = dp.Sequence(data)
>>> seq.distinct() \
...    .map(lambda x: x+'z') \
...    .map(lambda x: (x, len(x))) \
...    .collect()
[('az', 2),
 ('bz', 2),
 ('cz', 2)]

Check out our documentation below to see what else is possible with Datapad:

Documentation


This project incorporates ideas from:

About

Datapad: A Fluent API for Exploratory Data Analysis in Python

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages