GitHub - maxhumber/redframes: General Purpose Data Manipulation Library

About

redframes (rectangular data frames) is a general purpose data manipulation library that prioritizes syntax, simplicity, and speed (to a solution). Importantly, the library is fully interoperable with pandas, compatible with scikit-learn, and works great with matplotlib.

Install & Import

pip install redframes

import redframes as rf

Quickstart

Copy-and-paste this to get started:

import redframes as rf

df = rf.DataFrame({
    'bear': ['Brown bear', 'Polar bear', 'Asian black bear', 'American black bear', 'Sun bear', 'Sloth bear', 'Spectacled bear', 'Giant panda'],
    'genus': ['Ursus', 'Ursus', 'Ursus', 'Ursus', 'Helarctos', 'Melursus', 'Tremarctos', 'Ailuropoda'],
    'weight (male, lbs)': ['300-860', '880-1320', '220-440', '125-500', '60-150', '175-310', '220-340', '190-275'],
    'weight (female, lbs)': ['205-455', '330-550', '110-275', '90-300', '45-90', '120-210', '140-180', '155-220']
})

# | bear                | genus      | weight (male, lbs)   | weight (female, lbs)   |
# |:--------------------|:-----------|:---------------------|:-----------------------|
# | Brown bear          | Ursus      | 300-860              | 205-455                |
# | Polar bear          | Ursus      | 880-1320             | 330-550                |
# | Asian black bear    | Ursus      | 220-440              | 110-275                |
# | American black bear | Ursus      | 125-500              | 90-300                 |
# | Sun bear            | Helarctos  | 60-150               | 45-90                  |
# | Sloth bear          | Melursus   | 175-310              | 120-210                |
# | Spectacled bear     | Tremarctos | 220-340              | 140-180                |
# | Giant panda         | Ailuropoda | 190-275              | 155-220                |

(
    df
        .rename({"weight (male, lbs)": "male", "weight (female, lbs)": "female"})
        .gather(["male", "female"], into=("sex", "weight"))
        .split("weight", into=["min", "max"], sep="-")
        .gather(["min", "max"], into=("stat", "weight"))
        .mutate({"weight": lambda row: float(row["weight"])})
        .group(["genus", "sex"])
        .rollup({"weight": ("weight", rf.stat.mean)})
        .spread("sex", using="weight")
        .mutate({"dimorphism": lambda row: round(row["male"] / row["female"], 2)})
        .drop(["male", "female"])
        .sort("dimorphism", descending=True)
)

# | genus      |   dimorphism |
# |:-----------|-------------:|
# | Ursus      |         2.01 |
# | Tremarctos |         1.75 |
# | Helarctos  |         1.56 |
# | Melursus   |         1.47 |
# | Ailuropoda |         1.24 |

For comparison, here's the equivalent pandas:

import pandas as pd

# df = pd.DataFrame({...})

df = df.rename(columns={"weight (male, lbs)": "male", "weight (female, lbs)": "female"})
df = pd.melt(df, id_vars=['bear', 'genus'], value_vars=['male', 'female'], var_name='sex', value_name='weight')
df[["min", "max"]] = df["weight"].str.split("-", expand=True)
df = df.drop("weight", axis=1)
df = pd.melt(df, id_vars=['bear', 'genus', 'sex'], value_vars=['min', 'max'], var_name='stat', value_name='weight')
df['weight'] = df["weight"].astype('float')
df = df.groupby(["genus", "sex"])["weight"].mean()
df = df.reset_index()
df = pd.pivot_table(df, index=['genus'], columns=['sex'], values='weight')
df = df.reset_index()
df = df.rename_axis(None, axis=1)
df["dimorphism"] = round(df["male"] / df["female"], 2)
df = df.drop(["female", "male"], axis=1)
df = df.sort_values("dimorphism", ascending=False)
df = df.reset_index(drop=True)

# 🤮

IO

Save, load, and convert rf.DataFrame objects:

# save .csv
rf.save(df, "bears.csv")

# load .csv
df = rf.load("bears.csv")

# convert redframes → pandas
pandas_df = rf.unwrap(df)

# convert pandas → redframes
df = rf.wrap(pandas_df)

Verbs

Verbs are pure and "chain-able" methods that manipulate rf.DataFrame objects. Here is the complete list (see docstrings for examples and more details):

Verb	Description
`accumulate`^‡	Run a cumulative sum over a column
`append`	Append rows from another DataFrame
`combine`	Combine multiple columns into a single column (opposite of `split`)
`cross`	Cross join columns from another DataFrame
`dedupe`	Remove duplicate rows
`denix`	Remove rows with missing values
`drop`	Drop entire columns (opposite of `select`)
`fill`	Fill missing values "down", "up", or with a constant
`filter`	Keep rows matching specific conditions
`gather`^‡	Gather columns into rows (opposite of `spread`)
`group`	Prepare groups for compatible verbs^‡
`join`	Join columns from another DataFrame
`mutate`	Create a new, or overwrite an existing column
`pack`^‡	Collate and concatenate row values for a target column (opposite of `unpack`)
`rank`^‡	Rank order values in a column
`rename`	Rename column keys
`replace`	Replace matching values within columns
`rollup`^‡	Apply summary functions and/or statistics to target columns
`sample`	Randomly sample any number of rows
`select`	Select specific columns (opposite of `drop`)
`shuffle`	Shuffle the order of all rows
`sort`	Sort rows by specific columns
`split`	Split a single column into multiple columns (opposite of `combine`)
`spread`	Spread rows into columns (opposite of `gather`)
`take`^‡	Take any number of rows (from the top/bottom)
`unpack`	"Explode" concatenated row values into multiple rows (opposite of `pack`)

Properties

In addition to all of the verbs there are several properties attached to each DataFrame object:

df["genus"] 
# ['Ursus', 'Ursus', 'Ursus', 'Ursus', 'Helarctos', 'Melursus', 'Tremarctos', 'Ailuropoda']

df.columns 
# ['bear', 'genus', 'weight (male, lbs)', 'weight (female, lbs)']

df.dimensions
# {'rows': 8, 'columns': 4}

df.empty
# False

df.memory
# '2 KB'

df.types
# {'bear': object, 'genus': object, 'weight (male, lbs)': object, 'weight (female, lbs)': object}

matplotlib

rf.DataFrame objects integrate seamlessly with matplotlib:

import redframes as rf
import matplotlib.pyplot as plt

football = rf.DataFrame({
    'position': ['TE', 'K', 'RB', 'WR', 'QB'],
    'avp': [116.98, 131.15, 180, 222.22, 272.91]
})

df = (
    football
        .mutate({"color": lambda row: row["position"] in ["WR", "RB"]})
        .replace({"color": {False: "orange", True: "red"}})
)

plt.barh(df["position"], df["avp"], color=df["color"]);

scikit-learn

rf.DataFrame objects are fully compatible with sklearn functions, estimators, and transformers:

import redframes as rf
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

df = rf.DataFrame({
    "touchdowns": [15, 19, 5, 7, 9, 10, 12, 22, 16, 10],
    "age": [21, 22, 21, 24, 26, 28, 30, 35, 28, 21],
    "mvp": [1, 1, 0, 0, 0, 0, 0, 1, 0, 0]
})

target = "touchdowns"
y = df[target]
X = df.drop(target)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

model = LinearRegression()
model.fit(X_train, y_train)
model.score(X_test, y_test)
# 0.5083194901655527

print(X_train.take(1))
# rf.DataFrame({'age': [21], 'mvp': [0]})

X_new = rf.DataFrame({'age': [22], 'mvp': [1]})
model.predict(X_new)
# array([19.])

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
.github		.github
images		images
redframes		redframes
tests		tests
.gitignore		.gitignore
CHANGELOG		CHANGELOG
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
TODO		TODO
mypy.ini		mypy.ini
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Install & Import

Quickstart

IO

Verbs

Properties

matplotlib

scikit-learn

About

Releases 12

Sponsor this project

Languages

License

maxhumber/redframes

Folders and files

Latest commit

History

Repository files navigation

About

Install & Import

Quickstart

IO

Verbs

Properties

matplotlib

scikit-learn

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 12

Sponsor this project

Languages