Skip to content

rohankumar42/pandaSQL

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

pandaSQL

License: GPL v3 Build Status Coverage Status

pandaSQL is a data-analysis library inspired by pandas, but designed to use existing database optimization techniques. While pandaSQL provides the familiar pandas-like API, internally, it uses SQLite to get you results faster.

Install

pandaSQL can be installed via pip as follows:

git clone https://github.com/rohankumar42/pandaSQL.git
cd pandaSQL
python3 -m pip install .

How to Use

pandaSQL uses the same syntax that pandas does.

> import pandasql as ps
> df = ps.read_csv('my_data.csv')    # or ps.DataFrame(pandas_df)

A crucial difference between pandaSQL and pandas is that pandaSQL is lazy. This means that when you say:

> filtered = df[df['speed'] == 'fast']

filtered does not actually have any filtered results yet. Results are computed automatically when they are needed. For example, if you try to print filtered:

> print(filtered)
       name speed
0  pandaSQL  fast
1   SQLite3  fast

The results are automatically computed for you.

Development Note

pandaSQL is a fun project that I have been working on in my spare time. If you run into any issues, let me know!

About

A Pandas-inspired data analysis project with lazy semantics and query-offloading to SQLite

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages