Skip to content
Branch: master
Go to file
Code

Latest commit

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

pandaSQL

License: GPL v3 Build Status Coverage Status

pandaSQL is a data-analysis library inspired by pandas, but designed to use existing database optimization techniques. While pandaSQL provides the familiar pandas-like API, internally, it uses SQLite to get you results faster.

Install

pandaSQL can be installed via pip as follows:

git clone https://github.com/rohankumar42/pandaSQL.git
cd pandaSQL
python3 -m pip install .

How to Use

pandaSQL uses the same syntax that pandas does.

> import pandasql as ps
> df = ps.read_csv('my_data.csv')    # or ps.DataFrame(pandas_df)

A crucial difference between pandaSQL and pandas is that pandaSQL is lazy. This means that when you say:

> filtered = df[df['speed'] == 'fast']

filtered does not actually have any filtered results yet. Results are computed automatically when they are needed. For example, if you try to print filtered:

> print(filtered)
       name speed
0  pandaSQL  fast
1   SQLite3  fast

The results are automatically computed for you.

Development Note

pandaSQL is a fun project that I have been working on in my spare time. If you run into any issues, let me know!

About

A Pandas-inspired data analysis project with lazy semantics and query-offloading to SQLite

Topics

Resources

License

Releases

No releases published

Languages

You can’t perform that action at this time.