pandaSQL is a data-analysis library inspired by pandas, but designed to use existing database optimization techniques. While pandaSQL provides the familiar pandas-like API, internally, it uses SQLite to get you results faster.
pandaSQL can be installed via
pip as follows:
git clone https://github.com/rohankumar42/pandaSQL.git cd pandaSQL python3 -m pip install .
How to Use
pandaSQL uses the same syntax that pandas does.
> import pandasql as ps > df = ps.read_csv('my_data.csv') # or ps.DataFrame(pandas_df)
A crucial difference between pandaSQL and pandas is that pandaSQL is lazy. This means that when you say:
> filtered = df[df['speed'] == 'fast']
filtered does not actually have any filtered results yet. Results are computed automatically when they are needed. For example, if you try to print
> print(filtered) name speed 0 pandaSQL fast 1 SQLite3 fast
The results are automatically computed for you.
pandaSQL is a fun project that I have been working on in my spare time. If you run into any issues, let me know!