# **3: DuckDB's Best Practices**

---

By Jean-Yves Tran | jy.tran@[datascience-jy.com](https://datascience-jy.com) | [LinkedIn](https://www.linkedin.com/in/jytran-datascience/)  
IBM Certified Data Analyst 

---

Source: 
- [Getting Started with DuckDB](https://www.packtpub.com/en-ar/product/getting-started-with-duckdb-9781803232539) by Simon Aubury & Ned Letcher
- [DuckDB documentation](https://duckdb.org/docs/)
---

The interactive links in this notebook are not working due to GitHub limitations. View this notebook with the interactive links working [here](https://nbviewer.org/github/jendives2000/Data_ML_Practice_2025/blob/main/1-3-SQL/practice/DuckDB/notebooks/2_duckdb_python_API.ipynb).

---

This is part 2 of the series of notebooks on DuckDB.  
This time I will show how to **use DuckDB with Python**, which will make us **understand why it is useful, and performs better, for data analysis**. When I say use DuckDB I mostly refer to the **2 main APIs** that are available for this purpose.

For an introduction to DuckDB, check [my first notebook](https://github.com/jendives2000/Data_ML_Practice_2025/blob/82571ad44176666f9cf0735c5141c6a96d5eace9/1-3-SQL/practice/DuckDB/notebooks/1_duckdb_intro.ipynb). I also say in there when you should not use DuckDB. 

<strong><u>DATABASE:</u></strong>


For demonstration purpose I use the Seattle Pets Licenses [Database](#the-seattle-pets-licenses-dataset). It has over: 
- 43,000 rows, 
- 7 columns
- and is light

<u>**MAIN OUTLINE**:</u> 

This notebook is made of two main chapters, each one dives into one of the main API available for Querying: 
- the [Relational API](#using-the-relational-api)
- The [Python DB-API](#using-the-python-db-api)

I also cover briefly how [both can be used](#using-both-relational--python-apis), on the same Database. 

<u> **HIGHLIGHTS**:</u>   

- In the Relational API, here are the highlights: 
  - [loading](#loading-dataset-with-relational) a dataset, [querying](#querying-relation-objects) with Relation Objects (RO)
  - [Expression API](#expression-api), [chaining methods](#chaining-methods)
  - [writing to disk](#writing-to-disk-with-the-relational-api), [inserting](#inserting-new-records) new records

- Highlights for the Python DB-API:
  -  [fetch](#querying-with-fetch) for querying, [prepared statements](#querying-with-prepared-statements)
  -  [writing to disk](#writing-to-disk), [parallel querying](#cursor-for-parallel-querying) with cursor()
  -  [registering objects](#registering-objects-as-views) as views, converting to [pandas df](#converting-to-a-pandas-df), [polars df](#converting-to-a-polars-df), [pyarrow table](#converting-to-pyarrow-tables)
  -  [user-defined function](#user-defined-functions)

The [Expression API](#expression-api) is particularly interesting as it **integrates Python deeper** into DuckDB, leveraging even further the basic **OOP abilities** of Python.  
[Chaining methods](#chaining-methods) is another enhancement offered by DuckDB, adding speed and readability to building queries. Polars users will feel at home. 

**The two main takeaways are**:
- to understand the **flexibility** offered by the use of both the Relational and Python APIs:
  - Relational API experience being **[similar](#similarities-with-pandas) to what Pandas offers** (data analytics)
  - Python DB-API **interacts** with the main **analytical libraries**: Pandas, Polars, PyArrow, Numpy, etc 
  - Both can:
    - leverage **standard SQL for non-devs** 
    - and also **programmatic approach** based on Python objects  
- and so to understand how useful and better DuckDB is **for data analytics**.

---
