# Import data from CSV files

Load data from CSV and Excel files into Pixeltable tables for processing and analysis.


## Problem

You have data in CSV or Excel files that you want to process with AI models, add computed columns to, or combine with other data sources.

| Source | Records | Use case |
|--------|---------|----------|
| customers.csv | 10,000 | Add AI-generated summaries |
| products.xlsx | 500 | Generate embeddings for search |
| logs.csv | 1M | Filter and aggregate |


## Solution

**What's in this recipe:**
- Import CSV files directly into tables
- Import from Pandas DataFrames
- Handle different data types

You use `pxt.io.import_csv()` to create a table from a CSV file, or insert DataFrame rows into an existing table.


### Setup


In [1]:
%pip install -qU pixeltable pandas


Note: you may need to restart the kernel to use updated packages.


In [2]:
import pixeltable as pxt
import pandas as pd


In [3]:
# Create a fresh directory
pxt.drop_dir('import_demo', force=True)
pxt.create_dir('import_demo')


Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata
Created directory 'import_demo'.


<pixeltable.catalog.dir.Dir at 0x32a14f050>

### Import CSV directly

Use `import_csv` to create a table from a CSV file:


In [4]:
# Import CSV from URL
csv_url = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/world-population-data.csv'

population = pxt.io.import_csv(
    'import_demo.population',
    csv_url
)


Created table 'population'.
Inserting rows into `population`: 234 rows [00:00, 7653.30 rows/s]
Inserted 234 rows with 0 errors.


In [5]:
# View the imported data
population.head(5)


cca3,country,continent,pop_2023,pop_2022,pop_2000,area__km__
IND,India,Asia,1428627663,1417173173,1059633675,3287590.0
CHN,China,Asia,1425671352,1425887337,1264099069,9706961.0
USA,United States,North America,339996563,338289857,282398554,9372610.0
IDN,Indonesia,Asia,277534122,275501339,214072421,1904569.0
PAK,Pakistan,Asia,240485658,235824862,154369924,881912.0


### Import from Pandas DataFrame

You can also create a DataFrame first and insert it:


In [6]:
# Create a DataFrame
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'city': ['NYC', 'LA', 'Chicago']
})

# Create table and insert DataFrame
users = pxt.create_table('import_demo.users', {
    'name': pxt.String,
    'age': pxt.Int,
    'city': pxt.String
})
users.insert(df)


Created table 'users'.
Inserting rows into `users`: 3 rows [00:00, 792.18 rows/s]
Inserted 3 rows with 0 errors.


3 rows inserted, 6 values computed.

In [7]:
# View the data
users.collect()


name,age,city
Alice,25,NYC
Bob,30,LA
Charlie,35,Chicago


## Explanation

**Import methods:**

| Method | Use case |
|--------|----------|
| `pxt.io.import_csv()` | Create table from CSV file |
| `pxt.io.import_excel()` | Create table from Excel file |
| `table.insert(df)` | Insert DataFrame into existing table |

**Type inference:**

Pixeltable automatically infers column types from CSV data. You can override types by creating the table first with explicit schema.

**Large files:**

For very large CSV files, consider:
- Using `import_csv` which streams data
- Importing in batches if memory is limited


## See also

- [Tables documentation](https://docs.pixeltable.com/datastore/tables-and-operations)
- [Bringing data guide](https://docs.pixeltable.com/datastore/bringing-data)
