# Import data from CSV files

Load data from CSV and Excel files into Pixeltable tables for processing and analysis.

## Problem

You have data in CSV or Excel files that you want to process with AI models, add computed columns to, or combine with other data sources.

| Source | Records | Use case |
|--------|---------|----------|
| customers.csv | 10,000 | Add AI-generated summaries |
| products.xlsx | 500 | Generate embeddings for search |
| logs.csv | 1M | Filter and aggregate |

## Solution

**What's in this recipe:**
- Import CSV files directly into tables
- Import from Pandas DataFrames
- Handle different data types

You use `pxt.create_table()` with a `source` parameter to create a table from a CSV file, or insert DataFrame rows into an existing table.

### Setup

In [None]:
%pip install -qU pixeltable pandas

In [None]:
import pixeltable as pxt
import pandas as pd

In [None]:
# Create a fresh directory
pxt.drop_dir('import_demo', force=True)
pxt.create_dir('import_demo')

### Import CSV directly

Use `create_table` with `source` to create a table from a CSV file:

In [None]:
# Import CSV from URL
csv_url = 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/world-population-data.csv'

population = pxt.create_table(
    'import_demo.population',
    source=csv_url
)

In [None]:
# View the imported data
population.head(5)

### Import from Pandas DataFrame

You can also create a DataFrame first and insert it:

In [None]:
# Create a DataFrame
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'city': ['NYC', 'LA', 'Chicago']
})

# Create table and insert DataFrame
users = pxt.create_table('import_demo.users', {
    'name': pxt.String,
    'age': pxt.Int,
    'city': pxt.String
})
users.insert(df)

In [None]:
# View the data
users.collect()

## Explanation

**Source types supported:**

| Source | Example |
|--------|---------|
| CSV file path | `source='/path/to/data.csv'` |
| CSV URL | `source='https://example.com/data.csv'` |
| Excel file | `source='/path/to/data.xlsx'` |
| Pandas DataFrame | `source=df` |

**Type inference:**

Pixeltable automatically infers column types from CSV data. You can override types using `schema_overrides`.

**Large files:**

For very large CSV files, consider:
- Using `create_table(source=...)` which streams data
- Importing in batches if memory is limited

## See also

- [Tables documentation](https://docs.pixeltable.com/tutorials/tables-and-data-operations)
- [Bringing data guide](https://docs.pixeltable.com/howto/cookbooks/data/data-import-csv)