# Import data from JSON files

Load structured data from JSON files into Pixeltable tables for processing and analysis.

## Problem

You have data in JSON format—from APIs, exports, or application logs. You need to load this data for processing with AI models or combining with other data sources.

| Source | Records | Use case |
|--------|---------|----------|
| api_response.json | 1,000 | Analyze API data |
| user_events.json | 50,000 | Process event logs |
| products.json | 500 | Enrich with AI descriptions |

## Solution

**What's in this recipe:**

- Import JSON files directly into tables
- Import from URLs (APIs, remote files)
- Handle nested JSON structures

You use `pxt.create_table()` with a `source` parameter to create a table from a JSON file or URL. The JSON must be an array of objects, where each object becomes a row.

### Setup

In [1]:
%pip install -qU pixeltable

In [2]:
import json
import tempfile
from pathlib import Path

import pixeltable as pxt

### Create sample JSON file

First, create a sample JSON file to demonstrate the import process:

In [3]:
# Create sample JSON data (array of objects)
sample_data = [
    {
        'id': 1,
        'title': 'Introduction to ML',
        'author': 'Alice',
        'tags': ['ml', 'intro'],
        'rating': 4.5,
    },
    {
        'id': 2,
        'title': 'Deep Learning Basics',
        'author': 'Bob',
        'tags': ['dl', 'neural'],
        'rating': 4.8,
    },
    {
        'id': 3,
        'title': 'NLP Fundamentals',
        'author': 'Carol',
        'tags': ['nlp', 'text'],
        'rating': 4.2,
    },
    {
        'id': 4,
        'title': 'Computer Vision',
        'author': 'Dave',
        'tags': ['cv', 'images'],
        'rating': 4.6,
    },
    {
        'id': 5,
        'title': 'Reinforcement Learning',
        'author': 'Eve',
        'tags': ['rl', 'agents'],
        'rating': 4.3,
    },
]

# Save to temporary JSON file
temp_dir = tempfile.mkdtemp()
json_path = Path(temp_dir) / 'articles.json'

with open(json_path, 'w') as f:
    json.dump(sample_data, f, indent=2)

### Import JSON file

Use `create_table` with `source` to create a table directly from a JSON file:

In [4]:
# Create a fresh directory
pxt.drop_dir('json_demo', force=True)
pxt.create_dir('json_demo')

Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata


Created directory 'json_demo'.


<pixeltable.catalog.dir.Dir at 0x1556b2800>

In [5]:
# Import JSON file into a new table
articles = pxt.create_table(
    'json_demo/articles',
    source=str(json_path),
    source_format='json',  # Explicitly specify format when using local file paths
)

Created table 'articles'.



Inserting rows into `articles`: 0 rows [00:00, ? rows/s]


Inserting rows into `articles`: 5 rows [00:00, 538.52 rows/s]


Inserted 5 rows with 0 errors.


In [6]:
# View imported data
articles.collect()

id,title,author,tags,rating
1,Introduction to ML,Alice,"[""ml"", ""intro""]",4.5
2,Deep Learning Basics,Bob,"[""dl"", ""neural""]",4.8
3,NLP Fundamentals,Carol,"[""nlp"", ""text""]",4.2
4,Computer Vision,Dave,"[""cv"", ""images""]",4.6
5,Reinforcement Learning,Eve,"[""rl"", ""agents""]",4.3


### Import from URL

You can import JSON directly from a URL—useful for APIs and remote data:

In [7]:
# Import from a public JSON URL
# Using JSONPlaceholder API as an example
posts = pxt.create_table(
    'json_demo/posts',
    source='https://jsonplaceholder.typicode.com/posts',
    source_format='json',  # Required for URL sources
)

Created table 'posts'.



Inserting rows into `posts`: 0 rows [00:00, ? rows/s]


Inserting rows into `posts`: 100 rows [00:00, 15623.57 rows/s]




Inserted 100 rows with 0 errors.


In [8]:
# View first few rows
posts.head(5)

userId,id,title,body
1,1,sunt aut facere repellat provident occaecati excepturi optio reprehenderit,quia et suscipit suscipit recusandae consequuntur expedita et cum reprehenderit molestiae ut ut quas totam nostrum rerum est autem sunt rem eveniet architecto
1,2,qui est esse,est rerum tempore vitae sequi sint nihil reprehenderit dolor beatae ea dolores neque fugiat blanditiis voluptate porro vel nihil molestiae ut reiciendis qui aperiam non debitis possimus qui neque nisi nulla
1,3,ea molestias quasi exercitationem repellat qui ipsa sit aut,et iusto sed quo iure voluptatem occaecati omnis eligendi aut ad voluptatem doloribus vel accusantium quis pariatur molestiae porro eius odio et labore et velit aut
1,4,eum et est occaecati,ullam et saepe reiciendis voluptatem adipisci sit amet autem assumenda provident rerum culpa quis hic commodi nesciunt rem tenetur doloremque ipsam iure quis sunt voluptatem rerum illo velit
1,5,nesciunt quas odio,repudiandae veniam quaerat sunt sed alias aut fugiat sit autem sed est voluptatem omnis possimus esse voluptatibus quis est aut tenetur dolor neque


### Import from Python dictionaries

Use `create_table` with a list of dictionaries as `source`—useful when you have data in memory:

In [9]:
# Import from a list of dictionaries
events = [
    {
        'event': 'page_view',
        'user_id': 101,
        'timestamp': '2024-01-15T10:30:00',
    },
    {
        'event': 'click',
        'user_id': 101,
        'timestamp': '2024-01-15T10:31:00',
    },
    {
        'event': 'purchase',
        'user_id': 102,
        'timestamp': '2024-01-15T10:32:00',
    },
]

event_table = pxt.create_table('json_demo/events', source=events)

Created table 'events'.



Inserting rows into `events`: 0 rows [00:00, ? rows/s]


Inserting rows into `events`: 3 rows [00:00, 988.06 rows/s]


Inserted 3 rows with 0 errors.


In [10]:
# View imported events
event_table.collect()

event,user_id,timestamp
page_view,101,2024-01-15T10:30:00
click,101,2024-01-15T10:31:00
purchase,102,2024-01-15T10:32:00


### Add computed columns

Once imported, you can enrich the data with computed columns:

In [11]:
# Add a computed column combining title and author
articles.add_computed_column(
    summary=articles.title + ' by ' + articles.author
)

Added 5 column values with 0 errors.


5 rows updated, 10 values computed.

In [12]:
# View with computed column
articles.select(
    articles.title, articles.author, articles.summary
).collect()

title,author,summary
Introduction to ML,Alice,Introduction to ML by Alice
Deep Learning Basics,Bob,Deep Learning Basics by Bob
NLP Fundamentals,Carol,NLP Fundamentals by Carol
Computer Vision,Dave,Computer Vision by Dave
Reinforcement Learning,Eve,Reinforcement Learning by Eve


## Explanation

**JSON format requirements:**

The JSON file must contain an array of objects at the top level:

```json
[
  {"col1": "value1", "col2": 123},
  {"col1": "value2", "col2": 456}
]
```

**Source types supported:**

| Source | Example |
|--------|---------|
| JSON file path | `source='/path/to/data.json'` |
| JSON URL | `source='https://api.example.com/data'` |
| List of dicts | `source=[{'a': 1}, {'a': 2}]` |

**Nested JSON handling:**

Nested objects and arrays are stored as JSON columns. You can access nested fields using Pixeltable's JSON path syntax in computed columns.

## See also

- [Import CSV files](https://docs.pixeltable.com/howto/cookbooks/data/data-import-csv) - For CSV and Excel imports
- [Import Parquet files](https://docs.pixeltable.com/howto/cookbooks/data/data-import-parquet) - For Parquet data
- [Extract fields from JSON](https://docs.pixeltable.com/howto/cookbooks/core/workflow-json-extraction) - Parse LLM response fields