# TikTok Scraper - Testing Notebook

Test the TikTok scraper implementation:
1. **TikTokScraper** - URL-based extraction (profiles, posts, comments, fast APIs)
2. **TikTokSearchScraper** - Parameter-based discovery with `extra_params`

---

## Setup - Use Local Development Version

In [1]:
import os
import sys
from pathlib import Path

# Add local src to path (use development version, not installed)
project_root = Path.cwd().parent
src_path = project_root / "src"
if str(src_path) not in sys.path:
    sys.path.insert(0, str(src_path))

print(f"Using source from: {src_path}")

# Load environment variables
from dotenv import load_dotenv
load_dotenv(project_root / ".env")

# Get API token
API_TOKEN = os.getenv("BRIGHTDATA_API_TOKEN")
if not API_TOKEN:
    raise ValueError("BRIGHTDATA_API_TOKEN not found in environment")

print(f"API Token: {API_TOKEN[:10]}...{API_TOKEN[-4:]}")
print("Setup complete!")

Using source from: /Users/ns/Desktop/projects/sdk-python/src
API Token: 7011787d-2...3336
Setup complete!


## Import TikTok Scrapers

In [2]:
from brightdata import BrightDataClient

# Verify we're using local version
import brightdata
print(f"brightdata module location: {brightdata.__file__}")

# Initialize client
client = BrightDataClient(token=API_TOKEN)

# Verify TikTok scraper is accessible
print(f"\nTikTokScraper: {type(client.scrape.tiktok).__name__}")
print(f"TikTokSearchScraper: {type(client.search.tiktok).__name__}")

# Check for scraper methods
print("\nScraper methods (URL-based):")
print([m for m in dir(client.scrape.tiktok) if not m.startswith('_') and callable(getattr(client.scrape.tiktok, m))])

print("\nSearch scraper methods (Discovery):")
print([m for m in dir(client.search.tiktok) if not m.startswith('_') and callable(getattr(client.search.tiktok, m))])

brightdata module location: /Users/ns/Desktop/projects/sdk-python/src/brightdata/__init__.py

TikTokScraper: TikTokScraper
TikTokSearchScraper: TikTokSearchScraper

Scraper methods (URL-based):
['comments', 'comments_fetch', 'comments_fetch_sync', 'comments_status', 'comments_status_sync', 'comments_sync', 'comments_trigger', 'comments_trigger_sync', 'normalize_result', 'posts', 'posts_by_profile_fast', 'posts_by_profile_fast_sync', 'posts_by_search_url_fast', 'posts_by_search_url_fast_sync', 'posts_by_url_fast', 'posts_by_url_fast_sync', 'posts_fetch', 'posts_fetch_sync', 'posts_status', 'posts_status_sync', 'posts_sync', 'posts_trigger', 'posts_trigger_sync', 'profiles', 'profiles_fetch', 'profiles_fetch_sync', 'profiles_status', 'profiles_status_sync', 'profiles_sync', 'profiles_trigger', 'profiles_trigger_sync', 'scrape', 'scrape_async']

Search scraper methods (Discovery):
['posts_by_keyword', 'posts_by_keyword_sync', 'posts_by_profile', 'posts_by_profile_sync', 'posts_by_url', 'p

---
# Part 1: TikTokScraper (URL-based Extraction)

Test URL-based extraction methods using `await` (required in Jupyter).

## 1.1 Profiles - Extract profile by URL

In [None]:
# Test profile extraction by URL
PROFILE_URL = "https://www.tiktok.com/@tiktok"

print(f"Scraping profile: {PROFILE_URL}")
print("This may take 1-3 minutes...\n")

async with client.scrape.tiktok.engine:
    result = await client.scrape.tiktok.profiles(url=PROFILE_URL, timeout=240)

print(f"Success: {result.success}")
print(f"Status: {result.status}")
print(f"Snapshot ID: {result.snapshot_id}")
print(f"Cost: ${result.cost:.4f}" if result.cost else "Cost: N/A")

if result.success and result.data:
    print("\n--- Profile Data ---")
    data = result.data
    print(f"Available keys: {list(data.keys()) if isinstance(data, dict) else 'N/A'}")
    print(f"\nAccount ID: {data.get('account_id', 'N/A')}")
    print(f"Nickname: {data.get('nickname', 'N/A')}")
    print(f"Followers: {data.get('followers', 'N/A')}")
    print(f"Following: {data.get('following', 'N/A')}")
    print(f"Likes: {data.get('likes', 'N/A')}")
    print(f"Videos: {data.get('videos_count', 'N/A')}")
    print(f"Verified: {data.get('is_verified', 'N/A')}")
    bio = str(data.get('biography', data.get('signature', 'N/A')) or 'N/A')
    print(f"Bio: {bio[:100]}...")
else:
    print(f"\nError: {result.error}")

In [None]:
# Export raw data to JSON file for inspection
import json
from pathlib import Path

output_file = Path.cwd() / "tiktok_profile_result.json"

export_data = {
    "success": result.success,
    "status": result.status,
    "snapshot_id": result.snapshot_id,
    "cost": result.cost,
    "row_count": result.row_count,
    "data": result.data,
    "error": result.error,
}

with open(output_file, "w") as f:
    json.dump(export_data, f, indent=2, default=str)

print(f"Exported to: {output_file}")
print(f"\nData type: {type(result.data)}")
print(f"Data preview: {str(result.data)[:500]}...")

## 1.2 Posts - Extract post by URL

In [None]:
# Test post extraction by URL
# Use a popular video URL (you may need to update this with a current video)
POST_URL = "https://www.tiktok.com/@bilstedim/video/7593754673371221269"

print(f"Scraping post: {POST_URL}")
print("This may take up to 11 minutes...\n")

async with client.scrape.tiktok.engine:
    result = await client.scrape.tiktok.posts(url=POST_URL, timeout=660)

print(f"Success: {result.success}")
print(f"Status: {result.status}")
print(f"Snapshot ID: {result.snapshot_id}")
print(f"Cost: ${result.cost:.4f}" if result.cost else "Cost: N/A")

if result.success and result.data:
    print("\n--- Post Data ---")
    data = result.data
    print(f"Available keys: {list(data.keys()) if isinstance(data, dict) else 'N/A'}")
    print(f"\nPost ID: {data.get('post_id', 'N/A')}")
    print(f"Author: {data.get('profile_username', 'N/A')}")
    description = str(data.get('description', 'N/A') or 'N/A')
    print(f"Description: {description[:100]}...")
    print(f"Likes: {data.get('digg_count', 'N/A')}")
    print(f"Comments: {data.get('comment_count', 'N/A')}")
    print(f"Shares: {data.get('share_count', 'N/A')}")
    print(f"Views: {data.get('play_count', 'N/A')}")
    print(f"Duration: {data.get('video_duration', 'N/A')}")
else:
    print(f"\nError: {result.error}")

## 1.3 Comments - Extract comments by video URL

In [None]:
# Test comments extraction by video URL
VIDEO_URL = "https://www.tiktok.com/@bilstedim/video/7593754673371221269"

print(f"Scraping comments from: {VIDEO_URL}")
print("This may take up to 11 minutes...\n")

async with client.scrape.tiktok.engine:
    result = await client.scrape.tiktok.comments(
        url=VIDEO_URL,
        timeout=660
    )

print(f"Success: {result.success}")
print(f"Status: {result.status}")
print(f"Snapshot ID: {result.snapshot_id}")
print(f"Cost: ${result.cost:.4f}" if result.cost else "Cost: N/A")

if result.success and result.data:
    print("\n--- Comments Data ---")
    data = result.data
    if isinstance(data, list):
        print(f"Number of comments: {len(data)}")
        if len(data) > 0:
            print(f"Available keys: {list(data[0].keys())}")
        for i, comment in enumerate(data[:5]):
            print(f"\nComment {i+1}:")
            print(f"  User: {comment.get('commenter_user_name', 'N/A')}")
            text = str(comment.get('comment_text', 'N/A') or 'N/A')
            print(f"  Text: {text[:80]}...")
            print(f"  Likes: {comment.get('num_likes', 'N/A')}")
            print(f"  Replies: {comment.get('num_replies', 'N/A')}")
    elif isinstance(data, dict):
        print(f"Available keys: {list(data.keys())}")
        print(f"Data: {data}")
    else:
        print(f"Data type: {type(data)}")
else:
    print("\nNo data returned. Debug info:")
    print(f"  result.data: {result.data}")
    print(f"  result.row_count: {result.row_count}")
    print(f"  result.error: {result.error}")

---
# Part 2: TikTokScraper - Fast API Methods

Test the fast API methods for quicker responses.

## 2.1 Posts by Profile (Fast API)

In [None]:
# Test fast API for posts from profile
PROFILE_URL = "https://www.tiktok.com/@bbc"

print(f"Fast API - Getting posts from profile: {PROFILE_URL}")
print("This should be faster than regular API...\n")

async with client.scrape.tiktok.engine:
    result = await client.scrape.tiktok.posts_by_profile_fast(
        url=PROFILE_URL,
        timeout=660
    )

print(f"Success: {result.success}")
print(f"Status: {result.status}")
print(f"Snapshot ID: {result.snapshot_id}")
print(f"Cost: ${result.cost:.4f}" if result.cost else "Cost: N/A")

if result.success and result.data:
    print("\n--- Posts from Profile (Fast) ---")
    data = result.data
    if isinstance(data, list):
        print(f"Number of posts: {len(data)}")
        if len(data) > 0:
            print(f"Available keys: {list(data[0].keys()) if isinstance(data[0], dict) else 'N/A'}")
        for i, post in enumerate(data[:3]):
            print(f"\nPost {i+1}:")
            print(f"  URL: {post.get('url', 'N/A')}")
            print(f"  Views: {post.get('play_count', 'N/A')}")
            print(f"  Likes: {post.get('digg_count', 'N/A')}")
    else:
        print(f"Data type: {type(data)}")
else:
    print(f"\nError: {result.error}")

## 2.2 Posts by Search URL (Fast API)

In [None]:
# Test fast API for posts from search URL
SEARCH_URL = "https://www.tiktok.com/search?q=cooking"

print(f"Fast API - Getting posts from search: {SEARCH_URL}")
print("Requesting 10 posts...\n")

async with client.scrape.tiktok.engine:
    result = await client.scrape.tiktok.posts_by_search_url_fast(
        url=SEARCH_URL,
        num_of_posts=10,
        timeout=660
    )

print(f"Success: {result.success}")
print(f"Status: {result.status}")
print(f"Snapshot ID: {result.snapshot_id}")
print(f"Cost: ${result.cost:.4f}" if result.cost else "Cost: N/A")

if result.success and result.data:
    print("\n--- Posts from Search (Fast) ---")
    data = result.data
    if isinstance(data, list):
        print(f"Number of posts: {len(data)}")
        for i, post in enumerate(data[:3]):
            print(f"\nPost {i+1}:")
            print(f"  URL: {post.get('url', 'N/A')}")
            description = str(post.get('description', 'N/A') or 'N/A')
            print(f"  Description: {description[:60]}...")
            print(f"  Views: {post.get('play_count', 'N/A')}")
    else:
        print(f"Data type: {type(data)}")
else:
    print(f"\nError: {result.error}")

## 2.3 Post by URL (Fast API)

In [5]:
# Test fast API for posts by URL (discover/channel/explore pages - NOT individual videos)
# This endpoint is for: discover, channel, music, explore pages
DISCOVER_URL = "https://www.tiktok.com/discover/cooking"

print(f"Fast API - Getting posts from discover page: {DISCOVER_URL}")
print("Note: This endpoint is for discover/channel/explore pages, not individual videos")
print("For individual videos, use posts() method instead.\n")

async with client.scrape.tiktok.engine:
    result = await client.scrape.tiktok.posts_by_url_fast(
        url=DISCOVER_URL,
        timeout=660
    )

print(f"Success: {result.success}")
print(f"Status: {result.status}")
print(f"Snapshot ID: {result.snapshot_id}")
print(f"Cost: ${result.cost:.4f}" if result.cost else "Cost: N/A")

if result.success and result.data:
    print("\n--- Posts from Discover Page (Fast) ---")
    data = result.data
    
    # Check if this is an error record
    if isinstance(data, dict) and 'error' in data:
        print(f"API Error: {data.get('error')}")
        print(f"Error Code: {data.get('error_code')}")
    elif isinstance(data, list):
        print(f"Number of posts: {len(data)}")
        if len(data) > 0:
            print(f"Available keys: {list(data[0].keys()) if isinstance(data[0], dict) else 'N/A'}")
        for i, post in enumerate(data[:3]):
            if 'error' in post:
                continue  # Skip error records
            print(f"\nPost {i+1}:")
            print(f"  URL: {post.get('url', 'N/A')}")
            print(f"  Views: {post.get('play_count', 'N/A')}")
            print(f"  Likes: {post.get('digg_count', 'N/A')}")
    else:
        print(f"Available keys: {list(data.keys()) if isinstance(data, dict) else 'N/A'}")
else:
    print(f"\nError: {result.error}")

Fast API - Getting posts from discover page: https://www.tiktok.com/discover/cooking
Note: This endpoint is for discover/channel/explore pages, not individual videos
For individual videos, use posts() method instead.

Success: True
Status: ready
Snapshot ID: sd_mkut4bx012wfebxjf1
Cost: $0.3380

--- Posts from Discover Page (Fast) ---
Number of posts: 169
Available keys: ['url', 'post_id', 'description', 'create_time', 'digg_count', 'share_count', 'collect_count', 'comment_count', 'play_count', 'video_duration', 'hashtags', 'original_sound', 'profile_id', 'profile_username', 'profile_url', 'profile_avatar', 'profile_biography', 'preview_image', 'post_type', 'offical_item', 'secu_id', 'original_item', 'shortcode', 'width', 'ratio', 'video_url', 'music', 'cdn_url', 'is_verified', 'account_id', 'carousel_images', 'tagged_user', 'profile_followers', 'tt_chain_token', 'timestamp', 'input']

Post 1:
  URL: https://www.tiktok.com/@theeunicornriah/video/7571843974005116215
  Views: 89200
  Like

---
# Part 3: TikTokSearchScraper (Discovery with extra_params)

Test parameter-based discovery methods that use `extra_params` for:
- `type=discover_new`
- `discover_by=search_url|keyword|profile|url`

## 3.1 Profiles Discovery - by Search URL

In [7]:
# Test profiles discovery by search URL
# Uses: extra_params={"type": "discover_new", "discover_by": "search_url"}
SEARCH_URL = "https://www.tiktok.com/search?q=music"

print(f"Discovering profiles from search: {SEARCH_URL}")
print("Using extra_params: type=discover_new, discover_by=search_url")
print("This may take up to 11 minutes...\n")

async with client.search.tiktok.engine:
    result = await client.search.tiktok.profiles(
        search_url=SEARCH_URL,
        country="US",
        timeout=660
    )

print(f"Success: {result.success}")
print(f"Status: {result.status}")
print(f"Snapshot ID: {result.snapshot_id}")
print(f"Cost: ${result.cost:.4f}" if result.cost else "Cost: N/A")

if result.success and result.data:
    print("\n--- Discovered Profiles ---")
    data = result.data
    if isinstance(data, list):
        # Filter out error records
        valid_profiles = [p for p in data if 'error' not in p]
        print(f"Number of profiles discovered: {len(valid_profiles)}")
        if len(valid_profiles) > 0:
            print(f"Available keys: {list(valid_profiles[0].keys()) if isinstance(valid_profiles[0], dict) else 'N/A'}")
        for i, profile in enumerate(valid_profiles[:5]):
            print(f"\nProfile {i+1}:")
            print(f"  Account ID: {profile.get('account_id', 'N/A')}")
            print(f"  Nickname: {profile.get('nickname', 'N/A')}")
            print(f"  Followers: {profile.get('followers', 'N/A')}")
            print(f"  Verified: {profile.get('is_verified', 'N/A')}")
    elif isinstance(data, dict):
        print(f"Available keys: {list(data.keys())}")
    else:
        print(f"Data type: {type(data)}")
else:
    print(f"\nError: {result.error}")

Discovering profiles from search: https://www.tiktok.com/search?q=music
Using extra_params: type=discover_new, discover_by=search_url
This may take up to 11 minutes...

Success: False
Status: timeout
Snapshot ID: sd_mkuu4owy2p1whvc4db
Cost: N/A

Error: Polling timeout after 660s


## 3.2 Posts Discovery - by Keyword

In [None]:
# Test posts discovery by keyword
# Uses: extra_params={"type": "discover_new", "discover_by": "keyword"}
KEYWORD = "#dance"

print(f"Discovering posts for keyword: {KEYWORD}")
print("Using extra_params: type=discover_new, discover_by=keyword")
print("Requesting 10 posts...")
print("This may take 1-3 minutes...\n")

async with client.search.tiktok.engine:
    result = await client.search.tiktok.posts_by_keyword(
        keyword=KEYWORD,
        num_of_posts=10,
        timeout=240
    )

print(f"Success: {result.success}")
print(f"Status: {result.status}")
print(f"Snapshot ID: {result.snapshot_id}")
print(f"Cost: ${result.cost:.4f}" if result.cost else "Cost: N/A")

if result.success and result.data:
    print("\n--- Discovered Posts ---")
    data = result.data
    if isinstance(data, list):
        print(f"Number of posts discovered: {len(data)}")
        if len(data) > 0:
            print(f"Available keys: {list(data[0].keys()) if isinstance(data[0], dict) else 'N/A'}")
        for i, post in enumerate(data[:5]):
            print(f"\nPost {i+1}:")
            print(f"  URL: {post.get('url', 'N/A')}")
            print(f"  Author: {post.get('author', post.get('name', 'N/A'))}")
            description = str(post.get('description', post.get('title', 'N/A')) or 'N/A')
            print(f"  Description: {description[:60]}...")
            print(f"  Likes: {post.get('likes', post.get('digg_count', 'N/A'))}")
            print(f"  Views: {post.get('views', post.get('play_count', 'N/A'))}")
    else:
        print(f"Data type: {type(data)}")
else:
    print(f"\nError: {result.error}")

## 3.3 Posts Discovery - by Profile

In [None]:
# Test posts discovery from profile
# Uses: extra_params={"type": "discover_new", "discover_by": "profile"}
PROFILE_URL = "https://www.tiktok.com/@nasa"

print(f"Discovering posts from profile: {PROFILE_URL}")
print("Using extra_params: type=discover_new, discover_by=profile")
print("Requesting 10 posts...")
print("This may take 1-3 minutes...\n")

async with client.search.tiktok.engine:
    result = await client.search.tiktok.posts_by_profile(
        url=PROFILE_URL,
        num_of_posts=10,
        timeout=240
    )

print(f"Success: {result.success}")
print(f"Status: {result.status}")
print(f"Snapshot ID: {result.snapshot_id}")
print(f"Cost: ${result.cost:.4f}" if result.cost else "Cost: N/A")

if result.success and result.data:
    print("\n--- Discovered Posts from Profile ---")
    data = result.data
    if isinstance(data, list):
        print(f"Number of posts discovered: {len(data)}")
        if len(data) > 0:
            print(f"Available keys: {list(data[0].keys()) if isinstance(data[0], dict) else 'N/A'}")
        for i, post in enumerate(data[:5]):
            print(f"\nPost {i+1}:")
            print(f"  URL: {post.get('url', 'N/A')}")
            description = str(post.get('description', post.get('title', 'N/A')) or 'N/A')
            print(f"  Description: {description[:60]}...")
            print(f"  Likes: {post.get('likes', post.get('digg_count', 'N/A'))}")
            print(f"  Views: {post.get('views', post.get('play_count', 'N/A'))}")
    else:
        print(f"Data type: {type(data)}")
else:
    print(f"\nError: {result.error}")

## 3.4 Posts Discovery - by URL (Multiple)

In [None]:
# Test posts discovery by multiple URLs
# Uses: extra_params={"type": "discover_new", "discover_by": "url"}
POST_URLS = [
    "https://www.tiktok.com/@tiktok/video/7456789012345678901",
    "https://www.tiktok.com/@nasa/video/7456789012345678902"
]

print("Discovering posts by URLs:")
for url in POST_URLS:
    print(f"  - {url}")
print("Using extra_params: type=discover_new, discover_by=url")
print("This may take 1-3 minutes...\n")

async with client.search.tiktok.engine:
    result = await client.search.tiktok.posts_by_url(
        url=POST_URLS,
        timeout=240
    )

print(f"Success: {result.success}")
print(f"Status: {result.status}")
print(f"Snapshot ID: {result.snapshot_id}")
print(f"Cost: ${result.cost:.4f}" if result.cost else "Cost: N/A")

if result.success and result.data:
    print("\n--- Discovered Posts by URL ---")
    data = result.data
    if isinstance(data, list):
        print(f"Number of posts: {len(data)}")
        for i, post in enumerate(data):
            print(f"\nPost {i+1}:")
            print(f"  URL: {post.get('url', 'N/A')}")
            print(f"  Views: {post.get('views', post.get('play_count', 'N/A'))}")
            print(f"  Likes: {post.get('likes', post.get('digg_count', 'N/A'))}")
    else:
        print(f"Data type: {type(data)}")
else:
    print(f"\nError: {result.error}")

---
# Part 4: Verify Timing Metadata

In [None]:
# Check timing metadata from last result
print("=== Timing Metadata ===")
print(f"trigger_sent_at: {result.trigger_sent_at}")
print(f"snapshot_id_received_at: {result.snapshot_id_received_at}")
print(f"snapshot_polled_at: {result.snapshot_polled_at}")
print(f"data_fetched_at: {result.data_fetched_at}")
print(f"\nrow_count: {result.row_count}")
print(f"cost: {result.cost}")

---
# Summary

## TikTokScraper (URL-based)
- `profiles(url)` - Extract profile data by URL
- `posts(url)` - Extract post/video data by URL
- `comments(url)` - Extract comments from video URL

### Fast API Methods (Quicker responses)
- `posts_by_profile_fast(url)` - Get posts from profile (fast)
- `posts_by_search_url_fast(search_url)` - Get posts from search (fast)
- `posts_by_url_fast(url)` - Get post data (fast)

## TikTokSearchScraper (Discovery with extra_params)
- `profiles(search_url)` - Discover profiles (`discover_by=search_url`)
- `posts_by_keyword(keyword)` - Discover by keyword (`discover_by=keyword`)
- `posts_by_profile(url)` - Discover from profile (`discover_by=profile`)
- `posts_by_url(url)` - Discover by URL(s) (`discover_by=url`)