Skip to content

Latest commit

 

History

History
246 lines (146 loc) · 6.98 KB

CrawleeOneArgs.md

File metadata and controls

246 lines (146 loc) · 6.98 KB

crawlee-one / Exports / CrawleeOneArgs

Interface: CrawleeOneArgs<TType, T>

Args object passed to crawleeOne

Type parameters

Name Type
TType extends CrawlerType
T extends CrawleeOneCtx<CrawlerMeta<TType>["context"]>

Table of contents

Properties

Properties

crawlerConfig

Optional crawlerConfig: Omit<CrawlerMeta<TType, CrawlingContext<unknown, Dictionary>, Record<string, any>>["options"], "requestHandler">

Crawlee crawler configuration that CANNOT be overriden via input and crawlerConfigDefaults

Defined in

src/api.ts:25


crawlerConfigDefaults

Optional crawlerConfigDefaults: Omit<CrawlerMeta<TType, CrawlingContext<unknown, Dictionary>, Record<string, any>>["options"], "requestHandler">

Crawlee crawler configuration that CAN be overriden via input and crawlerConfig

Defined in

src/api.ts:27


hooks

Optional hooks: Object

Type declaration

Name Type
onAfterHandler? CrawleeOneRouteHandler<T, CrawleeOneActorRouterCtx<T>>
onBeforeHandler? CrawleeOneRouteHandler<T, CrawleeOneActorRouterCtx<T>>
onReady? (actor: CrawleeOneActorInst<T>) => MaybePromise<void>
validateInput? (input: null | AllActorInputs) => MaybePromise<void>

Defined in

src/api.ts:115


input

Optional input: Partial<AllActorInputs>

Input configuration that CANNOT be overriden via inputDefaults and io.getInput()

Defined in

src/api.ts:67


inputDefaults

Optional inputDefaults: Partial<AllActorInputs>

Input configuration that CAN be overriden via input and io.getInput()

Defined in

src/api.ts:69


io

Optional io: T["io"]

Provide an instance that is responsible for state management:

  • Adding scraped data to datasets
  • Adding and removing requests to/from queues
  • Cache storage

This is an API based on Apify's Actor utility class, which is also the default.

You don't need to override this in most of the cases.

By default, the data is saved and kept locally in ./storage directory. And if the cralwer runs in Apify's platform then it will use Apify's cloud for storage.

See CrawleeOneIO

Defined in

src/api.ts:101


mergeInput

Optional mergeInput: boolean | (sources: { defaults: Partial<AllActorInputs> ; env: Partial<AllActorInputs> ; overrides: Partial<AllActorInputs> }) => MaybePromise<Partial<AllActorInputs>>

If mergeInput is truthy, will merge input settings from inputDefaults, input, and io.getInput().

{ ...inputDefaults, ...io.getInput(), ...input }

If mergeInput is falsy, io.getInput() is ignored if input is provided. So the input is either:

{ ...inputDefaults, ...io.getInput() } // If `input` is not defined

OR

{ ...inputDefaults, ...input } // If `input` is defined

Alternatively, you can supply your own function that merges the sources:

{
  // `mergeInput` can be also async
  mergeInput: ({ defaults, overrides, env }) => {
    // This is same as `mergeInput: true`
    return { ...defaults, ...env, ...overrides };
  },
}

Defined in

src/api.ts:61


name

Optional name: string

Unique name of the crawler instance. The name may be used in codegen and logging.

Defined in

src/api.ts:22


proxy

Optional proxy: MaybeAsyncFn<ProxyConfiguration, [CrawleeOneActorDefWithInput<T>]>

Configure the Crawlee proxy.

See ProxyConfiguration

Defined in

src/api.ts:77


router

Optional router: MaybeAsyncFn<RouterHandler<T["context"]>, [CrawleeOneActorDefWithInput<T>]>

Provide a custom router instance.

By default, router is created as:

import { Router } from 'crawlee';
Router.create(),

See Router

Defined in

src/api.ts:113


routes

routes: Record<T["labels"], CrawleeOneRoute<T, CrawleeOneActorRouterCtx<T>>>

Defined in

src/api.ts:121


telemetry

Optional telemetry: MaybeAsyncFn<T["telemetry"], [CrawleeOneActorDefWithInput<T>]>

Provide a telemetry instance that is used for tracking errors.

See CrawleeOneTelemetry

Defined in

src/api.ts:83


type

type: "basic" | "http" | "cheerio" | "jsdom" | "playwright" | "puppeteer"

Type specifying the Crawlee crawler class, input options, and more.

Defined in

src/api.ts:20