This is a Singer tap that produces JSON-formatted data following the Singer spec.
This tap:
- Pulls raw data from Yahoo Gemini reporting API
- Extracts the reporting cubes detailed below in the "Table Schemas" section.
- Outputs the schema for each resource
- Incrementally pulls data based on the input state
Follow the guidelines below to use this tap in Stitch or in a Python environment.
To install tap-gemini
in Stitch, you need to create an API application and generate an OAuth 2.0
credentials. See the authentication documentation on the Oath website for instructions, including their OAuth 2.0 guide. Enter the client ID as the username and the refresh token as the password.
Follow the instructions below to use the tap as a Python package.
Create a virtual environment and install the package using pip
. These instructions are bash
commands that will work on Unix-based platforms.
python3 -m venv ~/.virtualenvs/tap-gemini
source ~/.virtualenvs/tap-gemini/bin/activate
pip install tap-gemini
deactivate
Run the following command to run the tap using the configuration specified in the JSON file config.json
:
~/.virtualenvs/tap-gemini/bin/tap-gemini --config ~/config.json
To output the data to a CSV file, pipe the data stream into target-csv:
~/.virtualenvs/tap-gemini/bin/tap-gemini --config ~/config.json | ~/.virtualenvs/target-csv/bin/target-csv
The options in the configuration file are described below.
These settings must be specified:
start_date
: The lower bound of the historical load time rangeusername
: the OAuth client IDpassword
: the OAuth client secretrefresh_token
: the OAuth refresh tokenadvertiser_ids
: List of advertiser (account) ID numbers
These additional options are available:
api_version
: The API version to usesession
: Options for the HTTP session such as headers and proxies with be passed into therequests.Session
as keyword arguments.sandbox
: Use the API testing environmentpoll_interval
: The number of seconds (minimum: 1.0) between poll attempts when waiting for a report to by ready for download.
Each incremental report run begins at the timestamp when books were marked closed (i.e. when no further changes to the data are written.)
For historic data loads, the reports will run over the largest possible time frame. Some reports have a limited time range as detailed below, where the "Days" column shows the largest available number of days prior to the current calendar date:
Table | Days |
---|---|
performance_stats | 15 |
slot_performance_stats | 15 |
product_ads | 400 |
site_performance_stats | 400 |
keyword_stats | 750 |
Most tables have the following primary key columns:
- Advertiser ID
- Day
The table schemas are detailed below.
The following reporting cubes are implemented:
- adjustment_stats
- Description: This cube provides performance metrics for over delivery adjustments for spend, which are not available in other cubes.
- Primary key columns:
- Advertiser ID
- Day
- Replication: Incremental
- Bookmark column(s): Day
- ad_extension_details
- call_extension_stats
- campaign_bid_performance_stats
- conversion_rules_stats
- domain_performance_stats
- keyword_stats
- performance_stats
- Description: This cube has performance stats for all levels down to the ad level. It is recommended to use this cube when querying for native ads campaign data. The cube does not include keyword level metrics. Data for both search and native campaigns is provided - you can use the “Source” field to filter for a specific channel. Note that the cube does not include any over delivery spend adjustments which are available in the adjustment_stats cube.
- Primary key columns:
- Advertiser ID
- Day
- Replication: Incremental
- Bookmark column: Day
- product_ads
- product_ad_performance_stats
- search_stats
- site_performance_stats
- slot_performance_stats
- user_stats
These cubes are not implemented:
The following account structure objects are implemented.
The other API objects are implemented in Python but schema and metadata definitions need to be written.
Some fields have been excluded from the schema (i.e. the meta-data inclusion is set to
unsupported
) because they are incompatible with other fields. This could probably be fixed by
defining meta-data exclusions that depend on other fields.
- All dates and times use the
advertiser
time zone.
- Github repository tap-gemini
- Slack channel #tap-gemini
Copyright © 2019 Stitch