<img alt="QuantRocket logo" src="https://www.quantrocket.com/assets/img/notebook-header-logo.png">

<a href="https://www.quantrocket.com/disclaimer/">Disclaimer</a>

# Data Collection

The data collection process consists of collecting US ETF listings, creating a universe of ETFs, then collecting 1-day bars for the ETFs.

First, start IB Gateway:

In [1]:
from quantrocket.launchpad import start_gateways
start_gateways(wait=True)

{'ibg1': {'status': 'running'}}

## Collect ETF listings

Then collect the listings:

In [2]:
from quantrocket.master import collect_listings
collect_listings(exchanges=["ARCA","NASDAQ","NYSE","AMEX","BATS"], sec_types="ETF")

{'status': 'the listing details will be collected asynchronously'}

Monitor flightlog for a completion message:

```
quantrocket.master: INFO Collecting ARCA ETF listings from IB website
quantrocket.master: INFO Requesting details for 2246 ARCA listings found on IB website
quantrocket.master: INFO Saved 1576 ARCA listings to securities master database
...
```

## Define universe of US ETFs 

Next we download a CSV of US ETFs:

In [1]:
from quantrocket.master import download_master_file
download_master_file("usa_etfs.csv", exchanges=["ARCA","NASDAQ","NYSE","AMEX","BATS"], sec_types="ETF")

Then upload the CSV to create the "usa-etf" universe:

In [2]:
from quantrocket.master import create_universe
create_universe("usa-etf", infilepath_or_buffer="usa_etfs.csv")

{'code': 'usa-etf',
 'provided': 2997,
 'inserted': 2997,
 'total_after_insert': 2997}

## Collect historical data

Next, we create a database for collecting 1-day bars. Because the pairs strategy will enter and exit on the open, we specify `primary_exchange=True` to limit prices to the primary exchange, which results in a more accurate picture of the opening and closing auction prices. (See the usage guide for more information.)

In [1]:
from quantrocket.history import create_db
create_db("usa-etf-1d-p", universes="usa-etf", bar_size="1 day", primary_exchange=True)

{'status': 'successfully created quantrocket.history.usa-etf-1d-p.sqlite'}

Then collect the data:

In [6]:
from quantrocket.history import collect_history
collect_history("usa-etf-1d-p")

{'status': 'the historical data will be collected asynchronously'}

Monitor flightlog for completion:

```
quantrocket.history: INFO [usa-etf-1d-p] Collecting history from IB for 2889 securities in usa-etf-1d-p
quantrocket.history: INFO [usa-etf-1d-p] Saved 4872473 total records for 2648 total securities to quantrocket.history.usa-etf-1d-p.sqlite
```

***

## *Next Up*

Part 2: [Moonshot Pairs Strategy](Part2-Moonshot-Pairs-Strategy.ipynb)