<a href="https://colab.research.google.com/github/samehra/Projects/blob/master/generation/langchain/handbook/create_urls_sec_edgar.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
pip install sec-api

In [None]:
from sec_api import QueryApi

queryApi = QueryApi(api_key="YOUR_API_KEY")

In [None]:
query = {
  "query": { "query_string": { 
      "query": "formType:\"10-K\" " + 
               "AND NOT formType:\"NT 10-K\" " + 
               "AND NOT formType:\"10-K/A\" " +
               "AND filedAt:[2022-01-01 TO 2022-12-31]",
      "time_zone": "America/New_York"
  } },
  "from": "0",
  "size": "10",
  "sort": [{ "filedAt": { "order": "desc" } }]
}

response = queryApi.get_filings(query)

In [None]:
# open the file we use to store the filing URLs
log_file = open("filing_urls.txt", "a")

for from_batch in range(0, 30, 10):
  # set new "from" starting position of search 
  #base_query["from"] = from_batch;
  query["from"] = from_batch;
  response = queryApi.get_filings(query)

  # no more filings in search universe
  if len(response["filings"]) == 0:
    break;

  # for each filing, only save the URL pointing to the filing itself 
  # and ignore all other data. 
  # the URL is set in the dict key "linkToFilingDetails"
  urls_list = list(map(lambda x: x["linkToFilingDetails"], response["filings"]))

  # transform list of URLs into one string by joining all list elements
  # and add a new-line character between each element.
  urls_string = "\n".join(urls_list) + "\n"
  
  log_file.write(urls_string)

print("Filing URLs downloaded for {year}-{month:02d}".format(year=year, month=month))

log_file.close()

print("All URLs downloaded")

Let's inspect the Query API response and print a subset of properties of each filing, namely `formType` and `periodOfReport`. The filings live in the `response["filings"]` list. 

Feel free to skip the next lines if you're familiar with the use of `map` and `lambda`. The `map` function applies the `lambda` function to every filing in the list of filings. The `lambda` function simply returns a new dict for each filing by extracting the `formType` and `periodOfReport` property from the filing, and setting both values as the values of the new dict. Finally, the result of the `map` function is converted into a new list. 

In [None]:
list(map(lambda x: {"formType": x["formType"], "periodOfReport": x["periodOfReport"]}, response["filings"]))

[{'formType': '10-K', 'periodOfReport': '2021-11-30'},
 {'formType': '10-K', 'periodOfReport': '2021-10-31'},
 {'formType': '10-K', 'periodOfReport': '2021-10-31'},
 {'formType': '10-K', 'periodOfReport': '2021-09-30'},
 {'formType': '10-K', 'periodOfReport': '2021-09-30'},
 {'formType': '10-K', 'periodOfReport': '2021-09-30'},
 {'formType': '10-K', 'periodOfReport': '2021-09-30'},
 {'formType': '10-K', 'periodOfReport': '2021-09-30'},
 {'formType': '10-K', 'periodOfReport': '2021-09-30'},
 {'formType': '10-K', 'periodOfReport': '2021-09-30'}]

In [None]:
period_by_cik = {}
period_by_ticker = {}

for filing in response["filings"]:
  cik, ticker, periodOfReport = filing["cik"], filing["ticker"], filing["periodOfReport"]

  if not cik in period_by_cik:
    period_by_cik[cik] = []

  if not periodOfReport in period_by_cik[cik]:
    period_by_cik[cik].append(periodOfReport)

  if len(ticker) > 0:
    if not ticker in period_by_ticker:
      period_by_ticker[ticker] = []

    if not periodOfReport in period_by_ticker[ticker]:
      period_by_ticker[ticker].append(periodOfReport)

In [None]:
period_by_cik

{'1206942': ['2021-10-31'],
 '1341726': ['2021-09-30'],
 '1377167': ['2021-09-30'],
 '1435181': ['2021-09-30'],
 '1502966': ['2021-09-30'],
 '1592782': ['2021-09-30'],
 '1725516': ['2021-11-30'],
 '1844817': ['2021-09-30'],
 '1853314': ['2021-09-30'],
 '72633': ['2021-10-31']}

In [None]:
period_by_ticker

{'AACI': ['2021-09-30'],
 'DIGP': ['2021-09-30'],
 'FGCO': ['2021-09-30'],
 'GIAC': ['2021-09-30'],
 'GSPE': ['2021-09-30'],
 'NBLD': ['2021-11-30'],
 'NRT': ['2021-10-31'],
 'NUKK': ['2021-09-30'],
 'PHBI': ['2021-09-30']}