# Earnings Call Transcript Gathering


## Step-1 API Key

Sign in to https://rapidapi.com/apidojo/api/seeking-alpha and get an API key. You can find it with the label 'X-RapidAPI-Key'



## Step-2 List of Transcripts

Get list of available transcripts for a given company with the following code.

In [21]:
import requests

url = "https://seeking-alpha.p.rapidapi.com/transcripts/v2/list"

# Change company_name and symbol according to the company you are interested in
company_name = 'Microsoft'
symbol = 'msft'

#Size controls the number of transcript details returned (not all of them are earnings call transcripts), max size is 40
querystring = {"id":symbol ,"size":"40"}

headers = {
	"X-RapidAPI-Key": "", #enter your key between the double quotes here
	"X-RapidAPI-Host": "seeking-alpha.p.rapidapi.com"
}

response = requests.get(url, headers=headers, params=querystring)


transcripts_list = response.json()

In [2]:
transcripts_list

{'data': [{'id': '4676748',
   'type': 'transcript',
   'attributes': {'publishOn': '2024-03-07T15:04:09-05:00',
    'isLockedPro': False,
    'commentCount': 0,
    'gettyImageUrl': None,
    'videoPreviewUrl': None,
    'themes': {},
    'title': 'Microsoft Corporation (MSFT) Morgan Stanley Technology, Media and Telecom Conference (Transcript)',
    'isPaywalled': False},
   'relationships': {'author': {'data': {'id': '44211', 'type': 'author'}},
    'sentiments': {'data': []},
    'primaryTickers': {'data': [{'id': '575', 'type': 'tag'}]},
    'secondaryTickers': {'data': []},
    'otherTags': {'data': [{'id': '49', 'type': 'tag'}]}},
   'links': {'self': '/article/4676748-microsoft-corporation-msft-morgan-stanley-technology-media-and-telecom-conference-transcript'}},
  {'id': '4666217',
   'type': 'transcript',
   'attributes': {'publishOn': '2024-01-30T21:32:11-05:00',
    'isLockedPro': False,
    'commentCount': 2,
    'gettyImageUrl': None,
    'videoPreviewUrl': None,
    'the

Note the 'minmaxPublishOn' key in the above response. We shall use it later.

In [3]:
import pandas as pd


transcripts_list_idtitle = []

for item in transcripts_list['data']:
    transcripts_list_idtitle.append({'id': item['id'], 'Title': item['attributes']['title']})

transcripts_list_df = pd.DataFrame(transcripts_list_idtitle)

In [4]:
transcripts_list_df

Unnamed: 0,id,Title
0,4676748,Microsoft Corporation (MSFT) Morgan Stanley Te...
1,4666217,Microsoft Corporation (MSFT) Q2 2024 Earnings ...
2,4666183,Microsoft Corporation 2024 Q2 - Results - Earn...
3,4656730,Microsoft Corporation (MSFT) Presents at Barcl...
4,4655000,Microsoft Corporation (MSFT) UBS Global Techno...
5,4654726,Microsoft Corporation (MSFT) Presents at Wells...
6,4643129,Microsoft Corporation (MSFT) Q1 2024 Earnings ...
7,4633767,Microsoft Corporation (MSFT) Citi 2023 Global ...
8,4633766,Microsoft Corporation (MSFT) Goldman Sachs 202...
9,4632174,Microsoft Corporation (MSFT) Deutsche Bank's 2...


## Step - 3 Filter Earnings Call Transcripts

As you can see, the API returns many different transcripts. We are only interested in earnings call transcripts. The next step is to filter out these id's.

In [28]:
transcripts_useful = transcripts_list_df.loc[transcripts_list_df['Title'].str.find('Earnings Call Transcript')>=0]

In [29]:
transcripts_useful

Unnamed: 0,id,Title
1,4666217,Microsoft Corporation (MSFT) Q2 2024 Earnings ...
6,4643129,Microsoft Corporation (MSFT) Q1 2024 Earnings ...
10,4619767,Microsoft Corporation (MSFT) Q4 2023 Earnings ...
14,4596556,Microsoft Corporation (MSFT) Q3 2023 Earnings ...
17,4572123,Microsoft Corporation (MSFT) Q2 2023 Earnings ...
21,4549108,Microsoft Corporation (MSFT) Q1 2023 Earnings ...
27,4526087,Microsoft Corporation (MSFT) CEO Satya Nadella...
30,4503841,Microsoft's (MSFT) CEO Satya Nadella on Q3 202...
33,4481617,Microsoft Corporation's (MSFT) CEO Satya Nadel...
39,4462243,Microsoft Corporation (MSFT) CEO Satya Nadella...


Sometimes even after this filter, there may be unnecessary transcripts in this list (see below for examples)

### **Figure out a way to get only Earnings Call Transcripts in this list, this has to be done on a stock by stock basis**

Since we only get 500 API calls for free, it is good to not download unnecessary stuff.

Beyond this step, it is assumed that the dataframe transcripts_useful contains only earnings call transcripts

## Step 4 - Download and Store Earnings Call Transcript

Before doing this, create a folder with the company's name in the folder containing this jupyter notebook. In this example, I have created a folder with the name "Microsoft".

In [41]:
import json

url = "https://seeking-alpha.p.rapidapi.com/transcripts/v2/get-details"

headers = {
	"X-RapidAPI-Key": "", #enter your key between the double quotes here
	"X-RapidAPI-Host": "seeking-alpha.p.rapidapi.com"
}

for ind in transcripts_useful.index:
	response = requests.get(url, headers=headers, params={"id": transcripts_useful['id'][ind]} )
	filename = transcripts_useful['Title'][ind]
	f = open(company_name+'/'+filename, 'w')
	json.dump(response.json(), f)
	f.close()

## Step 5 - Repeat for older transcripts

Note the 'minmaxPublishOn' key the last time we got transcripts details. When we next get details, we will only get details upto the 'min' value minus 1


In [24]:
import requests

url = "https://seeking-alpha.p.rapidapi.com/transcripts/v2/list"

# Change company_name and symbol according to the company you are interested in
company_name = 'Microsoft'
symbol = 'msft'

#Size controls the number of transcript details returned (not all of them are earnings call transcripts), max size is 40
querystring = {"id":symbol ,"size":"40", "until":'1635297546'}
#Note the new key "until", this denotes the time upto which we need the transcript details

headers = {
	"X-RapidAPI-Key": "", #enter your key between the double quotes here
	"X-RapidAPI-Host": "seeking-alpha.p.rapidapi.com"
}

response = requests.get(url, headers=headers, params=querystring)


transcripts_list2 = response.json()

In [27]:
transcripts_list2

{'data': [{'id': '4455407',
   'type': 'transcript',
   'attributes': {'publishOn': '2021-09-15T14:32:04-04:00',
    'isLockedPro': False,
    'commentCount': 0,
    'gettyImageUrl': None,
    'videoPreviewUrl': None,
    'themes': {},
    'title': 'Microsoft Corporation (MSFT) Presents at Virtual mCloud Connect 2021 Conference (Transcript)',
    'isPaywalled': False},
   'relationships': {'author': {'data': {'id': '44211', 'type': 'author'}},
    'sentiments': {'data': []},
    'primaryTickers': {'data': [{'id': '575', 'type': 'tag'}]},
    'secondaryTickers': {'data': []},
    'otherTags': {'data': [{'id': '49', 'type': 'tag'}]}},
   'links': {'self': '/article/4455407-microsoft-corporation-msft-presents-virtual-mcloud-connect-2021-conference-transcript'}},
  {'id': '4455238',
   'type': 'transcript',
   'attributes': {'publishOn': '2021-09-14T17:14:05-04:00',
    'isLockedPro': False,
    'commentCount': 0,
    'gettyImageUrl': None,
    'videoPreviewUrl': None,
    'themes': {},
  

In [25]:
transcripts_list2_idtitle = []

for item in transcripts_list2['data']:
    transcripts_list2_idtitle.append({'id': item['id'], 'Title': item['attributes']['title']})

transcripts_list2_df = pd.DataFrame(transcripts_list2_idtitle)

In [26]:
transcripts_list2_df

Unnamed: 0,id,Title
0,4455407,Microsoft Corporation (MSFT) Presents at Virtu...
1,4455238,Microsoft Corporation's (MSFT) Management Pres...
2,4454344,Microsoft Corporation's (MSFT) Management Pres...
3,4441926,Microsoft Corporation 2021 Q4 - Results - Earn...
4,4441881,Microsoft Corporation (MSFT) CEO Satya Nadella...
5,4433529,Microsoft Corporation (MSFT) Management Presen...
6,4431268,Microsoft Corporation's (MSFT) Management Pres...
7,4421835,Microsoft Corporation's (MSFT) CEO Satya Nadel...
8,4421758,Microsoft Corporation 2021 Q3 - Results - Earn...
9,4413021,Microsoft Corporation's (MSFT) Management Pres...


In [47]:
transcripts_useful2 = transcripts_list2_df.loc[transcripts_list2_df['Title'].str.find('Earnings Call Transcript')>=0]

In [48]:
transcripts_useful2

Unnamed: 0,id,Title
4,4441881,Microsoft Corporation (MSFT) CEO Satya Nadella...
7,4421835,Microsoft Corporation's (MSFT) CEO Satya Nadel...
11,4404446,"Nuance Communications, Inc.'s (NUAN) CEO Mark ..."
12,4401205,Microsoft Corporation's (MSFT) CEO Satya Nadel...
18,4390171,"Nuance Communications, Inc. (NUAN) CEO Mark Be..."
20,4381922,Microsoft Corporation (MSFT) CEO Satya Nadella...
24,4365022,"Nuance Communications, Inc. (NUAN) CEO Mark Be..."
25,4360065,Microsoft Corp. (MSFT) CEO Satya Nadella on Q4...
27,4344465,"Nuance Communications, Inc. (NUAN) CEO Mark Be..."
28,4341291,Microsoft Corp (MSFT) CEO Satya Nadella on Q3 ...


Here you see that the API returns some other company's earnings call transcripts as well. Filter these out.

In [50]:
transcripts_useful2 = transcripts_useful2.loc[transcripts_list2_df['Title'].str.find('MSFT')>=0]

In [51]:
transcripts_useful2

Unnamed: 0,id,Title
4,4441881,Microsoft Corporation (MSFT) CEO Satya Nadella...
7,4421835,Microsoft Corporation's (MSFT) CEO Satya Nadel...
12,4401205,Microsoft Corporation's (MSFT) CEO Satya Nadel...
20,4381922,Microsoft Corporation (MSFT) CEO Satya Nadella...
25,4360065,Microsoft Corp. (MSFT) CEO Satya Nadella on Q4...
28,4341291,Microsoft Corp (MSFT) CEO Satya Nadella on Q3 ...
31,4320005,Microsoft Corporation (MSFT) CEO Satya Nadella...
36,4298421,Microsoft Corp. (MSFT) CEO Satya Nadella on Q1...
39,4275911,Microsoft Corporation (MSFT) CEO Satya Nadella...


In [52]:
import json

url = "https://seeking-alpha.p.rapidapi.com/transcripts/v2/get-details"

headers = {
	"X-RapidAPI-Key": "", #enter your key between the double quotes here
	"X-RapidAPI-Host": "seeking-alpha.p.rapidapi.com"
}

for ind in transcripts_useful2.index:
	response = requests.get(url, headers=headers, params={"id": transcripts_useful2['id'][ind]} )
	filename = transcripts_useful2['Title'][ind]
	f = open(company_name+'/'+filename, 'w')
	json.dump(response.json(), f)
	f.close()