# Crawl Data

<img src="https://a.storyblok.com/f/139616/1200x600/1eca701f0a/crawl-and-scrap-to-find-datasets.webp" width=550>

## Introduction

>The **Motley Fool** is a financial services company that provides investment advice
and financial news to its subscribers.
One of its services is to provide earnings call transcripts for publicly traded companies.


> The ***earnings call transcript*** is a written record of the conference call between a company's management team and its investors, analysts, and other interested parties. These transcripts can provide valuable information about a company's financial performance, strategy, and future plans.

- In this guide, we will show you how to use Python and BeautifulSoup <br> to extract data from earnings call transcripts on https://www.fool.com/earnings/call-transcripts.

## Prerequisites
#### Before we begin, you should have the following installed on your system:

- The requests library
- The BeautifulSoup library

You can install requests and BeautifulSoup by running the following commands in your terminal:

```
pip install requests
pip install beautifulsoup4


```

## Steps to Crawl Data
**1. Identify the URL of the earnings call transcript you want to extract data from.**<br>
<br>
For example, let's use this transcript for Apple's Q1 2023 earnings call:<br>
https://www.fool.com/earnings/call-transcripts/2023/02/02/apple-aapl-q1-2023-earnings-call-transcript/

**2. Use the requests library to send a GET request to the URL and store the response in a variable:**

In [70]:
import requests

url = "https://www.fool.com/earnings/call-transcripts/2023/02/02/apple-aapl-q1-2023-earnings-call-transcript/"
response = requests.get(url)

**3. Use BeautifulSoup to parse the HTML content of the response:**

In [71]:
from bs4 import BeautifulSoup

soup = BeautifulSoup(response.text, 'html.parser')

**4. Extract the relevant data from the HTML using BeautifulSoup's find() and find_all() methods**. 

#### Here are some examples:

1. To extract the main title of the earnings call:

In [None]:
main_title = soup.find("h1", class_="font-medium text-gray-1100 leading-42 md:text-h1").text

1. To extract the date of the earnings call:

In [73]:
date = soup.find(id="date").text

2. To extract the time of the earnings call:

In [74]:
time = soup.find(id="time").text

3. To extract the title of the earnings call (usually the name of the company):

In [75]:
title = soup.find("h2", class_="font-light leading-10 text-h3 text-gray-1100 mb-32px").text

4. To extract the content of the earnings call (transcript):

In [76]:
content = soup.find("div", class_="tailwind-article-body").text

5. Store the extracted data in a dictionary:

In [79]:
earnings_dict = {
    "main_title": main_title,
    "date": date,
    "time": time,
    "title": title,
    "content": content
}


6. (Optional) Repeat steps 1-5 for other earnings call transcripts you want to extract data from.
7. (Optional) Store the extracted data in a database or a file for further analysis.

## Conclusion
In this guide, we showed you how to use Python and BeautifulSoup to extract data from<br> earnings call transcripts on https://www.fool.com/earnings/call-transcripts. <br>By following these steps, you can easily crawl and extract data from any earnings call transcript

# Functions

In [38]:
import requests
from bs4 import BeautifulSoup

def single_crawler(ticker):
    # Create the URL to scrape based on the ticker symbol
    url = f'https://www.fool.com/quote/nasdaq/{ticker}/'
    
    # Send a GET request to the URL and store the response
    response = requests.get(url)
    
    # Use BeautifulSoup to parse the HTML content of the response
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Find all <a> elements that have a certain class
    links = soup.find_all("a", class_="block border-b border-gray-300 hover-trigger "\
                          "md:items-center text-gray-1100 hover:text-black py-12px")
    
    # Define the base URL for the links we found
    base_url = "https://www.fool.com"
    
    # Use a list comprehension to construct a list of full URLs
    # by concatenating the base URL with the href attribute of each <a> element
    link_list = [base_url + link['href'] for link in links]
    
    # Return the list of links
    return link_list


def earnings_crawler(url):
    # Send a GET request to the URL and store the response
    response = requests.get(url)
    
    # Use BeautifulSoup to parse the HTML content of the response
    soup = BeautifulSoup(response.text, 'html.parser') 
    
    # Create an empty dictionary to store the extracted data
    earnings_dict = {}
        
    # Extract the main title of the earnings call
    try:
        earnings_dict["main_title"] = soup.find("h1", class_="font-medium text-gray-1100 leading-42 md:text-h1").text
    except:
        earnings_dict["main_title"] = "N/A"
        
    # Extract the date of the earnings call
    try:
        earnings_dict["date"] = soup.find(id="date").text
    except:
        earnings_dict["date"] = "N/A"
        
    # Extract the time of the earnings call
    try:
        earnings_dict["time"] = soup.find(id="time").text
    except:
        earnings_dict["time"] = "N/A"
        
    # Extract the title of the earnings call (usually the name of the company)
    try:
        earnings_dict["title"] = soup.find("h2", class_="font-light leading-10 text-h3 text-gray-1100 mb-32px").text
    except:
        earnings_dict["title"] = "N/A"
        
    # Extract the content of the earnings call (transcript)
    try:
        earnings_dict["content"] = soup.find("div", class_="tailwind-article-body").text
    except:
        earnings_dict["content"] = "N/A"
        
    # Return the dictionary of extracted data
    return earnings_dict


In [68]:
earnings_crawler(url)

{'main_title': 'Apple (AAPL) Q1 2023 Earnings Call Transcript',
 'date': 'Feb 02, 2023',
 'time': '5:00 p.m. ET',
 'title': 'AAPL earnings call for the period ending December 31, 2022.',
 'content': "\n\nImage source: The Motley Fool.\n\nApple\xa0(AAPL 0.55%)Q1\xa02023 Earnings CallFeb 02, 2023, 5:00 p.m. ETContents:  Prepared Remarks Questions and Answers Call Participants  Prepared Remarks:  OperatorGood day, everyone, and welcome to the Apple Q1 fiscal year 2023 earnings conference call. Today's call is being recorded. And now at this time, for opening remarks and introductions, I would like to turn the call over to Tejas Gala, director of investor relations and corporate finance. Please go ahead.Tejas Gala -- Director, Investor Relations and Corporate Finance Thank you. Speaking first today is Apple's CEO, Tim Cook; and he'll be followed by CFO, Luca Maestri. After that, we'll open the call to questions from analysts. Before turning the call over to Tim, I would like to remind ever

## Storing crawled data in PostgreSQL

<img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/PostgresDMSHeader.max-2200x2200.png" width=550>

Storing crawled data in PostgreSQL is a common task in web scraping projects, and can be done easily with Python. <br>Here's a step-by-step guide to store crawled data from https://www.fool.com/earnings/call-transcripts/ in PostgreSQL using Python:

**1. Install required packages:You need to install the psycopg2 package to connect to PostgreSQL and execute SQL queries.**

```
pip install psycopg2

```

**2. Create a database and table: Open PostgreSQL command-line interface (CLI) and create a database and table to store the data.**

```
CREATE DATABASE your_database_name;
```

```
CREATE TABLE earnings (
    id SERIAL PRIMARY KEY,
    main_title TEXT,
    date TEXT,
    time TEXT,
    title TEXT,
    content TEXT
);

```

#### 3. Connect to the database: Use psycopg2 to connect to the database in your Python code.

```
import psycopg2

conn = psycopg2.connect(
    host="localhost",
    database="your_database_name",
    user="your_username",
    password="your_password"
)
```

**4. Crawl data and insert into the database: Modify your crawling code to store the data in the earnings table.**

# Functions

In [None]:
def make_db_connection_psycopg2(database: str, autocommit: bool = False):

    assert database in list(conn_dict.keys()), "server couldn't be recognized in the config file"

    host = conn_dict[database]['host']
    user = conn_dict[database]['user']
    password = conn_dict[database]['password']
    dbname = conn_dict[database]['dbname']

    conn = psycopg2.connect(user=user, password=password, host=host, dbname=dbname)
    cursor = conn.cursor()

    if autocommit:
        conn.autocommit = True

    return conn, cursor
