# **1. WRDS Acess**

First, install the wrds Python module and any required dependencies.

`pip install wrds`



### **1.1 Initial Setup - The pgpass File**

To avoid entering your WRDS username and password each time you connect, you can create a `pgpass` file that securely stores your credentials. Run the following commands to connect to WRDS and create the pgpass file:

In [2]:
import wrds
db = wrds.Connection(wrds_username='')
db.create_pgpass_file()

WRDS recommends setting up a .pgpass file.
Created .pgpass file successfully.
You can create this file yourself at any time with the create_pgpass_file() function.
Loading library list...
Done


Replace `'your_wrds_username'` with your actual WRDS username. The first time you run the `Connection()` command, you’ll be prompted to enter your WRDS username and password. After running `create_pgpass_file()`, you won’t need to enter your credentials again.

You can test the connection by closing and reconnecting. If the connection is successful, you should be able to connect without re-entering your credentials.

In [3]:
db.close()
db = wrds.Connection(wrds_username='mnicolas')

Loading library list...
Done


### **1.2 Example Query: S&P 500 Index Returns**

This query retrieves the S&P 500 index returns (`sprtrn`) from WRDS' CRSP data between January 1, 1963, and January 1, 2024, and formats the date properly in the resulting DataFrame.

In [4]:
import pandas as pd

# Query S&P 500 returns from CRSP
df_sprtrn = db.raw_sql("""
    SELECT main.caldt, main.sprtrn
    FROM crsp_a_indexes.dsp500 AS main
    WHERE main.caldt >= '1963-01-01' AND main.caldt <= '2024-01-01'
    """ )

# Set the index as the date and drop the original date column
df_sprtrn.index = pd.to_datetime(df_sprtrn.caldt, format="%Y-%m-%d")
df_sprtrn = df_sprtrn.drop(columns=["caldt"])

# Show the last 5 rows
print(df_sprtrn.tail(5))

              sprtrn
caldt               
2023-12-22  0.001660
2023-12-26  0.004232
2023-12-27  0.001430
2023-12-28  0.000370
2023-12-29 -0.002826
