## How to Create an Account on WRDS

These steps should guide you through the process of creating and setting up your WRDS account effectively.

1. **Visit the WRDS Website:** Go to the WRDS registration page at [WRDS Registration](https://wrds-www.wharton.upenn.edu/register/) and use your university email to create an account.

2. **Fill Out Registration Form:** Complete the registration form with the required information.

3. **Email Confirmation:** After registration, check your university email for a message from WRDS. Click the link provided in the email to confirm your email address.

4. **Wait for Approval:** If you're associated with the University of Exeter, please note that your account may require approval from a university department staff member. Be prepared to wait for up to 5 working days for this approval. As a staff member, I waited for 5 working days as well, so consider this as the expected waiting period.

5. **Contact for Delays:** If the approval process takes longer than expected, send an email to Dr. Yao Chen (Y.Chen5@exeter.ac.uk) and cc myself on the email to inquire about the status.

6. **Account Approval Email:** Once your account is approved (after 5 working days!), you will receive an email stating: "Your WRDS faculty account (rc813) has just been created. You can now log in and begin accessing the data as University of Exeter is subscribed."

7. **Login:** Visit "https://wrds-www.wharton.upenn.edu/" and log in using your newly created account.

8. **Set Up Credentials:** After logging in, you'll need to set up your credentials. This typically involves enabling two-factor authentication (2FA). You'll be asked to download Duo Security. It's recommended to use the Duo App for 2FA.

9. **Duo App Setup:** Upon using the Duo App for the first time, you will need to log in to "https://wrds-www.wharton.upenn.edu/" once more and accept the terms and conditions of the database.

10. **Start Using WRDS:** After accepting the terms and conditions, you're ready to start using WRDS to access and retrieve data. You can refer to the startup manual on how to use Jupyter Notebook to fetch data at [WRDS Jupyter Notebook Guide](https://wrds-www.wharton.upenn.edu/pages/support/programming-wrds/programming-python/jupyterhub-wrds/).

In [None]:
!pip install wrds

In [None]:
# Import WRDS Module: Start by importing the WRDS module in your Python script.
import wrds
import pandas as pd

In [None]:
# Connect to WRDS: Establish a connection using your WRDS account credentials.
db = wrds.Connection(wrds_username='your_user_name')

**The Wharton Research Data Services** (WRDS) platform offers access to a variety of financial, economic, and marketing data. When you use the WRDS Python module, you can connect to their database and access different datasets. 

All of these datasets, known as "libraries" in the WRDS context. Each library contains specific types of data. 

Identify Available Datasets: You can explore available banking datasets. WRDS includes datasets like the Federal Reserve Bank Reports, Bank Holding Companies data, etc.
* db.list_libraries()

In [None]:
# list of all the datasets that you can access from WRDS
list_of_datasets=db.list_libraries()
list_of_datasets[-5:]

Each of these datasets serves a specific purpose in financial and economic research, offering insights into market behavior, corporate actions, investment analysis, and regulatory activities. The actual content of these datasets can be confirmed by accessing them via the WRDS platform and exploring the tables and variables they contain.

You can also go their webpages, particularly the [research option page](https://wrds-www.wharton.upenn.edu/pages/wrds-research/), watch their videos and try to explore more things about these libraries. 

If you want stock data it is known that databases such as CRSP (Center for Research in Security Prices) or Compustat provides stock price information. 

In [None]:
datasets_with_c = [i for i in list_of_datasets if "crs" in i]
datasets_with_c

In [None]:
# List all tables in the CRSP library, what do these tables mean? 
tables = db.list_tables(library='crsp')
tables


In [None]:
# Let us try to understand what the table dsf in crsp holds
columns = db.describe_table('crsp', 'dsf')
columns

# This is refered to schema information of dsf table, i.e. vol stand for volume, these are important to write the SQL code.

# Accessing Stock Data for Moderna, Pfizer, and Johnson & Johnson in WRDS

When working with the WRDS database, particularly with the CRSP (Center for Research in Security Prices) data, it's important to understand that stock identifiers like ticker symbols are not directly used in all tables. Instead, CRSP uses a unique identifier known as `permno` (permanent number). To access stock data for companies like Moderna, Pfizer, and Johnson & Johnson, you need to find their respective `permno` values first.

## Step 1: Find `permno` for Each Company

To find the `permno` for a specific company, you have to refer to a table within the CRSP database that maps ticker symbols to `permno`. A common table used for this purpose is `crsp.stocknames`.

### How to Check Which Table Contains What?
To determine which table contains the information you need (in this case, the mapping of ticker symbols to `permno`), you can follow these steps:

1. **List Available Tables in a Library**: 
   Use the `list_tables` function to see all tables in a specific library, like CRSP.
   ```python
   tables = db.list_tables(library='CRSP')
   print(tables)


In [None]:
# After confirming the table structure of stocknames in crsp, write a SQL query to retrieve permno values for your companies of interest.
permno_query = """
SELECT permno, ticker
FROM crsp.stocknames
WHERE ticker IN ('MRNA', 'PFE', 'JNJ')
"""
permnos = db.raw_sql(permno_query)
permnos


# Important Considerations for Accessing CRSP Data in WRDS

When working with the CRSP (Center for Research in Security Prices) database in WRDS, there are several key points to keep in mind, particularly when dealing with `permno` values and understanding stock prices.

## Understanding `permno` Values

`permno` is a unique identifier used by CRSP to track securities. It is important to note the following about `permno` values:

1. **Changing `permno` Over Time**: 
   - A company's `permno` might change due to various corporate actions like mergers, restructurings, or stock splits. Therefore, the `permno` associated with a company can vary over different time periods.
   - When researching historical data, ensure you are using the correct `permno` for the specific time period you are interested in. This might involve checking for any corporate actions that could have led to a change in `permno`.

2. **Multiple `permno` Values for a Company**: 
   - It's not uncommon for a single company to have multiple `permno` values over time or due to different share classes.
   - For instance, Moderna (MRNA) has `permno` values of 18312.0 and 80004.0, reflecting either different time periods or share classes.

## Stock Prices in the `crsp.dsf` Table

The `crsp.dsf` table contains daily stock price data, and understanding how these prices are represented is crucial:

1. **Price Column (`prc`)**: 
   - The `prc` column typically represents the stock's price. However, this can be either the closing price, the average of the bid and ask prices, or another measure, depending on CRSP's methodology.
   - Verify whether the prices are adjusted or unadjusted. Adjusted prices account for corporate actions like splits and dividends, providing a more accurate reflection of a stock's historical performance.

2. **Adjustments for Splits and Dividends**: 
   - If working with adjusted prices, understand how these adjustments are made to accurately interpret price movements over time.

---

Given these considerations, when compiling a list of `permno` values for specific companies, you should include each unique `permno` associated with the tickers of interest:

- **Moderna (MRNA)**: `permno` values are 18312.0 and 80004.0.
- **Pfizer (PFE)**: `permno` value is 21936.0.
- **Johnson & Johnson (JNJ)**: `permno` value is 22111.0.

By accounting for these important notes, you can ensure more accurate and relevant data retrieval from the CRSP database for your financial analyses.

---

**Note for Students**: 
* Understanding the structure of the database and the relationships between different tables is crucial for effective data retrieval in financial databases like WRDS. Always start by exploring the schema and available data in each table to ensure you are querying the right information.
* In financial data analysis, paying attention to the nuances of data representation and changes over time is critical. Always consider the context and specifics of the data you are working with, especially in dynamic databases like CRSP.


In [None]:
stock_price_query = """
SELECT date, permno, prc
FROM crsp.dsf
WHERE permno IN (18312.0, 80004.0, 21936.0, 22111.0)
AND date BETWEEN '2018-06-01' AND '2023-06-30'
"""
stock_prices = db.raw_sql(stock_price_query)
stock_prices


In [None]:
# To include the volume and time of each stock price in your query, you need to adjust the SQL query to select the relevant columns from the crsp.dsf table. 
stock_price_query = """
SELECT date, permno, prc, vol
FROM crsp.dsf
WHERE permno IN (18312.0, 80004.0, 21936.0, 22111.0)
AND date BETWEEN '2018-06-01' AND '2023-06-30'
"""
stock_prices = db.raw_sql(stock_price_query)
stock_prices


## Step 4: Data Analysis for Risk Management
* Data Manipulation: Use pandas to manipulate and prepare your data for analysis. This could involve cleaning, filtering, and restructuring the data.
* Risk Analysis: Perform risk analysis depending on your specific requirements. This might involve calculating financial ratios, assessing loan defaults, credit risk analysis, etc.
* Statistical and Econometric Analysis: Utilize Python’s statistical libraries like scipy or statsmodels for more advanced risk analysis.
* NB: See Fama-French example on their webpages: [Fama-French Factors (Python - CIZ Format)](https://wrds-www.wharton.upenn.edu/pages/wrds-research/applications/python-replications/fama-french-factors-python-ciz-format/)


In [None]:
# Sort the DataFrame
stock_prices1 = stock_prices.sort_values(by=['permno', 'date'])
stock_prices1

In [None]:
# Format Date and Numeric Columns
stock_prices1['date'] = pd.to_datetime(stock_prices1['date']).dt.date
stock_prices1['prc'] = stock_prices1['prc'].round(2)
stock_prices1['vol'] = stock_prices1['vol'].round(0)
stock_prices1


In [None]:
# Add Calculated Columns (if needed)
# Example: Calculating daily returns
stock_prices1['daily_return'] = stock_prices1.groupby('permno')['prc'].pct_change().round(4)
stock_prices1


In [None]:
# Map permno to Stock Names
# Map of permno to stock names
permno_to_name = {
    18312.0: 'Moderna',
    80004.0: 'Moderna',
    21936.0: 'Pfizer',
    22111.0: 'Johnson & Johnson'
}

# Add a column for stock names
stock_prices1['stock_name'] = stock_prices1['permno'].map(permno_to_name)
stock_prices1.head()


In [None]:
# Pivot the DataFrame
# Pivot the DataFrame
stock_prices_pivoted = stock_prices1.pivot_table(index='date', 
                                                columns='stock_name', 
                                                values=['daily_return', 'vol'])
stock_prices_pivoted.tail()

In [None]:
# Rename Columns
stock_prices_pivoted.columns = ['_'.join(col).strip() for col in stock_prices_pivoted.columns.values]
stock_prices_pivoted.tail()

In [None]:
# Reset Index and Display
stock_prices_pivoted = stock_prices_pivoted.reset_index()

# Display the DataFrame
stock_prices_pivoted.tail(10)


## Step 5: Visualization and Reporting
Data Visualization: Utilize libraries like matplotlib or seaborn for visualizing your analysis. This is particularly helpful in understanding trends and patterns in risk metrics.

In [None]:
# First, import the necessary Python libraries for plotting.
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

sns.set()  # Optional for nicer plot styles


In [None]:
# Prepare the DataFrame
stock_prices_pivoted['date'] = pd.to_datetime(stock_prices_pivoted['date'])
stock_prices_pivoted.set_index('date', inplace=True)
stock_prices_pivoted.head()

In [None]:
# Plotting Daily Returns
plt.figure(figsize=(12, 6))
plt.plot(stock_prices_pivoted.index, stock_prices_pivoted['daily_return_Johnson & Johnson'], label='Johnson & Johnson')
plt.plot(stock_prices_pivoted.index, stock_prices_pivoted['daily_return_Moderna'], label='Moderna')
plt.plot(stock_prices_pivoted.index, stock_prices_pivoted['daily_return_Pfizer'], label='Pfizer')
plt.title('Daily Stock Returns')
plt.xlabel('Date')
plt.ylabel('Daily Return')
plt.legend()
plt.xticks(rotation=45)
plt.show()



In [None]:
# Plotting Volumes
plt.figure(figsize=(12, 6))
plt.plot(stock_prices_pivoted.index, stock_prices_pivoted['vol_Johnson & Johnson'], label='Johnson & Johnson')
plt.plot(stock_prices_pivoted.index, stock_prices_pivoted['vol_Moderna'], label='Moderna')
plt.plot(stock_prices_pivoted.index, stock_prices_pivoted['vol_Pfizer'], label='Pfizer')
plt.title('Trading Volume Over Time')
plt.xlabel('Date')
plt.ylabel('Volume')
plt.legend()
plt.xticks(rotation=45)
plt.show()


NB: Feel free to customize the plots further with different colors, styles, or additional annotations to enhance readability and insights.

## Step 6: Closing the Connection

* Always ensure to close the WRDS database connection once your analysis is complete.


Have a look into the webpage for [Fama-French modelling](https://wrds-www.wharton.upenn.edu/pages/wrds-research/applications/python-replications/fama-french-factors-python/) 

In [None]:
db.close()