# Chapter 1b: Getting Started to Research <font color='red'>**(OPTIONAL)** </font> 

This subchapter introduces you how to use the IU specific computing reesources on/with Python. 

Last edited: 05/16/2021


**Contents of this Notebook:**

- [Section 1. IU Research Desktop (RED)](#Section-1.-IU-Research-Desktop-(RED))
- [Section 2. Access WRDS](#Section-2.-Access-WRDS)


## Section 1. IU Research Desktop (RED)


If you don't have enough storage in your laptop or you have any technical concerns, using IU's research desktop will be a good choice. 

No matter what, for any future research, I do suggest that you create supercomputer accounts and get familiar with IU research desktop. 


You can use the Jupyter Notebook or Spyder on IU's research desktop ([RED](https://kb.iu.edu/d/apum)) via Thinlinc Client. That is, you can work on IU's supercomputer via a virtual desktop. Note that the RED is using Linux system, so there is a little fix cost of learning terminal commands. A typical RED looks in this way.

![RED](images/a151x.jpeg)
<font color='orange'>*Tip: F8 in ThinLinc if you want to enter or exit full screen mode*.</font>


#### 1. Creating Accounts
To use RED, you need an account on [Carbonate](https://kb.iu.edu/d/aolp). Moreover, because there is 100GB limitation of home directory disk storage, I usually use [Slate](https://kb.iu.edu/d/aqnk) for storage. You might want to create a Slate account as well. 

*For sure, there are some other storage methods on IU research system, see [Available access to allocated and short-term storage capacity on IU's research systems](https://kb.iu.edu/d/avkm).*

>
> Below are the account creation links:
> 
> 1. If you do not already have an account, create one by visiting [one.iu.edu](https://one.iu.edu/) and searching for "[Create More Accounts](https://one.iu.edu/launch-task/iu/account-creation)". 
> 2. Select `Carbonate` and `Slate`, and then `Create Account`.
>

<font color='red'>Note that it may take one/two workdays to authorize your accounts.</font><br>

Learn more about [Get additional IU computing accounts](https://kb.iu.edu/d/achr).

#### 2. Remote Desktop (ThinLinc)
Using ThinkLinc is the simplest way to get access to the RED. You need to download, install and configure ThinLinc Client in your local (own) computer. 

Please follow the instruction CLOSELY on [Download, install, and configure ThinLinc Client to use Research Desktop (RED) at IU](https://kb.iu.edu/d/aput).

After configuring ThinLinc Client correctly, you should be able to open RED with your IU account and authentication tool. Let's exploring the new world!

**Some useful information on IU supercomputer**:  
- [Use the IU Globus Web App to transfer data between your accounts on IU's research computing and storage systems](https://kb.iu.edu/d/bdqp)
- [Access Google at IU My Drive from Research Desktop (RED)](https://kb.iu.edu/d/bgsp)
- [Run jobs on Carbonate](https://kb.iu.edu/d/avjo)

#### 3. Jupyter Notebook
On the top-left menu bar of the RED, you can find Jupyter notebook via `Applications` - `Analytics` - `Jupyter Notebook`.

Jupyter will automatically open up in your default web browser, Firefox, with your Carbonate Home as root directory, i.e., `/N/u/`IU-username`/Carbonate`.

Because Jupyter Notebook allows user to author content in Markdown and HTML, and provides an interactive data science environment with code and text, for educational purpose, I will use Jupyter for the this course.

In future, you might want to write your project in Spyder, which has a interface similar with R-studio.

#### 4. Spyder
If you want to use Spyder, RED doesn't put Spyder explicitly in the `Applications`. To open it, you need to use Terminal window.

1. Open `Terminal` on the RED
2. Type and return 
> `module load anaconda`


3. Type and return 
> `spyder`

It might take ~1 min to open the Spyder. Be patient. <br>
*Note that you can also launch Jupyter Notebook by using `jupyter notebook` in the step 3.*

#### 5. Run jobs on Carbonate 

**Show "Hello World" example on RED**


1. Change your working directory on Terminal:

        cd intro_to_python_redproject/


2. Submit your job on Terminal:<br>
<font color='red'>Change your email and working directory in the template!!!</font><br>

        sbatch intro_to_python_template

*You can generate the shell script with [HPC everywhere](https://hpceverywhere.iu.edu/).*

3. Check status on Terminal:  

        squeue -u iuusername -l

*I thank Rong Fan and Pengfei Ma for the helpful suggestions on submitting jobs on Carbonate.*

## Section 2. Access WRDS

This course will use Compustat as a sample example, which is financial data for publicly traded companies. You can download the sample data in the shared google drive. Alternatively, you can download it via Wharton Research Data Services (WRDS).

In general, you can download data from WRDS by using web query. But for specific purpose, researchers usually access WRDS through SAS, Python, R, MATLAB, or Stata.
In this section, I will show you how you can access WRDS data from Python on your computer.

Main reference:

[PYTHON: From Your Computer (Jupyter/Spyder)](https://wrds-www.wharton.upenn.edu/pages/support/programming-wrds/programming-python/python-from-your-computer/)

[Querying WRDS Data using Python]( https://wrds-www.wharton.upenn.edu/pages/support/programming-wrds/programming-python/querying-wrds-data-python/)

### 1. Register for a WRDS account
IU has subscripted the WRDS. To access the database, you need to register your individual account.

Please note: INDIVIDUAL REGISTRATION WITH AN <font color='orange'>**OFFICIAL INDIANA UNIVERSITY EMAIL ACCOUNT**</font> IS REQUIRED TO ACCESS THIS RESOURCE.

After you submit your registration, your account would be verified and authorized within few workdays.

For more details, read [IU Libraries WRDS Datasets](https://libraries.indiana.edu/wrds-datasets)


### 2. Install WRDS module
Open terminal window, run the following command:
`pip install wrds`

If using RED, you need to run:
`pip install wrds --user`

### 3. First time use -- Setup pgpass file
Open Jupyter Notebook, run the following:
```python
import wrds
db = wrds.Connection(wrds_username='yourusername')
db.create_pgpass_file()
```

**Expected ouput:**<br>
Enter your WRDS username [mymac]:yourusername<br>
Enter your password:········<br>
WRDS recommends setting up a .pgpass file.<br>
You can find more info here:<br>
https://www.postgresql.org/docs/9.5/static/libpq-pgpass.html.<br>
Loading library list...<br>
Done

After running `create_pgpass_file()` once, you should be able to connect from then on without needing to do so. Test this by disconnecting and reconnecting, using the following:
```python
db.close()
db = wrds.Connection(wrds_username='yourusername')
```


### 4. Querying WRDS Data using Python
After connecting to the WRDS, we can query data by using `get_table` or PostgreSQL commands.

The connection is saved as variable `db` (database).

In [None]:
# Do not forget to load the WRDS module and connection before pulling the data
import wrds

db = wrds.Connection(wrds_username="yourusername")

# Two ways of pulling data from WRDS
db.get_table("djones", "djdaily", columns=["date", "dji"], obs=10)

In [None]:
db.raw_sql("select date,dji from djones.djdaily LIMIT 10;", date_cols=["date"])

In [None]:
db.raw_sql("select * from comp.fundq LIMIT 10", date_cols=["datadate"])

There are too many variables from `comp.fundq`. We can select certain columns of interest and put a `where` clause on companies and time. 

In [None]:
companies = " (tic='AAPL' or tic='IBM' or tic='FB') "
time_frames = " (datadate>'1999-12-31' and datadate<'2010-01-01') "

# time_frames = " (datacqtr>'1999Q4' and datacqtr<'2010Q1') "
db.raw_sql(
    "select gvkey,datadate,datacqtr,datafqtr,fyearq,tic,conm,atq,capxy,ppegtq,ppentq,saleq from comp.fundq where ("
    + companies
    + " and "
    + time_frames
    + ")",
    date_cols=["datadate"],
)