<a href="https://colab.research.google.com/github/marketpsych/marketpsych/blob/main/notebooks/load_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Loading CSV files directly from SFTP

This notebook shows how to load MarketPsych's data with your SFTP credentials directly into a Jupyter Notebook. Note, however, that this notebook is intended for simple analyses and visualizations. For bulkier testing, you have to use the flat files, as instructed by the MRNSupport. 

---
## 1. Settings
This notebook uses some widgets (to facilitate with your navigation) and MarketPsych's library. To install the libraries and enable widgets, please run the following cell. 

<font color='blue'>**HOW TO**</font>    
<font color='blue'>1. Run the following cell.</font>  

In [1]:
import sys
# Installs marketpsych library into your environment
!{sys.executable} -m pip install marketpsych ipywidgets --upgrade --quiet
# Allows using the widgets
!{sys.executable} -m jupyter nbextension enable --py widgetsnbextension

## Libraries
from marketpsych import sftp, mpwidgets
from IPython.core.magic import register_cell_magic
from IPython.display import HTML, display
import pandas as pd

Enabling notebook extension jupyter-js-widgets/extension...
      - Validating: [32mOK[0m


---
## 2. Selecting your login credentials

Please input your credentials, i.e., the path to the .ppk as provided by the MRNSupport. 
 
<font color='blue'>**HOW TO**</font>  
<font color='blue'>1. Run the following cell.</font>  
<font color='blue'>2. Click on `Key File`.</font>   
<font color='blue'>3. Select your .ppk key [it must have be in a 1234567.ppk format].</font>  

In [2]:
cwdgts = mpwidgets.LoginWidgets()
cwdgts.display()

HBox(children=(FileUpload(value={}, description='Key File:'), HTML(value='', description='User ID:', placehold…

---

<a id='the_destination'></a>
## 3. Loading the data

Now you can download the files directly into a pandas dataframe. The options can be defined through the 5 widgets below. A description of each option is given in the following:

 - **Trial**: Select the checkbox if you are trialing the data. In some special cases, even if you are trialing, you may need to uncheck it (you can try both options in case of Permission errors).

 - **Asset class**:  
 
|Asset class   | Description|
|:-------------|:------------|
|`CMPNY`       | Individual companies|
|`CMPNY_AMER`  | Individual companies domiciled in America|
|`CMPNY_APAC`  | Individual companies domiciled in APAC|
|`CMPNY_EMEA`  | Individual companies domiciled in EMEA|
|`CMPNY_ESG`   | Individual companies (ESG package)|
|`CMPNY_GRP`   | Company groups and ETFs|
|`COM_AGR`     | Agricultural commodities|  
|`COM_ENM`     | Energy and Metals|
|`COU`         | Countries|
|`COU_ESG`     | Countries (ESG package)|
|`COU_MKT`     | Stock indices, sovereign bonds, real estate|
|`CRYPTO`      | Cryptocurrencies|
|`CUR`         | Currencies|  

 - **Frequencies**:  
 
|Frequency  | Description| Use case |
|:----------|:-----------|:---------|
|`W365_UDAI`| Yearly lookback window and daily updates| ESG Core only |
|`WDAI_UDAI`| Daily lookback window and daily updates| Daily data stamped at 15:30 ET|
|`WDAI_UHOU`| Daily lookback window and hourly updates| Daily data stamped hourly (in case you want daily data adjusted to your time-zone) |
|`W01M_U01M`| Minutely lookback window and minutely updates| Low-latency data (**WARNING:** extremely large datasets)|

- **Start Date**: Select the start of the period for which you would like to download the data.
- **End Date**: Select the end of the period for which you would like to download the data.

-  **Data Type**:  

|Frequency  | Description|
|:----------|:-----------|
|`News`| For news sources only (headlines and corpus)|
|`News_Headline`| For the headlines of news sources only| 
|`Social`| For social media sources| 
|`News_Social`| For the combined content|
 
- **Assets**: Asset codes as provided in the user guide. For companies, use the PermID. This is useful when you only want to use data for one or a few assets.

**WARNING**   
- I. Loading large files such as CMPNY data with a long window-frame can take quite a while and take all the available memory. Start by loading short periods (e.g., one month of data). Alternatively, you can select a subset of assets, as above.

- II. Check your asset class permissions. If you try downloading data for which no access was provided, it will give a Permission error:
```python
"PermissionError": [Errno 13] Access denied
```   

<font color='blue'>**HOW TO**</font>    
<font color='blue'>1. Run the cell below (after running it, you should see several widgets).</font>  
<font color='blue'>2. Make your selections according to the explanations above.</font>  
<font color='blue'>3. Click on the `Load Selection` button.</font>  

In [3]:
lwdgts = mpwidgets.LoaderWidgets(cwdgts.client)
lwdgts.display()

Checkbox(value=True, description='Trial')

HBox(children=(Dropdown(description='Asset class:', index=7, options=('CMPNY', 'CMPNY_AMER', 'CMPNY_APAC', 'CM…

HBox(children=(DatePicker(value=datetime.datetime(2020, 12, 1, 0, 0), description='Start date:'), DatePicker(v…

HBox(children=(SelectMultiple(description='Source:', index=(0, 1, 2, 3), options=('News_Social', 'News', 'News…

Button(description='Load Selection', style=ButtonStyle(), tooltip='Load data according to selections')

Output()

Loading...


Unnamed: 0,id,assetCode,windowTimestamp,dataType,systemVersion,mentions,buzz,sentiment,negative,positive,...,overvaluedVsUndervalued,volatility,consumptionVolume,productionVolume,regulatoryIssues,supplyVsDemand,supplyVsDemandForecast,newExploration,safetyAccident,futureVsPast
0,mp:2020-12-01_20.30.00.News.COM_ENM.ALU,ALU,2020-12-01 20:30:00+00:00,News,MP:4.0.0,261,1036.6,0.101775,0.188115,0.289890,...,,0.045823,,-0.001929,,-0.013506,-0.005788,0.000965,,-0.181266
1,mp:2020-12-01_20.30.00.News_Social.COM_ENM.ALU,ALU,2020-12-01 20:30:00+00:00,News_Social,MP:4.0.0,337,1253.7,0.099705,0.180266,0.279971,...,,0.043471,,0.000000,,-0.003191,-0.003191,0.000798,,-0.158650
2,mp:2020-12-01_20.30.00.News_Headline.COM_ENM.ALU,ALU,2020-12-01 20:30:00+00:00,News_Headline,MP:4.0.0,15,22.0,0.272727,0.181818,0.454545,...,,,,,,,,,,0.363636
3,mp:2020-12-01_20.30.00.Social.COM_ENM.ALU,ALU,2020-12-01 20:30:00+00:00,Social,MP:4.0.0,76,217.1,0.089820,0.142791,0.232612,...,,0.032243,,0.009212,,0.046062,0.009212,,,-0.050668
4,mp:2020-12-01_20.30.00.News.COM_ENM.ANGS,ANGS,2020-12-01 20:30:00+00:00,News,MP:4.0.0,19,110.0,0.254545,0.045455,0.300000,...,,,0.018182,,,-0.009091,,,,-0.327273
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4315,mp:2020-12-30_20.30.00.Social.COM_ENM.USCRU,USCRU,2020-12-30 20:30:00+00:00,Social,MP:4.0.0,1931,7061.5,0.075621,0.179424,0.255045,...,0.000283,0.009842,,0.001133,,-0.016569,-0.004532,0.001416,,-0.040232
4316,mp:2020-12-30_20.30.00.News.COM_ENM.ZNC,ZNC,2020-12-30 20:30:00+00:00,News,MP:4.0.0,186,772.1,0.141821,0.175495,0.317316,...,,0.006476,,0.018132,,0.040798,0.011009,0.003886,0.001295,0.154902
4317,mp:2020-12-30_20.30.00.News_Social.COM_ENM.ZNC,ZNC,2020-12-30 20:30:00+00:00,News_Social,MP:4.0.0,287,927.6,0.133139,0.181113,0.314252,...,,0.007007,,0.016171,,0.035037,0.009163,0.003234,0.001078,0.133786
4318,mp:2020-12-30_20.30.00.News_Headline.COM_ENM.ZNC,ZNC,2020-12-30 20:30:00+00:00,News_Headline,MP:4.0.0,8,17.0,0.352941,,0.352941,...,,,,0.470588,,0.470588,0.058824,,,0.058824


Done


If you can see a dataframe above, congratulations! You have downloaded some data into your notebook. From here, you can have fun exploring it. Below, you'll find a plotting tool for some simple understanding of the data.

---
## 4. Visualizing the data

Below you can use the widgets to do some basic exploration. A description of the widgets is given below.

- **Source**: It is selected above, if you selected only one source but would like to visualise a different source please select it in <a href='#the_destination'>Loading the data</a> and then refer back to here

- **Anaytics**: The RMA. The actual values will depend on the asset class. Several types of indicators are provided:
    - Emotional indicators such as Anger, Fear and Joy
    - 'Economic' metrics including Earnings Forecast, Interest Rate Forecast, Long vs. Short 
    - ESG measures including CarbonEmissionsControversy, ManagementTrust, and WorkplaceSafety
    - etc. 

- **Asset**: The asset of choice. To see all options, clear the cell. For a description of each asset, please search for the asset code in the User Guide or Eikon app.

- **Roll. window**: The length in the smoothing function (a simple moving average).

- **Min. period**: Minimum number of observations in window required to have a value (otherwise result is NA).

**WARNING**                
If your plot is empty, it is likely that there is no data for that combination of variables.

<font color='blue'>**HOW TO**</font>    
<font color='blue'>1. Run the cell below (after running it, you should see several widgets).</font>  
<font color='blue'>2. Make your selections according to the explanations above.</font>  

In [4]:
swdgts = mpwidgets.SlicerWidgets(lwdgts.df)
swdgts.display()

HBox(children=(Dropdown(description='Data Type:', index=2, options=('News', 'News_Headline', 'News_Social', 'S…

HBox(children=(Dropdown(description='Weighted by:', options=(False, 'buzz'), value=False), BoundedIntText(valu…

Tab(children=(Output(), Output()), _titles={'0': 'RMA plot', '1': 'RMA data'})

---
## 5. Downloading the data

If you are using Colab, you can also download your selected dataframe:

<font color='blue'>**HOW TO**</font>    
<font color='blue'>1. Run the cell below (after running it, you should see several widgets).</font>  
<font color='blue'>2. Change the file name and file extensions.</font> 
<font color='blue'>3. Click on `Download`.</font> 

In [5]:
dwdgts = mpwidgets.DownloaderWidgets(lwdgts.df)
dwdgts.display()

VBox(children=(Text(value='marketpsych_file', description='File name:', placeholder='Type name of file to save…