<a href="https://colab.research.google.com/github/marketpsych/marketpsych/blob/main/notebooks/load_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Loading CSV files directly from SFTP

This notebook shows how to load MarketPsych's data with your SFTP credentials directly into a Jupyter Notebook. Note, however, that this notebook is an **alpha version**. For more robust testing, we recommend downloading the trialing flat files, as instructed by the MRNSupport, and then loading them into your own environment. 

---
## Settings
In order to have this example working, you'll need to install MarketPsych's library. Additionally, this notebook uses some widgets to facilitate with your navigation. To install the libraries and enable widgets, please run the following cell. 

In [1]:
import sys
# Installs marketpsych library into your environment
!{sys.executable} -m pip install marketpsych --upgrade --quiet
# Installs ipywidget library into your environment
!{sys.executable} -m pip install ipywidgets --upgrade --quiet

## Libraries
from marketpsych import sftp
from marketpsych import mpwidgets

# Allows using the widgets
!{sys.executable} -m jupyter nbextension enable --py widgetsnbextension

from IPython.core.magic import register_cell_magic
from IPython.display import HTML, display

%load_ext autoreload
%autoreload 2

ERROR: Could not install packages due to an OSError: [WinError 5] Access is denied: 'c:\\programdata\\anaconda3\\lib\\site-packages\\ipywidgets-7.6.3.dist-info\\direct_url.json'
Consider using the `--user` option or check the permissions.

Enabling notebook extension jupyter-js-widgets/extension...
      - Validating: ok


---
## Selecting your login credentials

Please input your credentials, i.e., the path to the `Key file`, and `UserID` as provided by the MRNSupport. 

<font color='blue'>**IMPORTANT**</font>       
<font color='blue'>1. Run the following cell. After running it, you should see two widgets.</font>  
<font color='blue'>2. Click on the uploader named `Key File`.</font>        
<font color='blue'>3. Check if your `User ID` is correct, if not, change it manually.</font>  
<font color='blue'>4. Move on to the next cell.</font>  

In [2]:
cwdgts = mpwidgets.LoginWidgets()
cwdgts.display()

HBox(children=(FileUpload(value={}, description='Key File:'), Text(value='', description='User ID:', placehold…

**WARNING**                         
When you run the following cell after waiting for too long, you may be given the following error:
```python
"ValueError": I/O operation on closed file.
```   

If so, please re-run the previous cell and upload your key. Otherwise, continue.

In [3]:
# Creates client
client = sftp.connect(user=cwdgts.id_widget.value, key=cwdgts.key_widget.content)

AttributeError: 'FileUpload' object has no attribute 'content'

---
## Loading the data

Finally, you can download the files directly into a pandas dataframe. The options can be defined through the 5 widgets below.

 - Select the checkbox if you are trialing the data. In some special cases, even if you are trialing, you may need to uncheck it (you can try both options in case of Permission errors).

 - The options for **Asset class** are:  

|Asset class   | Description|
|:-------------|:------------|
|`CMPNY`       | Individual companies|
|`CMPNY_AMER`  | Individual companies domiciled in America|
|`CMPNY_APAC`  | Individual companies domiciled in APAC|
|`CMPNY_EMEA`  | Individual companies domiciled in EMEA|
|`CMPNY_ESG`   | Individual companies (ESG package)|
|`CMPNY_GRP`   | Company groups and ETFs|
|`COM_AGR`     | Agricultural commodities|  
|`COM_ENM`     | Energy and Metals|
|`COU`         | Countries|
|`COU_ESG`     | Countries (ESG package)|
|`COU_MKT`     | Stock indices, sovereign bonds, real estate|
|`CRYPTO`      | Cryptocurrencies|
|`CUR`         | Currencies|  

 - The options for **frequencies** are:  
  
|Frequency  | Description| Use case |
|:----------|:-----------|:---------|
|`W365_UDAI`| Yearly lookback window and daily updates| ESG Core only |
|`WDAI_UDAI`| Daily lookback window and daily updates| Daily data stamped 30 minutes before the NYSE close|
|`WDAI_UHOU`| Daily lookback window and hourly updates| Daily data stamped hourly (in case you want daily data adjusted to your time-zone) |
|`W01M_U01M`| Minutely lookback window and minutely updates| Low-latency data (**WARNING:** extremely large datasets)|
 
- Regarding **dates**, simply select the start and end dates of interest. Note that for dates older than 2 months (with respect to your current date), the files are packaged in monthly batches. Hence, only the selection of month will matter and not the specific day. For example, if you select `2020-12-25` as the start or end date, the full `2020-12` month will be loaded anyways.

**WARNINGS**   
Loading large files such as CMPNY data with a long window-frame can take quite a while and take over your memory. Start by loading very short periods (e.g., one month of data), then moving to ever longer periods. 

Check your asset class permissions. If you try downloading data for which no access was provided, it will give a Permission error:
```python
"PermissionError": [Errno 13] Access denied
```   

<font color='blue'>Run the following cell ONCE. After it you'll be able to select the parameters. Once you have selected them, run the next cell.</font> 

In [None]:
lwdgts = mpwidgets.LoaderWidgets()
lwdgts.display()

In [None]:
%%time

df = client.download(
    asset_class=sftp.AssetClass[lwdgts.asset_class_widget.value],
    frequency=sftp.Frequency[lwdgts.frequency_widget.value],
    start=lwdgts.start_date_widget.value,
    end=lwdgts.end_date_widget.value,
    trial=lwdgts.trial_check_widget.value,
    assets=tuple(lwdgts.assets_widget.value.split()),
    sources=tuple(lwdgts.data_type_widget.value)
)

display(df)

If you can see a dataframe above, congratulations! You have downloaded some data into your notebook. From here, you can have fun exploring it. Below, you'll find a plotting tool for some simple understanding of the data.

---
## Visualizing the data

Below you can use the widgets to do some very basic exploration. A description of the widgets is given below.

- The **Data Type** field represents the type of content source(s) on which the RMAs are based. There are four possible values:
    - `News` for news sources (headlines and corpus)
    - `News_Headline` for the headline only of news sources
    - `Social` for social media sources
    - `News_Social` for the combined content               


- The **Anaytics** field represents the RMA. The actual values will depend on the asset class. Several types of indicators are provided:
    - Emotional indicators such as Anger, Fear and Joy
    - 'Economic' metrics including Earnings Forecast, Interest Rate Forecast, Long vs. Short 
    - ESG measures including CarbonEmissionsControversy, ManagementTrust, and WorkplaceSafety
    - etc. 


- The **Asset** field represents the asset of choice. To see all options, clear the cell. For a description of each asset, please search for the asset code in the User Guide or Eikon app.


- The **Roll. window** field represents the length in the smoothing function (a simple moving average).

The indicators are updated every minute for companies, sectors, regions, countries, commodities, indices, bonds, currencies and cryptocurrencies. They can be translated directly into spreadsheets or charts that can be monitored by traders, risk managers or analysts – or they can be plugged straight into your algorithms for low frequency or longer-term asset allocation or sector rotation decisions.

**WARNING**                
If your plot is empty, it is likely that there is no data for that combination of the three top variables. 

<font color='blue'>Select options for plotting the data after running the cell:</font> 

In [4]:
swdgts = mpwidgets.SlicerWidgets(df)
swdgts.display()

NameError: name 'df' is not defined

In [5]:
import numpy as np
import pandas as pd
import ipywidgets as widgets
from IPython.display import display
w = widgets.IntSlider()

df = pd.DataFrame(np.random.randn(100, 4), columns=list('ABCD'))

In [92]:
class NameItWidgets:
    def __init__(self, df):
        """
        Widgets for downloading the data
        """
        self.file_name = widgets.Text(
            value='MarketPsychData',
            placeholder='Type name of file to save',
            description='File name:',
            disabled=False
        )

               
        
        
    def display(self):
        """
        Display widgets.
        """
        widgets_ = widgets.HBox([
            self.file_name])
        display(widgets_)
        
        


Single button for download - needs to be combined with above and function to download

In [97]:
class ButtonWidgets:   
    
    def __init__(self, df):
            self.button = widgets.Button(
            description='Click to Download',
            disabled=False,
            button_style='', # 'success', 'info', 'warning', 'danger' or ''
            tooltip='Download File to PC',
            icon='' # (FontAwesome names without the `fa-` prefix)
            )
            output = widgets.Output()
            with output:
                print('button clicked')
            
    def display(self):
        """
        Display widgets.
        """
        widgets_ = widgets.HBox([
            self.button])
        display(widgets_)
                   
            
            

Drop Down

In [68]:
class DropDownWidgets:   
    
    def __init__(self, df):
            self.dropdown = widgets.Dropdown(
            options=['.csv', '.xlsx', '.json', '.dta'],
            value='.csv',
            description='File Type:',
            disabled=False,
            )
            
    def display(self):
        """
        Display widgets.
        """
        widgets_ = widgets.VBox([
            self.dropdown])
        display(widgets_)
                  

Combine widgets so only have to call one thing, maybe done through MP on PiPy

Based on selections for name and dropdown perform:
download_df=pd.to_{filetype}({filename}.{filetype})
Then use button to perform download

In [98]:
nameitwidg = NameItWidgets(df).display()
dropdownwidg = DropDownWidgets(df).display()
buttonwidg = ButtonWidgets(df).display()

HBox(children=(Text(value='MarketPsychData', description='File name:', placeholder='Type name of file to save'…

VBox(children=(Dropdown(description='File Type:', options=('.csv', '.xlsx', '.json', '.dta'), value='.csv'),))

HBox(children=(Button(description='Click to Download', style=ButtonStyle(), tooltip='Download File to PC'),))

AttributeError: 'NoneType' object has no attribute 'value'

In [87]:
MarketPsychData

NameError: name 'MarketPsychData' is not defined

In [71]:
nameitwidg = NameItWidgets(df)
dropdownwidg = DropDownWidgets(df)
buttonwidg = ButtonWidgets(df)

test

In [62]:
class DownloadWidgets:
    def __init__(self, df):
        """
        Widgets for downloading the data
        """
        self.file_name = widgets.Text(
            value='MarketPsychData',
            placeholder='Type name of file to save',
            description='File name:',
            disabled=False
        )   
    
    def __init__(self, df):
            self.dropdown = widgets.Dropdown(
            options=['.csv', '.xlsx', '.json', '.dta'],
            value='.csv',
            description='File Type:',
            disabled=False,
            )    
            
    def display(self):
        """
        Display widgets.
        """
        widgets_ = widgets.VBox([self.file_name, self.dropdown])
        display(widgets_)
                  
            
            
        
        
        
        
        
        
        

In [72]:
display(nameitwidg, dropdownwidg, buttonwidg)

<__main__.NameItWidgets at 0x1eb3ad6c6d0>

<__main__.DropDownWidgets at 0x1eb3ad6cdc0>

<__main__.ButtonWidgets at 0x1eb3ad6c340>

In [75]:
widgets.ToggleButtons(
    options=['Slow', 'Regular', 'Fast'],
    description='Speed:',
    disabled=False,
    button_style='', # 'success', 'info', 'warning', 'danger' or ''
    tooltips=['Description of slow', 'Description of regular', 'Description of fast'],
#     icons=['check'] * 3
)

ToggleButtons(description='Speed:', options=('Slow', 'Regular', 'Fast'), tooltips=('Description of slow', 'Des…