# PyFusion Example Notebook #

This notebook steps through the main functions in the PyFusion module to provide an overview. Please refer to the documentation for the full list of optional parameters supported by each function.

In [None]:
from fusion import Fusion, FusionCredentials
import pandas as pd

## Getting started ##

A utility function _generate_credentials_file_ in the _FusionCredentials_ class can be used to save credentials in a format that can be read directly by the Fusion class to establish a session. 

Two optional parameters that are useful:

*  _credentials_file_ is an optional parameter to specify a location to write the file, this defaults to _./config/client_credentials.json_ 
*  _proxies_ will need to be set if connecting from behind an internet proxy, using the format:  proxies = {'http':'myproxy.mycompany.com:8080','https':'myproxy.mycompany.com:8080'}



In [None]:
FusionCredentials.generate_credentials_file(
                    client_id = '<CLIENT ID>',
                    client_secret = '<CLIENT SECRET>'
)

## Start a new Fusion session ##

Creating a Fusion object withough any parameters will attempt to load a credentials file from _./config/client_credentials.json_. A different path and file can be specified using the credentials parameter. Alternatively a Fusion object can be created by passing a FusionsCredentials object, or a dictionary with the required values to the constructor.  

The credentials are used to connect to the authentication server and obtain an access token. When the token expires it will be automatically renewed.

In [None]:
fusion = Fusion()

## List available catalogs ##

Returns a dataframe containing a list of catalogs that are available to the API functional account used to authenticate. 

In [None]:
fusion.list_catalogs()

## Get catalog resources ##

Returns the resources available for the specified catalog (the default is the common catalog). At this time, the catalog resources returned are products and datasets.

In [None]:
fusion.catalog_resources()

## List products in the catalog ##

Given a catalog name (the default is the common catalog), this will return a dataframe containing the list of products, including the product identifier and associated metadata.

* _contains_ is an optional parameter that filters the product list to those that contain the string or one of the strings in a list in the identifier or description attributes. The example below filters the product list to those containing either index or FX in the description or identifier attribtues. Calling the function without this parameter will return all products.
* _id_contains_ if set to True will match values specified by the contains parameter to the product identifier only

In [None]:
fusion.list_products(contains=['index','fx'])

## List datasets in the catalog ##

Given a catalog name (the default is the common catalog), this will return a dataframe containing the list of datasets, including the dataset identifier and associated metadata.

* _contains_ is an optional parameter that works in the same way it does for products, and is either a string or a list of strings. The example below will return datasets with the word index in the identifier or description. 
* _id_contains_ limits the filtering by values specified by the contains parameter to matching the dataset identifier only.

* _max_results_ is an optional parameter tha specifies the number of rows to return.

In [None]:
fusion.list_datasets(contains='index', max_results = 5)

## Get dataset resources ##

Return a dataframe containing the resources available for the dataset, given the dataset identifier. At this time, this will always be a datasetseries which represents instances of the dataset, e.g. in a timeseries.

In [None]:
fusion.dataset_resources('<DATASET ID>')

## List datasets attributes ##

Return a dataframe with the attributes contained in the dataset given the dataset identifier. The name, type, and description is retrieved from the data dictionary.

In [None]:
fusion.list_dataset_attributes('<DATASET ID>')

## List datasetseries members ##

Return a dataframe with the datasetseries members given a dataset identifier. The identifier in the results is a label representing invidual instances of the dataset, for example corresponding to a date.

In [None]:
fusion.list_datasetmembers('<DATASET ID>')

## Get datasetseries member resources ##

Given a dataset identifier and a datasetseries member identifier, this will return a dataframe containing the resources available for this datasetseries member in a dataframe. Currently, this will return distributions, which represents a downloadable file in a specific format.

In [None]:
fusion.datasetmember_resources('<DATASET ID>','<SERIES ID>')

## List available distributions ##

Given a dataset identifier and a dataset series member identifier, this will return a dataframe containing the available disitrbutions. A distribution corresponds to data in a format that can be downloaded, e.g. csv or parquet

In [None]:
fusion.list_distributions('<DATASET ID>','<SERIES ID>')

## Download distributions ##

Given a dataset identifier, a date or date range, and a file format, download the requested distributions to disk. 

In [None]:
fusion.download('<DATASET ID>',dt_str='<SERIES ID>',dataset_format='csv')

## Load distributions as a dataframe ##

Given a dataset identifier, a date or date range, and a file format, return the requested data as a pandas dataframe.

* _columns_ is an optional parameter specifying a list of columns to return (only applies when the dataset format is parquet).

In [None]:
fusion.to_df('<DATASET ID>',dt_str = "<SERIES ID>", columns= ['ID'], dataset_format='parquet')