# Connector for DBLP 

In this example, we will be going over how to use Connector with DBLP.

## Preprocessing

connector is a component in the dataprep library that aims to simplify the data access by providing a standard API set. The goal is to help the users skip the complex API configuration. In this tutorial, we demonstrate how to use connector library with DBLP.

If you haven't installed dataprep, run command `pip install dataprep` or execute the following cell.

In [None]:
># Run me if you'd like to install
>!pip install dataprep

# Download and store the configuration files in dataprep. 

The configuration files are used to configure the parameters and initial setup for the API. The available configuration files can be manually downloaded here: [Configuration Files](https://github.com/sfu-db/DataConnectorConfigs) or automatically downloaded at usage. 

Store the configuration file in the dataprep folder. 

# Initialize connector

To initialize run the following code. Unlike Yelp and Spotify, tokens and client information are not needed.

In [2]:
from dataprep.connector import Connector
dc = Connector('dblp')

# Functionalities

Connector has several functions you can perform to gain insight on the data downloaded from DBLP.

### Connector.info
The info method gives information and guidelines of using the connector. There are 3 sections in the response and they are table, parameters and examples.
>1. Table - The table(s) being accessed.
>2. Parameters - Identifies which parameters can be used to call the method. For DBLP, there is no required **parameter**. 
>3. Examples - Shows how you can call the methods in the Connector class.

In [3]:
dc.info()


Table dblp.publication

Parameters
----------
q required 
h, f, first_name, last_name optional 

Examples
--------
>>> dc.query("publication", q="word1")
>>> dc.show_schema("publication")



### Connector.show_schema
The show_schema method returns the schema of the website data to be returned in a Dataframe. There are two columns in the response. The first column is the column name and the second is the datatype.

As an example, lets see what is in the publication table.

In [5]:
dc.show_schema("publication")

table: publication


Unnamed: 0,column_name,data_type
0,authors,object
1,title,string
2,venue,object
3,pages,string
4,publish year,string
5,publication type,string
6,publication url,string


### Connector.query
The query method downloads the website data and displays it in a Dataframe. The parameters must meet the requirements as indicated in connector.info for the operation to run.

When the data is received from the server, it will either be in a JSON or XML format. The connector reformats the data in pandas Dataframe for the convenience of downstream operations.

As an example, let's try to get the data from the "publication" table, providing the query search for "lee".

In [6]:
df = await dc.query("publication", first_name="Jian", last_name="Pei")
df

Unnamed: 0,authors,title,venue,pages,publish year,publication type,publication url
0,"[Dong-Wan Choi, Jian Pei, Xuemin Lin 0001]",On spatial keyword covering.,[Knowl. Inf. Syst.],2577-2612,2020,Journal Articles,https://dblp.org/rec/journals/kais/ChoiPL20
1,"[Yu Yang 0001, Xiangbo Mao, Jian Pei, Xiaofei ...",Continuous Influence Maximization.,[ACM Trans. Knowl. Discov. Data],29:1-29:38,2020,Journal Articles,https://dblp.org/rec/journals/tkdd/YangMPH20
2,"[Wenhui Yu, Jinfei Liu, Jian Pei, Li Xiong 000...",Efficient Contour Computation of Group-Based S...,[IEEE Trans. Knowl. Data Eng.],1317-1332,2020,Journal Articles,https://dblp.org/rec/journals/tkde/YuLPXCQ20
3,"[Sihem Amer-Yahia, Jian Pei]",VLDB SI 2018 editorial.,[VLDB J.],593-594,2020,Journal Articles,https://dblp.org/rec/journals/vldb/Amer-YahiaP20
4,"[Mingtao Lei, Lingyang Chu, Zhefeng Wang, Jian...",Mining top-k sequential patterns in transactio...,[World Wide Web],103-130,2020,Journal Articles,https://dblp.org/rec/journals/www/LeiCWPHZF20
5,"[Shangqian Gao, Feihu Huang, Jian Pei, Heng Hu...",Discrete Model Compression With Resource Const...,[CVPR],1896-1905,2020,Conference and Workshop Papers,https://dblp.org/rec/conf/cvpr/GaoHPH20
6,"[Guanghan Ning, Jian Pei, Heng Huang]",LightTrack - A Generic Framework for Online To...,[CVPR Workshops],4456-4465,2020,Conference and Workshop Papers,https://dblp.org/rec/conf/cvpr/NingPH20
7,"[Zicun Cong, Lingyang Chu, Lanjun Wang, Xia Hu...",Exact and Consistent Interpretation of Piecewi...,[ICDE],613-624,2020,Conference and Workshop Papers,https://dblp.org/rec/conf/icde/CongCWHP20
8,"[Lei Luo, Jian Pei, Heng Huang]",Sinkhorn Regression.,[IJCAI],2598-2604,2020,Conference and Workshop Papers,https://dblp.org/rec/conf/ijcai/LuoPH20
9,"[Xiao Wang 0017, Meiqi Zhu, Deyu Bo, Peng Cui ...",AM-GCN - Adaptive Multi-channel Graph Convolut...,[KDD],1243-1253,2020,Conference and Workshop Papers,https://dblp.org/rec/conf/kdd/0017ZB0SP20


From query results, you can see how easy it is to download the publication data from DBLP into a pandas Dataframe.

Now that you have an understanding of how connector operates, you can easily accomplish the task with two lines of code.


>1. dc = Connector(...)
>2. dc.query(...)

# That's all for now. 
If you are interested in writing your own configuration file or modify an existing one, refer to the [Configuration Files](https://github.com/sfu-db/DataConnectorConfigs>).