Skip to content

Commit

Permalink
docs(connector): revision
Browse files Browse the repository at this point in the history
  • Loading branch information
peiwangdb authored and dovahcrow committed Apr 1, 2021
1 parent 36eb5fc commit 23085dd
Show file tree
Hide file tree
Showing 12 changed files with 58 additions and 47 deletions.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
Binary file not shown.
5 changes: 5 additions & 0 deletions docs/source/user_guide/connector/authorization.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,12 @@
"Connector supports the most used authorization methods in Web APIs:\n",
"\n",
"* API Key\n",
" * Bearer Token\n",
" * Query parameter\n",
" * Request header parameter\n",
"* OAuth 2.0 \"Client Credentials\" and \"Authorization Code\" grants.\n",
" * Client Credentials grant\n",
" * Authorization Code grant\n",
"\n",
"Let's review them in detail:\n",
"\n",
Expand Down
24 changes: 16 additions & 8 deletions docs/source/user_guide/connector/config.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Loading configuration files"
"# Configuration files: usage and composition"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Loading configuration files"
]
},
{
Expand All @@ -17,19 +24,17 @@
"\n",
"### 1. Loading existing files from our github repo\n",
"\n",
"We maintain a github repo that contains configuration files for more than 20 websites.\n",
"We maintain a github repo that contains configuration files for more than 20 websites [here](https://github.com/sfu-db/APIConnectors/tree/develop/api-connectors).\n",
"\n",
"//todo: add reference\n",
"\n",
"As an example, with the following code, the system will first download the configuration file folder of DBLP from our repo, and then load it to build the connection.\n",
"As an example, with the following code, the system will download the configuration file folder of dblp from our repo, load it, and build the connection.\n",
"The file is placed at the system temporary file folder.\n",
"\n",
"```\n",
"from dataprep.connector import connect\n",
"conn = connect(\"dblp\")\n",
"```\n",
"\n",
"```connect()``` provides a parameter called ```update```, which forces update the config files if set to ```True```.\n",
"```connect()``` provides a parameter called ```update```, which forces downloading of the fresh config files if set to ```True```.\n",
"\n",
"\n",
"### 2. Loading from a local directory\n",
Expand All @@ -43,14 +48,17 @@
"conn = connect(\"./dblp\")\n",
"```\n",
"\n",
"When the website API that you want to access is not supported by us, you will want to write your own configuration files. \n",
"Or when you want to do some modification for the configuration files, you need to first download the configuration files to your local computer, change the files accordingly, and then load it from the local directory.\n",
"\n",
"See below for how to create your own configuration folder and files."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Composing the configuration files"
"## Composing the configuration files"
]
},
{
Expand Down Expand Up @@ -100,7 +108,7 @@
"* What are the parameters the query support?\n",
"* What is the schema of the returned results?\n",
"\n",
"An tutorial of how to write a configuration file is here: https://github.com/sfu-db/DataConnectorConfigs/blob/develop/CONTRIBUTING.md\n",
"A tutorial of how to write a configuration file is [here](https://github.com/sfu-db/APIConnectors/blob/develop/CONTRIBUTING.md)\n",
"\n",
"Below shows the configuration file of the publication API."
]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,17 +19,17 @@
"\n",
"With Connector, you can collect data in two steps: **connect** to a website and **query** the data.\n",
"\n",
"We currently support tens of websites: https://github.com/sfu-db/DataConnectorConfigs/tree/develop/api-connectors\n",
"We currently support tens of websites: https://github.com/sfu-db/APIConnectors/tree/develop/api-connectors\n",
"\n",
"You can also author your own configuration files for new websites.\n",
"You can also author your own configuration files to support new websites.\n",
"We look forward to seeing your contribution to facilitate other users as well."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Collecting data from DBLP\n",
"## Collecting data from DBLP\n",
"\n",
"\n",
"#### DBLP Website\n",
Expand Down
53 changes: 24 additions & 29 deletions docs/source/user_guide/connector/info.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,17 @@
]
},
{
"cell_type": "raw",
"cell_type": "markdown",
"metadata": {},
"source": [
"```\n",
"from dataprep.connector import connect, info\n",
"\n",
"# Access tokens can be accessed generated here: https://www.yelp.com/developers/documentation/v3/authentication\n",
"dc = connect('yelp', _auth={'access_token':'cCMHU4M4t7rdt*********vp3whGzFjgIKIm0'})\n",
"\n",
"dc.info()"
"dc.info()\n",
"```"
]
},
{
Expand All @@ -36,10 +38,12 @@
]
},
{
"cell_type": "raw",
"cell_type": "markdown",
"metadata": {},
"source": [
"info('yelp')"
"```\n",
"info('yelp')\n",
"```"
]
},
{
Expand Down Expand Up @@ -76,42 +80,33 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Below is output of the info function for the Yelp API. \n",
"- Yelp has one table called \"businesses\" which contains information such as rating and location data of businesses.\n",
"- The businesses table has seven parameters: location, term, latitude, longitude, limit, categories and sort_by. The location parameter is required and must be specified in each query while the other parameters are optional. These parameters are specific to the businesses table and can be used to access certain types of business data.\n",
"- The example shows how to create a connector object for Yelp, given the _auth parameter to authenticate oneself and the _concurrency parameter to speed up data acquisition. The connector object \"dc\" can then be used to query the API via the businesses table. A dataframe will be returned with 20 rows of data for businesses located in Seattle. More details can be found in the \"connect\" and \"query\" sections.\n",
"- The schema shows there will be 20 columns of data returned when querying the businesses table. Each row of the schema displays a column name and its corresponding data type. For example the \"name\" and \"image_url\" columns contain string data while \"latitude\" and \"longitude\" columns contain float data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from dataprep.connector import info\n",
"info('yelp', update=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![title](assets/yelp_params_example.png)"
"Below shows the case for the Yelp website. You can see:\n",
"\n",
"* Yelp has one table: \"businesses\".\n",
"\n",
"* The businesses table has seven parameters: location, term, latitude, longitude, limit, categories and sort_by. The location parameter is required, while the other parameters are optional.\n",
"\n",
"* The example shows how to connect and query Yelp. More details can be found in the \"connect\" and \"query\" sections.\n",
"\n",
"* The schema shows there will be 20 columns of data returned. Each row of the schema displays a column name and its corresponding data type. For example the \"name\" and \"image_url\" columns contain string data while \"latitude\" and \"longitude\" columns contain float data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![title](assets/yelp_schema_top.png)"
"```\n",
"from dataprep.connector import info\n",
"info('yelp', update=True)\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![title](assets/yelp_schema_bottom.png)"
"![title](assets/yelp-1.png)\n",
"![title](assets/yelp-2.png)"
]
}
],
Expand All @@ -131,7 +126,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
"version": "3.7.7"
}
},
"nbformat": 4,
Expand Down
10 changes: 6 additions & 4 deletions docs/source/user_guide/connector/introduction.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,9 @@
"\n",
"DataPrep.Connector aims to simplify data collection from Web APIs by providing a standard set of operations. \n",
"Connector wraps-up complex API calls into a set of easy-to-use Python functions. \n",
"By using Connector, you can skip the complex API configuration process and rapidly query different Web APIs in few steps, enabling you to execute the analysis workflow you are familiar with in a direct way."
"By using Connector, you can skip the complex API configuration process and rapidly query different Web APIs in few steps, enabling you to execute the analysis workflow you are familiar with in a direct way.\n",
"\n",
"Watch our introduction in PyGlobal Conference `here <https://www.youtube.com/watch?v=56qu-0Ka-dA/>`_."
]
},
{
Expand All @@ -28,7 +30,7 @@
"\n",
"* Authorization: Access more Web APIs quickly! Even the ones that implement authorization!\n",
"\n",
"This section includes a case study for [```dblp```](https://dblp.org/) as an example for the process and other sections to explain Connector's functionalities in detail."
"The user guide first presents a case study for [dblp](https://dblp.org/) as an example for the process overview and provides a detailed explanation of the functionalities in the following sections."
]
},
{
Expand All @@ -46,9 +48,9 @@
"## Section Contents\n",
"\n",
" * [Case study: dblp API](dblp.ipynb)\n",
" * [Configuration files: usage and composition](config_file.ipynb)\n",
" * [Configuration files: usage and composition](config.ipynb)\n",
" * [connect(): establish the connection and load the configuration files](connect.ipynb)\n",
" * [query(): fetch into DataFrames via APIs](query.ipynb)\n",
" * [query(): fetch data into DataFrames via APIs](query.ipynb)\n",
" * [info(): get information of a website](info.ipynb)\n",
" * [Authorization schemes supported](authorization.ipynb)\n",
" * [Auto-pagination](pagination.ipynb)"
Expand Down
5 changes: 3 additions & 2 deletions docs/source/user_guide/connector/pagination.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@
"Connector supports two mainstream pagination schemes:\n",
"\n",
"* Offset-based\n",
"\n",
" * Offset & Limit\n",
" * Page & Perpage\n",
"* Cursor-based\n",
"\n",
"Additionally, Connector’s auto-pagination feature enables you to implement pagination without getting into unnecessary detail about a specific pagination scheme.\n",
Expand Down Expand Up @@ -184,7 +185,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.6"
"version": "3.7.7"
}
},
"nbformat": 4,
Expand Down
2 changes: 1 addition & 1 deletion docs/source/user_guide/connector/query.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Examples\n",
"## Examples\n",
"\n",
"Below shows some possible ways to call the query function.\n",
"\n",
Expand Down

0 comments on commit 23085dd

Please sign in to comment.