Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
8a36255
linting
anubis05 Nov 3, 2022
1b9928f
Refactored the result parsing code
anubis05 Nov 3, 2022
aaecfe0
added test script for result parser
anubis05 Nov 3, 2022
758fd87
* Updated logic for address metadata
anubis05 Nov 4, 2022
a6beb08
* Changes to config
anubis05 Nov 4, 2022
9d68fc0
* Additional modifications to the parser
anubis05 Nov 4, 2022
3b7f200
Added CSV files to gitignore
anubis05 Nov 4, 2022
7d39964
Removed all *.csv files from being tracked
anubis05 Nov 4, 2022
1b823b6
* Refactor av_parser class
anubis05 Nov 4, 2022
0ea94fe
*Converted variables to static variables
anubis05 Nov 4, 2022
14d0329
* Created new function for address metadata separate from USPS function
anubis05 Nov 7, 2022
3a81df0
* minor commenting changes
anubis05 Nov 7, 2022
1c51a90
main
anglarett Nov 8, 2022
c7ddb25
main
anglarett Nov 8, 2022
ae0bf48
update
anglarett Nov 8, 2022
5212b06
update
anglarett Nov 8, 2022
10fef97
update
anglarett Nov 8, 2022
2402257
update
anglarett Nov 8, 2022
4bdc5b6
update
anglarett Nov 8, 2022
7a49ce0
Moved all test files to tests folder outside src/
anubis05 Nov 9, 2022
e1f7a5b
* moving test files outside of src
anubis05 Nov 9, 2022
9031467
Merge branch 'Sarthak_refactoring' of github.com:googlemaps/python-hi…
anubis05 Nov 9, 2022
9f37dfa
* Updated redme file and removed dead code
anubis05 Nov 9, 2022
0f0dc8c
* Added License information and specific details of mode
anubis05 Nov 9, 2022
f6269f7
Removed some repetative elements from the readme file
anubis05 Nov 9, 2022
52b8a87
* Removed a temp file
anubis05 Nov 9, 2022
c3f928c
* Converted to a unit test class
anubis05 Nov 9, 2022
7ccb7dc
* Cleaned unused code
anubis05 Nov 9, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -132,13 +132,19 @@ dmypy.json
#ignore the shelve file to commit
*.db

#ignore all csv files to commit
*.csv

# ignore run files
addresses
.addresses
output.json
output.csv
api-key.js
duplicationReport.csv
only_addresses.csv

#other
/.vscode
.DS_Store
*.code-workspace
*.code-workspace
105 changes: 67 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ This program is a wrapper around [Address Validation API](https://developers.goo

![High-Level-overview](/doc_images/High-Volume-Address-Validation-overview.png)

The program takes a `csv` file. It then uses the API key configured in config.yaml to start the processing of the addresses.
The program takes a `csv` file. It then uses the API key configured in `config.yaml` to start the processing of the addresses.

## Overview

Expand All @@ -20,23 +20,41 @@ You will need an API Key to call the Address Validation API.

Running modes are essentially different scenarios or use cases under which the software can be run. There are three running modes for the software which can all be configured using the config.yaml deescribed in the next section:

Details of the elements we discuss in this section can be found in the [validateAddress object reference guide](https://developers.google.com/maps/documentation/address-validation/reference/rest/v1/TopLevel/validateAddress)

1. ### Test Mode : 1

In test mode you are allowed to store more details from the Address Validation API response (this can be configured from `main.py` in variable `header`).
In test mode you can store more details from the Address Validation API response .

- place_ID
- latlong
- formatted_address
- postal_address
- verdict
- address_type
- usps_data
- address_components

> **Note:** This is an extrmely permissive mode and should be avoided to be used for most scenarios. Only use case where this mode can be used is for testing and for very limited number of addresses. The responses have to be deleted within 15 days.

2. ### Production mode -Users : 2 (default)

A Production mode <ins>not</ins> initiated after user/human interaction, only minimal data elements are allowed to be stored as per [Google Maps Platform Terms of Service](https://cloud.google.com/maps-platform/terms). Typically involves successive and multiple programmatic requests to Address Validation API.

2. ### Production mode -NoUsers : 2 (default)
- place_ID
- latlong
- verdict
- address_components

a Production mode <ins>not</ins> initiated after user/human interaction, only minimal data elements are allowed to be stored as per [Google Maps Platform Terms of Service](https://cloud.google.com/maps-platform/terms). Typically involves successive and multiple programmatic requests to Address Validation API.
> **Note:** All the data elements in this mode can only be cached for a maximum of 30 days and > must be deleted afterwords.Only place_ID can be stored indefinitely.

3. ### Production mode -Users : 3
3. ### Production mode -NoUsers : 3

a Production mode initiated after user/human interaction, some more data may be cached for the unique purpose of the user completing his singular task.

* Update the mode in `config.yaml` file inside `/src` folder :
- place_ID

```
run_mode : 2
```
- Update the mode in `config.yaml` file:

### config.yaml

Expand Down Expand Up @@ -68,49 +86,29 @@ separator : ","
***Shelve db file:*** This is a temporary file created to maintain persistance for a long runninng process.
```shelve_db : addresses```

### Overall Flow of logic

* Reads a `csv` file
* Constructs the address as per configuration
* Stores the formatted addresses in a `shelve` object. This is done to make the program more resilient and async.
* The library then picks up addresses one by one from the `shelve` object and call the Address Validation API
* It gets the response back, parse it and store configured values back to the `shelve` object
* After all the addresses are inserted back to the datastructure, another piece of code executes and exports the data in a `csv` file
* Once the program is executed, it stores the [geocode](https://developers.google.com/maps/documentation/address-validation/requests-validate-address#response) and [`place ID`](https://developers.google.com/maps/documentation/places/web-service/place-id) against each given address and exports it in a `csv` file.

### Key features

* Maintains QPM limits set by the Address Validation API
* Async code and maintains state
* Checks for duplicates and runs repeated addresses only once
* Modes help create parity with Terms of Service
- Maintains QPM limits set by the Address Validation API
- Async code and maintains state
- Checks for duplicates and runs repeated addresses only once
- Generates a duplication report which shows which addresses are duplicated and how often
- Modes help create parity with Terms of Service

## Install and run

* Requires `python3` and `PyYAML`:
- Requires `python3` and `PyYAML`:

`brew install python3`
`brew install PyYAML`

* Install: python-high-volume-address-validation-library software also requires to have [google-maps-services-python](https://github.com/googlemaps/google-maps-services-python) installed, the latest version that includes Address Validation API:
- Install: python-high-volume-address-validation-library software also requires to have [google-maps-services-python](https://github.com/googlemaps/google-maps-services-python) installed, the latest version that includes Address Validation API:
`
pip3 install googlemaps
`

* Update `config.yaml` file in `/src` folder with your API key, `csv` output path, and mode in which to run the library (see "Running Modes" section):

```
## Address Validation API key
api_key : 'YOUR_API_KEY'

## Name of the output csv file
output_csv : './test-results.csv'

## There are three modes for running the software.
run_mode : 1
```
- Update `config.yaml` file in with your API key, `csv` output path, and mode in which to run the library (see "Running Modes" section):

* Run:
- Run:
`
python3 main.py
`
Expand All @@ -131,6 +129,37 @@ separator : ","

The software works in three modes. You can set the mode to comply with [Google Maps Platform Terms of Service](https://cloud.google.com/maps-platform/terms), by configuring the `config.yaml` file corresponding to the use case under which this is run.

### Overall Flow of logic

- Reads a `csv` file
- Constructs the address as per configuration
- Stores the formatted addresses in a `shelve` object. This is done to make the program more resilient and async.
- The library then picks up addresses one by one from the `shelve` object and call the Address Validation API
- It gets the response back, parse it and store configured values back to the `shelve` object
- After all the addresses are inserted back to the datastructure, another piece of code executes and exports the data in a `csv` file
- Once the program is executed, it stores the [geocode](https://developers.google.com/maps/documentation/address-validation/requests-validate-address#response) and [`place ID`](https://developers.google.com/maps/documentation/places/web-service/place-id) against each given address and exports it in a `csv` file.

## Output

This program outputs a CSV file. Based on the mode selected above, the contents of the CSV file changes.

It will also output a duplication csv file which reports all the addresses which were duplicates in the input request.

## License

Copyright 2022 Google LLC.

Licensed to the Apache Software Foundation (ASF) under one or more contributor
license agreements. See the NOTICE file distributed with this work for
additional information regarding copyright ownership. The ASF licenses this
file to you under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at

<http://www.apache.org/licenses/LICENSE-2.0>

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.
15 changes: 15 additions & 0 deletions __init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

__version__ = "0.1"
15 changes: 15 additions & 0 deletions src/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

__version__ = "0.1"
Loading