Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Api v2 #20

Merged
merged 8 commits into from Aug 14, 2018
Merged

Upgrade to Api v2 #20

merged 8 commits into from Aug 14, 2018

Conversation

StevenMHernandez
Copy link
Contributor

This directly handles issue #9.

In addition to the stated issue of API v1 being deprecated by Harvest, this pull request also syncs many new data points that v2 offers (in addition to some data points not available in the previous version).

Indirectly; as a result of this pull request, the following issues will be handled as well:

As a result of moving from API v1 up to v2 for Harvest, there are a few configuration changes that must occur (for use in systems such as Stitch). These config.json changes can be seen in the updated README

  • account_id is required instead of account_name
  • user_agent is required (though it appears that it was required in the past, just not documented)

I've changed the version # for the module because it makes large changes to the underlying schemas compared to that of module v1.1.1

  • many new schemas were added
  • many new schema attributes were added (as per new data from the v2 Harvest API)
  • many old schema attributes were removed (no longer available in the v2 Harvest API)
  • people schema was renamed to users (as per new Harvest API endpoints)
  • schema attribute names have been changed to match the current api (ex: active vs is_active)

Please let me know any questions, concerns or recommendations.

StevenMHernandez added 7 commits June 27, 2018 12:28
* update authentication method required for new v2 of the api
* add additional required config variables for tap usage
* update access_token request to comply with v2 of api
* update schemas to reflect data returned from the v2 api
* add additional schemas to reflect additional data points available

@see https://help.getharvest.com/api-v2/
@see singer-io#9
…a and loops through data, thus removing redundancies

* allow each parent object loaded from the `sync_endpoint` to recursively sync additional endpoints (related to the parent object) through a `for_each_handler`
* remove api requests from the individual `sync_$item` methods, instead prefer using the generic `sync_endpoint` method
* allow key_properties to be set to anything other than just `["id"]` for use with pivot table style data
* update config.json and state.json documentation in README.md
* add additional schemas and subsequent loading-code that was missed in the first pass
* additional minor fixes such as
  * incorrect schema attribute names
  * missing attributes
  * incorrect schema attribute type/formats
* minor formatting tweaks
…ompany, otherwise the user receives a 403 response error

* load company metadata from harvest api
* ensure specific features are either enabled or disabled before beginning endpoint syncs
* define default key_properties directly in `load_and_write_schema` method signature
* update version
* fix other minor inconsistencies
@cmerrick
Copy link
Contributor

Hi @StevenMHernandez, thanks for your contribution!

In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement. It's all electronic and will take just minutes.

@cmerrick
Copy link
Contributor

You did it @StevenMHernandez!

Thank you for signing the Singer Contribution License Agreement.

@nick-mccoy
Copy link

This is awesome, @StevenMHernandez!! Thanks for the well thought out, comprehensive updates. We are going to review and test these changes during our next sprint. We'll update soon!

@StevenMHernandez
Copy link
Contributor Author

@nick-mccoy Any update on this? Is there anything I can do on my end to help?

@nick-mccoy
Copy link

Hey @StevenMHernandez -- sorry for the delay -- reviewing this is in our current sprint, so we should be getting to it by next week at the latest.

@StevenMHernandez
Copy link
Contributor Author

Cool, no worries and thanks!

@dmosorast
Copy link
Contributor

dmosorast commented Aug 10, 2018

Hi @StevenMHernandez, I'm taking a look at this to see about getting it merged. Right now, I'm having a bit of trouble with the authorization, and I'm not sure what pattern the tap is using when looking at the Harvest docs on auth.

It looks to be a mix of personal access token (using account_id) and OAuth (using refresh_token), but neither have worked thus far.

Here's the details of what I've done.

Personal Access Token

I've tried creating a personal access token using the developer UI and passing it in the config like so (put it in refresh_token, assuming the name was just off, since that's a required field and there's no special handling around it other than the OAuth access token requests):

{
    "refresh_token": "<personal access token>",
    "account_id": "<id provided with access token>",
    ...
}

but this failed the Auth object's access token request with this partial stack trace.

CRITICAL invalid_grant
CRITICAL The provided access grant is invalid, expired, or revoked (e.g. invalid assertion, expired authorization token, bad end-user password credentials, or mismatching authorization code and redirection URI).

...

  File "/opt/code/tap-harvest/tap_harvest/__init__.py", line 61, in _refresh_access_token
    self._access_token = resp_json['access_token']
KeyError: 'access_token'

OAuth

I created an OAuth app and was able to retrieve a refresh token and access token, and specified them in the config. I also removed the account_id parameter requirement and usage, since their OAuth flow doesn't seem to return that? This gets further, but then I get a 401 (Unauthorized) from the company endpoint.

Have you run into these troubles locally? It would be great if the tap could be used with both of these methods.

@dmosorast
Copy link
Contributor

It appears that the problem was that I wasn't specifying the account_id, but I'm a bit confused about how to get that without creating a personal access token. So I have a bit of research to do.

In the meantime, if you have any thoughts on the above regarding the auth method, and how this flow should work, I'm interested!

@StevenMHernandez
Copy link
Contributor Author

Sorry for the issues with this.

OAuth

Yup, account_id is required, which would make sense why you are getting a 401 Unauthorized. However, I understand from what you are saying, this should be automated. Looking through example code that harvest provides here - https://github.com/harvesthq/harvest_api_samples/tree/master/v2/oauth it appears they are able to retrieve this account_id value automatically, however I am noticing that it is actually using the v1 api endpoints (which defeats the purpose of this pull request! V1 is being deprecated!).

I'm not sure if this is an oversight in the design of the harvest API or if I'm missing something.

Personal Access Token

I hadn't considered this method in this pull request. Do you think this is an additional requirement you all will need to accept this pull request or perhaps something that can be handled independently?

@StevenMHernandez
Copy link
Contributor Author

Actually, it appears the https://id.getharvest.com/api/v2/accounts endpoint accepts just the access_code parameter as authentication which does itself return the account_id required. So it wouldn't be required that the user specifies this id necessarily.

@dmosorast
Copy link
Contributor

Yes, I was just about to comment that! When I query it with Postman and my OAuth token, I get an object like this:

{
    "user": {
        "id": 123456,
        "first_name": "Name",
        "last_name": "Lastname",
        "email": "my-email@me.net"
    },
    "accounts": [
        {
            "id": 98765,
            "name": "Company Name",
            "product": "harvest",
            "google_sign_in_required": false
        }
    ]
}

Potentially, this could be requested by the tap to see which account the access token has been granted for. It would make sense to take in both the access_token and refresh_token from the OAuth flow, request the accounts endpoint, and pull out the first account from that endpoint to run. This could also give the option to extend it to allow for multiple accounts authed with the same token (though I wouldn't suggest adding that in to this PR heh).

Personal Access Token

As for the Personal Access Token method, I was just confused by the Account ID, since that auth method was the only mention of it I could find at the time. OAuth is definitely preferred.

@StevenMHernandez
Copy link
Contributor Author

Ok @dmosorast That last commit allows the tap to automatically retrieve the first account_id.

@dmosorast
Copy link
Contributor

Great! Thanks for the quick turnaround. It's working on my end now. I'll continue with the review and figure out the plans for merging. All in all, it's looking pretty good. I should have something concrete early next week.

@StevenMHernandez
Copy link
Contributor Author

Sure thing and thanks for taking a look into this!

@dmosorast
Copy link
Contributor

dmosorast commented Aug 14, 2018

@StevenMHernandez I've got everything set up to get this merged and released. If you have a Stitch account, can you PM me in the Singer slack @.dmosora and we can see about getting you set up there for testing?

@dmosorast dmosorast merged commit ca87f8b into singer-io:master Aug 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants