Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a ScrapingClient that doesn't need API access #5

Draft
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

pR0Ps
Copy link
Owner

@pR0Ps pR0Ps commented Oct 1, 2020

Also adds the ability to list activities using web scraping instead of the API. The activities are returned as ScrapedActivity objects that are mostly compatible with the normal Activity objects that are returned by the list activities function that uses the API.

Fixes #4

NOTE: stravalib moving to Pydantic for its models is going to break a LOT of this. Will need some work.

@pR0Ps pR0Ps force-pushed the feature/standalone-scraping branch from b2f0204 to 923a1c3 Compare October 1, 2020 06:03
@pR0Ps pR0Ps force-pushed the feature/standalone-scraping branch from 923a1c3 to 13737f7 Compare January 11, 2022 05:30
@pR0Ps pR0Ps force-pushed the feature/standalone-scraping branch 2 times, most recently from 9a82176 to d8ed33a Compare January 31, 2022 19:55
@pR0Ps pR0Ps marked this pull request as draft February 3, 2022 06:08
Also adds the ability to list activities using web scraping instead of
the API. The activities are returned as `ScrapedActivity` objects that
are mostly compatible with the normal `Activity` objects that are
returned by the list activities function that uses the API.
This should be done by the library consumer if it's needed
It's not going to be perfect, but the idea is that for the most basic of
cases it should be a pretty close replacement. The goal is to keep the
amount of work to support both API and scraping-based clients to a minimum.

To support this, the WebClient now uses delegation instead of
inheritance to add scraper-based functionality. This enables the
`ScrapingClient` class to use the same function names without
automatically overriding the `stravalib.Client` functions when used
through the `WebClient` class.
The default used to be to just download the JSON blob. It was changed to
request the GPX format instead since this is a more standardized format
for an activity.
Now accepts (but ignores) parameters that the `stravalib` version accepts
 - Make pagination actually work (forgot to increment page number)
 - Handle stopping based on the `before` param
 - Properly handle workout types
 - Move models to a separate file
 - Add more detailed scraping of activity details
 - Add more detailed scraping of bike data
 - Tweak LazyLoaded
 - Add scraping for challenges
 - Tweak gear access
BeautifulSoup v4.9.0 changed how `.text` works for `<script>` tags (ie.
not at all), breaking parsing.

See https://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/revision/564
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow using the library without API access
1 participant